feat(V2): /health/ready con ping client + middleware request log strutturato + request_id correlation

- /health/ready: ping di tutti i client (exchange, env) cached con timeout 2s, status ready|degraded|not_ready, opt-in 503 via READY_FAILS_ON_DEGRADED. - Middleware mcp.request: 1 riga JSON per HTTP request con request_id, method, path, status_code, duration_ms, actor, bot_tag, exchange, tool, client_ip, user_agent. - request_id propagato in request.state, audit log e error envelope per correlazione cross-cutting. - Aggiunto async health() come probe minimo a bybit/alpaca/macro/ sentiment/deribit (hyperliquid lo aveva già). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 09:03:28 +02:00
parent 9afd087152
commit 8ecc1a24a9
13 changed files with 509 additions and 2 deletions
@@ -80,7 +80,8 @@ non è richiesto sugli endpoint pubblici (`/health`, `/apidocs`,

 | Path | Descrizione |
 |---|---|
-| `GET /health` | Healthcheck (no auth) |
+| `GET /health` | Liveness check (no auth) |
+| `GET /health/ready` | Readiness check con ping client exchange (no auth) |
 | `GET /apidocs` | Swagger UI (no auth) |
 | `GET /openapi.json` | Schema OpenAPI 3.1 (no auth) |
 | `POST /mcp-deribit/tools/{tool}` | Tool exchange Deribit |
@@ -91,6 +92,44 @@ non è richiesto sugli endpoint pubblici (`/health`, `/apidocs`,
 | `POST /mcp-sentiment/tools/{tool}` | Tool sentiment/news |
 | `GET /admin/audit` | Query dell'audit log JSONL (bearer richiesto, no X-Bot-Tag) |

+## Observability
+
+### Health check
+
+L'applicazione espone due endpoint distinti per il monitoring:
+
+- `GET /health` — liveness check semplice. Non richiede autenticazione e
+  ritorna sempre HTTP 200 finché il processo è vivo. Ideale per la
+  liveness probe di Kubernetes o per il pinger di Traefik.
+- `GET /health/ready` — readiness check evoluto. Itera tutti i client
+  exchange presenti nel registry e per ciascuno tenta una probe leggera
+  (`health()` se disponibile, fallback su `is_testnet()`), con timeout
+  di 2 secondi per client. La risposta contiene il campo `status` con
+  uno dei valori `ready` (tutti i client rispondono), `degraded` (almeno
+  uno fallisce) o `not_ready` (registry vuoto) ed un array `clients` con
+  un record per ogni coppia `(exchange, env)` cached. Per default
+  l'endpoint risponde sempre con HTTP 200; impostando la variabile
+  d'ambiente `READY_FAILS_ON_DEGRADED=true` si forza HTTP 503 quando lo
+  stato non è `ready`, comportamento utile per la readiness probe di
+  Kubernetes.
+
+### Request log
+
+Ogni richiesta HTTP attraversa un middleware che emette una riga JSON
+sul logger `mcp.request` con i seguenti campi: `request_id`, `method`,
+`path`, `status_code`, `duration_ms`, `actor` (`testnet` o `mainnet`,
+solo se autenticato), `bot_tag` (header `X-Bot-Tag` se presente),
+`exchange` (estratto dal path `/mcp-{exchange}/...`), `tool` (nome del
+tool quando il path è `/mcp-X/tools/Y`), `client_ip`, `user_agent`. Lo
+stesso `request_id` viene incluso anche nei record dell'audit log
+`mcp.audit` e nell'envelope di errore restituito al client, in modo da
+poter correlare le tre tracce a parità di richiesta.
+
+### Audit log
+
+Vedi la sezione "Audit query" qui sotto per la consultazione del log
+strutturato delle operazioni di scrittura.
+
 ## Audit query

 `GET /admin/audit` legge il file JSONL puntato da `AUDIT_LOG_FILE` e