Hardening round 2: healthcheck, audit anchor, return_4h, exec config, signals

Sei interventi MEDIA priorità sul sistema. 323 test pass, mypy strict
pulito, ruff clean.

1. Docker HEALTHCHECK + cerbero-bite healthcheck:
   - nuovo subcommand che esce 0 se kill_switch=0 e last_health_check
     entro --max-staleness-s (default 600s);
   - HEALTHCHECK direttiva nel Dockerfile (60s interval, 5s timeout,
     start_period 120s, retries 3);
   - healthcheck definition nel docker-compose.yml.

2. Audit hash chain anti-truncation:
   - migration 0002: nuova colonna system_state.last_audit_hash;
   - AuditLog accetta callback on_append, dependencies.py la wire al
     repository.set_last_audit_hash;
   - Orchestrator.boot verifica che il tail file matcha l'anchor
     persistito; mismatch → kill switch CRITICAL.

3. return_4h bootstrap da deribit get_historical:
   - quando dvol_history è vuoto _fetch_return_4h cade su
     deribit.historical_close (1h candle 4h fa);
   - alert LOW se anche il fallback fallisce.

4. execution.environment + execution.eur_to_usd in strategy.yaml:
   - ExecutionConfig promosso a typed schema con i due campi
     consumati al boot;
   - CLI start preferisce i valori da config; CLI flag overridano
     solo quando differenti dai default.

5. Cycle correlation ID:
   - structlog.contextvars.bind_contextvars in run_entry/run_monitor/
     run_health propaga cycle_id e cycle nei log strutturati.

6. SIGTERM/SIGINT clean shutdown:
   - run_forever installa loop.add_signal_handler per SIGTERM e
     SIGINT; il segnale set()ta un asyncio.Event che termina il
     blocco principale, scheduler.shutdown e ctx.aclose finalizzano.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-28 00:37:39 +02:00
parent 411b747e93
commit b5b96f959c
15 changed files with 477 additions and 24 deletions
+80 -2
View File
@@ -123,6 +123,69 @@ def status(db: Path) -> None:
)
@main.command()
@click.option(
"--db",
type=click.Path(dir_okay=False, path_type=Path),
default=_DEFAULT_DB_PATH,
show_default=True,
)
@click.option(
"--max-staleness-s",
type=int,
default=600,
show_default=True,
help=(
"Maximum age (seconds) of last_health_check before the engine is "
"considered unhealthy. Used by Docker HEALTHCHECK."
),
)
def healthcheck(db: Path, max_staleness_s: int) -> None:
"""Exit 0 if the engine is healthy, 1 otherwise.
The check is intentionally conservative:
* the SQLite file must exist and be readable,
* ``system_state.kill_switch`` must be 0,
* ``system_state.last_health_check`` must not be older than
``--max-staleness-s`` seconds.
Wired as the container HEALTHCHECK in ``Dockerfile``.
"""
if not db.exists():
console.print("[red]unhealthy[/red]: state.sqlite missing")
sys.exit(1)
try:
conn = connect_state(db)
try:
run_migrations(conn)
sys_state = Repository().get_system_state(conn)
finally:
conn.close()
except Exception as exc:
console.print(f"[red]unhealthy[/red]: {type(exc).__name__}: {exc}")
sys.exit(1)
if sys_state is None:
console.print("[red]unhealthy[/red]: system_state singleton missing")
sys.exit(1)
if sys_state.kill_switch == 1:
console.print(
f"[red]unhealthy[/red]: kill switch armed "
f"reason={sys_state.kill_reason!r}"
)
sys.exit(1)
age = (datetime.now(UTC) - sys_state.last_health_check).total_seconds()
if age > max_staleness_s:
console.print(
f"[red]unhealthy[/red]: last_health_check stale "
f"({age:.0f}s > {max_staleness_s}s)"
)
sys.exit(1)
console.print(f"[green]healthy[/green] last_check_age={age:.0f}s")
def _engine_options(func: Callable[..., Any]) -> Callable[..., Any]:
"""Common options for the engine commands."""
decorators = [
@@ -181,14 +244,29 @@ def _build_orchestrator(
) -> Orchestrator:
loaded = load_strategy(strategy_path, enforce_hash=enforce_hash)
token = load_token(path=token_file)
# Strategy file values win over the CLI defaults; explicit overrides
# via env-style values (CLI flags) still apply when the user provides
# them — Click signals "default" via Click's resilient_parsing flag,
# but for now the CLI value is treated as authoritative when it
# differs from the documented default to keep the surface small.
cfg_env = loaded.config.execution.environment
cfg_fx = loaded.config.execution.eur_to_usd
chosen_env = (
environment if environment != "testnet" or cfg_env == "testnet" else cfg_env
)
chosen_fx = (
Decimal(str(eur_to_usd))
if eur_to_usd != 1.075
else cfg_fx
)
return make_orchestrator(
cfg=loaded.config,
endpoints=load_endpoints(),
token=token,
db_path=db,
audit_path=audit,
expected_environment=environment, # type: ignore[arg-type]
eur_to_usd=Decimal(str(eur_to_usd)),
expected_environment=chosen_env, # type: ignore[arg-type]
eur_to_usd=chosen_fx,
)