Hardening round 2: healthcheck, audit anchor, return_4h, exec config, signals
Sei interventi MEDIA priorità sul sistema. 323 test pass, mypy strict
pulito, ruff clean.
1. Docker HEALTHCHECK + cerbero-bite healthcheck:
- nuovo subcommand che esce 0 se kill_switch=0 e last_health_check
entro --max-staleness-s (default 600s);
- HEALTHCHECK direttiva nel Dockerfile (60s interval, 5s timeout,
start_period 120s, retries 3);
- healthcheck definition nel docker-compose.yml.
2. Audit hash chain anti-truncation:
- migration 0002: nuova colonna system_state.last_audit_hash;
- AuditLog accetta callback on_append, dependencies.py la wire al
repository.set_last_audit_hash;
- Orchestrator.boot verifica che il tail file matcha l'anchor
persistito; mismatch → kill switch CRITICAL.
3. return_4h bootstrap da deribit get_historical:
- quando dvol_history è vuoto _fetch_return_4h cade su
deribit.historical_close (1h candle 4h fa);
- alert LOW se anche il fallback fallisce.
4. execution.environment + execution.eur_to_usd in strategy.yaml:
- ExecutionConfig promosso a typed schema con i due campi
consumati al boot;
- CLI start preferisce i valori da config; CLI flag overridano
solo quando differenti dai default.
5. Cycle correlation ID:
- structlog.contextvars.bind_contextvars in run_entry/run_monitor/
run_health propaga cycle_id e cycle nei log strutturati.
6. SIGTERM/SIGINT clean shutdown:
- run_forever installa loop.add_signal_handler per SIGTERM e
SIGINT; il segnale set()ta un asyncio.Event che termina il
blocco principale, scheduler.shutdown e ctx.aclose finalizzano.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+80
-2
@@ -123,6 +123,69 @@ def status(db: Path) -> None:
|
||||
)
|
||||
|
||||
|
||||
@main.command()
|
||||
@click.option(
|
||||
"--db",
|
||||
type=click.Path(dir_okay=False, path_type=Path),
|
||||
default=_DEFAULT_DB_PATH,
|
||||
show_default=True,
|
||||
)
|
||||
@click.option(
|
||||
"--max-staleness-s",
|
||||
type=int,
|
||||
default=600,
|
||||
show_default=True,
|
||||
help=(
|
||||
"Maximum age (seconds) of last_health_check before the engine is "
|
||||
"considered unhealthy. Used by Docker HEALTHCHECK."
|
||||
),
|
||||
)
|
||||
def healthcheck(db: Path, max_staleness_s: int) -> None:
|
||||
"""Exit 0 if the engine is healthy, 1 otherwise.
|
||||
|
||||
The check is intentionally conservative:
|
||||
|
||||
* the SQLite file must exist and be readable,
|
||||
* ``system_state.kill_switch`` must be 0,
|
||||
* ``system_state.last_health_check`` must not be older than
|
||||
``--max-staleness-s`` seconds.
|
||||
|
||||
Wired as the container HEALTHCHECK in ``Dockerfile``.
|
||||
"""
|
||||
if not db.exists():
|
||||
console.print("[red]unhealthy[/red]: state.sqlite missing")
|
||||
sys.exit(1)
|
||||
try:
|
||||
conn = connect_state(db)
|
||||
try:
|
||||
run_migrations(conn)
|
||||
sys_state = Repository().get_system_state(conn)
|
||||
finally:
|
||||
conn.close()
|
||||
except Exception as exc:
|
||||
console.print(f"[red]unhealthy[/red]: {type(exc).__name__}: {exc}")
|
||||
sys.exit(1)
|
||||
|
||||
if sys_state is None:
|
||||
console.print("[red]unhealthy[/red]: system_state singleton missing")
|
||||
sys.exit(1)
|
||||
if sys_state.kill_switch == 1:
|
||||
console.print(
|
||||
f"[red]unhealthy[/red]: kill switch armed "
|
||||
f"reason={sys_state.kill_reason!r}"
|
||||
)
|
||||
sys.exit(1)
|
||||
|
||||
age = (datetime.now(UTC) - sys_state.last_health_check).total_seconds()
|
||||
if age > max_staleness_s:
|
||||
console.print(
|
||||
f"[red]unhealthy[/red]: last_health_check stale "
|
||||
f"({age:.0f}s > {max_staleness_s}s)"
|
||||
)
|
||||
sys.exit(1)
|
||||
console.print(f"[green]healthy[/green] last_check_age={age:.0f}s")
|
||||
|
||||
|
||||
def _engine_options(func: Callable[..., Any]) -> Callable[..., Any]:
|
||||
"""Common options for the engine commands."""
|
||||
decorators = [
|
||||
@@ -181,14 +244,29 @@ def _build_orchestrator(
|
||||
) -> Orchestrator:
|
||||
loaded = load_strategy(strategy_path, enforce_hash=enforce_hash)
|
||||
token = load_token(path=token_file)
|
||||
# Strategy file values win over the CLI defaults; explicit overrides
|
||||
# via env-style values (CLI flags) still apply when the user provides
|
||||
# them — Click signals "default" via Click's resilient_parsing flag,
|
||||
# but for now the CLI value is treated as authoritative when it
|
||||
# differs from the documented default to keep the surface small.
|
||||
cfg_env = loaded.config.execution.environment
|
||||
cfg_fx = loaded.config.execution.eur_to_usd
|
||||
chosen_env = (
|
||||
environment if environment != "testnet" or cfg_env == "testnet" else cfg_env
|
||||
)
|
||||
chosen_fx = (
|
||||
Decimal(str(eur_to_usd))
|
||||
if eur_to_usd != 1.075
|
||||
else cfg_fx
|
||||
)
|
||||
return make_orchestrator(
|
||||
cfg=loaded.config,
|
||||
endpoints=load_endpoints(),
|
||||
token=token,
|
||||
db_path=db,
|
||||
audit_path=audit,
|
||||
expected_environment=environment, # type: ignore[arg-type]
|
||||
eur_to_usd=Decimal(str(eur_to_usd)),
|
||||
expected_environment=chosen_env, # type: ignore[arg-type]
|
||||
eur_to_usd=chosen_fx,
|
||||
)
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user