Phase 0: project skeleton

- pyproject.toml with uv, deps for runtime + gui + backtest + dev - ruff/mypy strict config, pre-commit hooks for ruff/mypy/pytest - src/cerbero_bite/ layout with empty modules ready for Phase 1+ - structlog JSONL logger with daily rotation - click CLI with placeholder subcommands (status, start, kill-switch, gui, replay, config hash, audit verify) - 6 smoke tests passing, mypy --strict clean, ruff clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:10:30 +02:00
commit 881bc8a1bf
40 changed files with 6018 additions and 0 deletions
@@ -0,0 +1,268 @@
+# 08 — Testing & Validation
+
+Approccio TDD imposto dalla skill `superpowers:test-driven-development`
+e coerente con `mcp_cerbero_brain`. Niente codice senza test che fallisce
+prima e passa dopo.
+
+## Piramide dei test
+
+```
+                ┌────────────────┐
+                │  golden / e2e  │   ~10 scenari, lenti, eseguiti pre-release
+                └────────────────┘
+              ┌──────────────────────┐
+              │   integration tests  │   ~50, MCP fake, eseguiti su PR
+              └──────────────────────┘
+            ┌──────────────────────────┐
+            │       unit tests         │   ~300, < 5 sec totali, ad ogni save
+            └──────────────────────────┘
+              ┌────────────────────┐
+              │   property tests   │   hypothesis su algoritmi puri
+              └────────────────────┘
+```
+
+## Unit tests (`tests/unit/`)
+
+Coprono ogni funzione in `core/`. Sono **veloci** (< 5 sec totali),
+**deterministici** (no rete, no time, no random).
+
+Convenzioni:
+
+- Un file di test per modulo: `test_sizing_engine.py`, `test_exit_decision.py`, ecc.
+- Naming: `test_<funzione>_<scenario>_<aspettativa>`. Es:
+  - `test_compute_contracts_capital_720_dvol_40_returns_one`
+  - `test_evaluate_mark_at_50pct_returns_close_profit`
+- Fixture: dataclasses pre-costruite in `tests/fixtures/scenarios.py`.
+
+### Coverage minima richiesta
+
+| Modulo | Coverage |
+|---|---|
+| `core/*` | 100% statement + 100% branch |
+| `safety/*` | 100% statement |
+| `state/*` | ≥ 90% |
+| `clients/*` | ≥ 80% |
+| `runtime/*` | ≥ 80% |
+
+Coverage misurata con `coverage.py`, soglia bloccante in CI.
+
+### Esempi di test obbligatori
+
+**`test_entry_validator.py`**:
+
+```python
+def test_validate_entry_capital_below_minimum_returns_fail(default_cfg):
+    ctx = EntryContext(capital_usd=Decimal("700"), dvol_now=Decimal("40"), ...)
+    result = validate_entry(ctx, default_cfg)
+    assert result.accepted is False
+    assert "capital_below_720" in result.reasons
+
+def test_validate_entry_dvol_too_high_returns_fail(default_cfg):
+    ctx = EntryContext(capital_usd=Decimal("1500"), dvol_now=Decimal("95"), ...)
+    result = validate_entry(ctx, default_cfg)
+    assert "dvol_above_90" in result.reasons
+
+def test_validate_entry_macro_event_inside_dte_returns_fail(default_cfg):
+    ctx = EntryContext(..., next_macro_event_in_days=5)
+    result = validate_entry(ctx, default_cfg)
+    assert "macro_event_within_dte" in result.reasons
+
+def test_validate_entry_all_conditions_met_returns_accepted(default_cfg):
+    ctx = EntryContext(...)
+    result = validate_entry(ctx, default_cfg)
+    assert result.accepted is True
+    assert result.reasons == []
+```
+
+**`test_sizing_engine.py`**:
+
+```python
+@pytest.mark.parametrize("capital,dvol,expected_n", [
+    (720,  40, 1),
+    (1500, 40, 2),
+    (1500, 50, 1),    # adj 0.85, 195*0.85/93 ≈ 1.78 → 1
+    (5000, 40, 2),    # cap 200 EUR ≈ 215 USD; 215/93 ≈ 2
+    (100000, 40, 2),  # cap saturo
+    (500,  40, 0),    # undersize
+])
+def test_compute_contracts(capital, dvol, expected_n, default_cfg):
+    ctx = SizingContext(
+        capital_usd=Decimal(capital),
+        max_loss_per_contract_usd=Decimal("93"),
+        dvol_now=Decimal(dvol),
+        eur_to_usd=Decimal("1.075"),
+        open_engagement_usd=Decimal(0),
+        other_open_positions=0,
+    )
+    assert sizing_engine.compute_contracts(ctx, default_cfg).n_contracts == expected_n
+```
+
+**`test_exit_decision.py`**: ogni branch dell'ordine di valutazione
+deve avere almeno un test.
+
+## Property tests (`tests/unit/test_*_properties.py`)
+
+Usiamo `hypothesis` per le invarianti:
+
+```python
+@given(
+    capital=decimals(min_value=720, max_value=200_000),
+    dvol=decimals(min_value=20, max_value=89),
+    max_loss=decimals(min_value=50, max_value=300),
+)
+def test_sizing_never_exceeds_cap_eur(capital, dvol, max_loss, default_cfg):
+    """Invariante: il rischio totale non eccede mai il cap EUR."""
+    ctx = SizingContext(capital_usd=capital, dvol_now=dvol, ...)
+    result = sizing_engine.compute_contracts(ctx, default_cfg)
+    cap_usd = Decimal(200) * default_cfg.eur_to_usd
+    assert result.risk_dollars <= cap_usd
+```
+
+Property tests obbligatori:
+
+- Sizing: rischio ≤ cap; n_contracts ≥ 0; mai > 4.
+- Exit decision: ordine dei trigger rispettato (CLOSE_PROFIT prima di CLOSE_DELTA, ecc.).
+- Combo builder: short_strike < long_strike per bear_call, > per bull_put.
+
+## Integration tests (`tests/integration/`)
+
+Testano l'interazione tra `core/` + `clients/` + `state/` con MCP
+**fake** (in-memory).
+
+### Fake MCP
+
+```python
+class FakeDeribit(McpClient):
+    def __init__(self, scenario: dict): ...
+    async def index_price(self, asset): return Decimal(self._scenario["spot"])
+    async def dvol(self): return Decimal(self._scenario["dvol"])
+    # ...
+```
+
+I fake sono guidati da uno scenario YAML:
+
+```yaml
+# tests/fixtures/scenarios/happy_path.yaml
+spot: 2330
+dvol: 42
+funding_perp: 0.05
+funding_cross: 0.04
+macro_calendar: []
+chain:
+  - {instrument: ETH-13MAY26-1900-P, strike: 1900, delta: -0.12, mid: 0.0048, ...}
+  - ...
+```
+
+### Cosa coprono
+
+| Test | Scenario |
+|---|---|
+| `test_weekly_open_happy_path` | Tutto OK → proposta inviata |
+| `test_weekly_open_no_strike_available` | Chain vuota nel range delta |
+| `test_weekly_open_macro_blocks` | FOMC entro 5 giorni |
+| `test_monitor_profit_take` | Mark = 50% credito → close_profit |
+| `test_monitor_vol_stop` | DVOL +12 → close_vol |
+| `test_recovery_after_crash_open_position` | Crash mid-fill, restart, riconcilia |
+| `test_kill_switch_blocks_new_entries` | Kill switch armed → no proposta |
+| `test_user_rejection_logs_and_skips` | Adriano dice no → cancelled |
+| `test_user_timeout_with_revaluation` | Adriano risponde dopo 30 min, slippage > 8% → abort |
+
+## Golden tests (`tests/golden/`)
+
+Replay di scenari deterministici end-to-end con tutti gli MCP fake.
+Output (decisioni, log) confrontato byte-per-byte con un golden file
+checked-in.
+
+```
+tests/golden/
+├── 2026-04-27_weekly_open_bull_put.yaml      # input snapshot
+├── 2026-04-27_weekly_open_bull_put.golden    # output atteso
+└── runner.py
+```
+
+Modifica intenzionale di un algoritmo richiede aggiornamento del golden,
+con commit message che spiega perché.
+
+## Backtest deterministico (replay storico)
+
+Un comando dedicato:
+
+```bash
+cerbero-bite replay --from 2024-01-01 --to 2026-04-25 --capital 1500 --dry-run
+```
+
+Carica:
+- Storia oraria spot ETH (CSV)
+- Storia DVOL (CSV)
+- Calendar macro storico (CSV)
+- Chain opzioni storiche (Deribit API archive, dove possibile)
+
+Itera giorno per giorno applicando esattamente le stesse regole. Output:
+
+- File CSV con tutte le posizioni, P&L, trigger di uscita
+- Plot equity curve
+- Confronto con simulazione Monte Carlo del documento
+
+Il replay è **non sostituto** del Monte Carlo ma utile per validare
+l'engine su dati reali una volta avuti.
+
+## Paper trading (fase di go-live)
+
+Pre-live di **3 mesi minimo**:
+
+- Engine in `--dry-run` ma con Telegram alert reali (Adriano riceve i
+  segnali ma non li esegue)
+- Adriano replica manualmente su Deribit testnet alcuni trade per
+  toccare con mano
+- Confronto giornaliero tra paper P&L e replica reale per misurare
+  slippage realistico
+
+Soglie di go-live:
+
+- ≥ 30 trade in paper completati con esito coerente con Monte Carlo
+- 0 incidenti operativi (timeout, stato incoerente, hash chain rotto)
+- Win rate paper ≥ 70%
+- Adriano firma esplicitamente l'autorizzazione su Telegram (logga in audit chain)
+
+## Validazione live (fase iniziale)
+
+Primi 30 giorni live:
+
+- Cap dimezzato: 100 EUR per trade, 500 EUR engagement totale
+- Monitoraggio quotidiano via daily digest
+- Review settimanale con Milito (in chat) per anomalie
+- Promozione a cap pieno (200 / 1.000 EUR) solo dopo 10 trade reali
+  conclusi entro range atteso
+
+## Linting e static analysis
+
+Ogni PR:
+
+- `ruff check` — passing
+- `ruff format --check` — passing
+- `mypy --strict src/` — passing
+- `pytest --cov` — coverage soglie rispettate
+- `bandit -r src/` — no security warning
+
+Nessun merge se uno dei check fallisce.
+
+## CI
+
+Anche locale (no GitHub remoto al momento). Pre-commit hook esegue
+unit + integration in < 30 sec. Pre-push esegue golden suite (~3 min).
+
+```yaml
+# .pre-commit-config.yaml
+repos:
+  - repo: local
+    hooks:
+      - id: ruff
+      - id: ruff-format
+      - id: mypy
+      - id: pytest-fast
+        entry: uv run pytest tests/unit tests/integration -x
+        stages: [commit]
+      - id: pytest-golden
+        entry: uv run pytest tests/golden -x
+        stages: [push]
+```