# 08 — Testing & Validation Approccio TDD imposto dalla skill `superpowers:test-driven-development` e coerente con `mcp_cerbero_brain`. Niente codice senza test che fallisce prima e passa dopo. ## Piramide dei test ``` ┌────────────────┐ │ golden / e2e │ ~10 scenari, lenti, eseguiti pre-release └────────────────┘ ┌──────────────────────┐ │ integration tests │ ~50, MCP fake, eseguiti su PR └──────────────────────┘ ┌──────────────────────────┐ │ unit tests │ ~300, < 5 sec totali, ad ogni save └──────────────────────────┘ ┌────────────────────┐ │ property tests │ hypothesis su algoritmi puri └────────────────────┘ ``` ## Unit tests (`tests/unit/`) Coprono ogni funzione in `core/`. Sono **veloci** (< 5 sec totali), **deterministici** (no rete, no time, no random). Convenzioni: - Un file di test per modulo: `test_sizing_engine.py`, `test_exit_decision.py`, ecc. - Naming: `test___`. Es: - `test_compute_contracts_capital_720_dvol_40_returns_one` - `test_evaluate_mark_at_50pct_returns_close_profit` - Fixture: dataclasses pre-costruite in `tests/fixtures/scenarios.py`. ### Coverage minima richiesta | Modulo | Coverage | |---|---| | `core/*` | 100% statement + 100% branch | | `safety/*` | 100% statement | | `state/*` | ≥ 90% | | `clients/*` | ≥ 80% | | `runtime/*` | ≥ 80% | Coverage misurata con `coverage.py`, soglia bloccante in CI. ### Esempi di test obbligatori **`test_entry_validator.py`**: ```python def test_validate_entry_capital_below_minimum_returns_fail(default_cfg): ctx = EntryContext(capital_usd=Decimal("700"), dvol_now=Decimal("40"), ...) result = validate_entry(ctx, default_cfg) assert result.accepted is False assert "capital_below_720" in result.reasons def test_validate_entry_dvol_too_high_returns_fail(default_cfg): ctx = EntryContext(capital_usd=Decimal("1500"), dvol_now=Decimal("95"), ...) result = validate_entry(ctx, default_cfg) assert "dvol_above_90" in result.reasons def test_validate_entry_macro_event_inside_dte_returns_fail(default_cfg): ctx = EntryContext(..., next_macro_event_in_days=5) result = validate_entry(ctx, default_cfg) assert "macro_event_within_dte" in result.reasons def test_validate_entry_all_conditions_met_returns_accepted(default_cfg): ctx = EntryContext(...) result = validate_entry(ctx, default_cfg) assert result.accepted is True assert result.reasons == [] ``` **`test_sizing_engine.py`**: ```python @pytest.mark.parametrize("capital,dvol,expected_n", [ (720, 40, 1), (1500, 40, 2), (1500, 50, 1), # adj 0.85, 195*0.85/93 ≈ 1.78 → 1 (5000, 40, 2), # cap 200 EUR ≈ 215 USD; 215/93 ≈ 2 (100000, 40, 2), # cap saturo (500, 40, 0), # undersize ]) def test_compute_contracts(capital, dvol, expected_n, default_cfg): ctx = SizingContext( capital_usd=Decimal(capital), max_loss_per_contract_usd=Decimal("93"), dvol_now=Decimal(dvol), eur_to_usd=Decimal("1.075"), open_engagement_usd=Decimal(0), other_open_positions=0, ) assert sizing_engine.compute_contracts(ctx, default_cfg).n_contracts == expected_n ``` **`test_exit_decision.py`**: ogni branch dell'ordine di valutazione deve avere almeno un test. ## Property tests (`tests/unit/test_*_properties.py`) Usiamo `hypothesis` per le invarianti: ```python @given( capital=decimals(min_value=720, max_value=200_000), dvol=decimals(min_value=20, max_value=89), max_loss=decimals(min_value=50, max_value=300), ) def test_sizing_never_exceeds_cap_eur(capital, dvol, max_loss, default_cfg): """Invariante: il rischio totale non eccede mai il cap EUR.""" ctx = SizingContext(capital_usd=capital, dvol_now=dvol, ...) result = sizing_engine.compute_contracts(ctx, default_cfg) cap_usd = Decimal(200) * default_cfg.eur_to_usd assert result.risk_dollars <= cap_usd ``` Property tests obbligatori: - Sizing: rischio ≤ cap; n_contracts ≥ 0; mai > 4. - Exit decision: ordine dei trigger rispettato (CLOSE_PROFIT prima di CLOSE_DELTA, ecc.). - Combo builder: short_strike < long_strike per bear_call, > per bull_put. ## Integration tests (`tests/integration/`) Testano l'interazione tra `core/` + `clients/` + `state/` con MCP **fake** (in-memory). ### Fake MCP ```python class FakeDeribit(McpClient): def __init__(self, scenario: dict): ... async def index_price(self, asset): return Decimal(self._scenario["spot"]) async def dvol(self): return Decimal(self._scenario["dvol"]) # ... ``` I fake sono guidati da uno scenario YAML: ```yaml # tests/fixtures/scenarios/happy_path.yaml spot: 2330 dvol: 42 funding_perp: 0.05 funding_cross: 0.04 macro_calendar: [] chain: - {instrument: ETH-13MAY26-1900-P, strike: 1900, delta: -0.12, mid: 0.0048, ...} - ... ``` ### Cosa coprono | Test | Scenario | |---|---| | `test_weekly_open_happy_path` | Tutto OK → proposta inviata | | `test_weekly_open_no_strike_available` | Chain vuota nel range delta | | `test_weekly_open_macro_blocks` | FOMC entro 5 giorni | | `test_monitor_profit_take` | Mark = 50% credito → close_profit | | `test_monitor_vol_stop` | DVOL +12 → close_vol | | `test_recovery_after_crash_open_position` | Crash mid-fill, restart, riconcilia | | `test_kill_switch_blocks_new_entries` | Kill switch armed → no proposta | | `test_user_rejection_logs_and_skips` | Adriano dice no → cancelled | | `test_user_timeout_with_revaluation` | Adriano risponde dopo 30 min, slippage > 8% → abort | ## Golden tests (`tests/golden/`) Replay di scenari deterministici end-to-end con tutti gli MCP fake. Output (decisioni, log) confrontato byte-per-byte con un golden file checked-in. ``` tests/golden/ ├── 2026-04-27_weekly_open_bull_put.yaml # input snapshot ├── 2026-04-27_weekly_open_bull_put.golden # output atteso └── runner.py ``` Modifica intenzionale di un algoritmo richiede aggiornamento del golden, con commit message che spiega perché. ## Backtest deterministico (replay storico) Un comando dedicato: ```bash cerbero-bite replay --from 2024-01-01 --to 2026-04-25 --capital 1500 --dry-run ``` Carica: - Storia oraria spot ETH (CSV) - Storia DVOL (CSV) - Calendar macro storico (CSV) - Chain opzioni storiche (Deribit API archive, dove possibile) Itera giorno per giorno applicando esattamente le stesse regole. Output: - File CSV con tutte le posizioni, P&L, trigger di uscita - Plot equity curve - Confronto con simulazione Monte Carlo del documento Il replay è **non sostituto** del Monte Carlo ma utile per validare l'engine su dati reali una volta avuti. ## Paper trading (fase di go-live) Pre-live di **3 mesi minimo**: - Engine in `--dry-run` ma con Telegram alert reali (Adriano riceve i segnali ma non li esegue) - Adriano replica manualmente su Deribit testnet alcuni trade per toccare con mano - Confronto giornaliero tra paper P&L e replica reale per misurare slippage realistico Soglie di go-live: - ≥ 30 trade in paper completati con esito coerente con Monte Carlo - 0 incidenti operativi (timeout, stato incoerente, hash chain rotto) - Win rate paper ≥ 70% - Adriano firma esplicitamente l'autorizzazione su Telegram (logga in audit chain) ## Validazione live (fase iniziale) Primi 30 giorni live: - Cap dimezzato: 100 EUR per trade, 500 EUR engagement totale - Monitoraggio quotidiano via daily digest - Review settimanale con Milito (in chat) per anomalie - Promozione a cap pieno (200 / 1.000 EUR) solo dopo 10 trade reali conclusi entro range atteso ## Linting e static analysis Ogni PR: - `ruff check` — passing - `ruff format --check` — passing - `mypy --strict src/` — passing - `pytest --cov` — coverage soglie rispettate - `bandit -r src/` — no security warning Nessun merge se uno dei check fallisce. ## CI Anche locale (no GitHub remoto al momento). Pre-commit hook esegue unit + integration in < 30 sec. Pre-push esegue golden suite (~3 min). ```yaml # .pre-commit-config.yaml repos: - repo: local hooks: - id: ruff - id: ruff-format - id: mypy - id: pytest-fast entry: uv run pytest tests/unit tests/integration -x stages: [commit] - id: pytest-golden entry: uv run pytest tests/golden -x stages: [push] ```