docs: aggiorna README con architettura completa + esito Phase 1

- Stato Phase 1 completata (5/5 hard gate passati). - Link a decision memo + technical report. - Architettura modulare aggiornata (cerbero_ohlcv invece di ccxt, JSON parser, fitness v1 continua, dashboard aquarium). - Variabili .env corrette (no ANTHROPIC_API_KEY, modelli per tier). - Costi tipici reali ($0.07 per run, $0.19 Phase 1 totale). - Cerbero MCP setup aggiornato (uv run cerbero-mcp, port 9001). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 23:20:42 +02:00
parent 943aa38cf2
commit 690da30272
1 changed files with 147 additions and 15 deletions
@@ -1,33 +1,165 @@
-# Multi_Swarm_Coevolutive — Phase 1
+# Multi_Swarm_Coevolutive

-Lean spike del PoC. Vedi `docs/superpowers/specs/2026-05-09-decisione-strategica-design.md`
-per il razionale e `docs/superpowers/plans/2026-05-09-phase1-lean-spike.md` per il
-piano implementativo.
+Proof-of-concept di sistema co-evolutivo multi-agente per trading quantitativo. Un genetic algorithm fa evolvere una popolazione di agenti LLM (Hypothesis swarm) che generano strategie di trading espresse in JSON strutturato; un layer Falsification deterministico le backtesta su dati storici BTC-PERPETUAL via Cerbero MCP; un layer Adversarial euristico le sottopone a red-team checks; la fitness combina Deflated Sharpe Ratio (Bailey & López 2014), Sharpe normalizzato e penalizzazione di drawdown. Il tutto è ispirato alla filosofia di Renaissance Technologies adattata a un contesto retail single-author con LLM agents.
+
+## Stato del progetto
+
+**Phase 1 (lean spike) completata** il 10 maggio 2026 con tutti i 5 hard gate passati (loop convergence, parse success 100%, top-5 ratio 1116x, entropy 0.914, costo $0.069 vs cap $700). Decisione strategica: **GO Phase 2** con tre aggiustamenti (Adversarial soglie più strette, speciation, walk-forward 70/30).
+
+Documenti chiave:
+
+- [Decisione strategica](docs/superpowers/specs/2026-05-09-decisione-strategica-design.md) — perché Phase 1 prima, Phase 2 poi, Phase 3 forward-test.
+- [Piano implementativo Phase 1](docs/superpowers/plans/2026-05-09-phase1-lean-spike.md) — 38 task TDD-driven.
+- [Decision memo gate Phase 1](docs/decisions/2026-05-10-gate-phase1.md) — valutazione formale dei 5 hard gate.
+- [Technical report Phase 1](docs/reports/2026-05-10-phase1-technical-report.md) — risultati, ispezione top genomi, threats to validity.
+
+Documenti di contesto pre-implementazione:
+
+- `00_documento_zero.md` — framework concettuale (Renaissance → swarm co-evolutivo LLM).
+- `coevolutive_swarm_system.md` — design Filone A (sistema completo, 12-18 mesi).
+- `poc_trading_swarm.md` — design Filone B (PoC trading, fonte di Phase 1).
+
+## Architettura
+
+```
+src/multi_swarm/
+├── config.py                Settings Pydantic (.env)
+├── data/
+│   ├── cerbero_ohlcv.py     OHLCV loader via Cerbero MCP + cache parquet
+│   └── splits.py            Walk-forward expanding splits
+├── backtest/
+│   ├── orders.py            Side/Order/Position/Trade
+│   └── engine.py            Event-driven backtest, 1-bar exec delay
+├── metrics/
+│   ├── basic.py             Sharpe, max drawdown, total return
+│   └── dsr.py               Deflated Sharpe Ratio (Bailey & López 2014)
+├── cerbero/
+│   ├── client.py            HTTP client (bearer + bot-tag + retry tenacity)
+│   └── tools.py             Wrapper tool MCP (sma/rsi/atr/macd/realized_vol/funding)
+├── protocol/
+│   ├── grammar.py           Vocabolario operatori, indicatori, feature
+│   ├── parser.py            json.loads → AST dataclass tipizzato
+│   ├── validator.py         Arity checks, no-nesting indicators, whitelist
+│   └── compiler.py          AST → Callable[[df], Series[Side]]
+├── genome/
+│   ├── hypothesis.py        HypothesisAgentGenome (id deterministico)
+│   ├── mutation.py          4 operatori (temp, lookback, features, style)
+│   └── crossover.py         Uniform crossover
+├── llm/
+│   ├── client.py            Unified LLMClient via OpenRouter (tier S/A/B/C/D)
+│   └── cost_tracker.py      Pricing per tier, breakdown
+├── agents/
+│   ├── hypothesis.py        LLM call + JSON extract + retry-with-feedback
+│   ├── falsification.py     Compile → backtest → DSR
+│   ├── adversarial.py       Red-team heuristics (no_trades/degenerate/over/under)
+│   └── market_summary.py    Stats di mercato per il prompt
+├── ga/
+│   ├── selection.py         Tournament + elitism
+│   ├── fitness.py           v1 continua: dsr + tanh(sharpe) × penalty(dd)
+│   ├── loop.py              next_generation step
+│   ├── summary.py           median/max/p90/entropy per gen
+│   └── initial.py           Popolazione iniziale (6 cognitive style)
+├── persistence/
+│   ├── schema.py            SQLite DDL: 6 tabelle + 3 indici
+│   └── repository.py        CRUD per runs/genomes/evals/cost/findings/gen_summary
+├── orchestrator/
+│   └── run.py               End-to-end pipeline + persistence
+└── dashboard/
+    ├── streamlit_app.py     Hub multipage
+    ├── data.py              Lettura runs.db per le pagine
+    ├── aquarium.py          Helper canvas HTML5 (fish data + JS template)
+    └── pages/
+        ├── 01_overview.py       Run + metriche aggregate
+        ├── 02_ga_convergence.py Fitness convergence + entropy plot
+        ├── 03_genomes.py        Top-10 + ispezione system_prompt
+        └── 04_aquarium.py       Acquario 2D con click → info + lineage
+```
+
+Stack: Python 3.13, uv, pytest+pytest-mock+responses, openai SDK (verso OpenRouter), requests+tenacity, pandas+numpy+scipy, sqlmodel+sqlite, streamlit+plotly.

 ## Setup

 ```bash
 uv sync
-cp .env.example .env  # compilare token e API key
-uv run pytest         # verifica che tutto installi
+cp .env.example .env  # compilare CERBERO_*_TOKEN e OPENROUTER_API_KEY
+uv run pytest         # verifica che tutto installi (141 test attesi)
 ```

-## Cerbero locale
+### Variabili .env richieste

-Phase 1 backtest legge dataset OHLCV cached, ma alcune feature di indicatore
-sono delegate a Cerbero. Avviare Cerbero locale prima di eseguire un run:
+```bash
+# Cerbero MCP (locale o VPS https://cerbero-mcp.tielogic.xyz)
+CERBERO_BASE_URL=http://localhost:9001
+CERBERO_TESTNET_TOKEN=<testnet bearer>
+CERBERO_MAINNET_TOKEN=<mainnet bearer>   # serve per dati storici reali
+CERBERO_BOT_TAG=swarm-poc-phase1
+
+# LLM provider (unico endpoint via OpenRouter)
+OPENROUTER_API_KEY=<sk-or-v1-...>
+OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
+
+# Modelli per tier (override dei default se serve)
+LLM_MODEL_TIER_S=anthropic/claude-opus-4-7
+LLM_MODEL_TIER_A=anthropic/claude-sonnet-4-6
+LLM_MODEL_TIER_B=anthropic/claude-sonnet-4-6
+LLM_MODEL_TIER_C=qwen/qwen-2.5-72b-instruct
+LLM_MODEL_TIER_D=meta-llama/llama-3.3-70b-instruct
+```
+
+### Cerbero MCP
+
+Phase 1 fetcha OHLCV via Cerbero MCP (sostituisce ccxt). Avviare Cerbero locale prima di un run reale:

 ```bash
 cd /home/adriano/Documenti/Git_XYZ/CerberoSuite/Cerbero_mcp
-docker compose up -d
+uv sync
+uv run cerbero-mcp   # ascolta su porta da .env (default 9001 se 9000 è occupato)
 ```

+In alternativa usare il VPS esistente `https://cerbero-mcp.tielogic.xyz` (richiede bearer).
+
 ## Comandi principali

 ```bash
-uv run pytest                                # tutti i test
+# Quality gates
+uv run pytest                       # tutti i test (141 PASSED attesi)
 uv run pytest tests/unit -v         # solo unit
-uv run pytest tests/integration -v -m integration  # solo integration
-uv run python scripts/run_phase1.py          # run completo Phase 1
-uv run streamlit run src/multi_swarm/dashboard/streamlit_app.py
+uv run pytest tests/integration -v  # solo integration
+uv run ruff check src/ tests/ scripts/
+uv run mypy src/ scripts/
+
+# Smoke run (MockLLM + OHLCV sintetico, no API calls)
+uv run python scripts/smoke_run.py
+
+# Run reale Phase 1 (Cerbero + OpenRouter, ~$0.07 per run K=20 10gen)
+uv run python scripts/run_phase1.py \
+  --name phase1-run-XXX \
+  --exchange deribit --symbol BTC-PERPETUAL --timeframe 1h \
+  --start 2024-01-01T00:00:00+00:00 \
+  --end 2026-01-01T00:00:00+00:00 \
+  --population-size 20 --n-generations 10
+
+# Dashboard
+DB_PATH=./runs.db uv run streamlit run src/multi_swarm/dashboard/streamlit_app.py
 ```
+
+## Dashboard
+
+Streamlit multipage su `http://localhost:8501` (override con `--server.port`):
+
+- **Overview**: lista runs, status, costo, metriche aggregate evaluations (parse success %, top fitness, median).
+- **GA Convergence**: fitness median/max/p90 per generazione, entropy con hline a soglia gate (0.5).
+- **Genomes**: top-10 ordinati per fitness, click su row per ispezione system_prompt + raw_text JSON strategy.
+- **Aquarium**: visualizzazione 2D canvas HTML5 con un pesce per agente; dimensione ∝ fitness, colore per cognitive_style, halo sui top-3, click su pesce → panel info completo + lineage BFS (parents → grandparents → ...).
+
+## Costi tipici Phase 1
+
+Tier C (qwen-2.5-72b via OpenRouter): ~$0.40/1M token. Run K=20 × 10gen ≈ $0.07. Phase 1 totale (5 run incluse iterazioni bug-fix): $0.19.
+
+Per Phase 2 con tier mix B/C (Sonnet 4.6 = $3/$15 input/output) stima: $3-15 per ablation completa.
+
+## Sviluppo
+
+Conventional commits con prefix `feat:` `fix:` `chore:` `docs:` `refactor:` `test:`. Body italiano. Footer `Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>` su ogni commit collaborativo.
+
+Branch attuale: `main`. Nessun feature branch in Phase 1 (single author, lean spike). Phase 2 valuterà feature branch per ablation paralleli.