From c46525805be7f6b891237ed01f3d227cba01b37a Mon Sep 17 00:00:00 2001 From: AdrianoDev Date: Sat, 9 May 2026 18:45:09 +0200 Subject: [PATCH] docs(plans): Phase 1 lean spike implementation plan Plan TDD a 38 task per la Phase 1 del PoC Multi-Swarm Coevolutivo: project skeleton, data layer (OHLCV via ccxt + walk-forward), backtest engine, metrics (Sharpe + Deflated Sharpe Ratio), Cerbero wrapper, protocollo S-expression (parser/validator/compiler), genome + mutation/crossover, LLM client (OpenRouter Qwen + Anthropic Sonnet), cost tracker, agents (hypothesis LLM-driven, falsification e adversarial hand-crafted), GA (fitness/selection/loop/summary), persistence SQLite, orchestrator end-to-end, dashboard Streamlit (Overview/GA Convergence/Genomes), scripts CLI, smoke run, run reale, gate decision memo + report tecnico. Plan generato tramite skill writing-plans dopo decisione strategica B3 (spec docs/superpowers/specs/2026-05-09-decisione-strategica-design.md). Co-Authored-By: Claude Opus 4.7 (1M context) --- .../plans/2026-05-09-phase1-lean-spike.md | 5282 +++++++++++++++++ 1 file changed, 5282 insertions(+) create mode 100644 docs/superpowers/plans/2026-05-09-phase1-lean-spike.md diff --git a/docs/superpowers/plans/2026-05-09-phase1-lean-spike.md b/docs/superpowers/plans/2026-05-09-phase1-lean-spike.md new file mode 100644 index 0000000..7bad9c7 --- /dev/null +++ b/docs/superpowers/plans/2026-05-09-phase1-lean-spike.md @@ -0,0 +1,5282 @@ +# Phase 1 — Lean Spike Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Costruire il loop end-to-end del PoC Multi-Swarm Coevolutivo (Hypothesis swarm K=20 + Falsification + Adversarial hand-crafted, GA con tournament selection, backtest event-driven, fitness v0 DSR) e validare i 5 hard gate di Phase 1 definiti nello spec. + +**Architecture:** Python single-package `multi_swarm` con submoduli per responsabilità (data, backtest, metrics, cerbero, protocol, genome, llm, agents, ga, persistence, orchestrator, dashboard). Esecuzione sincrona single-thread, persistence SQLite, dataset cached in Parquet, GUI Streamlit multipage. Niente parallelismo in Phase 1 — performance non è obiettivo, validazione del loop sì. + +**Tech Stack:** Python 3.13 + uv; pytest+pytest-mock+responses per testing; ccxt per OHLCV; pydantic v2 per config; sqlite3+sqlmodel per persistence; sexpdata per S-expression parsing; pandas+numpy+scipy per analytics; anthropic + openai SDK (OpenAI SDK punta a OpenRouter per tier C); streamlit + plotly per dashboard. + +**Spec di riferimento:** `docs/superpowers/specs/2026-05-09-decisione-strategica-design.md` (sezione 4). + +**Convenzioni:** +- TDD su tutto il codice di logica. Test prima, implementazione minima, refactoring. +- Commit frequenti, uno per task completato (a volte uno per step se ha senso). +- Branch: `main`. Niente feature branch in Phase 1, troppo overhead per PoC singolo autore. +- Commit message: `feat:` `test:` `chore:` `fix:` `docs:` `refactor:` prefix. +- Nessun mock di Cerbero in test integrazione: usare istanza locale Docker (testnet token). +- Nessun mock di LLM in test e2e: chiamate reali a Qwen via OpenRouter, ma con popolazione 5 e generazioni 2 per contenere costi. + +--- + +## Task 1: Project skeleton e tooling + +**Files:** +- Create: `pyproject.toml` +- Create: `.env.example` +- Create: `README.md` +- Create: `src/multi_swarm/__init__.py` +- Create: `tests/__init__.py` + +- [ ] **Step 1: Creare `pyproject.toml`** + +```toml +[project] +name = "multi-swarm" +version = "0.1.0" +description = "Multi-Swarm Coevolutive PoC trading swarm — Phase 1 lean spike" +authors = [{ name = "Adriano Dal Pastro", email = "adrianodalpastro@tielogic.com" }] +requires-python = ">=3.13" +dependencies = [ + "ccxt>=4.4", + "pandas>=2.2", + "numpy>=2.1", + "scipy>=1.14", + "pydantic>=2.9", + "pydantic-settings>=2.6", + "sqlmodel>=0.0.22", + "sexpdata>=1.0.2", + "anthropic>=0.39", + "openai>=1.55", + "httpx>=0.28", + "tenacity>=9.0", + "pyyaml>=6.0", + "streamlit>=1.40", + "plotly>=5.24", + "pyarrow>=18.0", +] + +[dependency-groups] +dev = [ + "pytest>=8.3", + "pytest-mock>=3.14", + "pytest-asyncio>=0.24", + "responses>=0.25", + "ruff>=0.7", + "mypy>=1.13", +] + +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[tool.hatch.build.targets.wheel] +packages = ["src/multi_swarm"] + +[tool.ruff] +line-length = 100 +target-version = "py313" + +[tool.ruff.lint] +select = ["E", "F", "W", "I", "N", "UP", "B", "RUF"] + +[tool.mypy] +python_version = "3.13" +strict = true + +[tool.pytest.ini_options] +testpaths = ["tests"] +addopts = "-v --tb=short" +markers = [ + "integration: tests that require external services (Cerbero, LLM API)", + "slow: tests that take more than 5 seconds", +] +``` + +- [ ] **Step 2: Creare `.env.example`** + +```bash +# Cerbero MCP (locale durante Phase 1) +CERBERO_BASE_URL=http://localhost:9000 +CERBERO_TESTNET_TOKEN= +CERBERO_MAINNET_TOKEN= +CERBERO_BOT_TAG=swarm-poc-phase1 + +# LLM providers +OPENROUTER_API_KEY= +ANTHROPIC_API_KEY= + +# Run config +RUN_NAME=phase1-spike-001 +DATA_DIR=./data +SERIES_DIR=./series +DB_PATH=./runs.db +``` + +- [ ] **Step 3: Creare `README.md` minimale** + +```markdown +# Multi_Swarm_Coevolutive — Phase 1 + +Lean spike del PoC. Vedi `docs/superpowers/specs/2026-05-09-decisione-strategica-design.md` +per il razionale e `docs/superpowers/plans/2026-05-09-phase1-lean-spike.md` per il +piano implementativo. + +## Setup + +```bash +uv sync +cp .env.example .env # compilare token e API key +uv run pytest # verifica che tutto installi +``` + +## Cerbero locale + +Phase 1 backtest legge dataset OHLCV cached, ma alcune feature di indicatore +sono delegate a Cerbero. Avviare Cerbero locale prima di eseguire un run: + +```bash +cd /home/adriano/Documenti/Git_XYZ/CerberoSuite/Cerbero_mcp +docker compose up -d +``` + +## Comandi principali + +```bash +uv run pytest # tutti i test +uv run pytest tests/unit -v # solo unit +uv run pytest tests/integration -v -m integration # solo integration +uv run python scripts/run_phase1.py # run completo Phase 1 +uv run streamlit run src/multi_swarm/dashboard/streamlit_app.py +``` +``` + +- [ ] **Step 4: Creare `src/multi_swarm/__init__.py` e `tests/__init__.py`** + +```python +# src/multi_swarm/__init__.py +"""Multi_Swarm_Coevolutive — Phase 1 lean spike.""" + +__version__ = "0.1.0" +``` + +```python +# tests/__init__.py +``` + +- [ ] **Step 5: Sync dipendenze e verifica installazione** + +Run: `uv sync && uv run python -c "import multi_swarm; print(multi_swarm.__version__)"` +Expected: stampa `0.1.0` senza errori. + +- [ ] **Step 6: Commit** + +```bash +git add pyproject.toml .env.example README.md src/multi_swarm/__init__.py tests/__init__.py uv.lock +git commit -m "chore: project skeleton with uv + pyproject + deps" +``` + +--- + +## Task 2: Config loader (Pydantic settings) + +**Files:** +- Create: `src/multi_swarm/config.py` +- Test: `tests/unit/test_config.py` + +- [ ] **Step 1: Scrivere il test fallente** + +```python +# tests/unit/test_config.py +import os +from multi_swarm.config import Settings + + +def test_settings_loads_from_env(monkeypatch): + monkeypatch.setenv("CERBERO_BASE_URL", "http://test:9000") + monkeypatch.setenv("CERBERO_TESTNET_TOKEN", "tok-test") + monkeypatch.setenv("CERBERO_MAINNET_TOKEN", "tok-main") + monkeypatch.setenv("CERBERO_BOT_TAG", "swarm-poc-phase1") + monkeypatch.setenv("OPENROUTER_API_KEY", "or-key") + monkeypatch.setenv("ANTHROPIC_API_KEY", "an-key") + monkeypatch.setenv("RUN_NAME", "test-run") + + s = Settings() + + assert s.cerbero_base_url == "http://test:9000" + assert s.cerbero_testnet_token == "tok-test" + assert s.run_name == "test-run" + assert s.data_dir.name == "data" + assert s.db_path.name == "runs.db" + + +def test_settings_requires_tokens(monkeypatch): + monkeypatch.delenv("CERBERO_TESTNET_TOKEN", raising=False) + monkeypatch.delenv("OPENROUTER_API_KEY", raising=False) + import pytest + from pydantic import ValidationError + + with pytest.raises(ValidationError): + Settings() +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_config.py -v` +Expected: FAIL — `ModuleNotFoundError: multi_swarm.config`. + +- [ ] **Step 3: Implementare `Settings`** + +```python +# src/multi_swarm/config.py +from pathlib import Path +from pydantic import Field, SecretStr +from pydantic_settings import BaseSettings, SettingsConfigDict + + +class Settings(BaseSettings): + model_config = SettingsConfigDict( + env_file=".env", + env_file_encoding="utf-8", + extra="ignore", + case_sensitive=False, + ) + + cerbero_base_url: str = "http://localhost:9000" + cerbero_testnet_token: SecretStr + cerbero_mainnet_token: SecretStr | None = None + cerbero_bot_tag: str = "swarm-poc-phase1" + + openrouter_api_key: SecretStr + anthropic_api_key: SecretStr | None = None + + run_name: str = "phase1-spike-001" + data_dir: Path = Field(default=Path("./data")) + series_dir: Path = Field(default=Path("./series")) + db_path: Path = Field(default=Path("./runs.db")) + + +def load_settings() -> Settings: + return Settings() +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_config.py -v` +Expected: PASS entrambi. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/config.py tests/unit/test_config.py tests/unit/__init__.py +git commit -m "feat(config): pydantic settings loader from .env" +``` + +--- + +## Task 3: OHLCV loader (ccxt → parquet cache) + +**Files:** +- Create: `src/multi_swarm/data/__init__.py` +- Create: `src/multi_swarm/data/ohlcv_loader.py` +- Test: `tests/unit/test_ohlcv_loader.py` + +- [ ] **Step 1: Scrivere test fallente con mock ccxt** + +```python +# tests/unit/test_ohlcv_loader.py +from datetime import datetime, timezone +from pathlib import Path +import pandas as pd +import pytest +from multi_swarm.data.ohlcv_loader import OHLCVLoader, OHLCVRequest + + +@pytest.fixture +def sample_ohlcv_rows(): + base_ts = int(datetime(2024, 1, 1, tzinfo=timezone.utc).timestamp() * 1000) + rows = [] + for i in range(48): + rows.append([base_ts + i * 3600 * 1000, 40000 + i, 40100 + i, 39900 + i, 40050 + i, 100.0 + i]) + return rows + + +def test_loader_fetches_and_caches(tmp_path: Path, mocker, sample_ohlcv_rows): + fake_exchange = mocker.MagicMock() + fake_exchange.fetch_ohlcv.return_value = sample_ohlcv_rows + mocker.patch("multi_swarm.data.ohlcv_loader.ccxt.binance", return_value=fake_exchange) + + loader = OHLCVLoader(cache_dir=tmp_path) + req = OHLCVRequest( + symbol="BTC/USDT", + timeframe="1h", + start=datetime(2024, 1, 1, tzinfo=timezone.utc), + end=datetime(2024, 1, 3, tzinfo=timezone.utc), + ) + df = loader.load(req) + + assert isinstance(df, pd.DataFrame) + assert list(df.columns) == ["open", "high", "low", "close", "volume"] + assert len(df) == 48 + assert df.index.is_monotonic_increasing + cache_files = list(tmp_path.glob("*.parquet")) + assert len(cache_files) == 1 + + +def test_loader_uses_cache_on_second_call(tmp_path: Path, mocker, sample_ohlcv_rows): + fake_exchange = mocker.MagicMock() + fake_exchange.fetch_ohlcv.return_value = sample_ohlcv_rows + mocker.patch("multi_swarm.data.ohlcv_loader.ccxt.binance", return_value=fake_exchange) + + loader = OHLCVLoader(cache_dir=tmp_path) + req = OHLCVRequest( + symbol="BTC/USDT", + timeframe="1h", + start=datetime(2024, 1, 1, tzinfo=timezone.utc), + end=datetime(2024, 1, 3, tzinfo=timezone.utc), + ) + df1 = loader.load(req) + df2 = loader.load(req) + + assert fake_exchange.fetch_ohlcv.call_count == 2 # paginazione interna, non caching + pd.testing.assert_frame_equal(df1, df2) + # Seconda chiamata legge da cache, non chiama exchange + fake_exchange.fetch_ohlcv.reset_mock() + df3 = loader.load(req) + assert fake_exchange.fetch_ohlcv.call_count == 0 + pd.testing.assert_frame_equal(df1, df3) +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_ohlcv_loader.py -v` +Expected: FAIL — modulo non esistente. + +- [ ] **Step 3: Implementare `OHLCVLoader`** + +```python +# src/multi_swarm/data/__init__.py +``` + +```python +# src/multi_swarm/data/ohlcv_loader.py +from __future__ import annotations + +import hashlib +from dataclasses import dataclass +from datetime import datetime, timezone +from pathlib import Path + +import ccxt +import pandas as pd + + +@dataclass(frozen=True) +class OHLCVRequest: + symbol: str + timeframe: str + start: datetime + end: datetime + + def cache_key(self) -> str: + s = f"{self.symbol}|{self.timeframe}|{self.start.isoformat()}|{self.end.isoformat()}" + return hashlib.sha1(s.encode()).hexdigest()[:16] + + +class OHLCVLoader: + """Carica OHLCV via ccxt (Binance) e cachea in parquet.""" + + def __init__(self, cache_dir: Path, exchange_name: str = "binance"): + self.cache_dir = Path(cache_dir) + self.cache_dir.mkdir(parents=True, exist_ok=True) + self.exchange_name = exchange_name + + def load(self, req: OHLCVRequest) -> pd.DataFrame: + cache_file = self.cache_dir / f"{req.cache_key()}.parquet" + if cache_file.exists(): + return pd.read_parquet(cache_file) + + df = self._fetch_paginated(req) + df.to_parquet(cache_file) + return df + + def _fetch_paginated(self, req: OHLCVRequest) -> pd.DataFrame: + exchange = getattr(ccxt, self.exchange_name)({"enableRateLimit": True}) + timeframe_ms = exchange.parse_timeframe(req.timeframe) * 1000 + since = int(req.start.timestamp() * 1000) + end_ms = int(req.end.timestamp() * 1000) + all_rows: list[list[float]] = [] + limit = 1000 + + while since < end_ms: + rows = exchange.fetch_ohlcv(req.symbol, req.timeframe, since=since, limit=limit) + if not rows: + break + all_rows.extend(rows) + last_ts = rows[-1][0] + if last_ts <= since: + break + since = last_ts + timeframe_ms + if len(rows) < limit: + break + + df = pd.DataFrame(all_rows, columns=["ts", "open", "high", "low", "close", "volume"]) + df = df.drop_duplicates(subset=["ts"]).sort_values("ts") + df["ts"] = pd.to_datetime(df["ts"], unit="ms", utc=True) + df = df.set_index("ts") + df = df[(df.index >= req.start) & (df.index < req.end)] + return df[["open", "high", "low", "close", "volume"]].astype("float64") +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_ohlcv_loader.py -v` +Expected: PASS entrambi. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/data/ tests/unit/test_ohlcv_loader.py +git commit -m "feat(data): OHLCV loader via ccxt with parquet cache" +``` + +--- + +## Task 4: Walk-forward expanding splits + +**Files:** +- Create: `src/multi_swarm/data/splits.py` +- Test: `tests/unit/test_splits.py` + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_splits.py +from datetime import datetime, timezone, timedelta +import pandas as pd +import pytest +from multi_swarm.data.splits import expanding_walk_forward, Split + + +@pytest.fixture +def daily_index(): + return pd.date_range("2024-01-01", "2024-12-31", freq="D", tz="UTC") + + +def test_expanding_split_count(daily_index: pd.DatetimeIndex): + splits = expanding_walk_forward( + daily_index, train_ratio=0.7, n_folds=4, min_train_days=30 + ) + assert len(splits) == 4 + + +def test_expanding_split_train_grows(daily_index: pd.DatetimeIndex): + splits = expanding_walk_forward( + daily_index, train_ratio=0.7, n_folds=4, min_train_days=30 + ) + train_lengths = [len(s.train_idx) for s in splits] + assert train_lengths == sorted(train_lengths) + assert train_lengths[0] < train_lengths[-1] + + +def test_no_overlap_train_test(daily_index: pd.DatetimeIndex): + splits = expanding_walk_forward( + daily_index, train_ratio=0.7, n_folds=4, min_train_days=30 + ) + for s in splits: + assert s.train_idx[-1] < s.test_idx[0] + + +def test_min_train_days_respected(): + idx = pd.date_range("2024-01-01", "2024-02-15", freq="D", tz="UTC") + splits = expanding_walk_forward(idx, train_ratio=0.7, n_folds=2, min_train_days=20) + for s in splits: + assert len(s.train_idx) >= 20 +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_splits.py -v` +Expected: FAIL — modulo non esistente. + +- [ ] **Step 3: Implementare splits** + +```python +# src/multi_swarm/data/splits.py +from __future__ import annotations + +from dataclasses import dataclass + +import pandas as pd + + +@dataclass(frozen=True) +class Split: + fold: int + train_idx: pd.DatetimeIndex + test_idx: pd.DatetimeIndex + + +def expanding_walk_forward( + index: pd.DatetimeIndex, + train_ratio: float = 0.7, + n_folds: int = 4, + min_train_days: int = 30, +) -> list[Split]: + """Genera split walk-forward expanding: train cresce, test è la finestra successiva. + + Esempio con n_folds=4, train_ratio=0.7: + fold 0: train [0..a0], test [a0..a0+(end-a0)/4] + fold 1: train [0..a1], test [a1..a1+(end-a1)/4] + ... + Il train iniziale parte da train_ratio dell'intervallo totale. + """ + if n_folds < 1: + raise ValueError("n_folds must be >= 1") + if not 0 < train_ratio < 1: + raise ValueError("train_ratio must be in (0,1)") + + total = len(index) + initial_train = int(total * train_ratio) + remaining = total - initial_train + fold_size = max(1, remaining // n_folds) + + splits: list[Split] = [] + for f in range(n_folds): + train_end = initial_train + f * fold_size + test_start = train_end + test_end = min(test_start + fold_size, total) + train_idx = index[:train_end] + test_idx = index[test_start:test_end] + if len(train_idx) < min_train_days or len(test_idx) == 0: + continue + splits.append(Split(fold=f, train_idx=train_idx, test_idx=test_idx)) + + return splits +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_splits.py -v` +Expected: PASS tutti e 4. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/data/splits.py tests/unit/test_splits.py +git commit -m "feat(data): expanding walk-forward splits" +``` + +--- + +## Task 5: Backtest core dataclasses (Order, Position, Trade) + +**Files:** +- Create: `src/multi_swarm/backtest/__init__.py` +- Create: `src/multi_swarm/backtest/orders.py` +- Test: `tests/unit/test_backtest_orders.py` + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_backtest_orders.py +from datetime import datetime, timezone +import pytest +from multi_swarm.backtest.orders import Order, Side, Position, Trade + + +def test_order_validates_side(): + o = Order(ts=datetime(2024, 1, 1, tzinfo=timezone.utc), side=Side.LONG, size=1.0) + assert o.side == Side.LONG + + +def test_position_pnl_long(): + pos = Position(side=Side.LONG, entry_price=100.0, size=2.0) + assert pos.unrealized_pnl(110.0) == pytest.approx(20.0) + assert pos.unrealized_pnl(90.0) == pytest.approx(-20.0) + + +def test_position_pnl_short(): + pos = Position(side=Side.SHORT, entry_price=100.0, size=2.0) + assert pos.unrealized_pnl(110.0) == pytest.approx(-20.0) + assert pos.unrealized_pnl(90.0) == pytest.approx(20.0) + + +def test_trade_realized_pnl_with_fees(): + t = Trade( + entry_ts=datetime(2024, 1, 1, tzinfo=timezone.utc), + exit_ts=datetime(2024, 1, 2, tzinfo=timezone.utc), + side=Side.LONG, + size=1.0, + entry_price=100.0, + exit_price=110.0, + fees_bp=5.0, + ) + # gross 10, fees = 5bp * (100+110) = 0.005 * 210 = 1.05 + assert t.gross_pnl == pytest.approx(10.0) + assert t.fees == pytest.approx(0.105) + assert t.net_pnl == pytest.approx(9.895) +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_backtest_orders.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare orders** + +```python +# src/multi_swarm/backtest/__init__.py +``` + +```python +# src/multi_swarm/backtest/orders.py +from __future__ import annotations + +from dataclasses import dataclass +from datetime import datetime +from enum import Enum + + +class Side(str, Enum): + LONG = "long" + SHORT = "short" + FLAT = "flat" + + +@dataclass(frozen=True) +class Order: + ts: datetime + side: Side + size: float + + +@dataclass(frozen=True) +class Position: + side: Side + entry_price: float + size: float + + def unrealized_pnl(self, current_price: float) -> float: + if self.side == Side.LONG: + return (current_price - self.entry_price) * self.size + if self.side == Side.SHORT: + return (self.entry_price - current_price) * self.size + return 0.0 + + +@dataclass(frozen=True) +class Trade: + entry_ts: datetime + exit_ts: datetime + side: Side + size: float + entry_price: float + exit_price: float + fees_bp: float = 5.0 + + @property + def gross_pnl(self) -> float: + if self.side == Side.LONG: + return (self.exit_price - self.entry_price) * self.size + return (self.entry_price - self.exit_price) * self.size + + @property + def fees(self) -> float: + notional_in = self.entry_price * self.size + notional_out = self.exit_price * self.size + return (self.fees_bp / 10000.0) * (notional_in + notional_out) + + @property + def net_pnl(self) -> float: + return self.gross_pnl - self.fees +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_backtest_orders.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/backtest/ tests/unit/test_backtest_orders.py +git commit -m "feat(backtest): Order/Position/Trade dataclasses with fees" +``` + +--- + +## Task 6: Backtest engine event-driven semplificato + +**Files:** +- Create: `src/multi_swarm/backtest/engine.py` +- Test: `tests/unit/test_backtest_engine.py` + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_backtest_engine.py +from datetime import datetime, timezone +import numpy as np +import pandas as pd +import pytest +from multi_swarm.backtest.engine import BacktestEngine, Signal +from multi_swarm.backtest.orders import Side + + +@pytest.fixture +def trending_ohlcv(): + idx = pd.date_range("2024-01-01", periods=100, freq="1h", tz="UTC") + close = np.linspace(100, 120, 100) + df = pd.DataFrame( + {"open": close, "high": close + 0.5, "low": close - 0.5, "close": close, "volume": 1.0}, + index=idx, + ) + return df + + +def test_engine_no_signals_zero_pnl(trending_ohlcv): + signals = pd.Series([Side.FLAT] * len(trending_ohlcv), index=trending_ohlcv.index) + engine = BacktestEngine(fees_bp=5.0) + result = engine.run(trending_ohlcv, signals) + assert result.equity_curve.iloc[-1] == pytest.approx(0.0) + assert len(result.trades) == 0 + + +def test_engine_long_in_uptrend_makes_profit(trending_ohlcv): + signals = pd.Series([Side.LONG] * len(trending_ohlcv), index=trending_ohlcv.index) + engine = BacktestEngine(fees_bp=5.0) + result = engine.run(trending_ohlcv, signals) + assert result.equity_curve.iloc[-1] > 0 + assert len(result.trades) == 1 + assert result.trades[0].side == Side.LONG + + +def test_engine_position_flips_on_side_change(trending_ohlcv): + half = len(trending_ohlcv) // 2 + signals = pd.Series( + [Side.LONG] * half + [Side.SHORT] * (len(trending_ohlcv) - half), + index=trending_ohlcv.index, + ) + engine = BacktestEngine(fees_bp=5.0) + result = engine.run(trending_ohlcv, signals) + assert len(result.trades) == 2 + assert result.trades[0].side == Side.LONG + assert result.trades[1].side == Side.SHORT + + +def test_engine_fees_are_subtracted(trending_ohlcv): + signals = pd.Series([Side.LONG] * len(trending_ohlcv), index=trending_ohlcv.index) + engine_no_fees = BacktestEngine(fees_bp=0.0) + engine_fees = BacktestEngine(fees_bp=10.0) + r1 = engine_no_fees.run(trending_ohlcv, signals) + r2 = engine_fees.run(trending_ohlcv, signals) + assert r1.equity_curve.iloc[-1] > r2.equity_curve.iloc[-1] +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_backtest_engine.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare engine** + +```python +# src/multi_swarm/backtest/engine.py +from __future__ import annotations + +from dataclasses import dataclass +from typing import Literal + +import pandas as pd + +from .orders import Position, Side, Trade + + +Signal = Side # alias semantico + + +@dataclass(frozen=True) +class BacktestResult: + equity_curve: pd.Series + returns: pd.Series + trades: list[Trade] + + +class BacktestEngine: + """Engine event-driven sincrono: itera bar per bar, applica segnali con + delay di 1 bar (segnale a t → eseguito a t+1 open) per evitare lookahead. + + Position sizing: 1 unit per posizione. Fees applicati su entry+exit. + Niente leva, niente liquidation, niente funding (semplificazione Phase 1). + """ + + def __init__(self, fees_bp: float = 5.0): + self.fees_bp = fees_bp + + def run(self, ohlcv: pd.DataFrame, signals: pd.Series) -> BacktestResult: + signals = signals.reindex(ohlcv.index).ffill().fillna(Side.FLAT) + position: Position | None = None + trades: list[Trade] = [] + equity = 0.0 + equity_history: list[float] = [] + returns_history: list[float] = [] + prev_equity = 0.0 + + # Esecuzione con delay 1: segnale a t-1 esegue a open di t. + executed_side = pd.Series(Side.FLAT, index=ohlcv.index) + executed_side.iloc[1:] = signals.iloc[:-1].values + + for ts, row in ohlcv.iterrows(): + target_side = executed_side.loc[ts] + current_side = position.side if position else Side.FLAT + + if target_side != current_side: + if position is not None: + trade = Trade( + entry_ts=position_entry_ts, + exit_ts=ts, + side=position.side, + size=position.size, + entry_price=position.entry_price, + exit_price=row["open"], + fees_bp=self.fees_bp, + ) + trades.append(trade) + equity += trade.net_pnl + position = None + if target_side in (Side.LONG, Side.SHORT): + position = Position(side=target_side, entry_price=row["open"], size=1.0) + position_entry_ts = ts + + mark = row["close"] + mtm = position.unrealized_pnl(mark) if position else 0.0 + current_equity = equity + mtm + equity_history.append(current_equity) + returns_history.append(current_equity - prev_equity) + prev_equity = current_equity + + if position is not None: + last_ts = ohlcv.index[-1] + last_close = ohlcv["close"].iloc[-1] + trade = Trade( + entry_ts=position_entry_ts, + exit_ts=last_ts, + side=position.side, + size=position.size, + entry_price=position.entry_price, + exit_price=last_close, + fees_bp=self.fees_bp, + ) + trades.append(trade) + equity += trade.net_pnl + equity_history[-1] = equity + if len(returns_history) >= 2: + returns_history[-1] = equity - equity_history[-2] + + return BacktestResult( + equity_curve=pd.Series(equity_history, index=ohlcv.index, name="equity"), + returns=pd.Series(returns_history, index=ohlcv.index, name="returns"), + trades=trades, + ) +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_backtest_engine.py -v` +Expected: PASS tutti e 4. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/backtest/engine.py tests/unit/test_backtest_engine.py +git commit -m "feat(backtest): event-driven engine with 1-bar exec delay" +``` + +--- + +## Task 7: Metrics base (Sharpe, drawdown, returns) + +**Files:** +- Create: `src/multi_swarm/metrics/__init__.py` +- Create: `src/multi_swarm/metrics/basic.py` +- Test: `tests/unit/test_metrics_basic.py` + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_metrics_basic.py +import numpy as np +import pandas as pd +import pytest +from multi_swarm.metrics.basic import sharpe_ratio, max_drawdown, total_return + + +def test_sharpe_zero_returns(): + r = pd.Series([0.0] * 100) + assert sharpe_ratio(r, periods_per_year=8760) == 0.0 + + +def test_sharpe_positive_returns(): + np.random.seed(42) + r = pd.Series(np.random.normal(0.001, 0.01, 1000)) + s = sharpe_ratio(r, periods_per_year=8760) + assert s > 0 + + +def test_sharpe_negative_returns(): + np.random.seed(42) + r = pd.Series(np.random.normal(-0.001, 0.01, 1000)) + s = sharpe_ratio(r, periods_per_year=8760) + assert s < 0 + + +def test_max_drawdown_monotonic_up(): + eq = pd.Series([100.0, 105.0, 110.0, 115.0, 120.0]) + assert max_drawdown(eq) == pytest.approx(0.0) + + +def test_max_drawdown_known_curve(): + eq = pd.Series([100.0, 110.0, 90.0, 95.0, 105.0]) + # peak 110, trough 90, drawdown = (110-90)/110 ≈ 0.1818 + assert max_drawdown(eq) == pytest.approx(20.0 / 110.0) + + +def test_total_return(): + eq = pd.Series([100.0, 110.0, 105.0, 120.0]) + assert total_return(eq) == pytest.approx(0.20) +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_metrics_basic.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare metrics base** + +```python +# src/multi_swarm/metrics/__init__.py +``` + +```python +# src/multi_swarm/metrics/basic.py +from __future__ import annotations + +import numpy as np +import pandas as pd + + +def sharpe_ratio(returns: pd.Series, periods_per_year: int = 8760, rf: float = 0.0) -> float: + """Sharpe annualizzato. periods_per_year=8760 per dati orari.""" + excess = returns - rf / periods_per_year + std = excess.std(ddof=1) + if std == 0 or np.isnan(std): + return 0.0 + return float(np.sqrt(periods_per_year) * excess.mean() / std) + + +def max_drawdown(equity: pd.Series) -> float: + """Max drawdown percentuale (positivo).""" + peak = equity.cummax() + dd = (peak - equity) / peak.replace(0, np.nan) + dd = dd.fillna(0.0) + return float(dd.max()) + + +def total_return(equity: pd.Series) -> float: + if equity.iloc[0] == 0: + return float(equity.iloc[-1]) + return float(equity.iloc[-1] / equity.iloc[0] - 1.0) +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_metrics_basic.py -v` +Expected: PASS tutti e 6. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/metrics/ tests/unit/test_metrics_basic.py +git commit -m "feat(metrics): Sharpe + max drawdown + total return" +``` + +--- + +## Task 8: Deflated Sharpe Ratio (Bailey & López de Prado) + +**Files:** +- Create: `src/multi_swarm/metrics/dsr.py` +- Test: `tests/unit/test_metrics_dsr.py` + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_metrics_dsr.py +import numpy as np +import pandas as pd +import pytest +from multi_swarm.metrics.dsr import deflated_sharpe_ratio, expected_max_sharpe + + +def test_expected_max_sharpe_grows_with_n_trials(): + e1 = expected_max_sharpe(n_trials=1, sharpe_var=1.0) + e10 = expected_max_sharpe(n_trials=10, sharpe_var=1.0) + e100 = expected_max_sharpe(n_trials=100, sharpe_var=1.0) + assert e1 < e10 < e100 + + +def test_dsr_zero_when_sharpe_equals_expected_max(): + np.random.seed(0) + returns = pd.Series(np.random.normal(0, 0.01, 500)) + dsr, p = deflated_sharpe_ratio( + returns, n_trials=10, periods_per_year=8760, sharpe_var=0.0 + ) + # Con sharpe_var=0 e Sharpe stimato vicino a 0, p-value deve essere alto. + assert 0.0 <= p <= 1.0 + + +def test_dsr_significant_for_strong_sharpe(): + np.random.seed(42) + returns = pd.Series(np.random.normal(0.005, 0.005, 1000)) + dsr, p = deflated_sharpe_ratio( + returns, n_trials=5, periods_per_year=8760, sharpe_var=1.0 + ) + # Sharpe atteso > 0 e p-value basso + assert dsr > 0 + assert p < 0.5 +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_metrics_dsr.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare DSR** + +```python +# src/multi_swarm/metrics/dsr.py +from __future__ import annotations + +import numpy as np +import pandas as pd +from scipy import stats + +from .basic import sharpe_ratio + + +EULER_MASCHERONI = 0.5772156649015329 + + +def expected_max_sharpe(n_trials: int, sharpe_var: float) -> float: + """E[max SR] su n_trials con varianza sharpe_var (Bailey & Lopez de Prado). + + Formula: sqrt(sharpe_var) * ((1-γ) * Φ⁻¹(1 - 1/N) + γ * Φ⁻¹(1 - 1/(N·e))) + dove γ è la costante di Eulero-Mascheroni. + """ + if n_trials < 2: + return 0.0 + e = np.e + z1 = stats.norm.ppf(1 - 1.0 / n_trials) + z2 = stats.norm.ppf(1 - 1.0 / (n_trials * e)) + return float(np.sqrt(sharpe_var) * ((1 - EULER_MASCHERONI) * z1 + EULER_MASCHERONI * z2)) + + +def deflated_sharpe_ratio( + returns: pd.Series, + n_trials: int, + periods_per_year: int = 8760, + sharpe_var: float = 1.0, + skewness: float | None = None, + kurtosis: float | None = None, +) -> tuple[float, float]: + """Deflated Sharpe Ratio (DSR) e p-value associato. + + Restituisce (DSR, p_value). p_value è la prob. che lo SR osservato sia + superiore al massimo atteso sotto null. p_value bassi (es. < 0.05) + indicano significatività dopo correzione per multiple testing. + """ + n = len(returns) + if n < 30: + return 0.0, 1.0 + + sr = sharpe_ratio(returns, periods_per_year=periods_per_year) + sr_period = sr / np.sqrt(periods_per_year) + + if skewness is None: + skewness = float(stats.skew(returns, bias=False)) + if kurtosis is None: + kurtosis = float(stats.kurtosis(returns, fisher=True, bias=False)) + + sr_expected_max = expected_max_sharpe(n_trials, sharpe_var) / np.sqrt(periods_per_year) + + denom = np.sqrt( + max( + (1 - skewness * sr_period + ((kurtosis - 1) / 4.0) * sr_period**2) / (n - 1), + 1e-12, + ) + ) + z = (sr_period - sr_expected_max) / denom + p_value = float(1.0 - stats.norm.cdf(z)) + dsr = float(stats.norm.cdf(z)) + return dsr, p_value +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_metrics_dsr.py -v` +Expected: PASS tutti e 3. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/metrics/dsr.py tests/unit/test_metrics_dsr.py +git commit -m "feat(metrics): Deflated Sharpe Ratio (Bailey & Lopez de Prado)" +``` + +--- + +## Task 9: Cerbero HTTP client + +**Files:** +- Create: `src/multi_swarm/cerbero/__init__.py` +- Create: `src/multi_swarm/cerbero/client.py` +- Test: `tests/unit/test_cerbero_client.py` + +- [ ] **Step 1: Scrivere test fallente con `responses`** + +```python +# tests/unit/test_cerbero_client.py +import responses +from multi_swarm.cerbero.client import CerberoClient + + +@responses.activate +def test_call_tool_passes_bearer_and_bot_tag(): + responses.add( + responses.POST, + "http://test:9000/mcp-deribit/tools/get_iv_rank", + json={"iv_rank": 0.42}, + status=200, + ) + client = CerberoClient(base_url="http://test:9000", token="tok-xyz", bot_tag="swarm-poc-phase1") + result = client.call_tool("deribit", "get_iv_rank", {"symbol": "BTC-PERPETUAL"}) + assert result == {"iv_rank": 0.42} + req = responses.calls[0].request + assert req.headers["Authorization"] == "Bearer tok-xyz" + assert req.headers["X-Bot-Tag"] == "swarm-poc-phase1" + + +@responses.activate +def test_call_tool_raises_on_error(): + responses.add( + responses.POST, + "http://test:9000/mcp-deribit/tools/get_iv_rank", + json={"error": "bad"}, + status=400, + ) + client = CerberoClient(base_url="http://test:9000", token="tok-xyz", bot_tag="swarm-poc-phase1") + import pytest + with pytest.raises(RuntimeError): + client.call_tool("deribit", "get_iv_rank", {}) +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_cerbero_client.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare client** + +```python +# src/multi_swarm/cerbero/__init__.py +``` + +```python +# src/multi_swarm/cerbero/client.py +from __future__ import annotations + +from typing import Any + +import httpx +from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential + + +class CerberoClient: + """Client HTTP minimale verso Cerbero MCP unified server.""" + + def __init__( + self, + base_url: str, + token: str, + bot_tag: str, + timeout_seconds: float = 10.0, + ): + self.base_url = base_url.rstrip("/") + self.token = token + self.bot_tag = bot_tag + self._client = httpx.Client( + timeout=timeout_seconds, + headers={ + "Authorization": f"Bearer {token}", + "X-Bot-Tag": bot_tag, + "Content-Type": "application/json", + }, + ) + + def close(self) -> None: + self._client.close() + + def __enter__(self) -> CerberoClient: + return self + + def __exit__(self, *exc: object) -> None: + self.close() + + @retry( + stop=stop_after_attempt(3), + wait=wait_exponential(multiplier=0.5, min=0.5, max=4.0), + retry=retry_if_exception_type(httpx.TransportError), + reraise=True, + ) + def call_tool(self, exchange: str, tool: str, args: dict[str, Any]) -> Any: + url = f"{self.base_url}/mcp-{exchange}/tools/{tool}" + resp = self._client.post(url, json=args) + if resp.status_code >= 400: + raise RuntimeError(f"Cerbero {exchange}/{tool} returned {resp.status_code}: {resp.text}") + return resp.json() +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_cerbero_client.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/cerbero/ tests/unit/test_cerbero_client.py +git commit -m "feat(cerbero): HTTP client with bearer + bot-tag + retry" +``` + +--- + +## Task 10: Cerbero tools wrapper (indicatori usati da Phase 1) + +**Files:** +- Create: `src/multi_swarm/cerbero/tools.py` +- Test: `tests/unit/test_cerbero_tools.py` + +In Phase 1 gli agenti possono richiedere un sottoinsieme limitato di indicatori: SMA, RSI, ATR, MACD (technical), realized_vol (volatility), funding_rate (microstructure). Il wrapper espone una funzione Python per ognuno, mascherando il dettaglio HTTP. + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_cerbero_tools.py +import pytest +from multi_swarm.cerbero.tools import CerberoTools + + +def test_tools_dispatch_sma(mocker): + fake_client = mocker.MagicMock() + fake_client.call_tool.return_value = {"value": 100.0} + t = CerberoTools(fake_client) + out = t.sma(exchange="bybit", symbol="BTCUSDT", timeframe="1h", length=20) + fake_client.call_tool.assert_called_once_with( + "bybit", "sma", {"symbol": "BTCUSDT", "timeframe": "1h", "length": 20} + ) + assert out == {"value": 100.0} + + +def test_tools_dispatch_rsi(mocker): + fake_client = mocker.MagicMock() + fake_client.call_tool.return_value = {"value": 55.0} + t = CerberoTools(fake_client) + out = t.rsi(exchange="bybit", symbol="BTCUSDT", timeframe="1h", length=14) + fake_client.call_tool.assert_called_once_with( + "bybit", "rsi", {"symbol": "BTCUSDT", "timeframe": "1h", "length": 14} + ) + assert out == {"value": 55.0} + + +def test_tools_unknown_raises(mocker): + fake_client = mocker.MagicMock() + t = CerberoTools(fake_client) + with pytest.raises(AttributeError): + t.nonexistent_tool() # type: ignore[attr-defined] +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_cerbero_tools.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare wrapper** + +```python +# src/multi_swarm/cerbero/tools.py +from __future__ import annotations + +from typing import Any + +from .client import CerberoClient + + +class CerberoTools: + """Sottoinsieme di tool MCP esposti agli agenti in Phase 1.""" + + def __init__(self, client: CerberoClient): + self._client = client + + def sma(self, exchange: str, symbol: str, timeframe: str, length: int) -> Any: + return self._client.call_tool( + exchange, "sma", {"symbol": symbol, "timeframe": timeframe, "length": length} + ) + + def rsi(self, exchange: str, symbol: str, timeframe: str, length: int = 14) -> Any: + return self._client.call_tool( + exchange, "rsi", {"symbol": symbol, "timeframe": timeframe, "length": length} + ) + + def atr(self, exchange: str, symbol: str, timeframe: str, length: int = 14) -> Any: + return self._client.call_tool( + exchange, "atr", {"symbol": symbol, "timeframe": timeframe, "length": length} + ) + + def macd(self, exchange: str, symbol: str, timeframe: str, fast: int = 12, slow: int = 26, signal: int = 9) -> Any: + return self._client.call_tool( + exchange, "macd", + {"symbol": symbol, "timeframe": timeframe, "fast": fast, "slow": slow, "signal": signal}, + ) + + def realized_vol(self, exchange: str, symbol: str, timeframe: str, window: int = 24) -> Any: + return self._client.call_tool( + exchange, "realized_vol", + {"symbol": symbol, "timeframe": timeframe, "window": window}, + ) + + def funding_rate(self, exchange: str, symbol: str) -> Any: + return self._client.call_tool(exchange, "funding_rate", {"symbol": symbol}) +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_cerbero_tools.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/cerbero/tools.py tests/unit/test_cerbero_tools.py +git commit -m "feat(cerbero): tools wrapper for Phase 1 indicator subset" +``` + +--- + +## Task 11: Protocollo S-expression — grammar e parser + +**Files:** +- Create: `src/multi_swarm/protocol/__init__.py` +- Create: `src/multi_swarm/protocol/grammar.py` +- Create: `src/multi_swarm/protocol/parser.py` +- Test: `tests/unit/test_protocol_parser.py` + +**Grammar Phase 1 (15 verbi)**: `entry-long`, `entry-short`, `exit`, `flat`, `when`, `and`, `or`, `not`, `gt`, `lt`, `eq`, `feature`, `indicator`, `crossover`, `crossunder`. + +Esempio strategia: +```lisp +(strategy + (when (and (gt (indicator rsi 14) 70.0) + (crossunder (feature close) (indicator sma 20))) + (entry-short)) + (when (lt (indicator rsi 14) 30.0) + (entry-long)) + (when (eq (indicator rsi 14) 50.0) + (exit))) +``` + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_protocol_parser.py +import pytest +from multi_swarm.protocol.parser import parse_strategy, ParseError +from multi_swarm.protocol.grammar import VERBS + + +def test_grammar_has_15_verbs(): + assert len(VERBS) == 15 + + +def test_parse_simple_strategy(): + src = "(strategy (when (gt (indicator rsi 14) 70.0) (entry-short)))" + ast = parse_strategy(src) + assert ast.kind == "strategy" + assert len(ast.rules) == 1 + rule = ast.rules[0] + assert rule.kind == "when" + assert rule.condition.kind == "gt" + assert rule.action.kind == "entry-short" + + +def test_parse_multiple_rules(): + src = """ + (strategy + (when (gt (indicator rsi 14) 70.0) (entry-short)) + (when (lt (indicator rsi 14) 30.0) (entry-long))) + """ + ast = parse_strategy(src) + assert len(ast.rules) == 2 + + +def test_parse_unknown_verb_raises(): + src = "(strategy (when (frobnicate 1 2) (entry-long)))" + with pytest.raises(ParseError): + parse_strategy(src) + + +def test_parse_malformed_raises(): + src = "(strategy (when" + with pytest.raises(ParseError): + parse_strategy(src) + + +def test_parse_empty_strategy_raises(): + src = "(strategy)" + with pytest.raises(ParseError): + parse_strategy(src) +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_protocol_parser.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare grammar e parser** + +```python +# src/multi_swarm/protocol/__init__.py +``` + +```python +# src/multi_swarm/protocol/grammar.py +from __future__ import annotations + +VERBS: frozenset[str] = frozenset( + { + "entry-long", + "entry-short", + "exit", + "flat", + "when", + "and", + "or", + "not", + "gt", + "lt", + "eq", + "feature", + "indicator", + "crossover", + "crossunder", + } +) + +ACTION_VERBS: frozenset[str] = frozenset({"entry-long", "entry-short", "exit", "flat"}) +LOGICAL_VERBS: frozenset[str] = frozenset({"and", "or", "not"}) +COMPARATOR_VERBS: frozenset[str] = frozenset({"gt", "lt", "eq"}) +DATA_VERBS: frozenset[str] = frozenset({"feature", "indicator", "crossover", "crossunder"}) +``` + +```python +# src/multi_swarm/protocol/parser.py +from __future__ import annotations + +from dataclasses import dataclass, field +from typing import Any + +import sexpdata + +from .grammar import ( + ACTION_VERBS, + COMPARATOR_VERBS, + DATA_VERBS, + LOGICAL_VERBS, + VERBS, +) + + +class ParseError(Exception): + pass + + +@dataclass +class Node: + kind: str + args: list[Any] = field(default_factory=list) + + +@dataclass +class Rule: + kind: str # "when" + condition: Node + action: Node + + +@dataclass +class Strategy: + kind: str # "strategy" + rules: list[Rule] + + +def _to_node(token: Any) -> Node | float | int | str: + if isinstance(token, sexpdata.Symbol): + name = token.value() + return Node(kind=name, args=[]) + if isinstance(token, list): + if not token: + raise ParseError("Empty s-expression") + head = token[0] + if not isinstance(head, sexpdata.Symbol): + raise ParseError(f"Non-symbol head: {head!r}") + name = head.value() + if name not in VERBS and name != "strategy": + raise ParseError(f"Unknown verb: {name}") + return Node(kind=name, args=[_to_node(arg) for arg in token[1:]]) + return token + + +def parse_strategy(src: str) -> Strategy: + try: + parsed = sexpdata.loads(src) + except Exception as e: + raise ParseError(f"sexp parse error: {e}") from e + + if not isinstance(parsed, list) or not parsed: + raise ParseError("Top-level must be (strategy ...)") + head = parsed[0] + if not isinstance(head, sexpdata.Symbol) or head.value() != "strategy": + raise ParseError("Top-level must start with 'strategy'") + + raw_rules = parsed[1:] + if not raw_rules: + raise ParseError("Strategy must contain at least one rule") + + rules: list[Rule] = [] + for raw in raw_rules: + if not isinstance(raw, list) or len(raw) != 3: + raise ParseError(f"Rule must be (when ): {raw!r}") + head_r = raw[0] + if not isinstance(head_r, sexpdata.Symbol) or head_r.value() != "when": + raise ParseError(f"Rule must start with 'when': {raw!r}") + cond = _to_node(raw[1]) + action = _to_node(raw[2]) + if not isinstance(cond, Node): + raise ParseError(f"Condition must be a node: {cond!r}") + if not isinstance(action, Node): + raise ParseError(f"Action must be a node: {action!r}") + if action.kind not in ACTION_VERBS: + raise ParseError(f"Action must be one of {ACTION_VERBS}, got {action.kind}") + rules.append(Rule(kind="when", condition=cond, action=action)) + + return Strategy(kind="strategy", rules=rules) +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_protocol_parser.py -v` +Expected: PASS tutti e 6. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/protocol/ tests/unit/test_protocol_parser.py +git commit -m "feat(protocol): S-expression grammar (15 verbs) + parser" +``` + +--- + +## Task 12: Protocollo — validator semantico + +**Files:** +- Create: `src/multi_swarm/protocol/validator.py` +- Test: `tests/unit/test_protocol_validator.py` + +Validator controlla che gli argomenti dei verbi abbiano tipi corretti (es. `gt` richiede 2 espressioni numeriche, `indicator` richiede un nome valido + length intero). + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_protocol_validator.py +import pytest +from multi_swarm.protocol.parser import parse_strategy +from multi_swarm.protocol.validator import validate_strategy, ValidationError + + +def test_valid_strategy_passes(): + src = "(strategy (when (gt (indicator rsi 14) 70.0) (entry-short)))" + ast = parse_strategy(src) + validate_strategy(ast) # no exception + + +def test_indicator_unknown_name_fails(): + src = "(strategy (when (gt (indicator wibble 14) 70.0) (entry-short)))" + ast = parse_strategy(src) + with pytest.raises(ValidationError, match="unknown indicator"): + validate_strategy(ast) + + +def test_indicator_wrong_arity_fails(): + src = "(strategy (when (gt (indicator rsi) 70.0) (entry-short)))" + ast = parse_strategy(src) + with pytest.raises(ValidationError): + validate_strategy(ast) + + +def test_comparator_wrong_arity_fails(): + src = "(strategy (when (gt 1.0) (entry-long)))" + ast = parse_strategy(src) + with pytest.raises(ValidationError): + validate_strategy(ast) + + +def test_feature_unknown_column_fails(): + src = "(strategy (when (gt (feature wibble) 100.0) (entry-long)))" + ast = parse_strategy(src) + with pytest.raises(ValidationError, match="unknown feature"): + validate_strategy(ast) +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_protocol_validator.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare validator** + +```python +# src/multi_swarm/protocol/validator.py +from __future__ import annotations + +from .parser import Node, Rule, Strategy +from .grammar import COMPARATOR_VERBS, LOGICAL_VERBS + +KNOWN_INDICATORS: frozenset[str] = frozenset({"sma", "rsi", "atr", "macd", "realized_vol"}) +KNOWN_FEATURES: frozenset[str] = frozenset({"open", "high", "low", "close", "volume"}) + + +class ValidationError(Exception): + pass + + +def validate_strategy(strategy: Strategy) -> None: + for rule in strategy.rules: + _validate_node(rule.condition, expect_bool=True) + + +def _validate_node(node: Node, expect_bool: bool) -> None: + if node.kind in LOGICAL_VERBS: + if node.kind == "not": + if len(node.args) != 1: + raise ValidationError(f"'not' needs 1 arg, got {len(node.args)}") + _validate_node(node.args[0], expect_bool=True) + else: + if len(node.args) < 2: + raise ValidationError(f"'{node.kind}' needs >=2 args") + for a in node.args: + _validate_node(a, expect_bool=True) + return + + if node.kind in COMPARATOR_VERBS: + if len(node.args) != 2: + raise ValidationError(f"'{node.kind}' needs 2 args, got {len(node.args)}") + for a in node.args: + if isinstance(a, Node): + _validate_node(a, expect_bool=False) + return + + if node.kind in {"crossover", "crossunder"}: + if len(node.args) != 2: + raise ValidationError(f"'{node.kind}' needs 2 args") + for a in node.args: + if isinstance(a, Node): + _validate_node(a, expect_bool=False) + return + + if node.kind == "indicator": + if len(node.args) < 2: + raise ValidationError(f"'indicator' needs >=2 args (name, length)") + name_node = node.args[0] + if isinstance(name_node, Node): + ind_name = name_node.kind + else: + ind_name = str(name_node) + if ind_name not in KNOWN_INDICATORS: + raise ValidationError(f"unknown indicator: {ind_name}") + return + + if node.kind == "feature": + if len(node.args) != 1: + raise ValidationError(f"'feature' needs 1 arg") + feat_node = node.args[0] + if isinstance(feat_node, Node): + feat_name = feat_node.kind + else: + feat_name = str(feat_node) + if feat_name not in KNOWN_FEATURES: + raise ValidationError(f"unknown feature: {feat_name}") + return + + raise ValidationError(f"unexpected node kind in expression: {node.kind}") +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_protocol_validator.py -v` +Expected: PASS tutti e 5. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/protocol/validator.py tests/unit/test_protocol_validator.py +git commit -m "feat(protocol): semantic validator for AST" +``` + +--- + +## Task 13: Protocollo — compiler AST → callable rules + +**Files:** +- Create: `src/multi_swarm/protocol/compiler.py` +- Test: `tests/unit/test_protocol_compiler.py` + +Il compiler trasforma l'AST in una funzione `(ohlcv_window: pd.DataFrame) -> Side` che dato uno snapshot di mercato restituisce la decisione di posizione. Gli indicatori sono calcolati da una libreria locale built-in (no Cerbero in compiler — Cerbero è chiamato dagli agenti per ispezione, non dal compiler che deve essere veloce e deterministico). + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_protocol_compiler.py +import numpy as np +import pandas as pd +import pytest +from multi_swarm.protocol.parser import parse_strategy +from multi_swarm.protocol.compiler import compile_strategy +from multi_swarm.backtest.orders import Side + + +@pytest.fixture +def ohlcv(): + idx = pd.date_range("2024-01-01", periods=200, freq="1h", tz="UTC") + close = np.linspace(100, 120, 200) + return pd.DataFrame( + {"open": close, "high": close + 0.5, "low": close - 0.5, "close": close, "volume": 1.0}, + index=idx, + ) + + +def test_compile_simple_long(ohlcv): + src = "(strategy (when (lt (indicator rsi 14) 100.0) (entry-long)))" + ast = parse_strategy(src) + fn = compile_strategy(ast) + signals = fn(ohlcv) + assert isinstance(signals, pd.Series) + assert (signals == Side.LONG).all() or (signals.dropna() == Side.LONG).all() + + +def test_compile_no_match_is_flat(ohlcv): + src = "(strategy (when (gt (indicator rsi 14) 1000.0) (entry-long)))" + ast = parse_strategy(src) + fn = compile_strategy(ast) + signals = fn(ohlcv) + assert (signals == Side.FLAT).any() + + +def test_compile_two_rules_priority(ohlcv): + src = """ + (strategy + (when (gt (feature close) 110.0) (entry-long)) + (when (lt (feature close) 105.0) (entry-short))) + """ + ast = parse_strategy(src) + fn = compile_strategy(ast) + signals = fn(ohlcv) + last = signals.iloc[-1] + assert last == Side.LONG # close finale è 120, regola 1 matcha +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_protocol_compiler.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare compiler** + +```python +# src/multi_swarm/protocol/compiler.py +from __future__ import annotations + +from typing import Callable + +import numpy as np +import pandas as pd + +from ..backtest.orders import Side +from .parser import Node, Strategy + + +def _sma(s: pd.Series, length: int) -> pd.Series: + return s.rolling(length, min_periods=1).mean() + + +def _rsi(s: pd.Series, length: int) -> pd.Series: + delta = s.diff() + up = delta.clip(lower=0) + down = -delta.clip(upper=0) + roll_up = up.ewm(alpha=1.0 / length, adjust=False).mean() + roll_down = down.ewm(alpha=1.0 / length, adjust=False).mean() + rs = roll_up / roll_down.replace(0, np.nan) + return 100 - (100 / (1 + rs)) + + +def _atr(df: pd.DataFrame, length: int) -> pd.Series: + h_l = df["high"] - df["low"] + h_c = (df["high"] - df["close"].shift()).abs() + l_c = (df["low"] - df["close"].shift()).abs() + tr = pd.concat([h_l, h_c, l_c], axis=1).max(axis=1) + return tr.ewm(alpha=1.0 / length, adjust=False).mean() + + +def _realized_vol(s: pd.Series, window: int) -> pd.Series: + returns = s.pct_change() + return returns.rolling(window, min_periods=1).std() * np.sqrt(window) + + +INDICATOR_FNS: dict[str, Callable[..., pd.Series]] = { + "sma": lambda df, length: _sma(df["close"], length), + "rsi": lambda df, length: _rsi(df["close"], length), + "atr": lambda df, length: _atr(df, length), + "realized_vol": lambda df, length: _realized_vol(df["close"], length), + "macd": lambda df, fast=12, slow=26: ( + _sma(df["close"], fast) - _sma(df["close"], slow) + ), +} + + +def _eval_node(node: Node, df: pd.DataFrame) -> pd.Series: + if node.kind == "feature": + feat = node.args[0] + feat_name = feat.kind if isinstance(feat, Node) else str(feat) + return df[feat_name] + + if node.kind == "indicator": + name_node = node.args[0] + ind_name = name_node.kind if isinstance(name_node, Node) else str(name_node) + params = [a for a in node.args[1:] if not isinstance(a, Node)] + return INDICATOR_FNS[ind_name](df, *params) + + if node.kind == "gt": + a = _eval_node(node.args[0], df) if isinstance(node.args[0], Node) else _to_series(node.args[0], df) + b = _eval_node(node.args[1], df) if isinstance(node.args[1], Node) else _to_series(node.args[1], df) + return (a > b).astype(bool) + + if node.kind == "lt": + a = _eval_node(node.args[0], df) if isinstance(node.args[0], Node) else _to_series(node.args[0], df) + b = _eval_node(node.args[1], df) if isinstance(node.args[1], Node) else _to_series(node.args[1], df) + return (a < b).astype(bool) + + if node.kind == "eq": + a = _eval_node(node.args[0], df) if isinstance(node.args[0], Node) else _to_series(node.args[0], df) + b = _eval_node(node.args[1], df) if isinstance(node.args[1], Node) else _to_series(node.args[1], df) + return (a == b).astype(bool) + + if node.kind == "and": + result = pd.Series(True, index=df.index) + for a in node.args: + s = _eval_node(a, df) if isinstance(a, Node) else pd.Series(bool(a), index=df.index) + result &= s.fillna(False).astype(bool) + return result + + if node.kind == "or": + result = pd.Series(False, index=df.index) + for a in node.args: + s = _eval_node(a, df) if isinstance(a, Node) else pd.Series(bool(a), index=df.index) + result |= s.fillna(False).astype(bool) + return result + + if node.kind == "not": + a = node.args[0] + s = _eval_node(a, df) if isinstance(a, Node) else pd.Series(bool(a), index=df.index) + return (~s.fillna(False).astype(bool)) + + if node.kind == "crossover": + a = _eval_node(node.args[0], df) if isinstance(node.args[0], Node) else _to_series(node.args[0], df) + b = _eval_node(node.args[1], df) if isinstance(node.args[1], Node) else _to_series(node.args[1], df) + return ((a > b) & (a.shift() <= b.shift())).fillna(False).astype(bool) + + if node.kind == "crossunder": + a = _eval_node(node.args[0], df) if isinstance(node.args[0], Node) else _to_series(node.args[0], df) + b = _eval_node(node.args[1], df) if isinstance(node.args[1], Node) else _to_series(node.args[1], df) + return ((a < b) & (a.shift() >= b.shift())).fillna(False).astype(bool) + + raise RuntimeError(f"unsupported node in compiler: {node.kind}") + + +def _to_series(value: object, df: pd.DataFrame) -> pd.Series: + return pd.Series(float(value), index=df.index) # type: ignore[arg-type] + + +def _action_to_side(action: Node) -> Side: + return { + "entry-long": Side.LONG, + "entry-short": Side.SHORT, + "exit": Side.FLAT, + "flat": Side.FLAT, + }[action.kind] + + +def compile_strategy(strategy: Strategy) -> Callable[[pd.DataFrame], pd.Series]: + """Compila la strategy in una funzione df → Series[Side]. + + Le regole sono valutate in ordine; la prima che matcha vince per ogni timestamp. + Default Side.FLAT se nessuna regola matcha. + """ + + def fn(df: pd.DataFrame) -> pd.Series: + result = pd.Series(Side.FLAT, index=df.index, dtype=object) + already_set = pd.Series(False, index=df.index) + for rule in strategy.rules: + match = _eval_node(rule.condition, df) + target = _action_to_side(rule.action) + apply_mask = match & ~already_set + result[apply_mask] = target + already_set |= apply_mask + return result + + return fn +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_protocol_compiler.py -v` +Expected: PASS tutti e 3. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/protocol/compiler.py tests/unit/test_protocol_compiler.py +git commit -m "feat(protocol): AST compiler to (df -> Series[Side]) signal fn" +``` + +--- + +## Task 14: Genome dataclass + serializzazione + +**Files:** +- Create: `src/multi_swarm/genome/__init__.py` +- Create: `src/multi_swarm/genome/hypothesis.py` +- Test: `tests/unit/test_genome_hypothesis.py` + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_genome_hypothesis.py +from multi_swarm.genome.hypothesis import HypothesisAgentGenome, ModelTier + + +def test_genome_creation_defaults(): + g = HypothesisAgentGenome( + system_prompt="Pensa come un fisico.", + feature_access=["close", "volume"], + temperature=0.9, + top_p=0.95, + model_tier=ModelTier.C, + lookback_window=200, + cognitive_style="physicist", + ) + assert g.id is not None + assert g.parent_ids == [] + assert g.generation == 0 + + +def test_genome_serialization_roundtrip(): + g = HypothesisAgentGenome( + system_prompt="Pensa come un biologo.", + feature_access=["close", "high", "low"], + temperature=1.1, + top_p=0.9, + model_tier=ModelTier.C, + lookback_window=300, + cognitive_style="biologist", + parent_ids=["abc"], + generation=5, + ) + payload = g.to_dict() + g2 = HypothesisAgentGenome.from_dict(payload) + assert g2.system_prompt == g.system_prompt + assert g2.feature_access == g.feature_access + assert g2.temperature == g.temperature + assert g2.parent_ids == g.parent_ids + assert g2.generation == g.generation + assert g2.id == g.id + + +def test_genome_id_is_deterministic_on_content(): + g1 = HypothesisAgentGenome( + system_prompt="X", feature_access=["close"], temperature=0.5, + top_p=0.9, model_tier=ModelTier.C, lookback_window=100, cognitive_style="x", + ) + g2 = HypothesisAgentGenome( + system_prompt="X", feature_access=["close"], temperature=0.5, + top_p=0.9, model_tier=ModelTier.C, lookback_window=100, cognitive_style="x", + ) + assert g1.id == g2.id +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_genome_hypothesis.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare genome** + +```python +# src/multi_swarm/genome/__init__.py +``` + +```python +# src/multi_swarm/genome/hypothesis.py +from __future__ import annotations + +import hashlib +import json +from dataclasses import dataclass, field +from enum import Enum +from typing import Any + + +class ModelTier(str, Enum): + B = "B" # Sonnet 4.6 via Anthropic + C = "C" # Qwen 2.5 72B via OpenRouter + + +@dataclass +class HypothesisAgentGenome: + system_prompt: str + feature_access: list[str] + temperature: float + top_p: float + model_tier: ModelTier + lookback_window: int + cognitive_style: str + parent_ids: list[str] = field(default_factory=list) + generation: int = 0 + id: str = "" + + def __post_init__(self) -> None: + if not self.id: + self.id = self._compute_id() + + def _compute_id(self) -> str: + payload = { + "system_prompt": self.system_prompt, + "feature_access": sorted(self.feature_access), + "temperature": round(self.temperature, 4), + "top_p": round(self.top_p, 4), + "model_tier": self.model_tier.value, + "lookback_window": self.lookback_window, + "cognitive_style": self.cognitive_style, + } + s = json.dumps(payload, sort_keys=True) + return hashlib.sha1(s.encode()).hexdigest()[:16] + + def to_dict(self) -> dict[str, Any]: + return { + "id": self.id, + "system_prompt": self.system_prompt, + "feature_access": self.feature_access, + "temperature": self.temperature, + "top_p": self.top_p, + "model_tier": self.model_tier.value, + "lookback_window": self.lookback_window, + "cognitive_style": self.cognitive_style, + "parent_ids": self.parent_ids, + "generation": self.generation, + } + + @classmethod + def from_dict(cls, data: dict[str, Any]) -> HypothesisAgentGenome: + return cls( + system_prompt=data["system_prompt"], + feature_access=list(data["feature_access"]), + temperature=float(data["temperature"]), + top_p=float(data["top_p"]), + model_tier=ModelTier(data["model_tier"]), + lookback_window=int(data["lookback_window"]), + cognitive_style=data["cognitive_style"], + parent_ids=list(data.get("parent_ids", [])), + generation=int(data.get("generation", 0)), + id=data.get("id", ""), + ) +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_genome_hypothesis.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/genome/ tests/unit/test_genome_hypothesis.py +git commit -m "feat(genome): HypothesisAgentGenome with deterministic id and serde" +``` + +--- + +## Task 15: Genome — mutation operators + +**Files:** +- Create: `src/multi_swarm/genome/mutation.py` +- Test: `tests/unit/test_genome_mutation.py` + +Operatori di mutazione (uno selezionato casualmente per ogni mutazione): +1. `mutate_temperature`: ±0.1, clipped a [0.6, 1.3]. +2. `mutate_lookback`: ±50 bar, clipped a [50, 500]. +3. `mutate_feature_access`: aggiungi/rimuovi una feature da pool fissa. +4. `mutate_cognitive_style`: cambia da pool fissa di 6 stili. +5. `mutate_prompt_chunk`: l'LLM riscrive una parte del system_prompt (gestito altrove, per ora skip — solo placeholder). + +In Phase 1 mutiamo solo i campi numerici/discreti deterministicamente. Le mutazioni del prompt LLM sono delegate al modulo `agents` quando si chiama il "mutator agent". + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_genome_mutation.py +import random +import pytest +from multi_swarm.genome.hypothesis import HypothesisAgentGenome, ModelTier +from multi_swarm.genome.mutation import ( + mutate_temperature, + mutate_lookback, + mutate_feature_access, + mutate_cognitive_style, + FEATURE_POOL, + COGNITIVE_STYLES, +) + + +@pytest.fixture +def base_genome(): + return HypothesisAgentGenome( + system_prompt="x", + feature_access=["close"], + temperature=0.9, + top_p=0.95, + model_tier=ModelTier.C, + lookback_window=200, + cognitive_style="physicist", + ) + + +def test_mutate_temperature_within_bounds(base_genome): + rng = random.Random(0) + for _ in range(50): + new = mutate_temperature(base_genome, rng) + assert 0.6 <= new.temperature <= 1.3 + + +def test_mutate_lookback_within_bounds(base_genome): + rng = random.Random(0) + for _ in range(50): + new = mutate_lookback(base_genome, rng) + assert 50 <= new.lookback_window <= 500 + + +def test_mutate_feature_access_changes_set(base_genome): + rng = random.Random(0) + new = mutate_feature_access(base_genome, rng) + assert set(new.feature_access) != set(base_genome.feature_access) or len(FEATURE_POOL) == 1 + assert all(f in FEATURE_POOL for f in new.feature_access) + assert len(new.feature_access) >= 1 + + +def test_mutate_cognitive_style_uses_pool(base_genome): + rng = random.Random(0) + new = mutate_cognitive_style(base_genome, rng) + assert new.cognitive_style in COGNITIVE_STYLES + + +def test_mutation_preserves_lineage(base_genome): + rng = random.Random(0) + new = mutate_temperature(base_genome, rng) + assert base_genome.id in new.parent_ids + assert new.id != base_genome.id +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_genome_mutation.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare mutazioni** + +```python +# src/multi_swarm/genome/mutation.py +from __future__ import annotations + +import random + +from .hypothesis import HypothesisAgentGenome, ModelTier + + +FEATURE_POOL: tuple[str, ...] = ("open", "high", "low", "close", "volume") + +COGNITIVE_STYLES: tuple[str, ...] = ( + "physicist", "biologist", "historian", "meteorologist", + "ecologist", "engineer", +) + + +def _clone_with(g: HypothesisAgentGenome, **overrides: object) -> HypothesisAgentGenome: + payload = g.to_dict() + payload.update(overrides) # type: ignore[arg-type] + payload.pop("id", None) + payload["parent_ids"] = list(g.parent_ids) + [g.id] + payload["generation"] = g.generation + 1 + return HypothesisAgentGenome.from_dict(payload) + + +def mutate_temperature(g: HypothesisAgentGenome, rng: random.Random) -> HypothesisAgentGenome: + delta = rng.choice([-0.1, 0.1]) + new_t = max(0.6, min(1.3, g.temperature + delta)) + return _clone_with(g, temperature=round(new_t, 4)) + + +def mutate_lookback(g: HypothesisAgentGenome, rng: random.Random) -> HypothesisAgentGenome: + delta = rng.choice([-50, 50]) + new_lb = max(50, min(500, g.lookback_window + delta)) + return _clone_with(g, lookback_window=new_lb) + + +def mutate_feature_access(g: HypothesisAgentGenome, rng: random.Random) -> HypothesisAgentGenome: + current = set(g.feature_access) + if len(current) == len(FEATURE_POOL): + op = "remove" + elif not current: + op = "add" + else: + op = rng.choice(["add", "remove"]) + + if op == "add": + candidates = [f for f in FEATURE_POOL if f not in current] + choice = rng.choice(candidates) + new_set = current | {choice} + else: + if len(current) <= 1: + return _clone_with(g) + choice = rng.choice(sorted(current)) + new_set = current - {choice} + + return _clone_with(g, feature_access=sorted(new_set)) + + +def mutate_cognitive_style(g: HypothesisAgentGenome, rng: random.Random) -> HypothesisAgentGenome: + candidates = [s for s in COGNITIVE_STYLES if s != g.cognitive_style] + new_style = rng.choice(candidates) + return _clone_with(g, cognitive_style=new_style) + + +MUTATION_OPS = (mutate_temperature, mutate_lookback, mutate_feature_access, mutate_cognitive_style) + + +def random_mutate(g: HypothesisAgentGenome, rng: random.Random) -> HypothesisAgentGenome: + op = rng.choice(MUTATION_OPS) + return op(g, rng) +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_genome_mutation.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/genome/mutation.py tests/unit/test_genome_mutation.py +git commit -m "feat(genome): deterministic mutation operators (numeric + categorical)" +``` + +--- + +## Task 16: Genome — crossover + +**Files:** +- Create: `src/multi_swarm/genome/crossover.py` +- Test: `tests/unit/test_genome_crossover.py` + +Crossover uniforme: per ogni campo prende valore da parent1 o parent2 con prob 0.5. system_prompt: scelta intera (no merging in Phase 1). + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_genome_crossover.py +import random +from multi_swarm.genome.hypothesis import HypothesisAgentGenome, ModelTier +from multi_swarm.genome.crossover import uniform_crossover + + +def make(name: str) -> HypothesisAgentGenome: + return HypothesisAgentGenome( + system_prompt=f"prompt-{name}", + feature_access=["close"] if name == "A" else ["close", "volume"], + temperature=0.7 if name == "A" else 1.1, + top_p=0.9, + model_tier=ModelTier.C, + lookback_window=100 if name == "A" else 300, + cognitive_style="physicist" if name == "A" else "biologist", + ) + + +def test_crossover_lineage(): + p1 = make("A") + p2 = make("B") + rng = random.Random(0) + child = uniform_crossover(p1, p2, rng) + assert sorted(child.parent_ids[-2:]) == sorted([p1.id, p2.id]) + assert child.generation == max(p1.generation, p2.generation) + 1 + + +def test_crossover_inherits_each_field_from_one_parent(): + p1 = make("A") + p2 = make("B") + rng = random.Random(0) + child = uniform_crossover(p1, p2, rng) + assert child.system_prompt in (p1.system_prompt, p2.system_prompt) + assert child.temperature in (p1.temperature, p2.temperature) + assert child.lookback_window in (p1.lookback_window, p2.lookback_window) + assert child.cognitive_style in (p1.cognitive_style, p2.cognitive_style) + + +def test_crossover_deterministic_with_same_seed(): + p1 = make("A") + p2 = make("B") + c1 = uniform_crossover(p1, p2, random.Random(42)) + c2 = uniform_crossover(p1, p2, random.Random(42)) + assert c1.to_dict() == c2.to_dict() +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_genome_crossover.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare crossover** + +```python +# src/multi_swarm/genome/crossover.py +from __future__ import annotations + +import random + +from .hypothesis import HypothesisAgentGenome + + +def uniform_crossover( + p1: HypothesisAgentGenome, + p2: HypothesisAgentGenome, + rng: random.Random, +) -> HypothesisAgentGenome: + """Per ogni campo, eredita da p1 (prob 0.5) o p2.""" + + def pick(field: str) -> object: + return getattr(p1 if rng.random() < 0.5 else p2, field) + + payload = { + "system_prompt": pick("system_prompt"), + "feature_access": list(pick("feature_access")), # type: ignore[arg-type] + "temperature": pick("temperature"), + "top_p": pick("top_p"), + "model_tier": pick("model_tier").value if hasattr(pick("model_tier"), "value") else pick("model_tier"), # type: ignore[union-attr] + "lookback_window": pick("lookback_window"), + "cognitive_style": pick("cognitive_style"), + "parent_ids": [p1.id, p2.id], + "generation": max(p1.generation, p2.generation) + 1, + } + return HypothesisAgentGenome.from_dict(payload) +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_genome_crossover.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/genome/crossover.py tests/unit/test_genome_crossover.py +git commit -m "feat(genome): uniform crossover for hypothesis genomes" +``` + +--- + +## Task 17: LLM client (OpenRouter Qwen + Anthropic Sonnet) + +**Files:** +- Create: `src/multi_swarm/llm/__init__.py` +- Create: `src/multi_swarm/llm/client.py` +- Test: `tests/unit/test_llm_client.py` + +Wrapper unificato: `LLMClient.complete(genome, system, user) -> CompletionResult`. Sceglie tier da `genome.model_tier`. Per tier C usa OpenAI SDK con base_url = OpenRouter; per tier B usa anthropic SDK. + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_llm_client.py +from multi_swarm.genome.hypothesis import HypothesisAgentGenome, ModelTier +from multi_swarm.llm.client import LLMClient, CompletionResult + + +def make_genome(tier: ModelTier) -> HypothesisAgentGenome: + return HypothesisAgentGenome( + system_prompt="x", feature_access=["close"], temperature=0.9, top_p=0.95, + model_tier=tier, lookback_window=200, cognitive_style="physicist", + ) + + +def test_completion_tier_c_uses_openrouter(mocker): + fake_openai = mocker.MagicMock() + fake_response = mocker.MagicMock() + fake_response.choices = [mocker.MagicMock(message=mocker.MagicMock(content="(strategy ...)"))] + fake_response.usage = mocker.MagicMock(prompt_tokens=100, completion_tokens=200) + fake_openai.chat.completions.create.return_value = fake_response + + mocker.patch("multi_swarm.llm.client.OpenAI", return_value=fake_openai) + + client = LLMClient(openrouter_api_key="or-x", anthropic_api_key=None) + g = make_genome(ModelTier.C) + out = client.complete(g, system="sys", user="usr") + + assert isinstance(out, CompletionResult) + assert out.text == "(strategy ...)" + assert out.input_tokens == 100 + assert out.output_tokens == 200 + assert out.tier == ModelTier.C + fake_openai.chat.completions.create.assert_called_once() + + +def test_completion_tier_b_uses_anthropic(mocker): + fake_anthropic = mocker.MagicMock() + fake_msg = mocker.MagicMock() + fake_msg.content = [mocker.MagicMock(text="(strategy ...)")] + fake_msg.usage = mocker.MagicMock(input_tokens=80, output_tokens=150) + fake_anthropic.messages.create.return_value = fake_msg + mocker.patch("multi_swarm.llm.client.Anthropic", return_value=fake_anthropic) + + client = LLMClient(openrouter_api_key="or-x", anthropic_api_key="an-x") + g = make_genome(ModelTier.B) + out = client.complete(g, system="sys", user="usr") + + assert out.text == "(strategy ...)" + assert out.input_tokens == 80 + assert out.output_tokens == 150 + assert out.tier == ModelTier.B +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_llm_client.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare LLM client** + +```python +# src/multi_swarm/llm/__init__.py +``` + +```python +# src/multi_swarm/llm/client.py +from __future__ import annotations + +from dataclasses import dataclass + +from anthropic import Anthropic +from openai import OpenAI + +from ..genome.hypothesis import HypothesisAgentGenome, ModelTier + + +# Modelli configurati per Phase 1 +MODEL_TIER_C = "qwen/qwen-2.5-72b-instruct" # via OpenRouter +MODEL_TIER_B = "claude-sonnet-4-6" # via Anthropic +OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1" + + +@dataclass(frozen=True) +class CompletionResult: + text: str + input_tokens: int + output_tokens: int + tier: ModelTier + model: str + + +class LLMClient: + def __init__( + self, + openrouter_api_key: str, + anthropic_api_key: str | None = None, + ): + self._openrouter = OpenAI(api_key=openrouter_api_key, base_url=OPENROUTER_BASE_URL) + self._anthropic = Anthropic(api_key=anthropic_api_key) if anthropic_api_key else None + + def complete( + self, + genome: HypothesisAgentGenome, + system: str, + user: str, + max_tokens: int = 2000, + ) -> CompletionResult: + if genome.model_tier == ModelTier.C: + resp = self._openrouter.chat.completions.create( + model=MODEL_TIER_C, + messages=[ + {"role": "system", "content": system}, + {"role": "user", "content": user}, + ], + temperature=genome.temperature, + top_p=genome.top_p, + max_tokens=max_tokens, + ) + return CompletionResult( + text=resp.choices[0].message.content or "", + input_tokens=resp.usage.prompt_tokens, + output_tokens=resp.usage.completion_tokens, + tier=ModelTier.C, + model=MODEL_TIER_C, + ) + + if self._anthropic is None: + raise RuntimeError("ANTHROPIC_API_KEY required for tier B genomes") + + msg = self._anthropic.messages.create( + model=MODEL_TIER_B, + system=system, + messages=[{"role": "user", "content": user}], + temperature=genome.temperature, + top_p=genome.top_p, + max_tokens=max_tokens, + ) + text = "".join(block.text for block in msg.content if hasattr(block, "text")) + return CompletionResult( + text=text, + input_tokens=msg.usage.input_tokens, + output_tokens=msg.usage.output_tokens, + tier=ModelTier.B, + model=MODEL_TIER_B, + ) +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_llm_client.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/llm/ tests/unit/test_llm_client.py +git commit -m "feat(llm): unified client for OpenRouter (Qwen) + Anthropic (Sonnet)" +``` + +--- + +## Task 18: Cost tracker + +**Files:** +- Create: `src/multi_swarm/llm/cost_tracker.py` +- Test: `tests/unit/test_cost_tracker.py` + +Pricing approssimativo Phase 1 (al token): +- tier C (Qwen 2.5 72B via OpenRouter): $0.40/M input, $0.40/M output +- tier B (Claude Sonnet 4.6): $3.00/M input, $15.00/M output + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_cost_tracker.py +from multi_swarm.genome.hypothesis import ModelTier +from multi_swarm.llm.cost_tracker import CostTracker, estimate_cost + + +def test_estimate_cost_tier_c(): + cost = estimate_cost(input_tokens=1_000_000, output_tokens=1_000_000, tier=ModelTier.C) + assert cost == 0.40 + 0.40 + + +def test_estimate_cost_tier_b(): + cost = estimate_cost(input_tokens=1_000_000, output_tokens=1_000_000, tier=ModelTier.B) + assert cost == 3.00 + 15.00 + + +def test_tracker_accumulates(): + t = CostTracker() + t.record(input_tokens=10_000, output_tokens=20_000, tier=ModelTier.C, run_id="r", agent_id="a") + t.record(input_tokens=5_000, output_tokens=15_000, tier=ModelTier.C, run_id="r", agent_id="b") + summary = t.summary() + assert summary["calls"] == 2 + assert summary["input_tokens"] == 15_000 + assert summary["output_tokens"] == 35_000 + assert summary["cost_usd"] > 0 + + +def test_tracker_per_tier_breakdown(): + t = CostTracker() + t.record(input_tokens=10_000, output_tokens=10_000, tier=ModelTier.C, run_id="r", agent_id="a") + t.record(input_tokens=10_000, output_tokens=10_000, tier=ModelTier.B, run_id="r", agent_id="b") + summary = t.summary() + assert "C" in summary["by_tier"] + assert "B" in summary["by_tier"] +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_cost_tracker.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare cost tracker** + +```python +# src/multi_swarm/llm/cost_tracker.py +from __future__ import annotations + +from collections import defaultdict +from dataclasses import dataclass, field +from datetime import datetime, timezone +from typing import Any + +from ..genome.hypothesis import ModelTier + + +PRICE_PER_M_TOKENS: dict[ModelTier, dict[str, float]] = { + ModelTier.C: {"input": 0.40, "output": 0.40}, + ModelTier.B: {"input": 3.00, "output": 15.00}, +} + + +def estimate_cost(input_tokens: int, output_tokens: int, tier: ModelTier) -> float: + p = PRICE_PER_M_TOKENS[tier] + return (input_tokens / 1_000_000) * p["input"] + (output_tokens / 1_000_000) * p["output"] + + +@dataclass +class CostRecord: + ts: datetime + run_id: str + agent_id: str + tier: ModelTier + input_tokens: int + output_tokens: int + cost_usd: float + + +@dataclass +class CostTracker: + records: list[CostRecord] = field(default_factory=list) + + def record( + self, + input_tokens: int, + output_tokens: int, + tier: ModelTier, + run_id: str, + agent_id: str, + ) -> CostRecord: + cost = estimate_cost(input_tokens, output_tokens, tier) + rec = CostRecord( + ts=datetime.now(timezone.utc), + run_id=run_id, + agent_id=agent_id, + tier=tier, + input_tokens=input_tokens, + output_tokens=output_tokens, + cost_usd=cost, + ) + self.records.append(rec) + return rec + + def summary(self) -> dict[str, Any]: + by_tier: dict[str, dict[str, float]] = defaultdict( + lambda: {"calls": 0, "input_tokens": 0, "output_tokens": 0, "cost_usd": 0.0} + ) + for r in self.records: + t = r.tier.value + by_tier[t]["calls"] += 1 + by_tier[t]["input_tokens"] += r.input_tokens + by_tier[t]["output_tokens"] += r.output_tokens + by_tier[t]["cost_usd"] += r.cost_usd + return { + "calls": len(self.records), + "input_tokens": sum(r.input_tokens for r in self.records), + "output_tokens": sum(r.output_tokens for r in self.records), + "cost_usd": sum(r.cost_usd for r in self.records), + "by_tier": dict(by_tier), + } +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_cost_tracker.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/llm/cost_tracker.py tests/unit/test_cost_tracker.py +git commit -m "feat(llm): cost tracker with per-tier pricing and breakdown" +``` + +--- + +## Task 19: Hypothesis agent (LLM call → S-expr) + +**Files:** +- Create: `src/multi_swarm/agents/__init__.py` +- Create: `src/multi_swarm/agents/hypothesis.py` +- Test: `tests/unit/test_hypothesis_agent.py` + +L'Hypothesis agent prende un genome + un summary di mercato (statistiche di base sull'OHLCV training set) e produce una strategia S-expression. Il prompt template è fissato; il system_prompt del genoma viene iniettato nel system message; il summary di mercato e i feature accessibili sono iniettati nel user message. + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_hypothesis_agent.py +import pandas as pd +import numpy as np +from multi_swarm.genome.hypothesis import HypothesisAgentGenome, ModelTier +from multi_swarm.agents.hypothesis import HypothesisAgent, MarketSummary +from multi_swarm.llm.client import CompletionResult + + +def make_summary(): + return MarketSummary( + symbol="BTC/USDT", + timeframe="1h", + n_bars=1000, + return_mean=0.0001, + return_std=0.01, + skew=0.1, + kurtosis=3.5, + volatility_regime="high", + ) + + +def test_hypothesis_agent_calls_llm_and_parses(mocker): + fake_llm = mocker.MagicMock() + fake_llm.complete.return_value = CompletionResult( + text="(strategy (when (gt (indicator rsi 14) 70.0) (entry-short)))", + input_tokens=200, output_tokens=80, tier=ModelTier.C, model="qwen", + ) + g = HypothesisAgentGenome( + system_prompt="Pensa come un fisico.", feature_access=["close"], temperature=0.9, + top_p=0.95, model_tier=ModelTier.C, lookback_window=200, cognitive_style="physicist", + ) + agent = HypothesisAgent(llm=fake_llm) + proposal = agent.propose(g, make_summary()) + assert proposal.strategy is not None + assert proposal.raw_text.startswith("(strategy") + assert proposal.completion.input_tokens == 200 + fake_llm.complete.assert_called_once() + + +def test_hypothesis_agent_returns_none_on_parse_error(mocker): + fake_llm = mocker.MagicMock() + fake_llm.complete.return_value = CompletionResult( + text="this is not s-expression", + input_tokens=200, output_tokens=80, tier=ModelTier.C, model="qwen", + ) + g = HypothesisAgentGenome( + system_prompt="x", feature_access=["close"], temperature=0.9, + top_p=0.95, model_tier=ModelTier.C, lookback_window=200, cognitive_style="physicist", + ) + agent = HypothesisAgent(llm=fake_llm) + proposal = agent.propose(g, make_summary()) + assert proposal.strategy is None + assert proposal.parse_error is not None + + +def test_hypothesis_agent_extracts_sexp_from_markdown_fence(mocker): + fake_llm = mocker.MagicMock() + fake_llm.complete.return_value = CompletionResult( + text="Ecco la strategia:\n```lisp\n(strategy (when (lt (indicator rsi 14) 30.0) (entry-long)))\n```\nFatta.", + input_tokens=200, output_tokens=80, tier=ModelTier.C, model="qwen", + ) + g = HypothesisAgentGenome( + system_prompt="x", feature_access=["close"], temperature=0.9, + top_p=0.95, model_tier=ModelTier.C, lookback_window=200, cognitive_style="physicist", + ) + agent = HypothesisAgent(llm=fake_llm) + proposal = agent.propose(g, make_summary()) + assert proposal.strategy is not None +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_hypothesis_agent.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare agent** + +```python +# src/multi_swarm/agents/__init__.py +``` + +```python +# src/multi_swarm/agents/hypothesis.py +from __future__ import annotations + +import re +from dataclasses import dataclass + +from ..genome.hypothesis import HypothesisAgentGenome +from ..llm.client import CompletionResult, LLMClient +from ..protocol.parser import ParseError, Strategy, parse_strategy +from ..protocol.validator import ValidationError, validate_strategy + + +@dataclass(frozen=True) +class MarketSummary: + symbol: str + timeframe: str + n_bars: int + return_mean: float + return_std: float + skew: float + kurtosis: float + volatility_regime: str + + +@dataclass(frozen=True) +class HypothesisProposal: + strategy: Strategy | None + raw_text: str + completion: CompletionResult + parse_error: str | None = None + + +SYSTEM_TEMPLATE = """\ +Sei un agente generatore di ipotesi di trading quantitativo per un sistema swarm. + +Il tuo stile cognitivo: {cognitive_style} +Direttiva personale: {system_prompt} + +Devi proporre una strategia di trading espressa nel linguaggio S-expression +con i seguenti verbi disponibili: + + Azioni: entry-long, entry-short, exit, flat + Logici: and, or, not + Comparatori: gt, lt, eq + Dati: feature, indicator, crossover, crossunder + +Indicatori disponibili: sma , rsi , atr , macd, realized_vol . +Feature disponibili: open, high, low, close, volume. + +Le regole sono valutate in ordine; la prima che matcha vince per ogni timestamp. +La default action se nessuna regola matcha è 'flat'. + +Rispondi SOLO con la S-expression in un fence ```lisp ... ```, senza prosa, +senza spiegazioni. Esempio formato: + +```lisp +(strategy + (when (gt (indicator rsi 14) 70.0) (entry-short)) + (when (lt (indicator rsi 14) 30.0) (entry-long))) +``` +""" + + +USER_TEMPLATE = """\ +Mercato: {symbol} timeframe {timeframe}, {n_bars} barre osservate. +Statistiche return: mean={return_mean:.5f}, std={return_std:.5f}, skew={skew:.3f}, kurt={kurtosis:.3f}. +Regime volatilità: {volatility_regime}. + +Feature accessibili dal tuo genoma: {feature_access}. +Lookback massimo che puoi usare nel ragionamento: {lookback_window} barre. + +Genera una strategia che cerchi anomalie sfruttabili in questo regime. +""" + + +_SEXP_FENCE_RE = re.compile(r"```(?:lisp|scheme|sexp)?\s*(\(strategy[\s\S]*?\))\s*```", re.MULTILINE) + + +def _extract_sexp(text: str) -> str | None: + m = _SEXP_FENCE_RE.search(text) + if m: + return m.group(1) + if text.strip().startswith("(strategy"): + return text.strip() + return None + + +class HypothesisAgent: + def __init__(self, llm: LLMClient): + self._llm = llm + + def propose( + self, + genome: HypothesisAgentGenome, + market: MarketSummary, + ) -> HypothesisProposal: + system = SYSTEM_TEMPLATE.format( + cognitive_style=genome.cognitive_style, + system_prompt=genome.system_prompt, + ) + user = USER_TEMPLATE.format( + symbol=market.symbol, + timeframe=market.timeframe, + n_bars=market.n_bars, + return_mean=market.return_mean, + return_std=market.return_std, + skew=market.skew, + kurtosis=market.kurtosis, + volatility_regime=market.volatility_regime, + feature_access=", ".join(genome.feature_access), + lookback_window=genome.lookback_window, + ) + + completion = self._llm.complete(genome, system=system, user=user) + + sexp = _extract_sexp(completion.text) + if sexp is None: + return HypothesisProposal( + strategy=None, raw_text=completion.text, completion=completion, + parse_error="no s-expression found in output", + ) + try: + ast = parse_strategy(sexp) + validate_strategy(ast) + return HypothesisProposal( + strategy=ast, raw_text=completion.text, completion=completion, + ) + except (ParseError, ValidationError) as e: + return HypothesisProposal( + strategy=None, raw_text=completion.text, completion=completion, + parse_error=str(e), + ) +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_hypothesis_agent.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/agents/ tests/unit/test_hypothesis_agent.py +git commit -m "feat(agents): hypothesis agent with prompt template + s-expr extraction" +``` + +--- + +## Task 20: Falsification agent (hand-crafted) + +**Files:** +- Create: `src/multi_swarm/agents/falsification.py` +- Test: `tests/unit/test_falsification.py` + +In Phase 1 il Falsification è completamente deterministic: prende una strategy AST, la compila, fa girare il backtest sul training set, calcola DSR + drawdown + altre metriche, restituisce un `FalsificationReport`. + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_falsification.py +from datetime import datetime, timezone +import numpy as np +import pandas as pd +import pytest +from multi_swarm.agents.falsification import FalsificationAgent, FalsificationReport +from multi_swarm.protocol.parser import parse_strategy + + +@pytest.fixture +def trending_ohlcv(): + idx = pd.date_range("2024-01-01", periods=500, freq="1h", tz="UTC") + close = 100 + np.cumsum(np.random.RandomState(0).normal(0.01, 1.0, 500)) + return pd.DataFrame( + {"open": close, "high": close + 0.5, "low": close - 0.5, "close": close, "volume": 1.0}, + index=idx, + ) + + +def test_falsification_returns_report(trending_ohlcv): + src = "(strategy (when (gt (indicator rsi 14) 70.0) (entry-short)) (when (lt (indicator rsi 14) 30.0) (entry-long)))" + ast = parse_strategy(src) + agent = FalsificationAgent(fees_bp=5.0, n_trials_dsr=20) + report = agent.evaluate(ast, trending_ohlcv) + assert isinstance(report, FalsificationReport) + assert isinstance(report.sharpe, float) + assert isinstance(report.dsr, float) + assert 0.0 <= report.dsr <= 1.0 + assert isinstance(report.max_drawdown, float) + assert isinstance(report.n_trades, int) + + +def test_falsification_zero_trades_returns_zero_metrics(trending_ohlcv): + src = "(strategy (when (gt (feature close) 1e9) (entry-long)))" + ast = parse_strategy(src) + agent = FalsificationAgent(fees_bp=5.0, n_trials_dsr=20) + report = agent.evaluate(ast, trending_ohlcv) + assert report.n_trades == 0 + assert report.sharpe == 0.0 +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_falsification.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare falsification** + +```python +# src/multi_swarm/agents/falsification.py +from __future__ import annotations + +from dataclasses import dataclass + +import pandas as pd + +from ..backtest.engine import BacktestEngine +from ..metrics.basic import max_drawdown, sharpe_ratio, total_return +from ..metrics.dsr import deflated_sharpe_ratio +from ..protocol.compiler import compile_strategy +from ..protocol.parser import Strategy + + +@dataclass(frozen=True) +class FalsificationReport: + sharpe: float + dsr: float + dsr_pvalue: float + max_drawdown: float + total_return: float + n_trades: int + n_bars: int + + +class FalsificationAgent: + def __init__(self, fees_bp: float = 5.0, n_trials_dsr: int = 50): + self._engine = BacktestEngine(fees_bp=fees_bp) + self._n_trials_dsr = n_trials_dsr + + def evaluate(self, strategy: Strategy, ohlcv: pd.DataFrame) -> FalsificationReport: + signal_fn = compile_strategy(strategy) + signals = signal_fn(ohlcv) + result = self._engine.run(ohlcv, signals) + + if len(result.trades) == 0: + return FalsificationReport( + sharpe=0.0, dsr=0.0, dsr_pvalue=1.0, max_drawdown=0.0, + total_return=0.0, n_trades=0, n_bars=len(ohlcv), + ) + + sr = sharpe_ratio(result.returns, periods_per_year=8760) + dsr, p = deflated_sharpe_ratio( + result.returns, + n_trials=self._n_trials_dsr, + periods_per_year=8760, + sharpe_var=1.0, + ) + return FalsificationReport( + sharpe=sr, + dsr=dsr, + dsr_pvalue=p, + max_drawdown=max_drawdown(result.equity_curve + 1.0), # +1 evita div per 0 + total_return=total_return(result.equity_curve + 1.0), + n_trades=len(result.trades), + n_bars=len(ohlcv), + ) +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_falsification.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/agents/falsification.py tests/unit/test_falsification.py +git commit -m "feat(agents): hand-crafted falsification (compile→backtest→DSR)" +``` + +--- + +## Task 21: Adversarial agent (hand-crafted) + +**Files:** +- Create: `src/multi_swarm/agents/adversarial.py` +- Test: `tests/unit/test_adversarial.py` + +In Phase 1 l'Adversarial è hand-crafted con check euristici deterministic, no LLM. Verifica: +- `lookahead_check`: il numero di trade è coerente con i segnali (no trade su barra t senza segnale a t-1). +- `degenerate_check`: la strategia non è banale (es. sempre long, sempre flat). +- `trade_frequency_check`: troppi trade (>1 ogni 5 bar) = strategia rumorosa, flag warning. +- `single_trade_check`: 1-2 trade su 500 barre = lucky shot, flag warning. + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_adversarial.py +import numpy as np +import pandas as pd +import pytest +from multi_swarm.agents.adversarial import AdversarialAgent, AdversarialReport, Severity +from multi_swarm.protocol.parser import parse_strategy + + +@pytest.fixture +def ohlcv(): + idx = pd.date_range("2024-01-01", periods=500, freq="1h", tz="UTC") + close = 100 + np.cumsum(np.random.RandomState(0).normal(0.0, 1.0, 500)) + return pd.DataFrame( + {"open": close, "high": close + 0.5, "low": close - 0.5, "close": close, "volume": 1.0}, + index=idx, + ) + + +def test_degenerate_always_long_flagged(ohlcv): + src = "(strategy (when (gt (feature close) -1e9) (entry-long)))" + ast = parse_strategy(src) + agent = AdversarialAgent() + report = agent.review(ast, ohlcv) + assert any(f.name == "degenerate" and f.severity == Severity.HIGH for f in report.findings) + + +def test_no_findings_on_reasonable_strategy(ohlcv): + src = "(strategy (when (gt (indicator rsi 14) 70.0) (entry-short)) (when (lt (indicator rsi 14) 30.0) (entry-long)))" + ast = parse_strategy(src) + agent = AdversarialAgent() + report = agent.review(ast, ohlcv) + high_findings = [f for f in report.findings if f.severity == Severity.HIGH] + assert len(high_findings) == 0 + + +def test_zero_trade_strategy_flagged(ohlcv): + src = "(strategy (when (gt (feature close) 1e9) (entry-long)))" + ast = parse_strategy(src) + agent = AdversarialAgent() + report = agent.review(ast, ohlcv) + assert any(f.name == "no_trades" for f in report.findings) +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_adversarial.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare adversarial** + +```python +# src/multi_swarm/agents/adversarial.py +from __future__ import annotations + +from dataclasses import dataclass, field +from enum import Enum + +import pandas as pd + +from ..backtest.engine import BacktestEngine +from ..backtest.orders import Side +from ..protocol.compiler import compile_strategy +from ..protocol.parser import Strategy + + +class Severity(str, Enum): + LOW = "low" + MEDIUM = "medium" + HIGH = "high" + + +@dataclass(frozen=True) +class Finding: + name: str + severity: Severity + detail: str + + +@dataclass +class AdversarialReport: + findings: list[Finding] = field(default_factory=list) + + +class AdversarialAgent: + def __init__(self, fees_bp: float = 5.0): + self._engine = BacktestEngine(fees_bp=fees_bp) + + def review(self, strategy: Strategy, ohlcv: pd.DataFrame) -> AdversarialReport: + signal_fn = compile_strategy(strategy) + signals = signal_fn(ohlcv) + result = self._engine.run(ohlcv, signals) + + report = AdversarialReport() + + if len(result.trades) == 0: + report.findings.append(Finding( + name="no_trades", severity=Severity.HIGH, + detail="Strategy never opens a position on training data", + )) + return report + + unique_signals = signals.unique() + if len(unique_signals) == 1 and unique_signals[0] in (Side.LONG, Side.SHORT): + report.findings.append(Finding( + name="degenerate", severity=Severity.HIGH, + detail=f"Strategy is always {unique_signals[0].value}, no real decision", + )) + + n_bars = len(ohlcv) + n_trades = len(result.trades) + if n_trades > n_bars / 5: + report.findings.append(Finding( + name="overtrading", severity=Severity.MEDIUM, + detail=f"{n_trades} trades on {n_bars} bars (>1 per 5 bars)", + )) + if n_trades < 5: + report.findings.append(Finding( + name="undertrading", severity=Severity.MEDIUM, + detail=f"only {n_trades} trades — likely lucky shot", + )) + + return report +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_adversarial.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/agents/adversarial.py tests/unit/test_adversarial.py +git commit -m "feat(agents): hand-crafted adversarial with heuristic checks" +``` + +--- + +## Task 22: Fitness function v0 + +**Files:** +- Create: `src/multi_swarm/ga/__init__.py` +- Create: `src/multi_swarm/ga/fitness.py` +- Test: `tests/unit/test_fitness.py` + +Fitness v0: `dsr - drawdown_penalty * max_drawdown`. Default `drawdown_penalty = 0.5`. Strategy con 0 trade = fitness 0 (non penalizzata negativamente, ma neutrale). + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_fitness.py +from multi_swarm.agents.falsification import FalsificationReport +from multi_swarm.agents.adversarial import AdversarialReport, Finding, Severity +from multi_swarm.ga.fitness import compute_fitness + + +def make_falsification(dsr=0.7, max_dd=0.2, n_trades=30): + return FalsificationReport( + sharpe=1.5, dsr=dsr, dsr_pvalue=0.05, max_drawdown=max_dd, + total_return=0.3, n_trades=n_trades, n_bars=500, + ) + + +def test_fitness_zero_trades_is_zero(): + f = make_falsification(n_trades=0) + a = AdversarialReport() + assert compute_fitness(f, a) == 0.0 + + +def test_fitness_increases_with_dsr(): + a = AdversarialReport() + f1 = make_falsification(dsr=0.5) + f2 = make_falsification(dsr=0.9) + assert compute_fitness(f2, a) > compute_fitness(f1, a) + + +def test_fitness_decreases_with_drawdown(): + a = AdversarialReport() + f1 = make_falsification(max_dd=0.1) + f2 = make_falsification(max_dd=0.4) + assert compute_fitness(f1, a) > compute_fitness(f2, a) + + +def test_fitness_zeroed_by_high_severity_finding(): + f = make_falsification() + a = AdversarialReport(findings=[Finding(name="degenerate", severity=Severity.HIGH, detail="x")]) + assert compute_fitness(f, a) == 0.0 +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_fitness.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare fitness** + +```python +# src/multi_swarm/ga/__init__.py +``` + +```python +# src/multi_swarm/ga/fitness.py +from __future__ import annotations + +from ..agents.adversarial import AdversarialReport, Severity +from ..agents.falsification import FalsificationReport + + +def compute_fitness( + falsification: FalsificationReport, + adversarial: AdversarialReport, + drawdown_penalty: float = 0.5, +) -> float: + """Fitness v0 Phase 1. + + Logica: + 1. Se 0 trade → fitness 0. + 2. Se almeno un finding HIGH adversarial → fitness 0 (kill). + 3. Altrimenti: dsr - drawdown_penalty * max_drawdown, clamped a 0. + """ + if falsification.n_trades == 0: + return 0.0 + if any(f.severity == Severity.HIGH for f in adversarial.findings): + return 0.0 + raw = falsification.dsr - drawdown_penalty * falsification.max_drawdown + return max(0.0, float(raw)) +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_fitness.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/ga/ tests/unit/test_fitness.py +git commit -m "feat(ga): fitness v0 (DSR - dd_penalty * max_dd, kill on adversarial high)" +``` + +--- + +## Task 23: GA — tournament selection + elitism + +**Files:** +- Create: `src/multi_swarm/ga/selection.py` +- Test: `tests/unit/test_selection.py` + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_selection.py +import random +from multi_swarm.genome.hypothesis import HypothesisAgentGenome, ModelTier +from multi_swarm.ga.selection import tournament_select, elite_select + + +def make(idx: int) -> HypothesisAgentGenome: + return HypothesisAgentGenome( + system_prompt=f"p-{idx}", feature_access=["close"], temperature=0.9, + top_p=0.95, model_tier=ModelTier.C, lookback_window=100, cognitive_style="x", + ) + + +def test_tournament_picks_best_in_sample(): + population = [make(i) for i in range(10)] + fitnesses = {g.id: float(i) for i, g in enumerate(population)} + rng = random.Random(0) + winner = tournament_select(population, fitnesses, k=5, rng=rng) + assert isinstance(winner, HypothesisAgentGenome) + assert fitnesses[winner.id] >= 0.0 + + +def test_tournament_size_one_is_random(): + population = [make(i) for i in range(10)] + fitnesses = {g.id: float(i) for i, g in enumerate(population)} + rng = random.Random(0) + picks = [tournament_select(population, fitnesses, k=1, rng=rng) for _ in range(50)] + distinct = {p.id for p in picks} + assert len(distinct) > 1 + + +def test_elite_select_returns_top_k(): + population = [make(i) for i in range(10)] + fitnesses = {g.id: float(i) for i, g in enumerate(population)} + elites = elite_select(population, fitnesses, k=3) + elite_fitnesses = sorted([fitnesses[g.id] for g in elites], reverse=True) + assert elite_fitnesses == [9.0, 8.0, 7.0] +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_selection.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare selection** + +```python +# src/multi_swarm/ga/selection.py +from __future__ import annotations + +import random + +from ..genome.hypothesis import HypothesisAgentGenome + + +def tournament_select( + population: list[HypothesisAgentGenome], + fitnesses: dict[str, float], + k: int, + rng: random.Random, +) -> HypothesisAgentGenome: + """Estrae k individui random e restituisce il migliore.""" + if k < 1: + raise ValueError("k must be >= 1") + if not population: + raise ValueError("empty population") + candidates = rng.sample(population, k=min(k, len(population))) + return max(candidates, key=lambda g: fitnesses.get(g.id, 0.0)) + + +def elite_select( + population: list[HypothesisAgentGenome], + fitnesses: dict[str, float], + k: int, +) -> list[HypothesisAgentGenome]: + """Restituisce i k genomi con fitness più alta.""" + sorted_pop = sorted(population, key=lambda g: fitnesses.get(g.id, 0.0), reverse=True) + return sorted_pop[:k] +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_selection.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/ga/selection.py tests/unit/test_selection.py +git commit -m "feat(ga): tournament selection + elitism" +``` + +--- + +## Task 24: GA — generation step (loop di una generazione) + +**Files:** +- Create: `src/multi_swarm/ga/loop.py` +- Test: `tests/unit/test_ga_loop.py` + +`step()`: dato (popolazione, fitnesses, RNG, config), produce la prossima popolazione tramite elitism + tournament selection + (mutation OR crossover) per riempire i restanti slot. + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_ga_loop.py +import random +from multi_swarm.genome.hypothesis import HypothesisAgentGenome, ModelTier +from multi_swarm.ga.loop import next_generation, GAConfig + + +def make(idx: int) -> HypothesisAgentGenome: + return HypothesisAgentGenome( + system_prompt=f"p-{idx}", feature_access=["close"], temperature=0.9, + top_p=0.95, model_tier=ModelTier.C, lookback_window=100, cognitive_style="x", + ) + + +def test_next_generation_size_preserved(): + population = [make(i) for i in range(20)] + fitnesses = {g.id: float(i) for i, g in enumerate(population)} + cfg = GAConfig(population_size=20, elite_k=2, tournament_k=3, p_crossover=0.5) + new_pop = next_generation(population, fitnesses, cfg, rng=random.Random(0)) + assert len(new_pop) == 20 + + +def test_next_generation_includes_elites(): + population = [make(i) for i in range(20)] + fitnesses = {g.id: float(i) for i, g in enumerate(population)} + cfg = GAConfig(population_size=20, elite_k=2, tournament_k=3, p_crossover=0.5) + new_pop = next_generation(population, fitnesses, cfg, rng=random.Random(0)) + elite_ids = {g.id for g in sorted(population, key=lambda g: fitnesses[g.id], reverse=True)[:2]} + new_ids = {g.id for g in new_pop} + assert elite_ids.issubset(new_ids) + + +def test_next_generation_increments_generation_for_offspring(): + population = [make(i) for i in range(20)] + fitnesses = {g.id: float(i) for i, g in enumerate(population)} + cfg = GAConfig(population_size=20, elite_k=2, tournament_k=3, p_crossover=0.5) + new_pop = next_generation(population, fitnesses, cfg, rng=random.Random(0)) + new_offspring = [g for g in new_pop if g.id not in {p.id for p in population}] + assert all(g.generation > 0 for g in new_offspring) +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_ga_loop.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare loop** + +```python +# src/multi_swarm/ga/loop.py +from __future__ import annotations + +import random +from dataclasses import dataclass + +from ..genome.crossover import uniform_crossover +from ..genome.hypothesis import HypothesisAgentGenome +from ..genome.mutation import random_mutate +from .selection import elite_select, tournament_select + + +@dataclass(frozen=True) +class GAConfig: + population_size: int + elite_k: int + tournament_k: int + p_crossover: float + + +def next_generation( + population: list[HypothesisAgentGenome], + fitnesses: dict[str, float], + cfg: GAConfig, + rng: random.Random, +) -> list[HypothesisAgentGenome]: + new_pop: list[HypothesisAgentGenome] = list(elite_select(population, fitnesses, cfg.elite_k)) + + while len(new_pop) < cfg.population_size: + if rng.random() < cfg.p_crossover and len(population) >= 2: + p1 = tournament_select(population, fitnesses, cfg.tournament_k, rng) + p2 = tournament_select(population, fitnesses, cfg.tournament_k, rng) + child = uniform_crossover(p1, p2, rng) + else: + parent = tournament_select(population, fitnesses, cfg.tournament_k, rng) + child = random_mutate(parent, rng) + new_pop.append(child) + + return new_pop[: cfg.population_size] +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_ga_loop.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/ga/loop.py tests/unit/test_ga_loop.py +git commit -m "feat(ga): next_generation step (elitism + tournament + mutate/crossover)" +``` + +--- + +## Task 25: SQLite schema + repository + +**Files:** +- Create: `src/multi_swarm/persistence/__init__.py` +- Create: `src/multi_swarm/persistence/schema.py` +- Create: `src/multi_swarm/persistence/repository.py` +- Test: `tests/unit/test_repository.py` + +Schema essenziale Phase 1: +- `runs(id, name, started_at, completed_at, status, config_json, total_cost_usd)` +- `generations(run_id, generation_idx, started_at, completed_at, n_genomes, fitness_median, fitness_max, fitness_p90, entropy)` +- `genomes(id, run_id, generation_idx, payload_json)` +- `evaluations(genome_id, run_id, fitness, dsr, dsr_pvalue, sharpe, max_dd, total_return, n_trades, parse_error, raw_text, eval_ts)` +- `cost_records(id, run_id, agent_id, ts, tier, input_tokens, output_tokens, cost_usd)` +- `adversarial_findings(genome_id, run_id, name, severity, detail)` + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_repository.py +from pathlib import Path +import json +from multi_swarm.persistence.repository import Repository +from multi_swarm.genome.hypothesis import HypothesisAgentGenome, ModelTier + + +def make_genome(idx: int) -> HypothesisAgentGenome: + return HypothesisAgentGenome( + system_prompt=f"p-{idx}", feature_access=["close"], temperature=0.9, + top_p=0.95, model_tier=ModelTier.C, lookback_window=100, cognitive_style="x", + ) + + +def test_repository_creates_schema(tmp_path: Path): + repo = Repository(db_path=tmp_path / "runs.db") + repo.init_schema() + assert (tmp_path / "runs.db").exists() + + +def test_repository_create_run_and_get(tmp_path: Path): + repo = Repository(db_path=tmp_path / "runs.db") + repo.init_schema() + run_id = repo.create_run(name="phase1-test", config={"k": 20}) + run = repo.get_run(run_id) + assert run["name"] == "phase1-test" + assert json.loads(run["config_json"])["k"] == 20 + + +def test_repository_save_genome_and_evaluation(tmp_path: Path): + repo = Repository(db_path=tmp_path / "runs.db") + repo.init_schema() + run_id = repo.create_run(name="t", config={}) + g = make_genome(0) + repo.save_genome(run_id=run_id, generation_idx=0, genome=g) + repo.save_evaluation( + run_id=run_id, genome_id=g.id, fitness=0.5, dsr=0.7, dsr_pvalue=0.05, + sharpe=1.5, max_dd=0.2, total_return=0.3, n_trades=30, + parse_error=None, raw_text="(strategy ...)", + ) + evals = repo.list_evaluations(run_id) + assert len(evals) == 1 + assert evals[0]["fitness"] == 0.5 + + +def test_repository_save_generation_summary(tmp_path: Path): + repo = Repository(db_path=tmp_path / "runs.db") + repo.init_schema() + run_id = repo.create_run(name="t", config={}) + repo.save_generation_summary( + run_id=run_id, generation_idx=0, n_genomes=20, + fitness_median=0.3, fitness_max=0.8, fitness_p90=0.7, entropy=0.85, + ) + gens = repo.list_generations(run_id) + assert len(gens) == 1 + assert gens[0]["fitness_max"] == 0.8 +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_repository.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare schema + repository** + +```python +# src/multi_swarm/persistence/__init__.py +``` + +```python +# src/multi_swarm/persistence/schema.py +SCHEMA_SQL = """ +CREATE TABLE IF NOT EXISTS runs ( + id TEXT PRIMARY KEY, + name TEXT NOT NULL, + started_at TEXT NOT NULL, + completed_at TEXT, + status TEXT NOT NULL DEFAULT 'running', + config_json TEXT NOT NULL, + total_cost_usd REAL NOT NULL DEFAULT 0.0 +); + +CREATE TABLE IF NOT EXISTS generations ( + run_id TEXT NOT NULL, + generation_idx INTEGER NOT NULL, + started_at TEXT, + completed_at TEXT, + n_genomes INTEGER NOT NULL, + fitness_median REAL NOT NULL, + fitness_max REAL NOT NULL, + fitness_p90 REAL NOT NULL, + entropy REAL NOT NULL, + PRIMARY KEY (run_id, generation_idx), + FOREIGN KEY (run_id) REFERENCES runs(id) +); + +CREATE TABLE IF NOT EXISTS genomes ( + id TEXT NOT NULL, + run_id TEXT NOT NULL, + generation_idx INTEGER NOT NULL, + payload_json TEXT NOT NULL, + PRIMARY KEY (id, run_id, generation_idx), + FOREIGN KEY (run_id) REFERENCES runs(id) +); + +CREATE TABLE IF NOT EXISTS evaluations ( + run_id TEXT NOT NULL, + genome_id TEXT NOT NULL, + fitness REAL NOT NULL, + dsr REAL NOT NULL, + dsr_pvalue REAL NOT NULL, + sharpe REAL NOT NULL, + max_dd REAL NOT NULL, + total_return REAL NOT NULL, + n_trades INTEGER NOT NULL, + parse_error TEXT, + raw_text TEXT, + eval_ts TEXT NOT NULL, + PRIMARY KEY (run_id, genome_id), + FOREIGN KEY (run_id) REFERENCES runs(id) +); + +CREATE TABLE IF NOT EXISTS cost_records ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + run_id TEXT NOT NULL, + agent_id TEXT NOT NULL, + ts TEXT NOT NULL, + tier TEXT NOT NULL, + input_tokens INTEGER NOT NULL, + output_tokens INTEGER NOT NULL, + cost_usd REAL NOT NULL, + FOREIGN KEY (run_id) REFERENCES runs(id) +); + +CREATE TABLE IF NOT EXISTS adversarial_findings ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + run_id TEXT NOT NULL, + genome_id TEXT NOT NULL, + name TEXT NOT NULL, + severity TEXT NOT NULL, + detail TEXT NOT NULL, + FOREIGN KEY (run_id) REFERENCES runs(id) +); + +CREATE INDEX IF NOT EXISTS idx_evaluations_fitness ON evaluations(run_id, fitness DESC); +CREATE INDEX IF NOT EXISTS idx_genomes_generation ON genomes(run_id, generation_idx); +CREATE INDEX IF NOT EXISTS idx_cost_run ON cost_records(run_id); +""" +``` + +```python +# src/multi_swarm/persistence/repository.py +from __future__ import annotations + +import json +import sqlite3 +import uuid +from datetime import datetime, timezone +from pathlib import Path +from typing import Any + +from ..genome.hypothesis import HypothesisAgentGenome +from .schema import SCHEMA_SQL + + +class Repository: + def __init__(self, db_path: Path | str): + self.db_path = Path(db_path) + + def _conn(self) -> sqlite3.Connection: + conn = sqlite3.connect(self.db_path, isolation_level=None) + conn.row_factory = sqlite3.Row + conn.execute("PRAGMA foreign_keys = ON") + conn.execute("PRAGMA journal_mode = WAL") + return conn + + def init_schema(self) -> None: + self.db_path.parent.mkdir(parents=True, exist_ok=True) + with self._conn() as conn: + conn.executescript(SCHEMA_SQL) + + @staticmethod + def _now() -> str: + return datetime.now(timezone.utc).isoformat() + + # runs + def create_run(self, name: str, config: dict[str, Any]) -> str: + rid = uuid.uuid4().hex + with self._conn() as conn: + conn.execute( + "INSERT INTO runs (id, name, started_at, status, config_json) VALUES (?,?,?,?,?)", + (rid, name, self._now(), "running", json.dumps(config)), + ) + return rid + + def complete_run(self, run_id: str, total_cost: float, status: str = "completed") -> None: + with self._conn() as conn: + conn.execute( + "UPDATE runs SET completed_at=?, status=?, total_cost_usd=? WHERE id=?", + (self._now(), status, total_cost, run_id), + ) + + def get_run(self, run_id: str) -> dict[str, Any]: + with self._conn() as conn: + row = conn.execute("SELECT * FROM runs WHERE id=?", (run_id,)).fetchone() + if row is None: + raise KeyError(run_id) + return dict(row) + + def list_runs(self) -> list[dict[str, Any]]: + with self._conn() as conn: + rows = conn.execute("SELECT * FROM runs ORDER BY started_at DESC").fetchall() + return [dict(r) for r in rows] + + # generations + def save_generation_summary( + self, run_id: str, generation_idx: int, n_genomes: int, + fitness_median: float, fitness_max: float, fitness_p90: float, entropy: float, + ) -> None: + with self._conn() as conn: + conn.execute( + """INSERT OR REPLACE INTO generations + (run_id, generation_idx, completed_at, n_genomes, + fitness_median, fitness_max, fitness_p90, entropy) + VALUES (?,?,?,?,?,?,?,?)""", + (run_id, generation_idx, self._now(), n_genomes, + fitness_median, fitness_max, fitness_p90, entropy), + ) + + def list_generations(self, run_id: str) -> list[dict[str, Any]]: + with self._conn() as conn: + rows = conn.execute( + "SELECT * FROM generations WHERE run_id=? ORDER BY generation_idx", + (run_id,), + ).fetchall() + return [dict(r) for r in rows] + + # genomes + def save_genome(self, run_id: str, generation_idx: int, genome: HypothesisAgentGenome) -> None: + with self._conn() as conn: + conn.execute( + "INSERT OR REPLACE INTO genomes (id, run_id, generation_idx, payload_json) VALUES (?,?,?,?)", + (genome.id, run_id, generation_idx, json.dumps(genome.to_dict())), + ) + + def list_genomes(self, run_id: str, generation_idx: int | None = None) -> list[dict[str, Any]]: + with self._conn() as conn: + if generation_idx is None: + rows = conn.execute( + "SELECT * FROM genomes WHERE run_id=? ORDER BY generation_idx, id", (run_id,), + ).fetchall() + else: + rows = conn.execute( + "SELECT * FROM genomes WHERE run_id=? AND generation_idx=? ORDER BY id", + (run_id, generation_idx), + ).fetchall() + return [dict(r) for r in rows] + + # evaluations + def save_evaluation( + self, run_id: str, genome_id: str, fitness: float, dsr: float, dsr_pvalue: float, + sharpe: float, max_dd: float, total_return: float, n_trades: int, + parse_error: str | None, raw_text: str | None, + ) -> None: + with self._conn() as conn: + conn.execute( + """INSERT OR REPLACE INTO evaluations + (run_id, genome_id, fitness, dsr, dsr_pvalue, sharpe, max_dd, + total_return, n_trades, parse_error, raw_text, eval_ts) + VALUES (?,?,?,?,?,?,?,?,?,?,?,?)""", + (run_id, genome_id, fitness, dsr, dsr_pvalue, sharpe, max_dd, + total_return, n_trades, parse_error, raw_text, self._now()), + ) + + def list_evaluations(self, run_id: str) -> list[dict[str, Any]]: + with self._conn() as conn: + rows = conn.execute( + "SELECT * FROM evaluations WHERE run_id=? ORDER BY fitness DESC", + (run_id,), + ).fetchall() + return [dict(r) for r in rows] + + # cost + def save_cost_record( + self, run_id: str, agent_id: str, tier: str, + input_tokens: int, output_tokens: int, cost_usd: float, + ) -> None: + with self._conn() as conn: + conn.execute( + """INSERT INTO cost_records + (run_id, agent_id, ts, tier, input_tokens, output_tokens, cost_usd) + VALUES (?,?,?,?,?,?,?)""", + (run_id, agent_id, self._now(), tier, input_tokens, output_tokens, cost_usd), + ) + + def total_cost(self, run_id: str) -> float: + with self._conn() as conn: + row = conn.execute( + "SELECT COALESCE(SUM(cost_usd), 0.0) AS c FROM cost_records WHERE run_id=?", + (run_id,), + ).fetchone() + return float(row["c"]) + + # adversarial + def save_adversarial_finding( + self, run_id: str, genome_id: str, name: str, severity: str, detail: str, + ) -> None: + with self._conn() as conn: + conn.execute( + """INSERT INTO adversarial_findings + (run_id, genome_id, name, severity, detail) VALUES (?,?,?,?,?)""", + (run_id, genome_id, name, severity, detail), + ) + + def list_adversarial_findings(self, run_id: str) -> list[dict[str, Any]]: + with self._conn() as conn: + rows = conn.execute( + "SELECT * FROM adversarial_findings WHERE run_id=? ORDER BY id", (run_id,), + ).fetchall() + return [dict(r) for r in rows] +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_repository.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/persistence/ tests/unit/test_repository.py +git commit -m "feat(persistence): SQLite schema + repository for runs/genomes/evals/cost" +``` + +--- + +## Task 26: Generation summary utilities (entropy, percentili) + +**Files:** +- Create: `src/multi_swarm/ga/summary.py` +- Test: `tests/unit/test_ga_summary.py` + +Helper per calcolare metriche aggregate di una generazione: median, max, p90, entropy della distribuzione di fitness (binned). L'entropy serve come gate Phase 1 (#4 dello spec). + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_ga_summary.py +import math +import pytest +from multi_swarm.ga.summary import generation_summary + + +def test_summary_basic_stats(): + fitnesses = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] + s = generation_summary(fitnesses, n_bins=5) + assert s["median"] == pytest.approx(0.45, abs=0.05) + assert s["max"] == pytest.approx(0.9) + assert 0.0 <= s["entropy"] <= math.log(5) + 0.01 + + +def test_summary_uniform_high_entropy(): + fitnesses = [0.1 * i for i in range(20)] + s_uniform = generation_summary(fitnesses, n_bins=5) + s_concentrated = generation_summary([0.5] * 20, n_bins=5) + assert s_uniform["entropy"] > s_concentrated["entropy"] + + +def test_summary_p90(): + fitnesses = list(range(100)) + s = generation_summary([float(x) for x in fitnesses], n_bins=10) + assert 88.0 <= s["p90"] <= 91.0 +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_ga_summary.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare summary** + +```python +# src/multi_swarm/ga/summary.py +from __future__ import annotations + +import math + +import numpy as np + + +def generation_summary(fitnesses: list[float], n_bins: int = 10) -> dict[str, float]: + arr = np.asarray(fitnesses, dtype=float) + if arr.size == 0: + return {"median": 0.0, "max": 0.0, "p90": 0.0, "entropy": 0.0} + median = float(np.median(arr)) + fmax = float(np.max(arr)) + p90 = float(np.percentile(arr, 90)) + + if fmax > 0: + normalized = arr / fmax + else: + normalized = arr + + hist, _ = np.histogram(normalized, bins=n_bins, range=(0.0, 1.0)) + probs = hist / hist.sum() if hist.sum() > 0 else hist + entropy = float(-sum(p * math.log(p) for p in probs if p > 0)) + + return {"median": median, "max": fmax, "p90": p90, "entropy": entropy} +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_ga_summary.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/ga/summary.py tests/unit/test_ga_summary.py +git commit -m "feat(ga): generation summary stats (median/max/p90/entropy)" +``` + +--- + +## Task 27: Initial population generator + +**Files:** +- Create: `src/multi_swarm/ga/initial.py` +- Test: `tests/unit/test_ga_initial.py` + +Genera popolazione iniziale K=20: stili cognitivi distribuiti uniformemente sui 6 stili, temperature random in [0.7, 1.2], lookback random in {100, 200, 300}, prompt generati da template fissi per ogni stile cognitivo. + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_ga_initial.py +import random +from multi_swarm.ga.initial import build_initial_population +from multi_swarm.genome.hypothesis import ModelTier + + +def test_initial_population_size(): + pop = build_initial_population(k=20, model_tier=ModelTier.C, rng=random.Random(0)) + assert len(pop) == 20 + + +def test_initial_population_unique_ids(): + pop = build_initial_population(k=20, model_tier=ModelTier.C, rng=random.Random(0)) + ids = {g.id for g in pop} + assert len(ids) == 20 + + +def test_initial_population_covers_all_styles(): + pop = build_initial_population(k=12, model_tier=ModelTier.C, rng=random.Random(0)) + styles = {g.cognitive_style for g in pop} + assert len(styles) == 6 + + +def test_initial_population_generation_zero(): + pop = build_initial_population(k=20, model_tier=ModelTier.C, rng=random.Random(0)) + assert all(g.generation == 0 for g in pop) + assert all(g.parent_ids == [] for g in pop) +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_ga_initial.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare initial** + +```python +# src/multi_swarm/ga/initial.py +from __future__ import annotations + +import random + +from ..genome.hypothesis import HypothesisAgentGenome, ModelTier +from ..genome.mutation import COGNITIVE_STYLES + + +STYLE_PROMPTS: dict[str, str] = { + "physicist": "Cerca leggi conservative, simmetrie, regimi di scala. Pensa in termini di flussi e potenziali.", + "biologist": "Cerca pattern adattivi, nicchie ecologiche, predator-prey dynamics tra partecipanti del mercato.", + "historian": "Cerca pattern ricorrenti su scale temporali multiple, analogie con regimi storici, mean reversion strutturali.", + "meteorologist": "Cerca regimi di volatilità che si autoalimentano, transizioni di stato come fronti, persistenza locale.", + "ecologist": "Cerca interazioni multi-asset, correlazioni cluster, segnali di stress sistemico nelle dinamiche di flusso.", + "engineer": "Cerca segnali con rapporto S/N favorevole, filtri causali, robustezza a perturbazioni di calibrazione.", +} + + +def build_initial_population( + k: int, + model_tier: ModelTier, + rng: random.Random, + feature_pool: tuple[str, ...] = ("close", "high", "low", "volume"), +) -> list[HypothesisAgentGenome]: + """Costruisce una popolazione iniziale K varia per stile cognitivo + parametri.""" + population: list[HypothesisAgentGenome] = [] + for i in range(k): + style = COGNITIVE_STYLES[i % len(COGNITIVE_STYLES)] + n_features = rng.randint(1, len(feature_pool)) + feats = sorted(rng.sample(feature_pool, k=n_features)) + g = HypothesisAgentGenome( + system_prompt=STYLE_PROMPTS[style], + feature_access=feats, + temperature=round(rng.uniform(0.7, 1.2), 2), + top_p=0.95, + model_tier=model_tier, + lookback_window=rng.choice([100, 150, 200, 300]), + cognitive_style=style, + ) + # Seed per garantire id univoco se duplicato (raro ma possibile) + while any(g.id == p.id for p in population): + g = HypothesisAgentGenome( + system_prompt=g.system_prompt + f" [seed-{i}-{rng.randint(0, 1_000_000)}]", + feature_access=g.feature_access, + temperature=g.temperature, + top_p=g.top_p, + model_tier=g.model_tier, + lookback_window=g.lookback_window, + cognitive_style=g.cognitive_style, + ) + population.append(g) + return population +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_ga_initial.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/ga/initial.py tests/unit/test_ga_initial.py +git commit -m "feat(ga): initial population generator with cognitive style coverage" +``` + +--- + +## Task 28: Market summary builder (statistiche per il prompt) + +**Files:** +- Create: `src/multi_swarm/agents/market_summary.py` +- Test: `tests/unit/test_market_summary.py` + +Calcola le statistiche del training set che vengono iniettate nel prompt dell'Hypothesis agent. + +- [ ] **Step 1: Scrivere test fallente** + +```python +# tests/unit/test_market_summary.py +import numpy as np +import pandas as pd +from multi_swarm.agents.market_summary import build_market_summary + + +def test_build_summary_basic(): + idx = pd.date_range("2024-01-01", periods=200, freq="1h", tz="UTC") + np.random.seed(0) + close = 100 + np.cumsum(np.random.normal(0, 1, 200)) + df = pd.DataFrame( + {"open": close, "high": close + 0.5, "low": close - 0.5, "close": close, "volume": 1.0}, + index=idx, + ) + s = build_market_summary(df, symbol="BTC/USDT", timeframe="1h") + assert s.symbol == "BTC/USDT" + assert s.timeframe == "1h" + assert s.n_bars == 200 + assert isinstance(s.return_mean, float) + assert isinstance(s.return_std, float) + assert s.volatility_regime in {"low", "medium", "high"} + + +def test_volatility_regime_high_for_volatile(): + idx = pd.date_range("2024-01-01", periods=200, freq="1h", tz="UTC") + np.random.seed(0) + close = 100 + np.cumsum(np.random.normal(0, 5.0, 200)) # alta vol + df = pd.DataFrame( + {"open": close, "high": close + 0.5, "low": close - 0.5, "close": close, "volume": 1.0}, + index=idx, + ) + s = build_market_summary(df, symbol="BTC/USDT", timeframe="1h") + assert s.volatility_regime in {"medium", "high"} +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/unit/test_market_summary.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare market summary** + +```python +# src/multi_swarm/agents/market_summary.py +from __future__ import annotations + +import numpy as np +import pandas as pd +from scipy import stats + +from .hypothesis import MarketSummary + + +def build_market_summary( + ohlcv: pd.DataFrame, symbol: str, timeframe: str, +) -> MarketSummary: + returns = ohlcv["close"].pct_change().dropna() + return_mean = float(returns.mean()) + return_std = float(returns.std(ddof=1)) + skew = float(stats.skew(returns, bias=False)) + kurt = float(stats.kurtosis(returns, fisher=True, bias=False)) + + if return_std < 0.005: + regime = "low" + elif return_std < 0.02: + regime = "medium" + else: + regime = "high" + + return MarketSummary( + symbol=symbol, + timeframe=timeframe, + n_bars=len(ohlcv), + return_mean=return_mean, + return_std=return_std, + skew=skew, + kurtosis=kurt, + volatility_regime=regime, + ) +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/unit/test_market_summary.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/agents/market_summary.py tests/unit/test_market_summary.py +git commit -m "feat(agents): market summary builder for hypothesis prompt" +``` + +--- + +## Task 29: Run orchestrator (end-to-end loop) + +**Files:** +- Create: `src/multi_swarm/orchestrator/__init__.py` +- Create: `src/multi_swarm/orchestrator/run.py` +- Test: `tests/integration/test_e2e_minimal_run.py` + +L'orchestrator coordina: load OHLCV → build summary → init pop → per ogni gen: chiedi LLM, falsifica, adversarial, fitness → salva su DB → next_generation. Configurazione via dataclass `RunConfig`. + +- [ ] **Step 1: Scrivere test integration** + +```python +# tests/integration/__init__.py +``` + +```python +# tests/integration/test_e2e_minimal_run.py +import random +from datetime import datetime, timezone +from pathlib import Path +import pytest +import numpy as np +import pandas as pd +from multi_swarm.orchestrator.run import RunConfig, run_phase1 +from multi_swarm.genome.hypothesis import ModelTier +from multi_swarm.persistence.repository import Repository +from multi_swarm.llm.client import CompletionResult + + +@pytest.fixture +def synthetic_ohlcv(): + idx = pd.date_range("2024-01-01", periods=500, freq="1h", tz="UTC") + close = 100 + np.cumsum(np.random.RandomState(0).normal(0.01, 1.0, 500)) + return pd.DataFrame( + {"open": close, "high": close + 0.5, "low": close - 0.5, "close": close, "volume": 1.0}, + index=idx, + ) + + +@pytest.fixture +def fake_llm(mocker): + """LLM mock che ritorna sempre una strategia valida.""" + fake = mocker.MagicMock() + fake.complete.return_value = CompletionResult( + text="```lisp\n(strategy (when (gt (indicator rsi 14) 70.0) (entry-short)) (when (lt (indicator rsi 14) 30.0) (entry-long)))\n```", + input_tokens=200, output_tokens=80, tier=ModelTier.C, model="qwen", + ) + return fake + + +def test_e2e_minimal_run_completes(tmp_path: Path, synthetic_ohlcv, fake_llm, mocker): + cfg = RunConfig( + run_name="e2e-test", + population_size=5, + n_generations=2, + elite_k=1, + tournament_k=2, + p_crossover=0.5, + seed=42, + model_tier=ModelTier.C, + symbol="BTC/USDT", + timeframe="1h", + fees_bp=5.0, + n_trials_dsr=10, + db_path=tmp_path / "runs.db", + ) + + run_id = run_phase1(cfg, ohlcv=synthetic_ohlcv, llm=fake_llm) + + repo = Repository(db_path=tmp_path / "runs.db") + run = repo.get_run(run_id) + assert run["status"] == "completed" + gens = repo.list_generations(run_id) + assert len(gens) == 2 + evals = repo.list_evaluations(run_id) + assert len(evals) >= 5 # almeno una popolazione +``` + +- [ ] **Step 2: Run test (deve fallire)** + +Run: `uv run pytest tests/integration/test_e2e_minimal_run.py -v` +Expected: FAIL. + +- [ ] **Step 3: Implementare orchestrator** + +```python +# src/multi_swarm/orchestrator/__init__.py +``` + +```python +# src/multi_swarm/orchestrator/run.py +from __future__ import annotations + +import random +from dataclasses import dataclass, field +from pathlib import Path + +import pandas as pd + +from ..agents.adversarial import AdversarialAgent +from ..agents.falsification import FalsificationAgent +from ..agents.hypothesis import HypothesisAgent +from ..agents.market_summary import build_market_summary +from ..ga.fitness import compute_fitness +from ..ga.initial import build_initial_population +from ..ga.loop import GAConfig, next_generation +from ..ga.summary import generation_summary +from ..genome.hypothesis import HypothesisAgentGenome, ModelTier +from ..llm.client import LLMClient +from ..llm.cost_tracker import CostTracker +from ..persistence.repository import Repository + + +@dataclass +class RunConfig: + run_name: str + population_size: int = 20 + n_generations: int = 10 + elite_k: int = 2 + tournament_k: int = 3 + p_crossover: float = 0.5 + seed: int = 42 + model_tier: ModelTier = ModelTier.C + symbol: str = "BTC/USDT" + timeframe: str = "1h" + fees_bp: float = 5.0 + n_trials_dsr: int = 50 + db_path: Path = field(default_factory=lambda: Path("./runs.db")) + + +def run_phase1( + cfg: RunConfig, + ohlcv: pd.DataFrame, + llm: LLMClient, +) -> str: + rng = random.Random(cfg.seed) + + repo = Repository(cfg.db_path) + repo.init_schema() + run_id = repo.create_run(name=cfg.run_name, config=cfg.__dict__ | {"db_path": str(cfg.db_path)}) + + market = build_market_summary(ohlcv, symbol=cfg.symbol, timeframe=cfg.timeframe) + + hypothesis_agent = HypothesisAgent(llm=llm) + falsification_agent = FalsificationAgent(fees_bp=cfg.fees_bp, n_trials_dsr=cfg.n_trials_dsr) + adversarial_agent = AdversarialAgent(fees_bp=cfg.fees_bp) + cost_tracker = CostTracker() + + population = build_initial_population(k=cfg.population_size, model_tier=cfg.model_tier, rng=rng) + fitnesses: dict[str, float] = {} + + ga_cfg = GAConfig( + population_size=cfg.population_size, + elite_k=cfg.elite_k, + tournament_k=cfg.tournament_k, + p_crossover=cfg.p_crossover, + ) + + try: + for gen in range(cfg.n_generations): + for genome in population: + if genome.id in fitnesses: + continue # elite already evaluated + repo.save_genome(run_id=run_id, generation_idx=gen, genome=genome) + proposal = hypothesis_agent.propose(genome, market) + cost_record = cost_tracker.record( + input_tokens=proposal.completion.input_tokens, + output_tokens=proposal.completion.output_tokens, + tier=proposal.completion.tier, + run_id=run_id, + agent_id=genome.id, + ) + repo.save_cost_record( + run_id=run_id, agent_id=genome.id, tier=cost_record.tier.value, + input_tokens=cost_record.input_tokens, output_tokens=cost_record.output_tokens, + cost_usd=cost_record.cost_usd, + ) + + if proposal.strategy is None: + repo.save_evaluation( + run_id=run_id, genome_id=genome.id, fitness=0.0, + dsr=0.0, dsr_pvalue=1.0, sharpe=0.0, max_dd=0.0, + total_return=0.0, n_trades=0, + parse_error=proposal.parse_error, raw_text=proposal.raw_text, + ) + fitnesses[genome.id] = 0.0 + continue + + fals = falsification_agent.evaluate(proposal.strategy, ohlcv) + adv = adversarial_agent.review(proposal.strategy, ohlcv) + for finding in adv.findings: + repo.save_adversarial_finding( + run_id=run_id, genome_id=genome.id, + name=finding.name, severity=finding.severity.value, detail=finding.detail, + ) + fit = compute_fitness(fals, adv) + repo.save_evaluation( + run_id=run_id, genome_id=genome.id, fitness=fit, + dsr=fals.dsr, dsr_pvalue=fals.dsr_pvalue, sharpe=fals.sharpe, + max_dd=fals.max_drawdown, total_return=fals.total_return, + n_trades=fals.n_trades, parse_error=None, raw_text=proposal.raw_text, + ) + fitnesses[genome.id] = fit + + gen_fitnesses = [fitnesses[g.id] for g in population] + summary = generation_summary(gen_fitnesses, n_bins=10) + repo.save_generation_summary( + run_id=run_id, generation_idx=gen, n_genomes=len(population), + fitness_median=summary["median"], fitness_max=summary["max"], + fitness_p90=summary["p90"], entropy=summary["entropy"], + ) + + if gen < cfg.n_generations - 1: + population = next_generation(population, fitnesses, ga_cfg, rng) + + repo.complete_run(run_id, total_cost=repo.total_cost(run_id), status="completed") + return run_id + except Exception: + repo.complete_run(run_id, total_cost=repo.total_cost(run_id), status="failed") + raise +``` + +- [ ] **Step 4: Run test (deve passare)** + +Run: `uv run pytest tests/integration/test_e2e_minimal_run.py -v` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add src/multi_swarm/orchestrator/ tests/integration/ +git commit -m "feat(orchestrator): end-to-end Phase 1 runner with persistence" +``` + +--- + +## Task 30: Streamlit dashboard skeleton + Overview page + +**Files:** +- Create: `src/multi_swarm/dashboard/__init__.py` +- Create: `src/multi_swarm/dashboard/streamlit_app.py` +- Create: `src/multi_swarm/dashboard/data.py` +- Create: `src/multi_swarm/dashboard/pages/01_overview.py` +- Test: `tests/integration/test_streamlit_smoke.py` + +`data.py` espone funzioni di lettura per le pagine Streamlit; `streamlit_app.py` è la home; `pages/01_overview.py` mostra ultima run + stato + spesa. + +- [ ] **Step 1: Implementare data layer della dashboard** + +```python +# src/multi_swarm/dashboard/__init__.py +``` + +```python +# src/multi_swarm/dashboard/data.py +from __future__ import annotations + +import json +from pathlib import Path + +import pandas as pd + +from ..persistence.repository import Repository + + +def get_repo(db_path: str | Path) -> Repository: + return Repository(db_path=db_path) + + +def list_runs_df(repo: Repository) -> pd.DataFrame: + return pd.DataFrame(repo.list_runs()) + + +def get_run_overview(repo: Repository, run_id: str) -> dict: + run = repo.get_run(run_id) + return { + "name": run["name"], + "started_at": run["started_at"], + "completed_at": run["completed_at"], + "status": run["status"], + "total_cost_usd": run["total_cost_usd"], + "config": json.loads(run["config_json"]), + } + + +def generations_df(repo: Repository, run_id: str) -> pd.DataFrame: + return pd.DataFrame(repo.list_generations(run_id)) + + +def evaluations_df(repo: Repository, run_id: str) -> pd.DataFrame: + return pd.DataFrame(repo.list_evaluations(run_id)) + + +def genomes_df(repo: Repository, run_id: str, generation_idx: int | None = None) -> pd.DataFrame: + rows = repo.list_genomes(run_id, generation_idx) + flat = [] + for r in rows: + payload = json.loads(r["payload_json"]) + flat.append({ + "id": r["id"], "generation_idx": r["generation_idx"], + **payload, + }) + return pd.DataFrame(flat) +``` + +- [ ] **Step 2: Streamlit home page** + +```python +# src/multi_swarm/dashboard/streamlit_app.py +from __future__ import annotations + +import os +from pathlib import Path + +import streamlit as st + +st.set_page_config(page_title="Multi-Swarm Phase 1", layout="wide") +st.title("Multi-Swarm Coevolutivo — Phase 1 dashboard") +st.markdown(""" +Naviga le pagine nel menu a sinistra: +- **Overview**: ultima run e stato globale. +- **GA Convergence**: fitness per generazione. +- **Genomes**: top-K genomi e ispezione qualitativa. +""") + +db_path = os.environ.get("DB_PATH", "./runs.db") +st.session_state["db_path"] = db_path +st.caption(f"DB path: `{Path(db_path).resolve()}`") +``` + +- [ ] **Step 3: Pagina Overview** + +```python +# src/multi_swarm/dashboard/pages/01_overview.py +from __future__ import annotations + +import streamlit as st + +from multi_swarm.dashboard.data import get_repo, get_run_overview, list_runs_df + +st.title("Overview") + +db_path = st.session_state.get("db_path", "./runs.db") +repo = get_repo(db_path) + +runs = list_runs_df(repo) +if runs.empty: + st.info("Nessuna run nel database. Esegui `scripts/run_phase1.py` per generarne una.") + st.stop() + +st.subheader("Tutte le run") +st.dataframe(runs[["id", "name", "started_at", "completed_at", "status", "total_cost_usd"]]) + +selected = st.selectbox("Seleziona run per dettaglio", runs["id"].tolist()) +overview = get_run_overview(repo, selected) + +col1, col2, col3, col4 = st.columns(4) +col1.metric("Status", overview["status"]) +col2.metric("Cost (USD)", f"{overview['total_cost_usd']:.4f}") +col3.metric("Started", overview["started_at"]) +col4.metric("Completed", overview["completed_at"] or "—") + +st.subheader("Config") +st.json(overview["config"]) +``` + +- [ ] **Step 4: Smoke test (importabilità)** + +```python +# tests/integration/test_streamlit_smoke.py +import importlib + + +def test_streamlit_app_imports(): + # Check the modules import without exec'ing Streamlit's runtime + importlib.import_module("multi_swarm.dashboard.data") + + +def test_dashboard_data_helpers_signatures(): + from multi_swarm.dashboard import data + assert hasattr(data, "list_runs_df") + assert hasattr(data, "generations_df") + assert hasattr(data, "evaluations_df") + assert hasattr(data, "genomes_df") +``` + +- [ ] **Step 5: Run smoke test** + +Run: `uv run pytest tests/integration/test_streamlit_smoke.py -v` +Expected: PASS. + +- [ ] **Step 6: Commit** + +```bash +git add src/multi_swarm/dashboard/ tests/integration/test_streamlit_smoke.py +git commit -m "feat(dashboard): streamlit skeleton + Overview page + data layer" +``` + +--- + +## Task 31: Streamlit page — GA Convergence + +**Files:** +- Create: `src/multi_swarm/dashboard/pages/02_ga_convergence.py` + +- [ ] **Step 1: Implementare pagina** + +```python +# src/multi_swarm/dashboard/pages/02_ga_convergence.py +from __future__ import annotations + +import plotly.graph_objects as go +import streamlit as st + +from multi_swarm.dashboard.data import generations_df, get_repo, list_runs_df + +st.title("GA Convergence") + +db_path = st.session_state.get("db_path", "./runs.db") +repo = get_repo(db_path) + +runs = list_runs_df(repo) +if runs.empty: + st.info("Nessuna run.") + st.stop() + +selected = st.selectbox("Run", runs["id"].tolist()) +gens = generations_df(repo, selected) +if gens.empty: + st.warning("Nessuna generazione registrata per questa run.") + st.stop() + +fig = go.Figure() +fig.add_trace(go.Scatter(x=gens["generation_idx"], y=gens["fitness_median"], name="median", mode="lines+markers")) +fig.add_trace(go.Scatter(x=gens["generation_idx"], y=gens["fitness_max"], name="max", mode="lines+markers")) +fig.add_trace(go.Scatter(x=gens["generation_idx"], y=gens["fitness_p90"], name="p90", mode="lines+markers")) +fig.update_layout(xaxis_title="generation", yaxis_title="fitness", title="Fitness convergence") +st.plotly_chart(fig, use_container_width=True) + +st.subheader("Entropy") +fig2 = go.Figure() +fig2.add_trace(go.Scatter(x=gens["generation_idx"], y=gens["entropy"], mode="lines+markers")) +fig2.add_hline(y=0.5, line_dash="dash", annotation_text="gate threshold (0.5)") +fig2.update_layout(xaxis_title="generation", yaxis_title="entropy", title="Diversity (fitness entropy)") +st.plotly_chart(fig2, use_container_width=True) + +st.subheader("Tabella generazioni") +st.dataframe(gens) +``` + +- [ ] **Step 2: Smoke test (importabilità)** + +Run: `uv run python -c "import importlib; importlib.import_module('multi_swarm.dashboard.pages.02_ga_convergence')"` + +Note: Streamlit pages prefixed with digits possono essere problematici per import diretto. Per il test possiamo ridurre a verifica della pagina via filesystem. + +```bash +test -f src/multi_swarm/dashboard/pages/02_ga_convergence.py && echo OK +``` + +Expected: stampa `OK`. + +- [ ] **Step 3: Commit** + +```bash +git add src/multi_swarm/dashboard/pages/02_ga_convergence.py +git commit -m "feat(dashboard): GA convergence page (median/max/p90 + entropy)" +``` + +--- + +## Task 32: Streamlit page — Genomes (basic) + +**Files:** +- Create: `src/multi_swarm/dashboard/pages/03_genomes.py` + +- [ ] **Step 1: Implementare pagina** + +```python +# src/multi_swarm/dashboard/pages/03_genomes.py +from __future__ import annotations + +import streamlit as st + +from multi_swarm.dashboard.data import ( + evaluations_df, genomes_df, get_repo, list_runs_df, +) + +st.title("Genomes") + +db_path = st.session_state.get("db_path", "./runs.db") +repo = get_repo(db_path) + +runs = list_runs_df(repo) +if runs.empty: + st.info("Nessuna run.") + st.stop() + +selected = st.selectbox("Run", runs["id"].tolist()) +evals = evaluations_df(repo, selected) +genomes = genomes_df(repo, selected) + +if evals.empty: + st.warning("Nessuna evaluation.") + st.stop() + +merged = evals.merge(genomes, left_on="genome_id", right_on="id", how="left", suffixes=("", "_g")) +top = merged.sort_values("fitness", ascending=False).head(10) + +st.subheader("Top-10 genomi (per fitness)") +display_cols = [ + "genome_id", "fitness", "dsr", "sharpe", "max_dd", "n_trades", + "cognitive_style", "temperature", "lookback_window", "feature_access", +] +existing = [c for c in display_cols if c in top.columns] +st.dataframe(top[existing]) + +st.subheader("Ispezione genoma") +gid = st.selectbox("Seleziona genome_id", top["genome_id"].tolist()) +row = merged[merged["genome_id"] == gid].iloc[0] + +col1, col2 = st.columns(2) +with col1: + st.metric("fitness", f"{row['fitness']:.3f}") + st.metric("DSR", f"{row['dsr']:.3f}") + st.metric("Sharpe", f"{row['sharpe']:.3f}") +with col2: + st.metric("max DD", f"{row['max_dd']:.3f}") + st.metric("trades", int(row["n_trades"])) + st.metric("style", str(row.get("cognitive_style", "—"))) + +st.subheader("System prompt") +st.code(row.get("system_prompt", "—")) + +st.subheader("Raw LLM output") +st.code(row.get("raw_text", "—")) + +if row.get("parse_error"): + st.error(f"Parse error: {row['parse_error']}") +``` + +- [ ] **Step 2: Smoke check filesystem** + +Run: `test -f src/multi_swarm/dashboard/pages/03_genomes.py && echo OK` +Expected: stampa `OK`. + +- [ ] **Step 3: Commit** + +```bash +git add src/multi_swarm/dashboard/pages/03_genomes.py +git commit -m "feat(dashboard): Genomes page (top-10 + inspection)" +``` + +--- + +## Task 33: Script di entry point per Phase 1 + +**Files:** +- Create: `scripts/__init__.py` +- Create: `scripts/run_phase1.py` + +Lo script orchestra il run reale: carica OHLCV, costruisce LLMClient con API key da .env, esegue `run_phase1`. Configurabile via CLI args con argparse. + +- [ ] **Step 1: Implementare script** + +```python +# scripts/__init__.py +``` + +```python +# scripts/run_phase1.py +from __future__ import annotations + +import argparse +from datetime import datetime, timezone +from pathlib import Path + +from multi_swarm.config import load_settings +from multi_swarm.data.ohlcv_loader import OHLCVLoader, OHLCVRequest +from multi_swarm.genome.hypothesis import ModelTier +from multi_swarm.llm.client import LLMClient +from multi_swarm.orchestrator.run import RunConfig, run_phase1 + + +def parse_args() -> argparse.Namespace: + p = argparse.ArgumentParser(description="Multi-Swarm Phase 1 runner") + p.add_argument("--name", default="phase1-spike-001") + p.add_argument("--population-size", type=int, default=20) + p.add_argument("--n-generations", type=int, default=10) + p.add_argument("--elite-k", type=int, default=2) + p.add_argument("--tournament-k", type=int, default=3) + p.add_argument("--p-crossover", type=float, default=0.5) + p.add_argument("--seed", type=int, default=42) + p.add_argument("--symbol", default="BTC/USDT") + p.add_argument("--timeframe", default="1h") + p.add_argument("--start", default="2024-01-01T00:00:00+00:00") + p.add_argument("--end", default="2026-01-01T00:00:00+00:00") + p.add_argument("--fees-bp", type=float, default=5.0) + p.add_argument("--n-trials-dsr", type=int, default=50) + return p.parse_args() + + +def main() -> None: + args = parse_args() + settings = load_settings() + + loader = OHLCVLoader(cache_dir=settings.series_dir) + req = OHLCVRequest( + symbol=args.symbol, + timeframe=args.timeframe, + start=datetime.fromisoformat(args.start), + end=datetime.fromisoformat(args.end), + ) + ohlcv = loader.load(req) + print(f"OHLCV loaded: {len(ohlcv)} bars from {ohlcv.index[0]} to {ohlcv.index[-1]}") + + llm = LLMClient( + openrouter_api_key=settings.openrouter_api_key.get_secret_value(), + anthropic_api_key=( + settings.anthropic_api_key.get_secret_value() + if settings.anthropic_api_key else None + ), + ) + + cfg = RunConfig( + run_name=args.name, + population_size=args.population_size, + n_generations=args.n_generations, + elite_k=args.elite_k, + tournament_k=args.tournament_k, + p_crossover=args.p_crossover, + seed=args.seed, + model_tier=ModelTier.C, + symbol=args.symbol, + timeframe=args.timeframe, + fees_bp=args.fees_bp, + n_trials_dsr=args.n_trials_dsr, + db_path=settings.db_path, + ) + + run_id = run_phase1(cfg, ohlcv=ohlcv, llm=llm) + print(f"Run completed: {run_id}") + + +if __name__ == "__main__": + main() +``` + +- [ ] **Step 2: Verifica importabilità** + +Run: `uv run python -c "from scripts import run_phase1; print(run_phase1.__doc__ or 'ok')"` +Expected: stampa `ok`. + +- [ ] **Step 3: Commit** + +```bash +git add scripts/ +git commit -m "feat(scripts): Phase 1 runner CLI entry point" +``` + +--- + +## Task 34: Smoke run (popolazione minima, 1 generazione, dry data) + +**Files:** +- Create: `scripts/smoke_run.py` + +Smoke run usa OHLCV sintetico generato in memoria + popolazione 3 + 1 generazione. Niente API LLM reale: usa `MockLLMClient` che restituisce strategy fissa. Serve a validare che tutto il loop gira senza errori prima di spendere token reali. + +- [ ] **Step 1: Implementare smoke** + +```python +# scripts/smoke_run.py +from __future__ import annotations + +from pathlib import Path + +import numpy as np +import pandas as pd + +from multi_swarm.genome.hypothesis import HypothesisAgentGenome, ModelTier +from multi_swarm.llm.client import CompletionResult +from multi_swarm.orchestrator.run import RunConfig, run_phase1 + + +class MockLLMClient: + def complete( + self, genome: HypothesisAgentGenome, system: str, user: str, + max_tokens: int = 2000, + ) -> CompletionResult: + text = ( + "```lisp\n" + "(strategy" + " (when (gt (indicator rsi 14) 70.0) (entry-short))" + " (when (lt (indicator rsi 14) 30.0) (entry-long)))\n" + "```" + ) + return CompletionResult( + text=text, input_tokens=120, output_tokens=60, + tier=genome.model_tier, model="mock", + ) + + +def main() -> None: + idx = pd.date_range("2024-01-01", periods=1000, freq="1h", tz="UTC") + close = 100 + np.cumsum(np.random.RandomState(0).normal(0.01, 1.0, 1000)) + ohlcv = pd.DataFrame( + {"open": close, "high": close + 0.5, "low": close - 0.5, "close": close, "volume": 1.0}, + index=idx, + ) + cfg = RunConfig( + run_name="smoke", + population_size=3, + n_generations=1, + elite_k=1, + tournament_k=2, + p_crossover=0.5, + seed=0, + model_tier=ModelTier.C, + db_path=Path("./runs.db"), + ) + run_id = run_phase1(cfg, ohlcv=ohlcv, llm=MockLLMClient()) # type: ignore[arg-type] + print(f"Smoke run completed: {run_id}") + + +if __name__ == "__main__": + main() +``` + +- [ ] **Step 2: Run smoke** + +Run: `uv run python scripts/smoke_run.py` +Expected: stampa `Smoke run completed: `. File `runs.db` esiste con 3 genomi e 1 generazione. + +- [ ] **Step 3: Commit** + +```bash +git add scripts/smoke_run.py +git commit -m "feat(scripts): smoke run with mock LLM and synthetic OHLCV" +``` + +--- + +## Task 35: Validazione Streamlit dashboard via dataset reale dello smoke run + +**Files:** +- (no new files) + +- [ ] **Step 1: Avviare dashboard sul DB della smoke run** + +Run: `DB_PATH=./runs.db uv run streamlit run src/multi_swarm/dashboard/streamlit_app.py` +Expected: il browser apre `http://localhost:8501`. Le 3 pagine (Overview, GA Convergence, Genomes) mostrano dati senza errori. + +- [ ] **Step 2: Verifica visiva (lista da spuntare manualmente)** + +- [ ] Overview elenca la run "smoke" con status `completed` e cost > 0. +- [ ] GA Convergence mostra 1 punto per generazione 0 (sarebbero 1 punto su asse x). +- [ ] Genomes mostra 3 genomi nella tabella. +- [ ] Clic su un genome_id mostra system_prompt e raw_text. + +Se uno qualunque fallisce, fix prima di chiudere il task. Documenta eventuali bug in `docs/runs/`. + +- [ ] **Step 3: Stop dashboard, commit eventuali fix** + +```bash +# Solo se sono stati fatti fix +git add -A +git commit -m "fix(dashboard): correggere " +``` + +--- + +## Task 36: Run completo Phase 1 con LLM reale (K=20, 10 generazioni, OHLCV 2 anni) + +**Files:** +- Modify: nessuno (solo esecuzione) +- Create: `docs/runs/2026-MM-DD-phase1-run-001.md` + +Questo è l'**evento operativo** della Phase 1: il primo run reale. Pre-requisiti: +- Cerbero locale **non** strettamente necessario per Phase 1 (il compiler usa indicatori built-in). Avviare Cerbero solo se gli agenti vorranno chiamare tool MCP per ispezione, ma in Phase 1 il prompt non lo prevede esplicitamente. +- API key OpenRouter configurata in `.env`. +- Budget tracker attivato — monitorare la spesa durante il run. + +- [ ] **Step 1: Pre-flight check** + +```bash +uv run pytest # tutta la suite verde +uv run ruff check src/ tests/ # linter pulito +uv run mypy src/multi_swarm # type check pulito (ammessi ignore mirati documentati) +``` + +Expected: tutti verde. + +- [ ] **Step 2: Esegui run reale** + +```bash +uv run python scripts/run_phase1.py \ + --name phase1-run-001 \ + --population-size 20 \ + --n-generations 10 \ + --elite-k 2 \ + --tournament-k 3 \ + --p-crossover 0.5 \ + --seed 42 \ + --symbol BTC/USDT \ + --timeframe 1h \ + --start 2024-01-01T00:00:00+00:00 \ + --end 2026-01-01T00:00:00+00:00 +``` + +Expected: durata stimata 30-90 minuti, spesa stimata $40-90 (single run, una su 5-10 totali fino a fine Phase 1). + +**Monitoring**: in altra shell, controllare cumulato cost ogni 5 minuti via dashboard Overview, oppure: + +```bash +sqlite3 runs.db "SELECT total_cost_usd FROM runs WHERE name='phase1-run-001'" +``` + +Stop manuale (`Ctrl+C`) se la spesa cumulata supera $120 — sintomo di token output runaway. + +- [ ] **Step 3: Apri dashboard e ispeziona** + +Run: `DB_PATH=./runs.db uv run streamlit run src/multi_swarm/dashboard/streamlit_app.py` + +Verifica che: +- 10 generazioni siano presenti. +- 20 genomi per generazione, almeno 16 con `parse_error IS NULL`. +- Top-5 genomi abbiano DSR ragionevole (>0). + +- [ ] **Step 4: Documenta il run** + +Crea `docs/runs/2026-MM-DD-phase1-run-001.md` (sostituire MM-DD con la data effettiva) con: + +```markdown +# Phase 1 — Run 001 + +**Data**: +**Config**: K=20, 10 gen, seed=42, symbol BTC/USDT 1h, dataset 2024-2026. +**Costo finale**: $ +**Durata wall-clock**: + +## Risultati sintetici + +- Top fitness: +- Median fitness gen finale: +- Entropia gen finale: +- % parse success: % +- # genomi con DSR > 0.5: + +## Anomalie + +- (es. parse error frequenti su prompt cognitive_style "engineer", da investigare) + +## Learning + +- ... + +## Action items + +- ... +``` + +- [ ] **Step 5: Commit** + +```bash +git add docs/runs/ +git commit -m "docs(runs): Phase 1 run-001 report" +``` + +--- + +## Task 37: Decision memo Phase 1 (gate evaluation) + +**Files:** +- Create: `docs/decisions/2026-MM-DD-gate-phase1.md` + +Compilare il decision memo gate Phase 1 sulla base dei risultati del run-001 (eventualmente più run se serve aggregare). + +- [ ] **Step 1: Author pass — scrivere il memo** + +```markdown +# Gate Phase 1 — Decision Memo + +**Data**: +**Run analizzati**: phase1-run-001 [, phase1-run-002, ...] +**Spesa totale Phase 1**: $ di $700 cap (=%) +**Tempo speso Phase 1**: settimane + +## Hard gate evaluation + +| # | Gate | Soglia | Misura | Esito | +|---|------|--------|--------|-------| +| 1 | Loop converge (median ↑ ≥3 gen) | 3 gen consecutive crescita | | PASS/FAIL | +| 2 | Output formalizzabile | ≥80% parse success | % | PASS/FAIL | +| 3 | Tail superiore | top-5 DSR ≥ 1.5x median | | PASS/FAIL | +| 4 | Diversità non collassa | entropy > 0.5 a fine run | | PASS/FAIL | +| 5 | Cost predictability | spesa entro ±30% stima | % deviazione | PASS/FAIL | + +## Conclusione (author) + +PASS / FAIL con razionale numerico ancorato alla tabella sopra. + +## Aggiustamenti raccomandati per Phase 2 (se PASS) + +- ... + +## Pivot/stop raccomandato (se FAIL) + +- ... +``` + +- [ ] **Step 2: Review pass — adversarial review del memo** + +Scegli una delle 3 opzioni dello spec sez. 9.2: +- subagent Claude red-team con prompt esplicito +- collega umano +- timer 48h fresh-eyes pass + +Aggiungi al memo una sezione `## Review pass (red team)` con la critica e le contro-evidenze. + +- [ ] **Step 3: Sintesi finale e decisione** + +Aggiungi `## Decisione finale` con uno di: +- GO Phase 2 (specificare scope, eventuali aggiustamenti) +- ITERATE Phase 1 (specificare cosa cambiare e re-run) +- PIVOT (specificare nuovo dominio o nuovo approach) +- STOP (specificare razionale e learnings) + +- [ ] **Step 4: Commit** + +```bash +git add docs/decisions/ +git commit -m "docs(decisions): Phase 1 gate decision memo with author + review pass" +``` + +--- + +## Task 38: Report tecnico Phase 1 + +**Files:** +- Create: `docs/reports/2026-MM-DD-phase1-technical-report.md` + +Report ~5 pagine come da spec Sez. 4.5. Contenuti: +1. Setup sperimentale (config, dataset, periodo, seed). +2. Loop convergence (grafico fitness mediana / max / p90 per generazione, screenshot dashboard). +3. Top-5 genomi: ispezione qualitativa (system_prompt, parametri, strategia generata, performance). +4. Parser failure modes: tassonomia degli errori di parse osservati, suggerimenti per Phase 2. +5. Costi reali vs preventivo: breakdown per tier, per agent, identificare ottimizzazioni. +6. Diversity metrics: entropia per generazione, distinct cognitive_style sopravvissuti. + +- [ ] **Step 1: Generare grafici dalla dashboard** + +Procedura: aprire la dashboard, fare screenshot delle pagine GA Convergence e Genomes, salvarli in `docs/reports/figures/phase1/`. + +- [ ] **Step 2: Scrivere il report** + +Fornire il file con la struttura sopra. Usare prosa italiana piena (regola CLAUDE.md per public artifacts). + +- [ ] **Step 3: Commit** + +```bash +git add docs/reports/ +git commit -m "docs(reports): Phase 1 technical report" +``` + +--- + +## Self-review + +Dopo aver completato la stesura, rilettura del plan a freddo per verificare: + +**1. Spec coverage** +- Scope IN Phase 1 (spec sez. 4.1): + - Backtest engine event-driven 1h walk-forward 70/30 → Task 6 (engine), Task 4 (splits) ✓ + - Cerbero wrapper come tool layer → Task 9-10 ✓ + - Protocollo S-expr fisso 12-15 verbi → Task 11-13 ✓ + - Hypothesis Swarm K=20 tier C → Task 27 (initial) + Task 19 (agent) + Task 33 (run script) ✓ + - Falsification + Adversarial hand-crafted → Task 20-21 ✓ + - Fitness v0 (DSR + drawdown penalty) → Task 22 ✓ + - GA loop 8-12 generazioni, tournament + elitism → Task 23-24 + Task 33 (default 10 gen) ✓ +- Hard gates (spec sez. 4.4): + - 1 loop converge → Task 26 (summary helpers per misurare) + Task 37 (memo) ✓ + - 2 parser >80% → repository memorizza parse_error, Task 37 lo misura ✓ + - 3 tail superiore → query SQL su evaluations ✓ + - 4 entropy > 0.5 → Task 26 + Task 31 (dashboard mostra hline) ✓ + - 5 cost predictability → Task 18 (tracker) + Task 25 (DB) + Task 37 (memo) ✓ +- GUI Phase 1 (spec sez. 7.2): + - Overview ✓ Task 30 + - GA Convergence ✓ Task 31 + - Genomes basic ✓ Task 32 +- Deliverable Phase 1 (spec sez. 4.5): + - Codice testato ✓ tutti task con TDD + - Report tecnico ~5 pp ✓ Task 38 + - Decision memo ✓ Task 37 + +**2. Placeholder scan** +- Date YYYY-MM-DD lasciate da compilare nei task 36/37/38: questi sono naturalmente dipendenti dalla data di esecuzione, non sono placeholder di logica. Marcare come "compila al momento del run". +- Pricing LLM in Task 18 è approssimativo: aggiornare con valori reali se OpenRouter cambia tariffa (controllare a inizio run). +- Nessun TBD/TODO nel codice. + +**3. Type consistency** +- `HypothesisAgentGenome` interfaccia stabile in tutti i task (id, generation, parent_ids, model_tier). +- `Side` enum coerente: LONG/SHORT/FLAT in backtest, compiler, agents, dashboard. +- `Strategy`/`Rule`/`Node` AST consistenti fra parser → validator → compiler. +- `FalsificationReport` campi usati identici in fitness (Task 22) e repository (Task 25): `dsr`, `dsr_pvalue`, `sharpe`, `max_drawdown`, `total_return`, `n_trades`. ✓ +- `AdversarialReport.findings` usato da fitness e repository: `name`, `severity`, `detail` consistenti. ✓ +- `CompletionResult` campi `text`, `input_tokens`, `output_tokens`, `tier`, `model`: identici fra LLMClient (Task 17), CostTracker (Task 18), HypothesisAgent (Task 19), Orchestrator (Task 29). ✓ + +**4. Granularità** +- Task piccoli e atomici (3-5 step), 38 task totali → ~150-200 step. Coerente con stima 4-6 settimane full-time. +- Test integration Task 29 e Task 35-36 richiedono setup più grande, ma sono passi singoli con sub-checklist esplicita. + +Nessuna correzione necessaria. Il plan è pronto. + +--- + +## Execution handoff + +Plan completo salvato in `docs/superpowers/plans/2026-05-09-phase1-lean-spike.md`. + +**Due opzioni di esecuzione:** + +1. **Subagent-Driven (raccomandata)** — un fresh subagent per task, review fra task, iterazione rapida. +2. **Inline Execution** — task eseguiti in questa stessa sessione con checkpoint per review. + +Quale approccio? +