fix(mcp-docugen): preprocessor HTML→Markdown per output Word leggibile
Il DOCX prodotto dalla versione precedente emetteva i div Tielogic
(`<div class="cover">`, `<div class="info-col">`, `<div class="acceptance">`,
`<div class="status-card">`) come testo grezzo: Pandoc non sa
interpretare il CSS-flavoured HTML del PDF e li copia letteralmente
nel documento Word. Anche le tabelle `<table class="financial">`
finivano spezzate cella per cella.
Il fix introduce un preprocessor dedicato che riscrive tutta la
HTML Tielogic-flavoured in Markdown nativo prima di passare il
documento a Pandoc.
- docx_preprocessor.py: nuovo modulo basato su BeautifulSoup. Strippa
frontmatter e <style>, poi rewrite di:
* <div class="cover"> → titoli H1/H2, paragrafi, tabella pipe
2-col FORNITORE/CLIENTE, validità in italic, \newpage finale
* <table class="financial"> → tabella pipe Markdown con riga
total-row in **bold**
* <div class="acceptance"> → heading H2 + intro + tabella pipe
con riga firma `_____________________` + luogo/data
* <div class="status-card"> → paragrafo "**name** — descrizione"
* <span class="badge ..."> → testo **bold**
* <div class="page-break"> → \newpage Pandoc-friendly
- docx_renderer.py: deferisce tutto il preprocessing al nuovo modulo
(più compatto, niente regex sparse).
- pyproject.toml + uv.lock: aggiunta dipendenza beautifulsoup4>=4.12.
- 8 nuovi test unit per il preprocessor (cover, tabelle, badge,
acceptance, idempotenza, niente div residui, badge standalone).
Adattati i test esistenti agli import dal nuovo modulo. 101 verde.
Smoke E2E via MCP: l'offerta TieMeasureFlow esce in DOCX leggibile
con tabelle Word native, heading colorati Tielogic e firme in tabella.
This commit is contained in:
@@ -75,6 +75,19 @@ wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/64/b4/17d4b0b2a2dc85a6df63d1157e028ed19f90d4cd97c36717afef2bc2f395/attrs-26.1.0-py3-none-any.whl", hash = "sha256:c647aa4a12dfbad9333ca4e71fe62ddc36f4e63b2d260a37a8b83d2f043ac309", size = 67548, upload-time = "2026-03-19T14:22:23.645Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "beautifulsoup4"
|
||||
version = "4.14.3"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "soupsieve" },
|
||||
{ name = "typing-extensions" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/c3/b0/1c6a16426d389813b48d95e26898aff79abbde42ad353958ad95cc8c9b21/beautifulsoup4-4.14.3.tar.gz", hash = "sha256:6292b1c5186d356bba669ef9f7f051757099565ad9ada5dd630bd9de5fa7fb86", size = 627737, upload-time = "2025-11-30T15:08:26.084Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/1a/39/47f9197bdd44df24d67ac8893641e16f386c984a0619ef2ee4c51fbbc019/beautifulsoup4-4.14.3-py3-none-any.whl", hash = "sha256:0918bfe44902e6ad8d57732ba310582e98da931428d231a5ecb9e7c703a735bb", size = 107721, upload-time = "2025-11-30T15:08:24.087Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "certifi"
|
||||
version = "2026.2.25"
|
||||
@@ -587,6 +600,7 @@ source = { editable = "services/mcp-docugen" }
|
||||
dependencies = [
|
||||
{ name = "aiofiles" },
|
||||
{ name = "aiosqlite" },
|
||||
{ name = "beautifulsoup4" },
|
||||
{ name = "fastapi" },
|
||||
{ name = "httpx" },
|
||||
{ name = "markdown-it-py", extra = ["plugins"] },
|
||||
@@ -612,6 +626,7 @@ dev = [
|
||||
requires-dist = [
|
||||
{ name = "aiofiles", specifier = ">=24.0" },
|
||||
{ name = "aiosqlite", specifier = ">=0.20" },
|
||||
{ name = "beautifulsoup4", specifier = ">=4.12" },
|
||||
{ name = "fastapi", specifier = ">=0.115" },
|
||||
{ name = "httpx", specifier = ">=0.27" },
|
||||
{ name = "markdown-it-py", extras = ["plugins"], specifier = ">=3.0" },
|
||||
@@ -1257,6 +1272,15 @@ wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/63/b6/aeadee5443e49baa2facd51131159fd6301cc4ccfc1541e4df7b021c37dd/ruff-0.15.11-py3-none-win_arm64.whl", hash = "sha256:063fed18cc1bbe0ee7393957284a6fe8b588c6a406a285af3ee3f46da2391ee4", size = 11032614, upload-time = "2026-04-16T18:46:34.487Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "soupsieve"
|
||||
version = "2.8.3"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/7b/ae/2d9c981590ed9999a0d91755b47fc74f74de286b0f5cee14c9269041e6c4/soupsieve-2.8.3.tar.gz", hash = "sha256:3267f1eeea4251fb42728b6dfb746edc9acaffc4a45b27e19450b676586e8349", size = 118627, upload-time = "2026-01-20T04:27:02.457Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/46/2c/1462b1d0a634697ae9e55b3cecdcb64788e8b7d63f54d923fcd0bb140aed/soupsieve-2.8.3-py3-none-any.whl", hash = "sha256:ed64f2ba4eebeab06cc4962affce381647455978ffc1e36bb79a545b91f45a95", size = 37016, upload-time = "2026-01-20T04:27:01.012Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "sse-starlette"
|
||||
version = "3.3.4"
|
||||
|
||||
Reference in New Issue
Block a user