c783fff040
Aggiunge la generazione di documenti Word coerenti con l'identità
visiva Tielogic, in parallelo al render PDF già esistente. Il flusso
completo è ora `bullet input → Markdown formattato → PDF e/o DOCX`
in una singola chiamata MCP.
- docx_renderer.py: subprocess Pandoc che legge il Markdown da stdin,
emette il binario .docx su stdout. Strippa il YAML frontmatter e i
blocchi `<style>` (presenti per il PDF, irrilevanti in DOCX) prima
della conversione.
- mcp_tools.py: nuovo tool `document_to_docx(markdown)` che ritorna
`{docx_b64, size_bytes}`; `document_generate` esteso con
`output_format ∈ {md, pdf, docx, all}`. La firma di
`build_mcp_server` accetta ora `docx_reference_path` opzionale.
- config.py: `Settings.docx_reference_path` (default
/app/themes/tielogic-reference.docx).
- main.py: passa la nuova setting a `build_mcp_server`.
- mcp-docugen.Dockerfile: installazione di pandoc accanto alle libs
Chromium.
- themes/tielogic-reference.docx: reference Word (10 KB) con stili
Tielogic — heading colors blu/dark, font Inter, dimensioni allineate
al CSS web. Generato da `scripts/build-reference-docx.py` che parte
dal reference.docx di default di Pandoc e riscrive `word/styles.xml`
con regex sui blocchi `<w:style>`. Pandoc lo applica in automatico
agli output DOCX prodotti dal servizio.
- 9 nuovi test unit per docx_renderer (strip frontmatter/style,
preprocess combinato, error empty input, smoke skippato in
ambienti senza Pandoc): 92 test totali.
Smoke E2E via MCP: una sola chiamata `document_generate` con
`output_format=all` produce MD (14 KB), PDF (137 KB, 4 pagine A4) e
DOCX (12.7 KB) coerenti tra loro.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
90 lines
2.7 KiB
Python
90 lines
2.7 KiB
Python
from __future__ import annotations
|
|
|
|
import asyncio
|
|
import logging
|
|
import re
|
|
from dataclasses import dataclass
|
|
from pathlib import Path
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
_STYLE_BLOCK_RE = re.compile(r"<style\b[^>]*>.*?</style>", re.DOTALL | re.IGNORECASE)
|
|
_FRONTMATTER_DELIM = "---"
|
|
|
|
|
|
class DocxRenderError(Exception):
|
|
pass
|
|
|
|
|
|
@dataclass(frozen=True)
|
|
class DocxRenderResult:
|
|
docx_bytes: bytes
|
|
size_bytes: int
|
|
|
|
|
|
def _strip_style_blocks(markdown_text: str) -> str:
|
|
"""Remove `<style>...</style>` blocks: they're meaningless in DOCX and
|
|
Pandoc would otherwise embed them as raw text."""
|
|
return _STYLE_BLOCK_RE.sub("", markdown_text)
|
|
|
|
|
|
def _strip_frontmatter(markdown_text: str) -> str:
|
|
"""Remove the YAML frontmatter so it doesn't appear as a body table in
|
|
the DOCX. Frontmatter values were meant for the PDF renderer."""
|
|
if not markdown_text.startswith(_FRONTMATTER_DELIM):
|
|
return markdown_text
|
|
end_marker = f"\n{_FRONTMATTER_DELIM}\n"
|
|
idx = markdown_text.find(end_marker, len(_FRONTMATTER_DELIM))
|
|
if idx == -1:
|
|
return markdown_text
|
|
return markdown_text[idx + len(end_marker) :].lstrip()
|
|
|
|
|
|
def _preprocess(markdown_text: str) -> str:
|
|
return _strip_style_blocks(_strip_frontmatter(markdown_text))
|
|
|
|
|
|
async def render_markdown_to_docx(
|
|
markdown_text: str, reference_doc: Path | None = None
|
|
) -> DocxRenderResult:
|
|
"""Convert Markdown to a DOCX file via Pandoc subprocess.
|
|
|
|
Pandoc reads from stdin and writes the binary DOCX on stdout, so no
|
|
intermediate temp file is needed. The optional `reference_doc` is a
|
|
`.docx` whose styles (heading colors, fonts, header/footer, page size)
|
|
Pandoc will inherit — this is the path to add Tielogic branding to the
|
|
Word output later.
|
|
"""
|
|
if not markdown_text.strip():
|
|
raise DocxRenderError("empty markdown input")
|
|
|
|
cleaned = _preprocess(markdown_text)
|
|
if not cleaned.strip():
|
|
raise DocxRenderError("nothing to render after stripping frontmatter/style")
|
|
|
|
args = [
|
|
"pandoc",
|
|
"-f",
|
|
"markdown+raw_html-implicit_figures",
|
|
"-t",
|
|
"docx",
|
|
"-o",
|
|
"-",
|
|
]
|
|
if reference_doc is not None and reference_doc.is_file():
|
|
args[5:5] = ["--reference-doc", str(reference_doc)]
|
|
|
|
proc = await asyncio.create_subprocess_exec(
|
|
*args,
|
|
stdin=asyncio.subprocess.PIPE,
|
|
stdout=asyncio.subprocess.PIPE,
|
|
stderr=asyncio.subprocess.PIPE,
|
|
)
|
|
stdout, stderr = await proc.communicate(cleaned.encode("utf-8"))
|
|
if proc.returncode != 0:
|
|
raise DocxRenderError(
|
|
f"pandoc exit {proc.returncode}: {stderr.decode('utf-8', errors='replace')}"
|
|
)
|
|
|
|
return DocxRenderResult(docx_bytes=stdout, size_bytes=len(stdout))
|