perf: Numba JIT kernel per score_by_shift (2.1x speedup)

- Nuovo modulo pm2d/_jit_kernels.py con _jit_score_by_shift Numba njit
  parallel + fastmath + boundscheck=False
- Parallelizzazione per riga output (no race condition su acc)
- Fallback automatico numpy se numba non installato
- Warmup automatico al module import (evita JIT lag al 1 match)

Benchmark clip.png (13 istanze):
  prima (numpy + threads): 1.55s
  dopo (numba + threads):  0.72s
  speedup: 2.1x

Pipeline totale full (refine+subpix): 0.80s

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-24 01:30:31 +02:00
parent 51ed53cedd
commit b20b11c029
5 changed files with 169 additions and 21 deletions
+1
View File
@@ -3,6 +3,7 @@ name = "shape-model-2d"
version = "0.1.0"
requires-python = ">=3.13"
dependencies = [
"numba>=0.65.0",
"numpy>=1.24",
"opencv-python>=4.8",
]