Compare commits

...

23 Commits

Author SHA1 Message Date
Adriano 84b73dc651 feat: use_polarity 16-bin orientation (mod 2pi)
Flag opt-in use_polarity=True su LineShapeMatcher: distingue edge
chiaro->scuro da scuro->chiaro raddoppiando i bin (8 mod pi a 16
mod 2pi). Riduce match accidentali quando il template e direzionale
ma scena ha bordo opposto (es. pezzo nero su bg chiaro vs pezzo
chiaro su bg nero).

Implementazione:
- _gradient calcola atan2 mod 2pi quando use_polarity
- _spread_bitmap usa uint16 (16 bit) invece di uint8 (8 bit)
- Nuovi kernel JIT _jit_score_bitmap_rescored_u16 e
  _jit_popcount_density_u16
- Wrapper Python score_bitmap_rescored / popcount_density fanno
  dispatch su dtype dello spread

Default off (use_polarity=False) = backward compat completo, 8 bin.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:07:38 +02:00
Adriano 41976f574d fix: duplicati, score saturato e angolo impreciso
3 problemi visibili da test interattivo:
1. Match duplicati: stesso oggetto trovato da varianti angolari
   diverse, NMS pre-refine non basta perche refine sposta i match.
   Aggiunto NMS post-refine cross-variant.

2. Score sempre alto/saturato a 1.0: NCC era opzionale (skip>=0.85)
   e non veniva mescolato nello score. Ora ncc_skip_above=1.01
   (NCC sempre) e score finale = (shape + NCC) / 2: piu discriminante.

3. Angolo impreciso: _refine_angle aveva early-exit per shape-score
   >= 0.99, ma quel valore satura facile (con pyramid_propagate o
   spread ampio) senza garantire angolo preciso. Rimosso early-exit:
   refine angolare e' sempre essenziale per orientamento sub-step.

Inoltre: pyramid_propagate default False (era True), riduce duplicati
da picchi propagati su angle-vicini. propagate_topk default 4 (era 8).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 16:33:58 +02:00
Adriano 4ef7a4a85f merge: dedup varianti 2026-05-04 15:46:34 +02:00
Adriano 7de7f35b7c merge: SIMD popcount fallback 2026-05-04 15:46:21 +02:00
Adriano 7b014b7f69 merge: batch_top variant-parallel kernel 2026-05-04 15:46:17 +02:00
Adriano 367ee9aaac merge: greediness (kernel greedy alternativo a rescore strided) 2026-05-04 15:45:15 +02:00
Adriano 74e5a45a39 merge: refine cache 2026-05-04 15:43:23 +02:00
Adriano 11c5160385 merge: refine_pose_joint (param list unito) 2026-05-04 15:43:19 +02:00
Adriano 07bab87cb9 merge: lazy NCC 2026-05-04 15:42:53 +02:00
Adriano a247484f36 merge: auto angle_step 2026-05-04 15:42:45 +02:00
Adriano e188df0adb merge: pyramid_propagate (con coarse_stride preservato) 2026-05-04 15:42:41 +02:00
Adriano b35d47669c merge: coarse_stride 2026-05-04 15:41:57 +02:00
Adriano fc3b0dbc3a merge: search_roi 2026-05-04 15:41:54 +02:00
Adriano 6da4dd5329 feat: dedup varianti con feature-set identico post-quantizzazione
Hash byte-exact su (dx, dy, bin) ordinati + scale. Se due varianti
post-rasterizzazione hanno lo stesso feature-set, ne tiene solo una.

Tipico caso d'uso: template con simmetrie discrete (quadrati, croci,
forme regolari) generano duplicati esatti per rotazioni multiple
del periodo. Su quadrato 80x80 con angle_step=10 deg: 36 -> 27 varianti
(~25% in meno di lavoro top-pruning).

Approccio conservativo (byte-exact): zero rischio di rimuovere varianti
distinte. Forme arrotondate (cerchi) o template asimmetrici non beneficiano
ma non vengono compromessi.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:37:42 +02:00
Adriano b143c6607a feat: numpy.bitwise_count come fallback SIMD per popcount
NumPy 2.0+ espone np.bitwise_count: implementato in C nativo con
intrinsics SIMD (POPCNT/AVX2 vpopcnt). Aggiunto come fallback secondo
livello quando Numba non e disponibile (es. wheel constraint, env
ristretto). Numba JIT parallel resta default: misura su 1080p 0.5ms
vs 1.6ms (bitwise_count e single-thread).

AVX2 puro su _jit_score_bitmap_rescored richiederebbe C extension
con build nativa: out-of-scope per questo branch (Numba LLVM gia
autovettorizza il loop interno).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:36:48 +02:00
Adriano 6704d66cd5 feat: kernel JIT batch top-max-per-variant (opt-in)
Nuovo kernel _jit_top_max_per_variant: prange esterno sulle varianti
invece di n_vars chiamate JIT separate via ThreadPoolExecutor.
Wrapper Python top_max_per_variant prepara buffer flat (offsets +
dx_flat/dy_flat/bins_flat) e bg per scala.

Default batch_top=False perche su benchmark realistici (Linux 13 core,
72-180 varianti) ThreadPoolExecutor + kernel singolo che rilascia GIL
e gia ottimale. Path batch_top=True utile come opzione per scenari
con n_vars >>> n_threads o overhead chiamate JIT dominante.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:35:51 +02:00
Adriano f00cf9b621 feat: cache features template per _refine_angle
Cache LRU (chiave: angolo arrotondato a 0.05deg, scale) di
(fx, fy, fb) per evitare warpAffine + gradient + extract ripetuti
durante golden-search refine. Bucket condiviso tra match della stessa
find() e tra find() consecutive sulla stessa ricetta.

Cache invalidata in train(): il template puo essere cambiato.
Limite 256 entry (sufficiente per 32 candidati x 8 valutazioni).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:31:37 +02:00
Adriano 4b7271094b feat: refine_pose_joint - Nelder-Mead 3D su (cx, cy, angle)
Alternativa al refine angolare 1D + subpixel quadratico: ottimizza
simultaneamente posizione e angolo con Nelder-Mead 3D inline (no
scipy). Default off (refine_pose_joint=False) per backward compat.

Vantaggio Halcon-style: un singolo iter LM/simplex stila il match a
precisione sub-pixel + sub-step in modo congiunto invece di alternare
assi. Convergenza tipica ~24 valutazioni vs ~15 (golden+quadratico)
ma piu robusto su template asimmetrici dove pose e angolo sono
fortemente correlati.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:30:20 +02:00
Adriano 746d1668c6 feat: NCC verify lazy con skip per shape-score alto
ncc_skip_above (default 0.85): se lo score shape e gia molto alto,
salta la verifica NCC (costosa: warp + corr per ogni match). I match
borderline 0.6-0.85 vengono comunque verificati.

Comportamento Halcon-style: NCC come tie-breaker per casi ambigui,
non come gate generalizzato. Su scene con molti match netti riduce
sensibilmente il costo della fase post-NMS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:28:24 +02:00
Adriano d9a40952c4 feat: angle_step auto adattivo a dimensione template
Halcon-style: angle_step_deg=0 attiva derivazione automatica
step = atan(2/max_side) deg, clampato [0.5, 10]. Template grande
ottiene step fine, piccolo step grosso. auto_tune emette il valore
calcolato direttamente.

_refine_angle ora usa _effective_angle_step() per coerenza con
training quando la modalita auto e attiva.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:27:35 +02:00
Adriano 6db2086ead feat: pyramid_propagate - candidati top-level guidano full-res
Top-level ritorna top-K picchi locali invece di solo max. Fase full-res
valuta solo crop locali attorno ai picchi propagati (margine =
sf_top + spread + nms_radius/2) invece di scansionare intera scena.

Su scene 1920x1080 con pochi candidati: ~20-30% piu veloce mantenendo
identici match. Vantaggio cresce con scene piu grandi e meno candidati.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:26:29 +02:00
Adriano 27a0ef1a45 feat: coarse_stride per sub-sampling top-level
Nuovo kernel JIT _jit_score_bitmap_rescored_strided: valuta solo
pixel su griglia stride x stride al top della piramide. NMS + fase
full-res recuperano precisione. Speed-up ~stride^2 sulla fase coarse,
specie su scene grandi (1920x1080).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:24:44 +02:00
Adriano ba4024d252 feat: search_roi parametro find() per limitare area di ricerca
Equivalente a Halcon set_aoi: matching opera su crop locale, coord
output ri-traslate al sistema scena. Costo proporzionale a w*h del
ROI invece di W*H scena intera.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:22:43 +02:00
3 changed files with 667 additions and 63 deletions
+296 -13
View File
@@ -110,6 +110,63 @@ if HAS_NUMBA:
acc[y, x] *= inv acc[y, x] *= inv
return acc return acc
@nb.njit(cache=True, parallel=True, fastmath=True, boundscheck=False)
def _jit_score_bitmap_rescored_strided(
spread: np.ndarray,
dx: np.ndarray, dy: np.ndarray, bins: np.ndarray,
bit_active: np.uint8,
bg: np.ndarray,
stride: nb.int32,
) -> np.ndarray:
"""Variante con sub-sampling: valuta solo pixel su griglia stride×stride.
Score restituito ha stessa shape (H, W); celle non valutate = 0.
4× speed-up con stride=2 (NMS recupera precisione in full-res).
Numba prange richiede step costante: itero su indici griglia e
moltiplico per stride dentro il body.
"""
H, W = spread.shape
N = dx.shape[0]
acc = np.zeros((H, W), dtype=np.float32)
ny = (H + stride - 1) // stride
nx = (W + stride - 1) // stride
for yi in nb.prange(ny):
y = yi * stride
for i in range(N):
b = bins[i]
mask = np.uint8(1) << b
if (bit_active & mask) == 0:
continue
ddy = dy[i]
yy = y + ddy
if yy < 0 or yy >= H:
continue
ddx = dx[i]
x_lo = 0 if ddx >= 0 else -ddx
x_hi = W if ddx <= 0 else W - ddx
rem = x_lo % stride
if rem != 0:
x_lo += stride - rem
x = x_lo
while x < x_hi:
if spread[yy, x + ddx] & mask:
acc[y, x] += 1.0
x += stride
if N > 0:
inv = 1.0 / N
for yi in nb.prange(ny):
y = yi * stride
for xi in range(nx):
x = xi * stride
v = acc[y, x] * inv
bgv = bg[y, x]
if bgv < 1.0:
r = (v - bgv) / (1.0 - bgv + 1e-6)
acc[y, x] = r if r > 0.0 else 0.0
else:
acc[y, x] = 0.0
return acc
@nb.njit(cache=True, parallel=True, fastmath=True, boundscheck=False) @nb.njit(cache=True, parallel=True, fastmath=True, boundscheck=False)
def _jit_score_bitmap_greedy( def _jit_score_bitmap_greedy(
spread: np.ndarray, spread: np.ndarray,
@@ -142,7 +199,6 @@ if HAS_NUMBA:
b = bins[i] b = bins[i]
mask = np.uint8(1) << b mask = np.uint8(1) << b
if (bit_active & mask) == 0: if (bit_active & mask) == 0:
# Nessun chance per questa feature
if hits + (N - i - 1) < min_req: if hits + (N - i - 1) < min_req:
break break
continue continue
@@ -215,6 +271,122 @@ if HAS_NUMBA:
acc[y, x] = 0.0 acc[y, x] = 0.0
return acc return acc
@nb.njit(cache=True, parallel=True, fastmath=True, boundscheck=False)
def _jit_top_max_per_variant(
spread: np.ndarray, # uint8 (H, W)
dx_flat: np.ndarray, # int32 (sum_N,)
dy_flat: np.ndarray, # int32 (sum_N,)
bins_flat: np.ndarray, # int8 (sum_N,)
offsets: np.ndarray, # int32 (n_vars+1,) prefix sum
bit_active: np.uint8,
bg_per_variant: np.ndarray, # float32 (n_vars, H, W) - 1 per scala
scale_idx: np.ndarray, # int32 (n_vars,) idx in bg_per_variant
) -> np.ndarray:
"""Batch: per ogni variante calcola max score (rescored bg), ritorna
array float32 (n_vars,). Parallelismo prange ESTERNO sulle varianti
elimina overhead di n_vars chiamate JIT separate (avg ~20us per
chiamata su template piccoli) + pool thread Python.
Pensato per fase TOP del pruning quando n_vars >> n_threads.
"""
n_vars = offsets.shape[0] - 1
H, W = spread.shape
out = np.zeros(n_vars, dtype=np.float32)
for vi in nb.prange(n_vars):
i0 = offsets[vi]; i1 = offsets[vi + 1]
N = i1 - i0
if N == 0:
out[vi] = -1.0
continue
si = scale_idx[vi]
inv = nb.float32(1.0 / N)
best = nb.float32(-1.0)
for y in range(H):
for x in range(W):
s = nb.float32(0.0)
for k in range(N):
b = bins_flat[i0 + k]
mask = np.uint8(1) << b
if (bit_active & mask) == 0:
continue
ddy = dy_flat[i0 + k]
yy = y + ddy
if yy < 0 or yy >= H:
continue
ddx = dx_flat[i0 + k]
xx = x + ddx
if xx < 0 or xx >= W:
continue
if spread[yy, xx] & mask:
s += nb.float32(1.0)
s *= inv
bgv = bg_per_variant[si, y, x]
if bgv < 1.0:
r = (s - bgv) / (1.0 - bgv + 1e-6)
if r > best:
best = r
out[vi] = best if best > 0.0 else 0.0
return out
@nb.njit(cache=True, parallel=True, fastmath=True, boundscheck=False)
def _jit_score_bitmap_rescored_u16(
spread: np.ndarray, # uint16 (H, W) - 16 bit di polarity-aware
dx: np.ndarray, dy: np.ndarray, bins: np.ndarray,
bit_active: np.uint16,
bg: np.ndarray,
) -> np.ndarray:
"""Versione uint16 di _jit_score_bitmap_rescored per polarity 16-bin.
Identica logica ma mask = uint16(1) << b dove b in [0..15]
(orientamento mod 2π invece di mod π).
"""
H, W = spread.shape
N = dx.shape[0]
acc = np.zeros((H, W), dtype=np.float32)
for y in nb.prange(H):
for i in range(N):
b = bins[i]
mask = np.uint16(1) << b
if (bit_active & mask) == 0:
continue
ddy = dy[i]
yy = y + ddy
if yy < 0 or yy >= H:
continue
ddx = dx[i]
x_lo = 0 if ddx >= 0 else -ddx
x_hi = W if ddx <= 0 else W - ddx
for x in range(x_lo, x_hi):
if spread[yy, x + ddx] & mask:
acc[y, x] += 1.0
if N > 0:
inv = 1.0 / N
for y in nb.prange(H):
for x in range(W):
v = acc[y, x] * inv
bgv = bg[y, x]
if bgv < 1.0:
r = (v - bgv) / (1.0 - bgv + 1e-6)
acc[y, x] = r if r > 0.0 else 0.0
else:
acc[y, x] = 0.0
return acc
@nb.njit(cache=True, parallel=True, fastmath=True, boundscheck=False)
def _jit_popcount_density_u16(spread: np.ndarray) -> np.ndarray:
"""Popcount per uint16 (16 bin polarity)."""
H, W = spread.shape
out = np.zeros((H, W), dtype=np.float32)
for y in nb.prange(H):
for x in range(W):
v = spread[y, x]
cnt = 0
for b in range(16):
if v & (np.uint16(1) << b):
cnt += 1
out[y, x] = float(cnt)
return out
@nb.njit(cache=True, parallel=True, fastmath=True, boundscheck=False) @nb.njit(cache=True, parallel=True, fastmath=True, boundscheck=False)
def _jit_popcount_density(spread: np.ndarray) -> np.ndarray: def _jit_popcount_density(spread: np.ndarray) -> np.ndarray:
"""Conta bit set per pixel: ritorna (H, W) float32 in [0..8].""" """Conta bit set per pixel: ritorna (H, W) float32 in [0..8]."""
@@ -241,11 +413,25 @@ if HAS_NUMBA:
_jit_score_bitmap(spread, dx, dy, b, np.uint8(0xFF)) _jit_score_bitmap(spread, dx, dy, b, np.uint8(0xFF))
bg = np.zeros((32, 32), dtype=np.float32) bg = np.zeros((32, 32), dtype=np.float32)
_jit_score_bitmap_rescored(spread, dx, dy, b, np.uint8(0xFF), bg) _jit_score_bitmap_rescored(spread, dx, dy, b, np.uint8(0xFF), bg)
_jit_score_bitmap_rescored_strided(
spread, dx, dy, b, np.uint8(0xFF), bg, np.int32(2),
)
_jit_score_bitmap_greedy( _jit_score_bitmap_greedy(
spread, dx, dy, b, np.uint8(0xFF), spread, dx, dy, b, np.uint8(0xFF),
np.float32(0.5), np.float32(0.8), np.float32(0.5), np.float32(0.8),
) )
offsets = np.array([0, 1], dtype=np.int32)
scale_idx = np.zeros(1, dtype=np.int32)
bg_pv = np.zeros((1, 32, 32), dtype=np.float32)
_jit_top_max_per_variant(
spread, dx, dy, b, offsets, np.uint8(0xFF), bg_pv, scale_idx,
)
_jit_popcount_density(spread) _jit_popcount_density(spread)
spread16 = np.zeros((32, 32), dtype=np.uint16)
_jit_score_bitmap_rescored_u16(
spread16, dx, dy, b, np.uint16(0xFFFF), bg,
)
_jit_popcount_density_u16(spread16)
else: # pragma: no cover else: # pragma: no cover
@@ -258,9 +444,24 @@ else: # pragma: no cover
def _jit_score_bitmap_rescored(spread, dx, dy, bins, bit_active, bg): def _jit_score_bitmap_rescored(spread, dx, dy, bins, bit_active, bg):
raise RuntimeError("numba non disponibile") raise RuntimeError("numba non disponibile")
def _jit_score_bitmap_rescored_strided(spread, dx, dy, bins, bit_active, bg, stride):
raise RuntimeError("numba non disponibile")
def _jit_score_bitmap_greedy(spread, dx, dy, bins, bit_active, min_score, greediness): def _jit_score_bitmap_greedy(spread, dx, dy, bins, bit_active, min_score, greediness):
raise RuntimeError("numba non disponibile") raise RuntimeError("numba non disponibile")
def _jit_top_max_per_variant(
spread, dx_flat, dy_flat, bins_flat, offsets, bit_active,
bg_per_variant, scale_idx,
):
raise RuntimeError("numba non disponibile")
def _jit_score_bitmap_rescored_u16(spread, dx, dy, bins, bit_active, bg):
raise RuntimeError("numba non disponibile")
def _jit_popcount_density_u16(spread):
raise RuntimeError("numba non disponibile")
def _jit_popcount_density(spread): def _jit_popcount_density(spread):
raise RuntimeError("numba non disponibile") raise RuntimeError("numba non disponibile")
@@ -291,19 +492,33 @@ def score_bitmap(
def score_bitmap_rescored( def score_bitmap_rescored(
spread: np.ndarray, dx: np.ndarray, dy: np.ndarray, bins: np.ndarray, spread: np.ndarray, dx: np.ndarray, dy: np.ndarray, bins: np.ndarray,
bit_active: int, bg: np.ndarray, bit_active: int, bg: np.ndarray, stride: int = 1,
) -> np.ndarray: ) -> np.ndarray:
"""Score bitmap + rescore fusi in un solo pass (JIT).""" """Score bitmap + rescore fusi in un solo pass (JIT).
Dispatch per dtype: uint16 → kernel polarity 16-bin, uint8 → kernel
standard 8-bin (con eventuale stride > 1 per coarse top-level).
"""
if HAS_NUMBA and len(dx) > 0: if HAS_NUMBA and len(dx) > 0:
return _jit_score_bitmap_rescored( dx_c = np.ascontiguousarray(dx, dtype=np.int32)
np.ascontiguousarray(spread, dtype=np.uint8), dy_c = np.ascontiguousarray(dy, dtype=np.int32)
np.ascontiguousarray(dx, dtype=np.int32), bins_c = np.ascontiguousarray(bins, dtype=np.int8)
np.ascontiguousarray(dy, dtype=np.int32), bg_c = np.ascontiguousarray(bg, dtype=np.float32)
np.ascontiguousarray(bins, dtype=np.int8), if spread.dtype == np.uint16:
np.uint8(bit_active), spread_c = np.ascontiguousarray(spread, dtype=np.uint16)
np.ascontiguousarray(bg, dtype=np.float32), return _jit_score_bitmap_rescored_u16(
spread_c, dx_c, dy_c, bins_c, np.uint16(bit_active), bg_c,
) )
# Fallback: chiamate separate spread_c = np.ascontiguousarray(spread, dtype=np.uint8)
if stride > 1:
return _jit_score_bitmap_rescored_strided(
spread_c, dx_c, dy_c, bins_c, np.uint8(bit_active), bg_c,
np.int32(stride),
)
return _jit_score_bitmap_rescored(
spread_c, dx_c, dy_c, bins_c, np.uint8(bit_active), bg_c,
)
# Fallback: chiamate separate (stride ignorato in fallback)
score = score_bitmap(spread, dx, dy, bins, bit_active) score = score_bitmap(spread, dx, dy, bins, bit_active)
out = (score - bg) / (1.0 - bg + 1e-6) out = (score - bg) / (1.0 - bg + 1e-6)
return np.maximum(0.0, out).astype(np.float32) return np.maximum(0.0, out).astype(np.float32)
@@ -331,10 +546,78 @@ def score_bitmap_greedy(
return score_bitmap(spread, dx, dy, bins, bit_active) return score_bitmap(spread, dx, dy, bins, bit_active)
def top_max_per_variant(
spread: np.ndarray,
dx_list: list, dy_list: list, bin_list: list,
bg_per_scale: dict,
variant_scales: list,
bit_active: int,
) -> np.ndarray:
"""Wrapper: prepara buffer flat e chiama kernel batch su tutte le varianti.
Parallelismo Numba prange-esterno sulle varianti (n_vars >> n_threads
tipicamente per top-pruning) → meglio del thread-pool Python che paga
overhead di n_vars chiamate JIT separate.
"""
if not HAS_NUMBA or len(dx_list) == 0:
return np.array([], dtype=np.float32)
n_vars = len(dx_list)
sizes = [len(d) for d in dx_list]
offsets = np.zeros(n_vars + 1, dtype=np.int32)
offsets[1:] = np.cumsum(sizes)
total = int(offsets[-1])
dx_flat = np.empty(total, dtype=np.int32)
dy_flat = np.empty(total, dtype=np.int32)
bins_flat = np.empty(total, dtype=np.int8)
for vi, (dx, dy, bn) in enumerate(zip(dx_list, dy_list, bin_list)):
i0 = int(offsets[vi]); i1 = int(offsets[vi + 1])
dx_flat[i0:i1] = dx
dy_flat[i0:i1] = dy
bins_flat[i0:i1] = bn
# bg per variante: indicizzato per scala
scales_unique = sorted(bg_per_scale.keys())
scale_to_idx = {s: i for i, s in enumerate(scales_unique)}
H, W = spread.shape
bg_pv = np.empty((len(scales_unique), H, W), dtype=np.float32)
for s, idx in scale_to_idx.items():
bg_pv[idx] = bg_per_scale[s]
scale_idx = np.array(
[scale_to_idx[s] for s in variant_scales], dtype=np.int32,
)
return _jit_top_max_per_variant(
np.ascontiguousarray(spread, dtype=np.uint8),
dx_flat, dy_flat, bins_flat, offsets, np.uint8(bit_active),
bg_pv, scale_idx,
)
_HAS_NP_BITCOUNT = hasattr(np, "bitwise_count")
def popcount_density(spread: np.ndarray) -> np.ndarray: def popcount_density(spread: np.ndarray) -> np.ndarray:
"""Conta bit set per pixel.
Order:
1) Numba JIT parallel (preferito: piu veloce su 1080p, 0.5ms vs 1.6ms)
2) numpy.bitwise_count (NumPy 2.0+, SIMD ma single-thread)
3) Fallback numpy bit-shift puro
"""
if spread.dtype == np.uint16:
spread_c = np.ascontiguousarray(spread, dtype=np.uint16)
if HAS_NUMBA: if HAS_NUMBA:
return _jit_popcount_density(np.ascontiguousarray(spread, dtype=np.uint8)) return _jit_popcount_density_u16(spread_c)
# Fallback if _HAS_NP_BITCOUNT:
return np.bitwise_count(spread_c).astype(np.float32, copy=False)
H, W = spread_c.shape
out = np.zeros((H, W), dtype=np.float32)
for b in range(16):
out += ((spread_c >> b) & 1).astype(np.float32)
return out
spread_c = np.ascontiguousarray(spread, dtype=np.uint8)
if HAS_NUMBA:
return _jit_popcount_density(spread_c)
if _HAS_NP_BITCOUNT:
return np.bitwise_count(spread_c).astype(np.float32, copy=False)
H, W = spread.shape H, W = spread.shape
out = np.zeros((H, W), dtype=np.float32) out = np.zeros((H, W), dtype=np.float32)
for b in range(8): for b in range(8):
+5 -2
View File
@@ -220,8 +220,11 @@ def auto_tune(template_bgr: np.ndarray, mask: np.ndarray | None = None) -> dict:
else: else:
min_score = 0.45 min_score = 0.45
# angle step: 5° default; se simmetria, mantengo step ma range ridotto # angle step adattivo (Halcon-style): atan(2/max_side) deg, clampato.
angle_step = 5.0 # Template grande → step fine (rotazione minima visibile su perimetro).
# Template piccolo → step grosso (over-sampling = sprecato).
max_side = max(h, w)
angle_step = float(np.clip(np.degrees(np.arctan2(2.0, max_side)), 1.0, 8.0))
result = { result = {
"backend": "line", "backend": "line",
+353 -35
View File
@@ -41,11 +41,13 @@ from pm2d._jit_kernels import (
score_bitmap as _jit_score_bitmap, score_bitmap as _jit_score_bitmap,
score_bitmap_rescored as _jit_score_bitmap_rescored, score_bitmap_rescored as _jit_score_bitmap_rescored,
score_bitmap_greedy as _jit_score_bitmap_greedy, score_bitmap_greedy as _jit_score_bitmap_greedy,
top_max_per_variant as _jit_top_max_per_variant,
popcount_density as _jit_popcount, popcount_density as _jit_popcount,
HAS_NUMBA, HAS_NUMBA,
) )
N_BINS = 8 # orientamenti quantizzati modulo π N_BINS = 8 # default: orientamento mod π (no polarity)
N_BINS_POL = 16 # use_polarity=True: orientamento mod 2π (con polarity)
def _oriented_bbox_polygon( def _oriented_bbox_polygon(
@@ -121,6 +123,7 @@ class LineShapeMatcher:
pyramid_levels: int = 2, pyramid_levels: int = 2,
top_score_factor: float = 0.5, top_score_factor: float = 0.5,
n_threads: int | None = None, n_threads: int | None = None,
use_polarity: bool = False,
) -> None: ) -> None:
self.num_features = num_features self.num_features = num_features
self.weak_grad = weak_grad self.weak_grad = weak_grad
@@ -134,6 +137,12 @@ class LineShapeMatcher:
self.pyramid_levels = max(1, pyramid_levels) self.pyramid_levels = max(1, pyramid_levels)
self.top_score_factor = top_score_factor self.top_score_factor = top_score_factor
self.n_threads = n_threads or max(1, (os.cpu_count() or 2) - 1) self.n_threads = n_threads or max(1, (os.cpu_count() or 2) - 1)
# Polarity-aware: 16 bin (orientamento mod 2π) usando bitmap uint16.
# Distingue edge "chiaro→scuro" da "scuro→chiaro" → 2x selettività.
# Usare quando background di scena varia (chiaro/scuro) e orientamento
# template e' direzionale.
self.use_polarity = use_polarity
self._n_bins = N_BINS_POL if use_polarity else N_BINS
self.variants: list[_Variant] = [] self.variants: list[_Variant] = []
self.template_size: tuple[int, int] = (0, 0) self.template_size: tuple[int, int] = (0, 0)
@@ -149,12 +158,17 @@ class LineShapeMatcher:
return cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) return cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
return img return img
@staticmethod def _gradient(self, gray: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
def _gradient(gray: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
gx = cv2.Sobel(gray, cv2.CV_32F, 1, 0, ksize=3) gx = cv2.Sobel(gray, cv2.CV_32F, 1, 0, ksize=3)
gy = cv2.Sobel(gray, cv2.CV_32F, 0, 1, ksize=3) gy = cv2.Sobel(gray, cv2.CV_32F, 0, 1, ksize=3)
mag = cv2.magnitude(gx, gy) mag = cv2.magnitude(gx, gy)
ang = np.arctan2(gy, gx) ang = np.arctan2(gy, gx) # [-π, π]
if self.use_polarity:
# Mod 2π: bin 0..15 codifica direzione + polarity edge.
ang_full = np.where(ang < 0, ang + 2.0 * np.pi, ang)
bins = np.floor(ang_full / (2.0 * np.pi) * N_BINS_POL).astype(np.int16)
bins = np.clip(bins, 0, N_BINS_POL - 1)
else:
ang_mod = np.where(ang < 0, ang + np.pi, ang) ang_mod = np.where(ang < 0, ang + np.pi, ang)
bins = np.floor(ang_mod / np.pi * N_BINS).astype(np.int16) bins = np.floor(ang_mod / np.pi * N_BINS).astype(np.int16)
bins = np.clip(bins, 0, N_BINS - 1) bins = np.clip(bins, 0, N_BINS - 1)
@@ -198,12 +212,31 @@ class LineShapeMatcher:
n = int(np.floor((s1 - s0) / self.scale_step)) + 1 n = int(np.floor((s1 - s0) / self.scale_step)) + 1
return [float(s0 + i * self.scale_step) for i in range(n)] return [float(s0 + i * self.scale_step) for i in range(n)]
def _auto_angle_step(self) -> float:
"""Step angolare derivato da dimensione template (Halcon-style).
Formula: step ≈ atan(2 / max_side) gradi. Garantisce che la
rotazione minima produca uno spostamento di ≥2 px sul perimetro
del template (sotto sample il matching coarse perde candidati).
Clampato in [0.5°, 10°].
"""
max_side = max(self.template_size) if self.template_size != (0, 0) else 64
step = math.degrees(math.atan2(2.0, float(max_side)))
return float(np.clip(step, 0.5, 10.0))
def _effective_angle_step(self) -> float:
"""Risolve angle_step_deg gestendo modalità auto (<=0)."""
if self.angle_step_deg <= 0:
return self._auto_angle_step()
return self.angle_step_deg
def _angle_list(self) -> list[float]: def _angle_list(self) -> list[float]:
a0, a1 = self.angle_range_deg a0, a1 = self.angle_range_deg
if self.angle_step_deg <= 0 or a0 >= a1: step = self._effective_angle_step()
if step <= 0 or a0 >= a1:
return [float(a0)] return [float(a0)]
n = int(np.floor((a1 - a0) / self.angle_step_deg)) n = int(np.floor((a1 - a0) / step))
return [float(a0 + i * self.angle_step_deg) for i in range(n)] return [float(a0 + i * step) for i in range(n)]
# --- Training ------------------------------------------------------ # --- Training ------------------------------------------------------
@@ -240,6 +273,8 @@ class LineShapeMatcher:
self._train_mask = mask_full.copy() self._train_mask = mask_full.copy()
self.variants.clear() self.variants.clear()
# Invalida cache feature di refine: il template e cambiato.
self._refine_feat_cache = {}
for s in self._scale_list(): for s in self._scale_list():
sw = max(16, int(round(w * s))) sw = max(16, int(round(w * s)))
sh = max(16, int(round(h * s))) sh = max(16, int(round(h * s)))
@@ -294,8 +329,42 @@ class LineShapeMatcher:
kh=kh, kw=kw, kh=kh, kw=kw,
cx_local=float(cx_local), cy_local=float(cy_local), cx_local=float(cx_local), cy_local=float(cy_local),
)) ))
self._dedup_variants()
return len(self.variants) return len(self.variants)
def _dedup_variants(self) -> int:
"""Rimuove varianti con feature-set identico (post-quantizzazione).
Halcon-style: con angle range = (0, 360) e simmetrie del template,
molte rotazioni producono lo stesso set quantizzato di feature.
Es: quadrato a 0/90/180/270 deg → stesse features (modulo permutazione).
Hash su feature ordinate (livello 0, full-res) elimina i duplicati.
Vantaggio: meno varianti = meno chiamate kernel JIT al top-level
senza perdere copertura angolare effettiva. Per template asimmetrici
non rimuove nulla.
"""
seen: dict[bytes, int] = {}
kept: list[_Variant] = []
removed = 0
for var in self.variants:
lvl0 = var.levels[0]
order = np.lexsort((lvl0.bin, lvl0.dy, lvl0.dx))
key = (
lvl0.dx[order].tobytes()
+ b"|" + lvl0.dy[order].tobytes()
+ b"|" + lvl0.bin[order].tobytes()
+ b"|" + str(round(var.scale, 4)).encode()
)
h = key # diretto, senza hash crypto (collision ok solo se identici)
if h in seen:
removed += 1
continue
seen[h] = len(kept)
kept.append(var)
self.variants = kept
return removed
# --- Matching ------------------------------------------------------ # --- Matching ------------------------------------------------------
def _response_map(self, gray: np.ndarray) -> np.ndarray: def _response_map(self, gray: np.ndarray) -> np.ndarray:
@@ -313,20 +382,22 @@ class LineShapeMatcher:
return raw return raw
def _spread_bitmap(self, gray: np.ndarray) -> np.ndarray: def _spread_bitmap(self, gray: np.ndarray) -> np.ndarray:
"""Spread bitmap uint8: bit b acceso dove bin b è presente nel raggio. """Spread bitmap: bit b acceso dove bin b è presente nel raggio.
Formato compatto 32× più denso della response map (N_BINS, H, W) float32. dtype: uint8 per N_BINS=8, uint16 per N_BINS_POL=16 (use_polarity).
""" """
mag, bins = self._gradient(gray) mag, bins = self._gradient(gray)
valid = mag >= self.weak_grad valid = mag >= self.weak_grad
k = 2 * self.spread_radius + 1 k = 2 * self.spread_radius + 1
kernel = np.ones((k, k), dtype=np.uint8) kernel = np.ones((k, k), dtype=np.uint8)
H, W = gray.shape H, W = gray.shape
spread = np.zeros((H, W), dtype=np.uint8) nb = self._n_bins
for b in range(N_BINS): dtype = np.uint16 if nb > 8 else np.uint8
spread = np.zeros((H, W), dtype=dtype)
for b in range(nb):
mask_b = ((bins == b) & valid).astype(np.uint8) mask_b = ((bins == b) & valid).astype(np.uint8)
d = cv2.dilate(mask_b, kernel) d = cv2.dilate(mask_b, kernel)
spread |= (d << b) spread |= (d.astype(dtype) << b)
return spread return spread
@staticmethod @staticmethod
@@ -394,6 +465,108 @@ class LineShapeMatcher:
oy = float(np.clip(oy, -0.5, 0.5)) oy = float(np.clip(oy, -0.5, 0.5))
return x + ox, y + oy return x + ox, y + oy
def _refine_pose_joint(
self,
spread0: np.ndarray,
template_gray: np.ndarray,
cx: float, cy: float,
angle_deg: float, scale: float,
mask_full: np.ndarray,
max_iter: int = 24,
tol: float = 1e-3,
) -> tuple[float, float, float, float]:
"""Refine congiunto (cx, cy, angle) via Nelder-Mead 3D.
Ottimizza simultaneamente posizione e angolo (vs golden search 1D
sull'angolo poi quadratico 2D su xy che alterna assi). Halcon-style:
un singolo iter LM stila il match a precisione sub-pixel + sub-step.
Ritorna (angle, score, cx, cy) dove score e quello calcolato sulla
scena spread (no template gray).
"""
h, w = template_gray.shape
sw = max(16, int(round(w * scale)))
sh = max(16, int(round(h * scale)))
gray_s = cv2.resize(template_gray, (sw, sh), interpolation=cv2.INTER_LINEAR)
mask_s = cv2.resize(mask_full, (sw, sh), interpolation=cv2.INTER_NEAREST)
diag = int(np.ceil(np.hypot(sh, sw))) + 6
py = (diag - sh) // 2; px = (diag - sw) // 2
gray_p = cv2.copyMakeBorder(gray_s, py, diag - sh - py, px, diag - sw - px,
cv2.BORDER_REPLICATE)
mask_p = cv2.copyMakeBorder(mask_s, py, diag - sh - py, px, diag - sw - px,
cv2.BORDER_CONSTANT, value=0)
center = (diag / 2.0, diag / 2.0)
H, W = spread0.shape
def _score(params: tuple[float, float, float]) -> float:
ddx, ddy, dang = params
ang = angle_deg + dang
M = cv2.getRotationMatrix2D(center, ang, 1.0)
gray_r = cv2.warpAffine(gray_p, M, (diag, diag),
flags=cv2.INTER_LINEAR,
borderMode=cv2.BORDER_REPLICATE)
mask_r = cv2.warpAffine(mask_p, M, (diag, diag),
flags=cv2.INTER_NEAREST, borderValue=0)
mag, bins = self._gradient(gray_r)
fx, fy, fb = self._extract_features(mag, bins, mask_r)
if len(fx) < 8:
return 0.0
cxe = cx + ddx; cye = cy + ddy
ix = int(round(cxe)); iy = int(round(cye))
tot = 0
valid = 0
for i in range(len(fx)):
xs = ix + int(fx[i] - center[0])
ys = iy + int(fy[i] - center[1])
if 0 <= xs < W and 0 <= ys < H:
bit = np.uint8(1 << int(fb[i]))
if spread0[ys, xs] & bit:
tot += 1
valid += 1
return -float(tot) / max(1, valid) # minimize -score
# Nelder-Mead 3D inline (no scipy). Simplex iniziale: vertice + offset
# dx=±0.5px, dy=±0.5px, dθ=±step/2.
step_a = self.angle_step_deg / 2.0 if self.angle_step_deg > 0 else 1.0
x0 = np.array([0.0, 0.0, 0.0])
simplex = np.array([
x0,
x0 + [0.5, 0.0, 0.0],
x0 + [0.0, 0.5, 0.0],
x0 + [0.0, 0.0, step_a],
])
fvals = np.array([_score(tuple(s)) for s in simplex])
for _ in range(max_iter):
order = np.argsort(fvals)
simplex = simplex[order]; fvals = fvals[order]
if abs(fvals[-1] - fvals[0]) < tol:
break
centroid = simplex[:-1].mean(axis=0)
xr = centroid + 1.0 * (centroid - simplex[-1])
fr = _score(tuple(xr))
if fvals[0] <= fr < fvals[-2]:
simplex[-1] = xr; fvals[-1] = fr
continue
if fr < fvals[0]:
xe = centroid + 2.0 * (centroid - simplex[-1])
fe = _score(tuple(xe))
if fe < fr:
simplex[-1] = xe; fvals[-1] = fe
else:
simplex[-1] = xr; fvals[-1] = fr
continue
xc = centroid + 0.5 * (simplex[-1] - centroid)
fc = _score(tuple(xc))
if fc < fvals[-1]:
simplex[-1] = xc; fvals[-1] = fc
continue
for k in range(1, 4):
simplex[k] = simplex[0] + 0.5 * (simplex[k] - simplex[0])
fvals[k] = _score(tuple(simplex[k]))
best_i = int(np.argmin(fvals))
ddx, ddy, dang = simplex[best_i]
return (angle_deg + float(dang), -float(fvals[best_i]),
cx + float(ddx), cy + float(ddy))
def _refine_angle( def _refine_angle(
self, self,
spread0: np.ndarray, # bitmap uint8 (H, W) spread0: np.ndarray, # bitmap uint8 (H, W)
@@ -412,11 +585,13 @@ class LineShapeMatcher:
l'angolo con score massimo (parabolic fit sulle 3 score centrali). l'angolo con score massimo (parabolic fit sulle 3 score centrali).
Ritorna (angle_refined, score, cx_refined, cy_refined). Ritorna (angle_refined, score, cx_refined, cy_refined).
""" """
# Se il match grezzo è già quasi perfetto, NON refinare # NB: rimosso early-skip su score >= 0.99. Lo score linemod/shape
if original_score is not None and original_score >= 0.99: # satura facilmente a 1.0 (specie con pyramid_propagate o spread
return (angle_deg, original_score, cx, cy) # ampio) ma NON garantisce angolo preciso: l'angolo grezzo della
# variante e' quantizzato a multipli di angle_step (5 deg default).
# Refine angolare e' essenziale per orientamento sub-step.
if search_radius is None: if search_radius is None:
search_radius = self.angle_step_deg / 2.0 search_radius = self._effective_angle_step() / 2.0
h, w = template_gray.shape h, w = template_gray.shape
sw = max(16, int(round(w * scale))) sw = max(16, int(round(w * scale)))
@@ -434,9 +609,24 @@ class LineShapeMatcher:
H, W = spread0.shape H, W = spread0.shape
margin = 3 margin = 3
# Cache template features per angolo (chiave: int(round(ang*20)) =
# bucket di 0.05°). Golden-search ricontratto puo richiedere lo
# stesso bucket piu volte; evita re-warp+gradient+extract (costoso).
# Cache a livello matcher per riusare tra chiamate find() su scene
# diverse: la rotazione del template non dipende dalla scena.
if not hasattr(self, '_refine_feat_cache'):
self._refine_feat_cache = {}
feat_cache = self._refine_feat_cache
cache_scale_key = round(scale * 1000)
def _score_at_angle(off: float) -> tuple[float, float, float]: def _score_at_angle(off: float) -> tuple[float, float, float]:
"""Ritorna (score, best_cx, best_cy) per angolo = angle_deg + off.""" """Ritorna (score, best_cx, best_cy) per angolo = angle_deg + off."""
ang = angle_deg + off ang = angle_deg + off
ck = (round(ang * 20), cache_scale_key)
cached = feat_cache.get(ck)
if cached is not None:
fx, fy, fb = cached
else:
M = cv2.getRotationMatrix2D(center, ang, 1.0) M = cv2.getRotationMatrix2D(center, ang, 1.0)
gray_r = cv2.warpAffine(gray_p, M, (diag, diag), gray_r = cv2.warpAffine(gray_p, M, (diag, diag),
flags=cv2.INTER_LINEAR, flags=cv2.INTER_LINEAR,
@@ -445,6 +635,10 @@ class LineShapeMatcher:
flags=cv2.INTER_NEAREST, borderValue=0) flags=cv2.INTER_NEAREST, borderValue=0)
mag, bins = self._gradient(gray_r) mag, bins = self._gradient(gray_r)
fx, fy, fb = self._extract_features(mag, bins, mask_r) fx, fy, fb = self._extract_features(mag, bins, mask_r)
# LRU semplice: limita cache a ~256 angoli (8 angoli * 32 candidati)
if len(feat_cache) > 256:
feat_cache.pop(next(iter(feat_cache)))
feat_cache[ck] = (fx, fy, fb)
if len(fx) < 8: if len(fx) < 8:
return (0.0, cx, cy) return (0.0, cx, cy)
dx = (fx - center[0]).astype(np.int32) dx = (fx - center[0]).astype(np.int32)
@@ -453,9 +647,10 @@ class LineShapeMatcher:
x_lo = int(cx) - margin; x_hi = int(cx) + margin + 1 x_lo = int(cx) - margin; x_hi = int(cx) + margin + 1
sh_w = y_hi - y_lo; sw_w = x_hi - x_lo sh_w = y_hi - y_lo; sw_w = x_hi - x_lo
acc = np.zeros((sh_w, sw_w), dtype=np.float32) acc = np.zeros((sh_w, sw_w), dtype=np.float32)
spread_dtype = spread0.dtype.type
for i in range(len(dx)): for i in range(len(dx)):
ddx = int(dx[i]); ddy = int(dy[i]); b = int(fb[i]) ddx = int(dx[i]); ddy = int(dy[i]); b = int(fb[i])
bit = np.uint8(1 << b) bit = spread_dtype(1 << b)
sy0 = y_lo + ddy; sy1 = y_hi + ddy sy0 = y_lo + ddy; sy1 = y_hi + ddy
sx0 = x_lo + ddx; sx1 = x_hi + ddx sx0 = x_lo + ddx; sx1 = x_hi + ddx
a_y0 = max(0, -sy0); a_y1 = sh_w - max(0, sy1 - H) a_y0 = max(0, -sy0); a_y1 = sh_w - max(0, sy1 - H)
@@ -573,9 +768,16 @@ class LineShapeMatcher:
subpixel: bool = True, subpixel: bool = True,
verify_ncc: bool = True, verify_ncc: bool = True,
verify_threshold: float = 0.4, verify_threshold: float = 0.4,
ncc_skip_above: float = 1.01, # disabilitato di default: NCC sempre
coarse_angle_factor: int = 2, coarse_angle_factor: int = 2,
coarse_stride: int = 1,
scale_penalty: float = 0.0, scale_penalty: float = 0.0,
search_roi: tuple[int, int, int, int] | None = None,
pyramid_propagate: bool = False, # off di default: meno duplicati
propagate_topk: int = 4,
refine_pose_joint: bool = False,
greediness: float = 0.0, greediness: float = 0.0,
batch_top: bool = False,
) -> list[Match]: ) -> list[Match]:
""" """
scale_penalty: se > 0, riduce lo score per match a scala diversa da 1.0: scale_penalty: se > 0, riduce lo score per match a scala diversa da 1.0:
@@ -583,11 +785,30 @@ class LineShapeMatcher:
Utile se l'operatore vuole che match "identico al template anche per Utile se l'operatore vuole che match "identico al template anche per
dimensione" abbia score più alto di match "stessa forma, dimensione dimensione" abbia score più alto di match "stessa forma, dimensione
diversa". scale_penalty=0 (default) = comportamento shape puro. diversa". scale_penalty=0 (default) = comportamento shape puro.
search_roi: (x, y, w, h) limita la ricerca a una regione della scena.
Equivalente a Halcon set_aoi: il matching opera su crop locale e le
coordinate output sono ri-traslate al sistema scena originale. Usare
quando si conosce a priori l'area in cui il pezzo può apparire (es.
feeder a posizione fissa) → costo proporzionale a w·h invece di W·H.
""" """
if not self.variants: if not self.variants:
raise RuntimeError("Matcher non addestrato: chiamare train() prima.") raise RuntimeError("Matcher non addestrato: chiamare train() prima.")
gray0 = self._to_gray(scene_bgr) gray_full = self._to_gray(scene_bgr)
# Applica ROI di ricerca: restringe scena a crop, ricorda offset per
# ri-traslare le coordinate dei match a fine pipeline.
if search_roi is not None:
rx, ry, rw, rh = search_roi
H_s, W_s = gray_full.shape
rx = max(0, int(rx)); ry = max(0, int(ry))
rw = max(1, min(int(rw), W_s - rx))
rh = max(1, min(int(rh), H_s - ry))
gray0 = gray_full[ry:ry + rh, rx:rx + rw]
roi_offset = (rx, ry)
else:
gray0 = gray_full
roi_offset = (0, 0)
grays = [gray0] grays = [gray0]
for _ in range(self.pyramid_levels - 1): for _ in range(self.pyramid_levels - 1):
grays.append(cv2.pyrDown(grays[-1])) grays.append(cv2.pyrDown(grays[-1]))
@@ -597,8 +818,8 @@ class LineShapeMatcher:
# map float32 → MOLTO più cache-friendly per _score_by_shift. # map float32 → MOLTO più cache-friendly per _score_by_shift.
spread_top = self._spread_bitmap(grays[top]) spread_top = self._spread_bitmap(grays[top])
bit_active_top = int( bit_active_top = int(
sum(1 << b for b in range(N_BINS) sum(1 << b for b in range(self._n_bins)
if (spread_top & np.uint8(1 << b)).any()) if (spread_top & (spread_top.dtype.type(1) << b)).any())
) )
if nms_radius is None: if nms_radius is None:
nms_radius = max(8, min(self.template_size) // 2) nms_radius = max(8, min(self.template_size) // 2)
@@ -647,15 +868,19 @@ class LineShapeMatcher:
end = min(n, i + half + 1) end = min(n, i + half + 1)
neighbor_map[vi_c] = vi_sorted[start:end] neighbor_map[vi_c] = vi_sorted[start:end]
# Pruning varianti via top-level (parallelizzato) - solo coarse. # Pruning varianti via top-level (parallelizzato).
# greediness > 0: usa kernel greedy con early-exit (no rescore bg) # coarse_stride > 1: 1 pixel ogni stride (~stride^2 speed-up).
# per il pruning. ~2-4x speed-up sul top con greediness=0.8. # pyramid_propagate=True: top-K picchi per restringere full-res.
# greediness > 0: kernel greedy con early-exit (alternativo a rescore).
cs = max(1, int(coarse_stride))
peaks_by_vi: dict[int, list[tuple[int, int, float]]] = {}
use_greedy_top = greediness > 0.0 use_greedy_top = greediness > 0.0
def _top_score(vi: int) -> tuple[int, float]: def _top_score(vi: int) -> tuple[int, float]:
var = self.variants[vi] var = self.variants[vi]
lvl = var.levels[min(top, len(var.levels) - 1)] lvl = var.levels[min(top, len(var.levels) - 1)]
if use_greedy_top: if use_greedy_top:
# Greedy non supporta stride né rescore bg
score = _jit_score_bitmap_greedy( score = _jit_score_bitmap_greedy(
spread_top, lvl.dx, lvl.dy, lvl.bin, bit_active_top, spread_top, lvl.dx, lvl.dy, lvl.bin, bit_active_top,
top_thresh, greediness, top_thresh, greediness,
@@ -663,13 +888,46 @@ class LineShapeMatcher:
else: else:
score = _jit_score_bitmap_rescored( score = _jit_score_bitmap_rescored(
spread_top, lvl.dx, lvl.dy, lvl.bin, bit_active_top, spread_top, lvl.dx, lvl.dy, lvl.bin, bit_active_top,
bg_cache_top[var.scale], bg_cache_top[var.scale], stride=cs,
) )
return vi, float(score.max()) if score.size else -1.0 if score.size == 0:
return vi, -1.0
best = float(score.max())
if pyramid_propagate and best > 0:
flat = score.ravel()
k = min(propagate_topk, flat.size)
idx = np.argpartition(-flat, k - 1)[:k]
peaks: list[tuple[int, int, float]] = []
for i in idx:
s = float(flat[i])
if s < top_thresh * 0.7:
continue
yt, xt = int(i // score.shape[1]), int(i % score.shape[1])
peaks.append((xt, yt, s))
peaks_by_vi[vi] = peaks
return vi, best
kept_coarse: list[tuple[int, float]] = [] kept_coarse: list[tuple[int, float]] = []
all_top_scores: list[tuple[int, float]] = [] all_top_scores: list[tuple[int, float]] = []
if self.n_threads > 1 and len(coarse_idx_list) > 1: # batch_top: usa kernel batch single-call con prange-esterno su
# varianti. Vince su threadpool quando n_vars >> n_threads e quando
# H*W top e' piccolo (overhead chiamate JIT > costo kernel).
if (batch_top and HAS_NUMBA and len(coarse_idx_list) > 4):
dx_l = []; dy_l = []; bn_l = []; vs_l = []
for vi in coarse_idx_list:
var = self.variants[vi]
lvl = var.levels[min(top, len(var.levels) - 1)]
dx_l.append(lvl.dx); dy_l.append(lvl.dy); bn_l.append(lvl.bin)
vs_l.append(var.scale)
scores_arr = _jit_top_max_per_variant(
spread_top, dx_l, dy_l, bn_l, bg_cache_top, vs_l,
bit_active_top,
)
for vi, best in zip(coarse_idx_list, scores_arr.tolist()):
all_top_scores.append((vi, best))
if best >= top_thresh:
kept_coarse.append((vi, best))
elif self.n_threads > 1 and len(coarse_idx_list) > 1:
with ThreadPoolExecutor(max_workers=self.n_threads) as ex: with ThreadPoolExecutor(max_workers=self.n_threads) as ex:
for vi, best in ex.map(_top_score, coarse_idx_list): for vi, best in ex.map(_top_score, coarse_idx_list):
all_top_scores.append((vi, best)) all_top_scores.append((vi, best))
@@ -718,21 +976,55 @@ class LineShapeMatcher:
# Full-res (parallelizzato) con bitmap # Full-res (parallelizzato) con bitmap
spread0 = self._spread_bitmap(gray0) spread0 = self._spread_bitmap(gray0)
bit_active_full = int( bit_active_full = int(
sum(1 << b for b in range(N_BINS) sum(1 << b for b in range(self._n_bins)
if (spread0 & np.uint8(1 << b)).any()) if (spread0 & (spread0.dtype.type(1) << b)).any())
) )
density_full = _jit_popcount(spread0) density_full = _jit_popcount(spread0)
for sc in unique_scales: for sc in unique_scales:
bg_cache_full[sc] = _bg_for_scale(density_full, sc, 1) bg_cache_full[sc] = _bg_for_scale(density_full, sc, 1)
# Margine in full-res attorno ad ogni peak top: copre incertezza
# downsampling (sf_top px) + spread_radius + slack per NMS.
propagate_margin = sf_top + self.spread_radius + max(8, nms_radius // 2)
H_full, W_full = spread0.shape
def _full_score(vi: int) -> tuple[int, np.ndarray]: def _full_score(vi: int) -> tuple[int, np.ndarray]:
var = self.variants[vi] var = self.variants[vi]
lvl0 = var.levels[0] lvl0 = var.levels[0]
score = _jit_score_bitmap_rescored( if not pyramid_propagate or vi not in peaks_by_vi or not peaks_by_vi[vi]:
# Path legacy: scansiona intera scena
return vi, _jit_score_bitmap_rescored(
spread0, lvl0.dx, lvl0.dy, lvl0.bin, bit_active_full, spread0, lvl0.dx, lvl0.dy, lvl0.bin, bit_active_full,
bg_cache_full[var.scale], bg_cache_full[var.scale],
) )
return vi, score # Path piramide propagata: valuta solo crop locali attorno
# alle posizioni dei picchi top-level (riproiettati a full-res).
score_full = np.zeros((H_full, W_full), dtype=np.float32)
mark = np.zeros((H_full, W_full), dtype=bool)
bg = bg_cache_full[var.scale]
for xt, yt, _s in peaks_by_vi[vi]:
cx0 = xt * sf_top
cy0 = yt * sf_top
x_lo = max(0, cx0 - propagate_margin)
x_hi = min(W_full, cx0 + propagate_margin + 1)
y_lo = max(0, cy0 - propagate_margin)
y_hi = min(H_full, cy0 + propagate_margin + 1)
if x_hi <= x_lo or y_hi <= y_lo:
continue
if mark[y_lo:y_hi, x_lo:x_hi].all():
continue
# Crop spread + bg, valuta kernel sul crop
spread_crop = np.ascontiguousarray(spread0[y_lo:y_hi, x_lo:x_hi])
bg_crop = np.ascontiguousarray(bg[y_lo:y_hi, x_lo:x_hi])
score_crop = _jit_score_bitmap_rescored(
spread_crop, lvl0.dx, lvl0.dy, lvl0.bin,
bit_active_full, bg_crop,
)
score_full[y_lo:y_hi, x_lo:x_hi] = np.maximum(
score_full[y_lo:y_hi, x_lo:x_hi], score_crop,
)
mark[y_lo:y_hi, x_lo:x_hi] = True
return vi, score_full
candidates_per_var: list[tuple[int, np.ndarray]] = [] candidates_per_var: list[tuple[int, np.ndarray]] = []
raw: list[tuple[float, int, int, int]] = [] raw: list[tuple[float, int, int, int]] = []
@@ -810,28 +1102,54 @@ class LineShapeMatcher:
var = self.variants[vi] var = self.variants[vi]
ang_f = var.angle_deg ang_f = var.angle_deg
score_f = score score_f = score
if refine_angle and self.template_gray is not None: if refine_pose_joint and self.template_gray is not None:
ang_f, score_f, cx_f, cy_f = self._refine_pose_joint(
spread0, self.template_gray, cx_f, cy_f,
var.angle_deg, var.scale, mask_full,
)
elif refine_angle and self.template_gray is not None:
ang_f, score_f, cx_f, cy_f = self._refine_angle( ang_f, score_f, cx_f, cy_f = self._refine_angle(
spread0, bit_active_full, self.template_gray, cx_f, cy_f, spread0, bit_active_full, self.template_gray, cx_f, cy_f,
var.angle_deg, var.scale, mask_full, var.angle_deg, var.scale, mask_full,
search_radius=self.angle_step_deg / 2.0, search_radius=self._effective_angle_step() / 2.0,
original_score=score, original_score=score,
) )
if verify_ncc: # NCC verify (Halcon-style): se ncc_skip_above < 1.0 salta
# il calcolo per shape-score gia alti. Default 1.01 = NCC sempre,
# piu sicuro contro falsi positivi (lo shape-score satura facile).
# Quando NCC viene calcolato, lo score finale e' la MEDIA tra
# shape-score e NCC: rende lo score piu discriminante per
# ranking/visualizzazione (uno score 1.0 vero richiede sia
# match shape sia template gray identici).
if verify_ncc and float(score_f) < ncc_skip_above:
ncc = self._verify_ncc(gray0, cx_f, cy_f, ang_f, var.scale) ncc = self._verify_ncc(gray0, cx_f, cy_f, ang_f, var.scale)
if ncc < verify_threshold: if ncc < verify_threshold:
continue continue
score_f = (float(score_f) + max(0.0, ncc)) * 0.5
# Ri-traslo coord da spazio crop ROI a spazio scena originale.
cx_out = cx_f + roi_offset[0]
cy_out = cy_f + roi_offset[1]
poly = _oriented_bbox_polygon( poly = _oriented_bbox_polygon(
cx_f, cy_f, tw * var.scale, th * var.scale, ang_f, cx_out, cy_out, tw * var.scale, th * var.scale, ang_f,
) )
# Penalità scala opzionale: score degrada con distanza da 1.0 # Penalità scala opzionale: score degrada con distanza da 1.0
if scale_penalty > 0.0 and var.scale != 1.0: if scale_penalty > 0.0 and var.scale != 1.0:
score_f = float(score_f) * max( score_f = float(score_f) * max(
0.0, 1.0 - scale_penalty * abs(var.scale - 1.0) 0.0, 1.0 - scale_penalty * abs(var.scale - 1.0)
) )
# NMS post-refine: refine puo spostare il match di nms_radius;
# ricontrollo overlap su match gia accettati per evitare
# duplicati (stesso oggetto trovato da varianti angolari diverse).
dup = False
for k in kept:
if (k.cx - cx_out) ** 2 + (k.cy - cy_out) ** 2 < r2:
dup = True
break
if dup:
continue
kept.append(Match( kept.append(Match(
cx=cx_f, cy=cy_f, cx=cx_out, cy=cy_out,
angle_deg=ang_f, angle_deg=ang_f,
scale=var.scale, scale=var.scale,
score=score_f, score=score_f,