Number7AI — Docs
Duplicate invoice detection
Invoice number matching alone misses the majority of real-world duplicate submissions. Production duplicate detection requires multi-signal scoring across exact matches, near-matches, and temporal patterns.
Last updated: April 2026
TL;DR
- •Single-signal duplicate checks (invoice number only) miss 30–40% of real duplicate submissions.
- •Exact matches catch re-uploads and clear re-submissions. Near-matches catch vendor name variants and amount-tolerance fraud.
- •Temporal signals catch rapid resubmissions and end-of-period clustering that look clean in isolation.
- •All duplicate candidates are routed to review — never auto-rejected or auto-approved.
Why invoice-number matching alone fails
The most common duplicate check is: does this invoice number already exist for this vendor? This catches obvious re-submissions but misses most real-world scenarios. Vendors reuse numbers across periods. AP staff manually key invoices with typos that don't exact-match. PDFs are re-scanned with different filenames. Partial payments get resubmitted as full invoices. None of these are caught by a simple invoice-number lookup.
Layer 1 — Exact match signals
| Signal | How it works |
|---|---|
| Invoice number | Exact string match after normalisation (strip spaces, hyphens, leading zeros). |
| Vendor + invoice number | Composite key — same number from two different vendors is not a duplicate. |
| Document hash | SHA-256 of the raw file catches re-uploads of the identical PDF without re-submission logic. |
| Amount + vendor + date | Triple composite match for invoices from vendors that reuse numbers across periods. |
Layer 2 — Near-match signals
Normalised vendor name
Levenshtein distance check handles 'Acme Pvt Ltd' vs 'Acme Private Ltd'. Vendor aliases from master data are pre-loaded.
Amount tolerance window
Invoices within ±1% of amount, same vendor, within a 30-day window are flagged for review — common in partial-payment resubmission scenarios.
GSTIN similarity
Near-duplicate GSTIN strings (single digit transposition) trigger a vendor identity exception separate from the duplicate check.
Semantic hash
Key field fingerprint (vendor + date + total + invoice number) is stored and compared against a rolling 90-day window.
Layer 3 — Temporal pattern signals
Rapid resubmission
Same vendor + similar amount within 72 hours is a common resubmission pattern. Flagged regardless of invoice number.
End-of-period clustering
Unusually high submission volume from one vendor near period close is logged as a batch-duplicate risk signal.
Round-number anomaly
Exact round amounts (₹10,000, ₹50,000) with no line-item breakdown are scored higher for manual review.
Duplicate candidate resolution
All duplicate candidates — regardless of signal layer — are routed to an exception queue. Operators see the matched invoice, the signal that triggered the flag, and the confidence score. Rejections and approvals are both logged as part of the AP audit trail.
Auto-rejection is not offered. Finance-grade systems require a human in the loop for any duplicate determination that could result in a vendor not getting paid.