Number7AI — Docs

Duplicate invoice detection

Invoice number matching alone misses the majority of real-world duplicate submissions. Production duplicate detection requires multi-signal scoring across exact matches, near-matches, and temporal patterns.

Last updated: April 2026

TL;DR

  • Single-signal duplicate checks (invoice number only) miss 30–40% of real duplicate submissions.
  • Exact matches catch re-uploads and clear re-submissions. Near-matches catch vendor name variants and amount-tolerance fraud.
  • Temporal signals catch rapid resubmissions and end-of-period clustering that look clean in isolation.
  • All duplicate candidates are routed to review — never auto-rejected or auto-approved.

Why invoice-number matching alone fails

The most common duplicate check is: does this invoice number already exist for this vendor? This catches obvious re-submissions but misses most real-world scenarios. Vendors reuse numbers across periods. AP staff manually key invoices with typos that don't exact-match. PDFs are re-scanned with different filenames. Partial payments get resubmitted as full invoices. None of these are caught by a simple invoice-number lookup.

Production data: in deployments with single-signal duplicate detection enabled, multi-signal checks caught an additional 2–4% of submitted invoices as duplicates that the invoice-number check had already passed.

Layer 1 — Exact match signals

SignalHow it works
Invoice numberExact string match after normalisation (strip spaces, hyphens, leading zeros).
Vendor + invoice numberComposite key — same number from two different vendors is not a duplicate.
Document hashSHA-256 of the raw file catches re-uploads of the identical PDF without re-submission logic.
Amount + vendor + dateTriple composite match for invoices from vendors that reuse numbers across periods.

Layer 2 — Near-match signals

  • Normalised vendor name

    Levenshtein distance check handles 'Acme Pvt Ltd' vs 'Acme Private Ltd'. Vendor aliases from master data are pre-loaded.

  • Amount tolerance window

    Invoices within ±1% of amount, same vendor, within a 30-day window are flagged for review — common in partial-payment resubmission scenarios.

  • GSTIN similarity

    Near-duplicate GSTIN strings (single digit transposition) trigger a vendor identity exception separate from the duplicate check.

  • Semantic hash

    Key field fingerprint (vendor + date + total + invoice number) is stored and compared against a rolling 90-day window.

Layer 3 — Temporal pattern signals

  • Rapid resubmission

    Same vendor + similar amount within 72 hours is a common resubmission pattern. Flagged regardless of invoice number.

  • End-of-period clustering

    Unusually high submission volume from one vendor near period close is logged as a batch-duplicate risk signal.

  • Round-number anomaly

    Exact round amounts (₹10,000, ₹50,000) with no line-item breakdown are scored higher for manual review.

Duplicate candidate resolution

All duplicate candidates — regardless of signal layer — are routed to an exception queue. Operators see the matched invoice, the signal that triggered the flag, and the confidence score. Rejections and approvals are both logged as part of the AP audit trail.

Auto-rejection is not offered. Finance-grade systems require a human in the loop for any duplicate determination that could result in a vendor not getting paid.