Number7AI — Docs

Production benchmarks

Benchmarks from real AP deployments on Indian document sets — not controlled demo conditions. We report the metrics that matter for operations: straight-through processing, ERP-safe accuracy, exception volume, and cycle time.

Last updated: April 2026

TL;DR

  • STP moved from industry baseline ~60% to 90%+ in documented deployment conditions on Indian AP.
  • Field accuracy reached 99.5% in production; 98.7% on mixed pilot datasets with diverse layouts.
  • Error volume dropped ~90% in a high-volume BPO AP workflow.
  • Median upload-to-export-ready time: ~4.2 minutes including validation.

How to read these numbers

Figures below come from production deployments and named pilots on real Indian AP workloads—not lab-only demos. Where a metric is tied to a single customer, we name them. Where we cite an industry range, it is from third-party analyst research (e.g. AYR, Everest Group) for comparison; we do not control those estimates.

Metric blockPrimary basis
STP 90%+ vs ~60% baselinePransform Inc.: ~50k documents/month, BPO serving 300+ companies. Industry ~60% touchless figure: analyst-survey style baselines for mixed AP (see footnotes in STP & fair benchmarking).
Field accuracy 99.5% / 98.7%Field-level measurement (vendor, invoice #, dates, each line, tax, total)—not document-level pass/fail. Pransform production (99.5%); JSK Automation pilot over 2,400 documents across WhatsApp, email, scanned PDF (98.7%).
Error volume ~2,500 → <250/monthPransform, same throughput: a document that required correction after initial extraction (wrong field, missing line, failed validation)—not “could not process at all.”
Median 4.2 min to export-readyEnd-to-end including QuickBooks master checks (vendor match, GL verification). Excludes customer approval SLA—by design, because approval depth varies.

Why we measure this, not OCR accuracy

Headline OCR accuracy tells you how many characters were recognised correctly. It does not tell you how many invoices were posted correctly. A document can have 99.8% character accuracy and still produce a wrong ERP record if unit price and quantity are in the wrong columns.

The metrics below reflect operational reality: how many documents flow end-to-end without human touch, how many errors reach the exception queue, and how fast a document moves from upload to export-ready state.

90%+

Straight-through processing

vs ~60% baseline

99.5%

Field accuracy (production)

on Indian AP documents

~90%

Error volume reduction

in BPO AP workflow

4.2 min

Median cycle time

upload → export-ready

Core processing metrics

The STP gap is an operational volume number, not a slogan. At Pransform scale (~50k docs/month), moving from ~60% to 90%+ STP frees roughly 15,000 documents per month from routine human touch (30 points × 50k ≈ 15k)— the same documents that previously sat in review or rework queues.

MetricIndustry baselineAIdaptIQ (observed)
Straight-through processing rate~60% (analyst-style mixed AP)90%+ (Pransform)
AP field accuracy (production)Up to ~99.9% ceiling on highly structured formats (analyst research)99.5% (Pransform)
Accuracy on mixed pilot datasetManual baseline98.7% (JSK pilot)
Error volume at scale (BPO)~2,500/month corrections<250/month (Pransform)

Speed and throughput

StageObserved timeNotes
Single invoice extraction<30 secondsIncluding validation pass
Bulk PDF (10–40 invoices)2–5 minutesIncludes boundary detection
Upload → export-ready stateMedian ~4.2 minExtraction + validation; excludes approval SLA
Pilot turnaround (first doc → full report)<72 hoursJSK Automation pilot

Matching and reconciliation (observed)

Cycle-time wins compound when matching moves from batch manual work to always-on validation.

ProcessBeforeAfter
LR-to-invoice matching (Fairlorry)2–3 days manualReal-time automated
Duplicate detection triggerManual catchInstant at upload
Debit/credit note generationHours of draftingSystem-generated
PO intake (WhatsApp / email → structured record)Hours per batchMinutes

Labor, leakage, and payback (documented deployments)

Pransform (BPO)

~50k docs/month: team from 12 AP clerks to 3 exception handlers (~75% reduction), same volume.

Payback: under 1 month on eliminated labor vs. platform cost (internal customer economics).

JSK Automation

Shift from dedicated data-entry scaling with order volume to software-elastic intake (zero dedicated data-entry headcount for that path).

Fairlorry

Reconciliation team moved from 2–3 day batch cycles to exceptions-only on flagged mismatches.

Billing leakage recovered: ~3–5% of freight revenue from previously late or missed reconciliation (weights, damage claims, rate errors).

Analyst IDP research (AYR) often cites 30–200% year-one ROIranges driven primarily by labor. Pransform's sub-month payback sits at the aggressive end because savings were immediate and did not require a long retraining cycle—consistent with a zero-template extraction architecture.

Accuracy by document type and complexity

Use this table to set expectations for pilots: complexity shifts first-pass accuracy and exception mix—not only OCR quality.

Document typeObserved accuracyNotes
Standard vendor invoice (single page)99.5%+Core production AP
Multi-page invoice (line continuation)99%+Page-continuity detection
Mixed-language (English + Hindi/Gujarati)97%+Degrades at script boundaries
Multi-currency + exchange rate97%+Currency column semantics
Bulk PDF (mixed invoices, random order)94.7%Boundary accuracy, first pass
Handwritten / partial handwriting95%+Handwriting module path
Photo capture (mobile)96%+JSK WhatsApp channel

Deployment and retraining economics

MetricValueBasis
Standard deployment time<2 weeksFairlorry (4-module custom build)
Typical IDP deployment (industry)4–8 weeksIncludes template/retrain cycles (analyst / market baseline)
Retraining for new BPO clientNone requiredZero-template architecture
Retraining for new vendor layoutNone requiredLayout-agnostic extraction path

Published competitor benchmarks (read the footnote)

Competitor figures below are from public vendor case studies—we cite them to show headline numbers in market, not to claim identical measurement conditions to our AP deployments.

MetricAIdaptIQNanonets (published)Docsumo (published)ABBYY (published)
STP / touchless90%+ (mixed AP)99% (medical forms, PayGround)99% (invoices, Valtatech)90% (inbound docs, Paragon)
Time savings (case study)80% matching time (Fairlorry)90% PO time (Suzano)86% dispatch (NS Trucking)75% supplier PO (GEMLUX)
Staff / cost reduction75% AP headcount (Pransform)52% cost reduction (SafeRide)N/A in cited storyN/A in cited story
Deployment time<2 weeks (Fairlorry)Not published in table source4–8 weeks (vendor-stated)Weeks–months (partner-led)
Retrain per new vendorNone (layout-agnostic)RequiredRequiredRequired
India-first / Tally / multi-client BPONative / bi-directional / nativeNot documented equivalentlyNot documented equivalentlyNot documented equivalently

How to read this row fairly:Nanonets' 99% STP on medical forms (PayGround) is not the same population as 90%+ STP on mixed-format Indian AP invoices (Pransform). Interpolate between vendors only after you align document mix, STP definition, and denominator. For document-layer stress tests on real invoices, see Competitor analysis.

Documented residual failure modes (honest ceiling)

We do not claim 100% on production diversity. Representative classes we flag rather than silently post:

  1. Vendor name mismatch (header trade name vs remittance legal entity)—≈8% on utility-style invoices; flagged for review.
  2. Tax as rate only(e.g. "18% GST" without rupee amount)—calculation depends on discount base; ≈8% residual when structure is ambiguous.
  3. Line items rendered as images in PDFs exported from legacy ERP—image extraction path, slower and slightly less accurate than native text.

Full failure-mode taxonomy →

Residual failure rates by class

Rates after AIdaptIQ processing on Indian production documents. Residual failures are routed to exception queues — they do not silently post.

Failure classResidual rangeDisposition
Multi-row table continuity~1.5–2%Flagged for review
Locale/format numeric<0.5% → ~5%Auto-corrected (known) or flagged
Tax amount inference~8%Flagged with context
Vendor identity mismatch~2–5%Flagged for master match

Methodology and caveats

  • What is included

    Metrics combine production deployments and pilot datasets across Indian AP workflows. Accuracy is field-level where possible, not just document-level pass/fail.

  • Cycle time scope

    Cycle-time values include extraction and validation. Approval timing varies by customer policy and is not included in the median.

  • When numbers will be lower

    Mixed-language, multi-currency documents and low-quality scans can lower first-pass confidence. Exception queue volume goes up.

  • How to compare fairly

    Use your own production documents, not demo samples. Measure STP, silent error rate, and exception volume — not just headline accuracy.

FAQ

Do these numbers include validation or only extraction?
Workflow performance combining extraction, validation, and exception handling. Not raw OCR numbers.
Is 99.5% guaranteed for every document set?
No. It is observed in specific AP conditions on Indian documents. Complexity and scan quality affect outcomes. Run your own pilot to get numbers for your corpus.
What should I compare first?
Compare STP, residual error volume, and time-to-export-ready on your real documents — not sanitised demo samples.