Number7AI — Docs
Production benchmarks
Benchmarks from real AP deployments on Indian document sets — not controlled demo conditions. We report the metrics that matter for operations: straight-through processing, ERP-safe accuracy, exception volume, and cycle time.
Last updated: April 2026
TL;DR
- STP moved from industry baseline ~60% to 90%+ in documented deployment conditions on Indian AP.
- Field accuracy reached 99.5% in production; 98.7% on mixed pilot datasets with diverse layouts.
- Error volume dropped ~90% in a high-volume BPO AP workflow.
- Median upload-to-export-ready time: ~4.2 minutes including validation.
How to read these numbers
Figures below come from production deployments and named pilots on real Indian AP workloads—not lab-only demos. Where a metric is tied to a single customer, we name them. Where we cite an industry range, it is from third-party analyst research (e.g. AYR, Everest Group) for comparison; we do not control those estimates.
| Metric block | Primary basis |
|---|---|
| STP 90%+ vs ~60% baseline | Pransform Inc.: ~50k documents/month, BPO serving 300+ companies. Industry ~60% touchless figure: analyst-survey style baselines for mixed AP (see footnotes in STP & fair benchmarking). |
| Field accuracy 99.5% / 98.7% | Field-level measurement (vendor, invoice #, dates, each line, tax, total)—not document-level pass/fail. Pransform production (99.5%); JSK Automation pilot over 2,400 documents across WhatsApp, email, scanned PDF (98.7%). |
| Error volume ~2,500 → <250/month | Pransform, same throughput: a document that required correction after initial extraction (wrong field, missing line, failed validation)—not “could not process at all.” |
| Median 4.2 min to export-ready | End-to-end including QuickBooks master checks (vendor match, GL verification). Excludes customer approval SLA—by design, because approval depth varies. |
Why we measure this, not OCR accuracy
Headline OCR accuracy tells you how many characters were recognised correctly. It does not tell you how many invoices were posted correctly. A document can have 99.8% character accuracy and still produce a wrong ERP record if unit price and quantity are in the wrong columns.
The metrics below reflect operational reality: how many documents flow end-to-end without human touch, how many errors reach the exception queue, and how fast a document moves from upload to export-ready state.
90%+
Straight-through processing
vs ~60% baseline
99.5%
Field accuracy (production)
on Indian AP documents
~90%
Error volume reduction
in BPO AP workflow
4.2 min
Median cycle time
upload → export-ready
Core processing metrics
The STP gap is an operational volume number, not a slogan. At Pransform scale (~50k docs/month), moving from ~60% to 90%+ STP frees roughly 15,000 documents per month from routine human touch (30 points × 50k ≈ 15k)— the same documents that previously sat in review or rework queues.
| Metric | Industry baseline | AIdaptIQ (observed) |
|---|---|---|
| Straight-through processing rate | ~60% (analyst-style mixed AP) | 90%+ (Pransform) |
| AP field accuracy (production) | Up to ~99.9% ceiling on highly structured formats (analyst research) | 99.5% (Pransform) |
| Accuracy on mixed pilot dataset | Manual baseline | 98.7% (JSK pilot) |
| Error volume at scale (BPO) | ~2,500/month corrections | <250/month (Pransform) |
Speed and throughput
| Stage | Observed time | Notes |
|---|---|---|
| Single invoice extraction | <30 seconds | Including validation pass |
| Bulk PDF (10–40 invoices) | 2–5 minutes | Includes boundary detection |
| Upload → export-ready state | Median ~4.2 min | Extraction + validation; excludes approval SLA |
| Pilot turnaround (first doc → full report) | <72 hours | JSK Automation pilot |
Matching and reconciliation (observed)
Cycle-time wins compound when matching moves from batch manual work to always-on validation.
| Process | Before | After |
|---|---|---|
| LR-to-invoice matching (Fairlorry) | 2–3 days manual | Real-time automated |
| Duplicate detection trigger | Manual catch | Instant at upload |
| Debit/credit note generation | Hours of drafting | System-generated |
| PO intake (WhatsApp / email → structured record) | Hours per batch | Minutes |
Labor, leakage, and payback (documented deployments)
Pransform (BPO)
~50k docs/month: team from 12 AP clerks to 3 exception handlers (~75% reduction), same volume.
Payback: under 1 month on eliminated labor vs. platform cost (internal customer economics).
JSK Automation
Shift from dedicated data-entry scaling with order volume to software-elastic intake (zero dedicated data-entry headcount for that path).
Fairlorry
Reconciliation team moved from 2–3 day batch cycles to exceptions-only on flagged mismatches.
Billing leakage recovered: ~3–5% of freight revenue from previously late or missed reconciliation (weights, damage claims, rate errors).
Analyst IDP research (AYR) often cites 30–200% year-one ROIranges driven primarily by labor. Pransform's sub-month payback sits at the aggressive end because savings were immediate and did not require a long retraining cycle—consistent with a zero-template extraction architecture.
Accuracy by document type and complexity
Use this table to set expectations for pilots: complexity shifts first-pass accuracy and exception mix—not only OCR quality.
| Document type | Observed accuracy | Notes |
|---|---|---|
| Standard vendor invoice (single page) | 99.5%+ | Core production AP |
| Multi-page invoice (line continuation) | 99%+ | Page-continuity detection |
| Mixed-language (English + Hindi/Gujarati) | 97%+ | Degrades at script boundaries |
| Multi-currency + exchange rate | 97%+ | Currency column semantics |
| Bulk PDF (mixed invoices, random order) | 94.7% | Boundary accuracy, first pass |
| Handwritten / partial handwriting | 95%+ | Handwriting module path |
| Photo capture (mobile) | 96%+ | JSK WhatsApp channel |
Deployment and retraining economics
| Metric | Value | Basis |
|---|---|---|
| Standard deployment time | <2 weeks | Fairlorry (4-module custom build) |
| Typical IDP deployment (industry) | 4–8 weeks | Includes template/retrain cycles (analyst / market baseline) |
| Retraining for new BPO client | None required | Zero-template architecture |
| Retraining for new vendor layout | None required | Layout-agnostic extraction path |
Published competitor benchmarks (read the footnote)
Competitor figures below are from public vendor case studies—we cite them to show headline numbers in market, not to claim identical measurement conditions to our AP deployments.
| Metric | AIdaptIQ | Nanonets (published) | Docsumo (published) | ABBYY (published) |
|---|---|---|---|---|
| STP / touchless | 90%+ (mixed AP) | 99% (medical forms, PayGround) | 99% (invoices, Valtatech) | 90% (inbound docs, Paragon) |
| Time savings (case study) | 80% matching time (Fairlorry) | 90% PO time (Suzano) | 86% dispatch (NS Trucking) | 75% supplier PO (GEMLUX) |
| Staff / cost reduction | 75% AP headcount (Pransform) | 52% cost reduction (SafeRide) | N/A in cited story | N/A in cited story |
| Deployment time | <2 weeks (Fairlorry) | Not published in table source | 4–8 weeks (vendor-stated) | Weeks–months (partner-led) |
| Retrain per new vendor | None (layout-agnostic) | Required | Required | Required |
| India-first / Tally / multi-client BPO | Native / bi-directional / native | Not documented equivalently | Not documented equivalently | Not documented equivalently |
How to read this row fairly:Nanonets' 99% STP on medical forms (PayGround) is not the same population as 90%+ STP on mixed-format Indian AP invoices (Pransform). Interpolate between vendors only after you align document mix, STP definition, and denominator. For document-layer stress tests on real invoices, see Competitor analysis.
Documented residual failure modes (honest ceiling)
We do not claim 100% on production diversity. Representative classes we flag rather than silently post:
- Vendor name mismatch (header trade name vs remittance legal entity)—≈8% on utility-style invoices; flagged for review.
- Tax as rate only(e.g. "18% GST" without rupee amount)—calculation depends on discount base; ≈8% residual when structure is ambiguous.
- Line items rendered as images in PDFs exported from legacy ERP—image extraction path, slower and slightly less accurate than native text.
Residual failure rates by class
Rates after AIdaptIQ processing on Indian production documents. Residual failures are routed to exception queues — they do not silently post.
| Failure class | Residual range | Disposition |
|---|---|---|
| Multi-row table continuity | ~1.5–2% | Flagged for review |
| Locale/format numeric | <0.5% → ~5% | Auto-corrected (known) or flagged |
| Tax amount inference | ~8% | Flagged with context |
| Vendor identity mismatch | ~2–5% | Flagged for master match |
Methodology and caveats
What is included
Metrics combine production deployments and pilot datasets across Indian AP workflows. Accuracy is field-level where possible, not just document-level pass/fail.
Cycle time scope
Cycle-time values include extraction and validation. Approval timing varies by customer policy and is not included in the median.
When numbers will be lower
Mixed-language, multi-currency documents and low-quality scans can lower first-pass confidence. Exception queue volume goes up.
How to compare fairly
Use your own production documents, not demo samples. Measure STP, silent error rate, and exception volume — not just headline accuracy.
FAQ
- Do these numbers include validation or only extraction?
- Workflow performance combining extraction, validation, and exception handling. Not raw OCR numbers.
- Is 99.5% guaranteed for every document set?
- No. It is observed in specific AP conditions on Indian documents. Complexity and scan quality affect outcomes. Run your own pilot to get numbers for your corpus.
- What should I compare first?
- Compare STP, residual error volume, and time-to-export-ready on your real documents — not sanitised demo samples.
Related reading
- Read
STP & self-healing
Industry baselines, the self-healing loop, and how to compare vendors fairly.
- Read
Extraction failure modes
What classes of failure drive the exception queue.
- Read
OCR vs. IDP
Why STP and ERP-safe rate matter more than OCR accuracy.
- Read
Competitor analysis
How competing IDP products performed on the same test set.
- Read
IDP market landscape
How to evaluate vendors for your deployment context.