Number7AI — Docs
Production benchmarks
Benchmarks from real AP deployments on Indian document sets — not controlled demo conditions. We report the metrics that matter for operations: straight-through processing, ERP-safe accuracy, exception volume, and cycle time.
Last updated: April 2026
TL;DR
- STP moved from industry baseline ~60% to 90%+ in documented deployment conditions on Indian AP.
- Field accuracy reached 99.5% in production; 98.7% on mixed pilot datasets with diverse layouts.
- Error volume dropped ~90% in a high-volume BPO AP workflow.
- Median upload-to-export-ready time: ~4.2 minutes including validation.
Why we measure this, not OCR accuracy
Headline OCR accuracy tells you how many characters were recognised correctly. It does not tell you how many invoices were posted correctly. A document can have 99.8% character accuracy and still produce a wrong ERP record if unit price and quantity are in the wrong columns.
The metrics below reflect operational reality: how many documents flow end-to-end without human touch, how many errors reach the exception queue, and how fast a document moves from upload to export-ready state.
90%+
Straight-through processing
vs ~60% baseline
99.5%
Field accuracy (production)
on Indian AP documents
~90%
Error volume reduction
in BPO AP workflow
4.2 min
Median cycle time
upload → export-ready
Core processing metrics
| Metric | Industry baseline | AIdaptIQ (observed) |
|---|---|---|
| Straight-through processing rate | ~60% | 90%+ |
| AP field accuracy (production) | Varies by template and quality | 99.5% |
| Accuracy on mixed pilot dataset | Manual baseline | 98.7% |
| Error volume at scale (BPO) | ~2,500/month | <250/month |
Speed and throughput
| Stage | Observed time | Notes |
|---|---|---|
| Single invoice extraction | <30 seconds | Including validation pass |
| Bulk PDF (10–40 invoices) | 2–5 minutes | Includes boundary detection |
| Upload → export-ready state | Median ~4.2 min | Extraction + validation; excludes approval SLA |
Residual failure rates by class
Rates after AIdaptIQ processing on Indian production documents. Residual failures are routed to exception queues — they do not silently post.
| Failure class | Residual range | Disposition |
|---|---|---|
| Multi-row table continuity | ~1.5–2% | Flagged for review |
| Locale/format numeric | <0.5% → ~5% | Auto-corrected (known) or flagged |
| Tax amount inference | ~8% | Flagged with context |
| Vendor identity mismatch | ~2–5% | Flagged for master match |
Methodology and caveats
What is included
Metrics combine production deployments and pilot datasets across Indian AP workflows. Accuracy is field-level where possible, not just document-level pass/fail.
Cycle time scope
Cycle-time values include extraction and validation. Approval timing varies by customer policy and is not included in the median.
When numbers will be lower
Mixed-language, multi-currency documents and low-quality scans can lower first-pass confidence. Exception queue volume goes up.
How to compare fairly
Use your own production documents, not demo samples. Measure STP, silent error rate, and exception volume — not just headline accuracy.
FAQ
- Do these numbers include validation or only extraction?
- Workflow performance combining extraction, validation, and exception handling. Not raw OCR numbers.
- Is 99.5% guaranteed for every document set?
- No. It is observed in specific AP conditions on Indian documents. Complexity and scan quality affect outcomes. Run your own pilot to get numbers for your corpus.
- What should I compare first?
- Compare STP, residual error volume, and time-to-export-ready on your real documents — not sanitised demo samples.
Related reading
- Read
Extraction failure modes
What classes of failure drive the exception queue.
- Read
OCR vs. IDP
Why STP and ERP-safe rate matter more than OCR accuracy.
- Read
Competitor analysis
How competing IDP products performed on the same test set.
- Read
IDP market landscape
How to evaluate vendors for your deployment context.