Number7AI — Docs
Why every major IDP platform failed on real invoices
Direct testing on production-representative messy invoices during AIdaptIQ’s founding R&D—not a vendor scorecard. We lead with document and line-item truth because everything else in the AP cycle—routing, approvals, audit, vendor analytics—depends on data you can post. This page ties those lab findings to how the broader IDP market actually competes.
Last updated: April 2026
On this page
TL;DR
- Most IDP products look strong on curated demos; production diversity (language, table shape, bulk PDFs) is where results diverge.
- We observed structural misparses, not only low confidence: wrong columns, wrong lines, and silent ERP-risk outputs.
- “Accuracy” without semantic table understanding still corrupts posting when numbers sit in the wrong fields.
- Choose tools for your real document mix, deployment economics, and how much of the AP cycle you need beyond extraction.
R&D basis
How this document was written
Before building AIdaptIQ, our engineers spent months testing every IDP platform they could access on real invoices—not curated demo documents or clean PDFs from modern accounting software, but the actual messy documents that show up in production: scanned paper bills, mixed-language invoices, multi-currency PDFs, handwritten notes, and bulk files with multiple invoices in arbitrary page order.
The tests were not adversarial. We were looking for a tool that worked. Each platform was run on identical documents; we logged specific failure modes—wrong fields, missed line items, structural misparses—so we could understand why they failed, not only that they failed. That record became the foundation for AIdaptIQ.
What we actually ran
The test methodology
Documents used included:
- Indian supplier invoices with GST breakdowns
- Multi-page invoices where line items span pages
- Mixed English/Hindi and English/Gujarati content
- Non-standard number formats (Indian lakh notation, European comma–decimal)
- Bulk PDFs with several invoices in random page order
- Photocopied scans with moderate degradation
Evaluation criteria
- Column structure: did the extracted table match the real table?
- Field accuracy: vendor, amount, date, line items
- Contextual handling: were anomalies caught or silently passed through?
- Usability for ERP push: could the output be posted as-is to QuickBooks or Tally?
The last criterion matters most. An extraction that looks fine on screen but has wrong types or format anomalies can corrupt an import. “Mostly right” is often worse than “clearly wrong” because the error is silent.
Nanonets
Beyond extraction
Nanonets also markets AP workflows and integrations; this block is only what we saw at the document layer on our Indian test set. If line structure is unsafe, downstream assignment, exception handling, and vendor analytics all sit on sand—regardless of workflow features elsewhere in the product.
What we found (document layer)
Good product experience, smooth UI, reasonable accuracy on clean, single-language invoices.
Where it failed in our R&D tests
On a standard Indian supplier invoice with multi-row product descriptions and sub-totals: wrong column mapping (description in quantity column), header row treated as the first data row (first line missed), and catalog fragments showing up in “model” fields. The system followed visual layout instead of understanding catalog-style line semantics.
Verdict
Works well when layouts stay simple and consistent. Where our samples were hard, structural errors were posting risk: finance needs correct line geometry before close, audit, or spend analytics mean much.
Rossum
Beyond extraction
Rossum is sold as enterprise AP automation, not a narrow OCR toy. The point of this test is: if core line structure fails on a common Indian GST pattern, the rest of the stack (matching, accruals, reporting) has nothing trustworthy to work with—no matter what the deck says about automation rates.
What we found (document layer)
Enterprise positioning and marketing for high-volume AP automation.
Where it failed in our R&D tests
On a standard Indian GST invoice: two real products (two lines each) merged into one phantom line; unit price column missed though visible; a “Particulars” column header misread as a product. Fundamental structural misparses on a document type that is a large share of South Asian AP volume—not a one-off edge case.
Verdict
In our run, the output did not match the marketing story for that document. Run your own production sample before you bet the operating plan on touchless rates.
Hyperscience
Beyond extraction
Hyperscience’s public positioning covers large-scale unstructured document work beyond AP. This section is a narrow, brutal test: mixed-language line cells on invoices. If that layer collapses, you do not get to a defensible close, a clean audit trail, or useful vendor metrics—whatever the rest of the platform can do in other departments.
What we found (document layer)
Strong enterprise story and claims of high accuracy in broad document automation.
Where it failed in our R&D tests
On a mixed-language invoice (English labels, Hindi line text): output unusable for ERP import—column boundaries failed, unit prices and names wrong or garbled. At script boundaries, spatial cues for columns are interpreted inconsistently, so text can be “correct” but from the wrong cell.
Verdict
On our mixed-language invoice sample, we would not have posted the result. Treat this as a reason to test your own script mix, not as a final grade on the entire product line.
DocSumo
Beyond extraction
DocSumo’s own marketing includes touchless storylines and operations dashboards. Those only compound value when semantic fields are right. If unit price and quantity are swapped, every downstream report—open payables, vendor concentration, DPO—turns into noise until someone fixes the document layer.
What we found (document layer)
Competitive for many structured Western-format documents; known strong vertical case studies in some US markets (per vendor public materials).
Where it failed in our R&D tests
Numbers in plausible positions but mapped to the wrong semantic fields (unit price vs quantity vs line total)—visual position without understanding when column headers don’t match Western training patterns. Unsuitable for straight ERP push even when digits appear on screen.
Verdict
We only know how our Indian samples behaved. DocSumo may shine on other corpora. Bring your own PDFs and decide whether you trust the line items before you trust the dashboard.
Landing AI (agentic approach)
Beyond extraction
An agentic shell does not give you a finance-grade inbox, per-step audit, or GL-safe controls by itself—you still have to productize who approves, what evidence is kept, and how analytics tie back to posted lines.
Technical take
An agent can chain detect layout, infer fields, and emit structure. On simple documents that works. On complex financial layouts, each step in the chain can fail, and compounding errors produce incoherent output—not just lower accuracy.
Controls and money movement
Finance can’t use unconstrained agency on money movement: a wrong vendor, amount, or GL code is a compliance and close problem. AP needs controlled AI with human-in-the-loop for exceptions, not open-ended “figure it out” behavior on every document.
Azure OCR and custom scripts
Beyond extraction
Cloud OCR plus glue code can feed a database, not a full finance system. Somebody still owns task assignment, exception SLAs, comment threads, retention for audit, and the reporting layer the CFO actually reads.
What we tried
We also tried Microsoft Azure’s OCR with custom post-processing. Better than raw OCR, but still brittle: a parsing path per layout does not scale when thousands of unique vendor layouts exist—every template change is maintenance, and simple scripts fail on contextual anomalies (e.g. 4200/- or 00:80) that need understanding, not regex alone.
Verdict
Viable for a small layout set and a single well-staffed engineering team. At real vendor diversity you need a productized finance layer on top, not only more scripts.
The pattern
What all these failures have in common
They optimize for demo performance, not production diversity.
Demos: clean, single-language, single-page, one invoice per file. Production: 40+ page PDFs, six vendors, random order, Hindi in some rows, handwriting, odd number formats, line items that span page breaks. No template-only system covers that without configuration per layout; no purely agentic stack meets financial precision requirements by itself.
A system that succeeds needs:
- Contextual structure understanding, not just templates
- Business validation, not only character recognition
- Format anomalies as first-class, not “edge cases we’ll fix later”
- Human judgment reserved for true exceptions, not every invoice
Where each platform actually competes
Every name below is a real product with real customers. The question is what each is optimized for versus your document mix, geography, and deployment economics.
Enterprise platforms (e.g. ABBYY, UiPath, Tungsten/Kofax)
Mature, analyst-recognized, multi-department scale, RPA and compliance depth, Western document baselines, partner-led implementations. Weaker fit for sub–2 week deployments, zero-template onboarding, and India-first document defaults unless scoped explicitly in the project.
Mid-market IDP (Nanonets, DocSumo, Rossum, and similar)
Often the same buyer as AIdaptIQ for mid-market AP. Public case studies show strong results on clean, consistent documents; our work isolates where Indian GST and mixed-language tables still stress extraction-only architectures—before you get to full workflow, audit, and analytics expectations.
Hyperscience
Positioned for broad unstructured work (contracts, reports). Different sweet spot from high-volume, field-precise AP on semi-structured invoices in mixed scripts.
How AIdaptIQ is positioned differently
We are not trying to be a generic “ABBYY for every department in India.” The build is focused on:
- Indian business documents—GST, lakh notation, Tally-relevant output, mixed-language lines—as core design, not afterthought.
- Multi-client and BPO-style economics: reduce per-vendor retraining and configuration drag.
- Template-light paths so onboarding measures in weeks, not only quarters.
- A full finance-operations direction: from intake and assignment through validation, collaboration, audit trail, and analytics—not field extraction alone.
| Bucket | Typical strength | Observed stress on messy Indian AP |
|---|---|---|
| Mid-market IDP | Fast value on clean, regular layouts | Layout drift, script mixing, retraining and template cost |
| Enterprise suites | Scale, partners, many document types | Long deployments; narrow AP scope can still be expensive |
| Cloud OCR (Azure, Google, Textract…) | Infra, APIs, batch scale | You still own workflow, policy, and ERP-safe semantics |
FAQ
- Is this an attack on these vendors?
- No. It is a test log and a market map. Every serious vendor wins some workloads; the point is to match product architecture to your documents and to your required depth of AP workflow, not to a score from a one-page demo.
- What beats raw extraction accuracy as a KPI?
- ERP-safe pass rate, silent-error risk, exception handling, and time-to-correct. If you cannot trust structure, analytics and close quality downstream are compromised.