For 30 years, OCR was the only way to extract data from documents. Today, Vision Language Models do something OCR never could: they understand what they read.
No templates · No manual rules · The model sees and understands
OCR (Optical Character Recognition) was born in the 1990s. It works simply: scan the document image, recognize individual characters, convert them to digital text. So far, so good. The problem is the next step: to extract structured data (invoice number, VAT ID, total), you need a template — a map telling the software "the invoice number is at position X,Y on the page."
This approach has a fundamental flaw: every supplier has a different layout. Every time a supplier changes, a layout shifts, or a slightly different document arrives, the template breaks. The system doesn't "understand" the document — it only knows coordinates. If the invoice is rotated 2 degrees, if there's a stain on the total, if the table has an extra column: error.
In 30 years of development, OCR never solved this problem. It only added complexity layers: image pre-processing, automatic deskew, zone recognition. But the limitation is structural: OCR recognizes characters, it doesn't understand meaning.
A Vision Language Model (VLM) is an AI model that "looks" at documents the way a human would. It doesn't recognize individual characters: it understands the entire layout, relationships between fields, table structures, the meaning of numbers in context.
When a VLM reads an invoice, it doesn't look for "text at coordinate X,Y." It understands that the number in the bottom-right corner, below the word "Total," is the invoice total. It understands this even if the document is rotated, stained, handwritten, or in a layout it has never seen before.
Zero templates. Zero manual rules. Zero maintenance. The model receives a prompt ("extract invoice number, VAT ID, total...") and returns structured JSON. Changing document type means changing the prompt — not rewriting the software.
Traditional OCR vs Vision Language Model — feature by feature.
Our proprietary VLM tested on 219 authentic Italian business documents — invoices, delivery notes, credit notes, receipts, payslips, contracts.
Estimated accuracy on documents with mixed layouts, smartphone scans, stained delivery notes, and non-standard invoices. Error rate rises dramatically on "imperfect" documents.
Certified accuracy across all 219 real documents, including smartphone scans, non-standard layouts, complex tables, and multilingual documents. With auto-learning, targeting 99%+.
A new supplier sends an invoice with a layout different from all previous ones. OCR has no template: it doesn't know where to find fields. Result: null or severely erroneous extraction.
A warehouse delivery note with oil stains, folds, stamps overlapping text. OCR recognizes corrupted characters. The VLM "sees" the document like a human and reconstructs information from context.
An invoice with a complex table: rows spanning two lines, merged columns, intermediate subtotals. OCR loses the tabular structure. The VLM understands cell relationships.
An invoice from a foreign supplier: German header, Italian line items, European number format. OCR needs language-specific configuration. The VLM understands any language natively.
An operator photographs a receipt from their phone: imperfect angle, shadows, partial blur. OCR can't segment the text. The VLM interprets the image with human-like visual capability.
In all 5 scenarios, the VLM outperforms OCR. Not because it's a better OCR — but because it's a fundamentally different technology. It understands, it doesn't just recognize.
Via email, PEC, Telegram, REST API, or shared folder. 5 input channels, all automatically monitored.
Our proprietary Vision AI model "sees" the document, understands layout and content, extracts all required fields into structured JSON. Runs 100% locally via Ollama — no data goes to the cloud.
Python verifies every field: net + VAT = total, valid VAT ID (11 digits), tax code (homocodia-aware), date formats. Low-confidence fields are flagged for human review.
Validated data is sent to your ERP via one of 18 native connectors: Fatture in Cloud, TeamSystem, Zucchetti, Mexal, Odoo, SAP, Salesforce, HubSpot, and more. Zero client-side configuration.
All local. All automatic.
No data leaves your network. No cloud APIs. No per-page costs. No templates to maintain. The VLM is the difference between software that recognizes characters and one that understands documents.
Discover what our proprietary VLM can do with your documents. Request a demo or join the Early Adopter program to try it free for 6 months.