Why DataUnchain Solutions Pricing Technology Blog GitHub ↗
Italiano English
Blog · March 3, 2026

From PDF Chaos to Structured Data in Seconds

You have 200 supplier invoices on your desk. Each one has a different format. Normally, this means a full day of manual data entry. With DataUnchain, it takes 3 minutes.

Step 1: Scan and drop

Your scanner outputs PDFs to a network folder. DataUnchain's Watchdog service detects each new file instantly. Multi-page PDFs are automatically split into individual page images.

Step 2: AI reads each page

Qwen 3.5 VL analyses each page image. Unlike OCR, it understands the document — it knows where the invoice number is, where the totals are, and can read handwritten notes next to the line items.

You've configured the extraction prompt once:

"Extract: invoice_number, date, supplier,
vat_id, subtotal, vat, total, line_items.
Reply in JSON."

Step 3: Math validation

For each invoice, Python checks: subtotal + vat == total. If it doesn't match within a 2-cent tolerance, the record is flagged NEEDS_REVIEW instead of VALIDATED.

Out of 200 invoices, typically 3–5 get flagged — either because the AI misread a digit, or because the original invoice actually has an error.

Step 4: Clean export

All 200 invoices are now in PostgreSQL. Export to Excel with one click. Upload to your accounting software. Done.

Total time: 3 minutes instead of 8 hours.