From PDF Chaos to Structured Data in Seconds

Step 1: Scan and drop

Your scanner outputs PDFs to a network folder. DataUnchain's Watchdog service detects each new file instantly. Multi-page PDFs are automatically split into individual page images.

Step 2: AI reads each page

Qwen 3.5 VL analyses each page image. Unlike OCR, it understands the document — it knows where the invoice number is, where the totals are, and can read handwritten notes next to the line items.

You've configured the extraction prompt once:

                "Extract: invoice_number, date, supplier,

                 vat_id, subtotal, vat, total, line_items.

                 Reply in JSON."

Step 3: Math validation

For each invoice, Python checks: subtotal + vat == total. If it doesn't match within a 2-cent tolerance, the record is flagged NEEDS_REVIEW instead of VALIDATED.

Out of 200 invoices, typically 3–5 get flagged — either because the AI misread a digit, or because the original invoice actually has an error.

Step 4: Clean export

All 200 invoices are now in PostgreSQL. Export to Excel with one click. Upload to your accounting software. Done.

Total time: 3 minutes instead of 8 hours.