How it Works — DataUnchain

📂 Phase 1 — Automatic Ingestion

Drop the files. DataUnchain automates.

The ingestion engine uses automation libraries to monitor email inboxes or network-mounted volumes. The moment a new file appears — via network share, scanner or manual copy — processing begins automatically.

Multi-page PDFs are intelligently split into single, high-resolution images. Each page is queued for AI analysis.

Supported Formats

PDF, JPG, PNG, TIFF — including multi-page PDFs

Input Methods

Network share, scanners, manual drop, or API upload

Watchdog · monitoring /app/ingest

✓ New file detected: invoice_2026_001.pdf

✓ PDF split: 3 pages extracted

→ Sending page 1/3 to Model AI...

✓ Page 1 processed (0.8s)

→ Sending page 2/3 to Model AI...

✓ Page 2 processed (0.6s)

→ Sending page 3/3 to Model AI...

✓ Page 3 processed (0.7s)

✓ Moved to /app/processed/

AI Engine · document analysis

👁️

Native Vision — no OCR

Direct reading of the invoice layout

                    # Extraction Prompt (configurable):
                    "Extract: document_number,
                     issue_date, supplier,
                     vat_number, gross_total,
                     product_list.
                     Reply in JSON only."
                

👁️ Phase 2 — AI Vision

The model "sees" the document

Unlike traditional OCR, our Vision Language Model relies on native vision capability. It doesn't just read text — it understands spatial layout, tables, handwriting, and even rotated or blurry scans.

You control what is extracted using a simple text prompt in the .env file. Same code, different prompt = different sector.

Multimodal: reads images natively (no OCR layer)

1M token context window for lengthy documents

Understands tables, handwriting, stamps

Runs locally — no internet required

✅ Phase 3 — High Precision Validation

Trust, but verify. Automatically.

No AI is 100% perfect. That's why DataUnchain executes three levels of automatic validation before saving any result:

🧮 Math Check

Python verifies arithmetic: Taxable + VAT = Total. If it doesn't add up, the record gets flagged for human review.

📊 Confidence Score

The model reports confidence per field. Fields under the 85% threshold are flagged with ⚠ for manual check.

🔀 Double-Check (Optional)

Process the same document with two different models. If the results diverge, it flags for review.

validation_engine.py

                    # Math Check — "The Auditor"
                    def validate_data(extracted):
                    taxable = extracted['taxable']
                    vat = extracted['vat']
                    total = extracted['total']

                    if abs((taxable + vat) - total) > 0.02:
                    extracted['status'] = 'TO_REVIEW'
                    else:
                    extracted['status'] = 'VALIDATED'

                    return extracted
                

📊 Phase 4 — ERP Integration

Data loaded straight into your ERP

Validated data is written directly into your management system's database, or triggers an API to log goods receipt and accounting in few milliseconds.

💾

PostgreSQL + JSONB

Every extraction is saved with raw data, validated data, source file path, and a timestamp.

📊

Excel Export

Export to .xlsx in one click — ready for the accountant, warehouse manager, or legal office.

🔗

Legacy Integration

Creation of middleware for bidirectional passes with SAP, Zucchetti, TeamSystem, Microsoft Dynamics and AS400 ERPs.

Ready to automate?

Three steps. Three minutes. Zero data entry.

← Back Home Request an Integrated Demo

How DataUnchain Works