Why DataUnchain Solutions Pricing Technology Blog GitHub ↗
Italiano English
🏗️ Application Architecture

How DataUnchain Works

Four coordinated phases — Ingestion, AI Extraction, Scientific Validation and Integration — to send perfect data to your ERP.

Ingestion
AI Vision
Validation
ERP Integration
📂 Phase 1 — Automatic Ingestion

Drop the files. DataUnchain automates.

The ingestion engine uses automation libraries to monitor email inboxes or network-mounted volumes. The moment a new file appears — via network share, scanner or manual copy — processing begins automatically.

Multi-page PDFs are intelligently split into single, high-resolution images. Each page is queued for AI analysis.

Supported Formats

PDF, JPG, PNG, TIFF — including multi-page PDFs

Input Methods

Network share, scanners, manual drop, or API upload

Watchdog · monitoring /app/ingest
✓ New file detected: invoice_2026_001.pdf
✓ PDF split: 3 pages extracted
→ Sending page 1/3 to Model AI...
✓ Page 1 processed (0.8s)
→ Sending page 2/3 to Model AI...
✓ Page 2 processed (0.6s)
→ Sending page 3/3 to Model AI...
✓ Page 3 processed (0.7s)
✓ Moved to /app/processed/
AI Engine · document analysis
👁️
Native Vision — no OCR
Direct reading of the invoice layout
# Extraction Prompt (configurable): "Extract: document_number, issue_date, supplier, vat_number, gross_total, product_list. Reply in JSON only."
👁️ Phase 2 — AI Vision

The model "sees" the document

Unlike traditional OCR, our Vision Language Model relies on native vision capability. It doesn't just read text — it understands spatial layout, tables, handwriting, and even rotated or blurry scans.

You control what is extracted using a simple text prompt in the .env file. Same code, different prompt = different sector.

Multimodal: reads images natively (no OCR layer)
1M token context window for lengthy documents
Understands tables, handwriting, stamps
Runs locally — no internet required
Phase 3 — High Precision Validation

Trust, but verify. Automatically.

No AI is 100% perfect. That's why DataUnchain executes three levels of automatic validation before saving any result:

🧮 Math Check

Python verifies arithmetic: Taxable + VAT = Total. If it doesn't add up, the record gets flagged for human review.

📊 Confidence Score

The model reports confidence per field. Fields under the 85% threshold are flagged with ⚠ for manual check.

🔀 Double-Check (Optional)

Process the same document with two different models. If the results diverge, it flags for review.

validation_engine.py
# Math Check — "The Auditor" def validate_data(extracted): taxable = extracted['taxable'] vat = extracted['vat'] total = extracted['total'] if abs((taxable + vat) - total) > 0.02: extracted['status'] = 'TO_REVIEW' else: extracted['status'] = 'VALIDATED' return extracted
📊 Phase 4 — ERP Integration

Data loaded straight into your ERP

Validated data is written directly into your management system's database, or triggers an API to log goods receipt and accounting in few milliseconds.

💾

PostgreSQL + JSONB

Every extraction is saved with raw data, validated data, source file path, and a timestamp.

📊

Excel Export

Export to .xlsx in one click — ready for the accountant, warehouse manager, or legal office.

🔗

Legacy Integration

Creation of middleware for bidirectional passes with SAP, Zucchetti, TeamSystem, Microsoft Dynamics and AS400 ERPs.

Ready to automate?

Three steps. Three minutes. Zero data entry.

← Back Home Request an Integrated Demo