Development Roadmap — DataUnchain

How to Read This Roadmap

Each milestone is grouped by quarter and version number. Features marked with a green check are already shipped. Everything else is planned or in active development.

✅

Shipped

The feature is live in production and available to every user right now.

🛠️

In Progress

Active development. Expected to land in the target quarter.

📋

Planned

On the backlog, validated by user research. Timing may shift based on feedback.

Now

v2.1 — Current State

March 2026. This is what DataUnchain can do today. Every feature below has been validated on real Italian invoices and deployed on-premise with early testers.

✔️

Proprietary VLM Extraction

Our proprietary Vision-Language Model reads invoices, delivery notes, and receipts directly from images — no OCR layer required. It handles rotated scans, stamps, handwriting, and complex table layouts.

✔️

Mathematical Validation Engine

A dedicated Python layer cross-checks every extracted number: Taxable + VAT = Total, line-item sums, rounding tolerance. Inconsistencies are flagged before anything reaches your ERP.

✔️

ERP Adapter Framework

Fattura Elettronica adapter is production-ready. The architecture is modular: each ERP connector is a standalone Python adapter that maps JSON extraction to the target format (XML, CSV, SQL).

✔️

Progressive Learning v1

The system learns from every correction: when a user fixes an extraction, the correction is stored and used to improve future prompts. Accuracy climbs with every processed document.

✔️

Docker-Based Appliance

Complete containerised stack: VLM engine, processor, PostgreSQL, hot-folder watcher. Runs on a single GPU-equipped machine. Zero cloud dependency, absolute data privacy.

✔️

Hot-Folder & REST API v1

Drop a PDF into a watched folder or POST it to the API. Extraction starts automatically. Results land in PostgreSQL and are available via JSON or webhook callback.

✔️

Multi-Schema Configuration

Switch between document types — invoices, transport documents, contracts — by changing a single prompt string in the configuration. No retraining, no redeployment.

✔️

RPA Vision Engine

A lightweight vision-guided RPA module that types extracted data directly into legacy interfaces — no selectors, no brittle macros, no cloud APIs. Works with any screen-based ERP.

Q2 2026

v2.5 — Consolidation

April – June 2026. The focus shifts from “does it work?” to “can anyone set it up in one afternoon?” We harden the product, onboard the first paying customers, and ship the first third-party ERP connector.

🛠️

Early Adopter Program Launch

Five selected businesses get a free, fully-managed installation, six months at zero cost, and a direct line to the founder. In return, they help us validate real-world edge cases and shape the product. Apply now →

🛠️

One-Click Onboarding Wizard

A guided setup experience that walks new users through GPU detection, Docker health checks, model download, and the first test extraction — all from a single web interface.

🛠️

Fine-Tuning Pipeline v1

A semi-automated pipeline that takes accumulated user corrections, packages them into training data, and fine-tunes the VLM on Italian invoice patterns. The first iteration targets global accuracy improvements across all suppliers.

🛠️

Danea Easyfatt Connector

The first official third-party ERP adapter. Danea Easyfatt is the most widely used invoicing software among Italian micro-businesses. The connector maps extracted JSON directly into Danea's import format, closing the gap between AI extraction and daily bookkeeping.

🛠️

Comprehensive Documentation

Full installation guide, API reference, adapter development tutorial, and troubleshooting playbook. Published as a searchable docs site — because great software without great docs is just a puzzle.

🛠️

Dashboard & Extraction Review UI

A lightweight web dashboard where operators can review extractions side-by-side with the original document image, approve or correct fields, and trigger re-processing. The corrections feed directly back into Progressive Learning.

Q3 2026

v2.8 — Scalability

July – September 2026. DataUnchain learns to handle volume. Batch processing, per-supplier intelligence, multi-language support, and an upgraded API designed for system integrators.

📋

Advanced Auto-Learning

Progressive Learning v2 introduces automatic confidence scoring. When the model's confidence on a field drops below a configurable threshold, the extraction is routed to human review. Approved corrections are incorporated continuously, creating a flywheel that makes the system smarter with every batch.

📋

Per-Supplier Fine-Tuning

Instead of a single global model, the system can now maintain lightweight per-supplier LoRA adapters. If a particular supplier uses an unusual layout, the VLM learns that layout specifically — without forgetting everything else. Think of it as muscle memory for each vendor's quirks.

📋

Batch Processing Engine

Queue hundreds of documents at once. The engine distributes work across available GPU resources, parallelises extraction, and reports results via a progress webhook. Perfect for month-end invoice dumps or bulk migrations from paper archives.

📋

Automatic Document Classification

Before extraction even begins, the VLM classifies the incoming document: invoice, credit note, delivery note, purchase order, receipt. The correct extraction schema is selected automatically, eliminating the need for manual routing.

📋

Multi-Language Support

Extend extraction beyond Italian. English, German, French, and Spanish invoices are supported out of the box, with language auto-detection. Critical for Italian companies with international suppliers or for expanding DataUnchain into new markets.

📋

REST API v2 & Webhooks

A redesigned API with versioning, API key management, rate limiting, and proper OpenAPI documentation. Webhook callbacks support multiple endpoints per tenant. Designed so system integrators can embed DataUnchain into larger automation workflows without friction.

📋

TeamSystem & Zucchetti Connectors

Two more heavyweight ERP connectors for the Italian market. TeamSystem and Zucchetti together cover the majority of Italian SMEs and accounting firms. Both connectors follow the same adapter pattern established in v2.1, keeping the architecture clean and testable.

📋

Audit Log & Traceability

Every extraction, correction, approval, and export is logged with timestamps, user identity, and before/after values. Designed for compliance-sensitive environments where you need to prove exactly what happened to every document.

Q4 2026

v3.0 — Enterprise

October – December 2026. The version where DataUnchain becomes enterprise-grade. Multi-tenant architecture, a connector marketplace, hybrid OCR fallback, digital signature workflows, and analytics.

📋

Connector Marketplace

An open marketplace where third-party developers and system integrators can publish, share, and sell ERP adapters. We provide the SDK, the testing sandbox, and the distribution channel. The community builds the long tail of integrations we could never cover alone.

📋

Multi-Tenant Architecture

A single DataUnchain instance can now serve multiple organisations with complete data isolation. Each tenant gets its own extraction schemas, learning history, user roles, and API keys. Ideal for accounting firms managing dozens of client companies from one appliance.

📋

Hybrid OCR Fallback

For edge cases where our proprietary VLM cannot confidently extract a field — think severely damaged documents or handwritten notes on thermal paper — a secondary OCR engine kicks in as a fallback. The system merges both outputs and surfaces the highest-confidence result.

📋

Digital Signature Workflow

After extraction and validation, documents can be routed through a multi-step approval workflow and digitally signed. Supports Italian CAdES and PAdES signature formats, integrated with major Italian qualified trust service providers.

📋

Analytics Dashboard

Real-time visibility into extraction volume, accuracy rates, processing times, error distributions, and per-supplier performance. Exportable charts help operations teams identify bottlenecks and justify ROI to management. Think of it as the control tower for your document pipeline.

📋

ISO 27001 Alignment

Full documentation and controls alignment with ISO 27001 information security standards. Encryption at rest, encryption in transit, role-based access control, penetration testing, and a formal incident response plan. The goal: make procurement teams at large enterprises comfortable signing off on DataUnchain without months of back-and-forth.

📋

RBAC & SSO

Granular role-based access control with predefined roles (admin, operator, auditor, read-only) plus custom role creation. Single Sign-On via SAML 2.0 and OpenID Connect for seamless integration with corporate identity providers like Azure AD, Okta, and Google Workspace.

📋

High-Availability Deployment

Kubernetes-native deployment with horizontal scaling, health checks, and automatic failover. For organisations that process thousands of documents per day and cannot afford downtime. Helm charts and Terraform modules included.

2027 Vision

Beyond v3.0 — The Bigger Picture

Looking further ahead. These are strategic directions, not committed features. They represent where we believe the market is going and where DataUnchain needs to be.

☁️

Hybrid SaaS Model

A managed cloud option for businesses that want DataUnchain without managing hardware. Your data stays encrypted and isolated; the VLM runs on dedicated GPU instances in European data centres. On-premise remains the default for privacy-first customers.

🇮🇹

Italian-First Fine-Tuned Model

A purpose-built VLM trained from scratch on hundreds of thousands of Italian fiscal documents. Not a generic multilingual model with Italian bolted on, but a model that thinks in Italian accounting terminology, understands Italian tax codes natively, and handles the idiosyncrasies of Italian invoice layouts out of the box.

🏛️

PagoPA Integration

Direct integration with Italy's PagoPA digital payments platform. Extract payment notices, reconcile them against open invoices, and trigger payments — all within a single automated workflow. A natural extension for public administration use cases.

🏥

Vertical Industry Packages

Pre-configured extraction schemas and adapter bundles for high-value verticals: healthcare (medical reports, lab results), logistics (bills of lading, customs declarations), construction (SAL certificates, material receipts), and legal (court filings, notarial deeds).

🤝

VAR & Reseller Partnerships

A partner programme for Value-Added Resellers, system integrators, and accounting software vendors. Partners get training, co-marketing support, volume licensing, and revenue sharing. The fastest path to scaling distribution across Italy and eventually across Europe.

🧠

Agentic Document Workflows

Move beyond extraction into full document orchestration. The AI agent autonomously decides the next step: extract, validate, route for approval, match against purchase orders, flag anomalies, and push to ERP — all without human intervention except for edge cases that truly require judgment.

Community-Driven

How You Shape the Roadmap

DataUnchain is not built in a vacuum. Every feature on this page exists because a real business told us they needed it. The best way to influence what we build next is to become an early adopter.

Early adopters get a direct communication channel with the founder and the engineering team. When you hit a limitation, we hear about it the same day. When you request a feature, it goes straight into our prioritisation matrix with real-world context attached. No feature-request black holes, no six-month silence. This is how good software gets built.

01

Use It Daily

Install DataUnchain, feed it your real invoices, and let us know what breaks. Real usage data is worth more than a hundred surveys.

02

Share Feedback

Every correction you make teaches the system. Every bug you report prevents it from happening to the next user. Every feature request shapes the next quarter.

03

Co-Create

Join monthly roadmap calls. Vote on feature priorities. Beta-test new capabilities before anyone else. Your name in the changelog.

Apply for Early Access →

Timeline at a Glance

A bird's-eye view of every milestone from now to 2027.

LIVE March 2026

v2.1 — Foundation

Proprietary VLM extraction, mathematical validation, Fattura Elettronica adapter, Progressive Learning v1, Docker appliance, hot-folder and REST API, multi-schema config, RPA vision engine.

IN PROGRESS Q2 2026

v2.5 — Consolidation

Early Adopter programme, one-click onboarding wizard, fine-tuning pipeline v1, Danea Easyfatt connector, comprehensive documentation, extraction review dashboard.

PLANNED Q3 2026

v2.8 — Scalability

Advanced auto-learning, per-supplier fine-tuning, batch processing, automatic document classification, multi-language support, REST API v2, TeamSystem and Zucchetti connectors, audit log.

PLANNED Q4 2026

v3.0 — Enterprise

Connector marketplace, multi-tenant architecture, hybrid OCR fallback, digital signature workflow, analytics dashboard, ISO 27001 alignment, RBAC and SSO, high-availability deployment.

VISION 2027

Beyond v3.0

Hybrid SaaS model, Italian-first fine-tuned VLM, PagoPA integration, vertical industry packages, VAR and reseller partnerships, agentic document workflows.

Frequently Asked Questions

Common questions about the roadmap and what comes next.

Is this roadmap a binding commitment? +

No. This roadmap reflects our current plans and priorities. Dates and feature sets may shift based on user feedback, technical discoveries, and market conditions. We update this page regularly to keep it honest. If a planned feature gets deprioritised or fundamentally redesigned, we will say so openly.

Can I request a feature not listed here? +

Absolutely. The best way is to join the Early Adopter Programme and bring your use case directly to us. We evaluate every request against three criteria: how many users would benefit, how well it fits the product vision, and how feasible it is with current resources. Even if your request doesn't make the next quarter, it enters our backlog with full context.

Will the on-premise model always be available? +

Yes. On-premise deployment is a core architectural principle, not a temporary concession. Even when we introduce a managed cloud option in 2027, the self-hosted appliance will remain first-class. We believe data sovereignty is non-negotiable for many businesses, and we will never force a cloud dependency.

What hardware will v3.0 require? +

We are committed to keeping the minimum hardware requirements reasonable. Currently, DataUnchain runs on a single machine with an NVIDIA GPU (16 GB+ VRAM recommended). As we add features like batch processing and per-supplier fine-tuning, you may benefit from more VRAM (24 GB+), but the base extraction pipeline will continue to work on 16 GB cards. The Kubernetes-based HA deployment in v3.0 is optional and designed for high-volume environments.

How do you handle breaking changes between versions? +

We follow semantic versioning. Minor versions (2.1 to 2.5, 2.5 to 2.8) will not break existing integrations. The jump to v3.0 may introduce breaking changes in the API, but we will provide a detailed migration guide, a compatibility shim for the v1 API, and at least three months of parallel support. No surprises.

Can I build my own ERP connector today? +

Yes. The adapter framework in v2.1 is fully extensible. Each connector is a standalone Python module that implements a standard interface. You can build a connector for any ERP that accepts CSV, XML, SQL, or API-based imports. The formal Connector Marketplace with SDK and distribution is planned for v3.0, but nothing stops you from building and using custom adapters right now.

Limited Spots

Want to influence the roadmap?
Become an Early Adopter.

Free installation, six months at zero cost, and a direct line to the team building the product. In return, help us validate real-world edge cases and shape what comes next.

Only five spots available. We are looking for Italian businesses that process at least 200 invoices per month and are tired of manual data entry. If that sounds like you, we should talk.

Apply for Early Access → See Current Features

Where We're Headed.

How to Read This Roadmap

Shipped

In Progress

Planned

v2.1 — Current State

Proprietary VLM Extraction

Mathematical Validation Engine

ERP Adapter Framework

Progressive Learning v1

Docker-Based Appliance

Hot-Folder & REST API v1

Multi-Schema Configuration

RPA Vision Engine

v2.5 — Consolidation

Early Adopter Program Launch

One-Click Onboarding Wizard

Fine-Tuning Pipeline v1

Danea Easyfatt Connector

Comprehensive Documentation

Dashboard & Extraction Review UI

v2.8 — Scalability

Advanced Auto-Learning

Per-Supplier Fine-Tuning

Batch Processing Engine

Automatic Document Classification

Multi-Language Support

REST API v2 & Webhooks

TeamSystem & Zucchetti Connectors

Audit Log & Traceability

v3.0 — Enterprise

Connector Marketplace

Multi-Tenant Architecture

Hybrid OCR Fallback

Digital Signature Workflow

Analytics Dashboard

ISO 27001 Alignment

RBAC & SSO

High-Availability Deployment

Beyond v3.0 — The Bigger Picture

Hybrid SaaS Model

Italian-First Fine-Tuned Model

PagoPA Integration

Vertical Industry Packages

VAR & Reseller Partnerships

Agentic Document Workflows

How You Shape the Roadmap

Use It Daily

Share Feedback

Co-Create

Timeline at a Glance

v2.1 — Foundation

v2.5 — Consolidation

v2.8 — Scalability

v3.0 — Enterprise

Beyond v3.0

Frequently Asked Questions

Want to influence the roadmap? Become an Early Adopter.

Want to influence the roadmap?
Become an Early Adopter.