# Mildport — Every customer file arrives safely at the right port.

> The embeddable import wizard that turns messy CSVs, spreadsheets, PDFs — even scans — into exactly the records your product expects. EU-hosted cloud or self-hosted in your own infrastructure. Deterministic. Explainable. Yours.

Canonical: https://mildport.com/ · Agent index: https://mildport.com/llms.txt
Source of truth: this file mirrors the landing sections in `apps/landing/src/app/sections/`. Last updated: 2026-07-02.

**Guarantees:** EU cloud (built in Germany, GDPR) or runs in your VPC · Offline signed licenses · No black-box decisions · WCAG 2.2 AA review grid · Tenant #1 is our own CRM

Built by **Capitality** — and run in production by Capitality as tenant #1: the same widget, the same public API, the same license gates. We deleted our own importer to get here (the cutover commit: 142 files, +343 −12,236).

## Everything an importer owes you

- **Reads almost anything.** CSV, TSV, XLSX, ODS, JSON, XML — plus PDF table extraction, ZUGFeRD-style document facts, and OCR for scans and photos. Async jobs handle the heavy files.
- **Deterministic first, AI when it earns it.** Headers and value shapes scored by an explainable engine — same input, same answer. An optional, evidence-gated AI judge re-ranks only the uncertain tail: bring your own model, rationale shown in the grid, off by default.
- **Remembers every mapping.** Confirmed mappings persist per template and fingerprint; learned header aliases improve future matches. Your customer maps a file once.
- **Multi-file imports.** Contacts in one file, companies in another? Drop both, join on a key — left or inner — and import a single coherent dataset.
- **A review grid that fixes.** Inline editing, per-cell validation, error-row export, your own cleaning hooks and step gates — and a WCAG 2.2 AA accessible grid mode.
- **References that resolve.** Rows can point at records that already exist in your product. Mildport resolves them against your datasets — chips, pickers, graph apply.
- **Your brand, not ours.** Light-DOM theming over `--mildport-*` tokens: any design system, dark mode included.
- **Delivery you can audit.** HMAC-signed apply webhooks with durable retries and a delivery log — or browser-mode callbacks. Usage metering and retention policies built in.

## Proof on hard files

The browser demo stays small so it can run entirely on the page. The full engine is built for the files buyers test in pilots:

- **Scanned PDFs and photos.** OCR turns image-only PDFs, receipts, screenshots, and camera photos into reviewable text, tables, or fact cards with confidence signals.
- **Messy spreadsheets.** XLSX/ODS decode through sidecars; CSV, TSV, JSON, XML, Markdown, and HTML tables run through deliberate parsers.
- **Mixed documents.** Meeting notes, forms, emails, invoices, and raw text can become proposed records: contacts, organizations, tasks, invoices, leads, or host-defined targets.
- **Human confirmation.** Mildport proposes mappings, facts, rows, or linked records. The user confirms, edits, or rejects the plan before anything is applied.

## Current product status

| Status                 | What it means                                                                      | Examples                                                                                                                                                                    |
| ---------------------- | ---------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Ready today            | Production-shaped capabilities already present in the codebase and self-host stack | CSV/TSV/JSON/XML/Markdown/HTML table parsing; XLSX/ODS sidecar; PDF table extraction; OCR sidecar; review grid; HMAC webhooks; Docker Compose; Helm chart; offline licenses |
| Pilot / design partner | Workflows to validate on hard customer data before self-serve packaging            | Capture-first mixed documents; proposed linked records; AI-assisted fact extraction with host-defined targets; hard-file calibration                                        |
| Roadmap                | The next layer for large-scale operation and self-serve purchase                   | Smart joins; huge XLSX joins server-side; air-gap runbooks; self-serve licensing; recurring imports; MCP server                                                             |

## Self-host operations

Self-hosting should feel boring:

- **Deploy.** Docker Compose today, or Helm for the engine and decode sidecars in your own cluster.
- **Store.** Mongo for import state; S3 or MinIO-compatible blob storage for durable file handling.
- **Secure.** Offline EdDSA licenses, no license phone-home, HMAC-signed apply deliveries, SSRF guards around host datasets.
- **Operate.** Health checks, preflight checks, retry queues, usage counters, retention controls, and delivery audit records.
- **Control AI.** Deterministic by default. If AI is enabled, customers bring the model endpoint and choose the deployment ceiling and tenant mode.

## Concrete workflows

- **CRM onboarding.** Import messy contact and company files, join them, validate fields, fix errors, then apply clean contact and organization records to the host CRM.
- **Document-to-record capture.** Turn meeting notes, voice transcripts, business-card photos, forms, screenshots, or emails into proposed contacts, organizations, leads, tasks, and linked records.
- **Finance and operations intake.** Extract vendor, amount, date, line items, references, and confidence signals from invoices, receipts, scans, or supplier exports, then apply confirmed records to the target system.

## How it works — one element, four moments

1. **Drop in the element.** An Angular 21 custom element that works in React, Vue, Svelte or plain HTML. Attributes in, DOM events out — no SDK lock-in, no iframe.
2. **Your customer uploads anything.** CSV, TSV, XLSX, ODS, JSON, XML, PDFs with tables, scans and photos via OCR. Multiple files at once — joined on a key into one dataset.
3. **Mildport matches, they confirm.** Deterministic header + value matching with visible confidence, validation on every cell, inline fixes in the review grid, and your own cleaning hooks — client-side, before anything is delivered.
4. **You receive clean rows.** An HMAC-signed apply webhook with retries and a delivery audit — or an `onResults` callback in the browser. Either way: rows shaped exactly like your schema, with the mapping that produced them.

```ts
import { defineImportSuiteElement, mountImportFlow } from '@capitality-io/mildport-widget';

await defineImportSuiteElement('mildport-import'); // framework-agnostic

<mildport-import
  api-base-url="https://imports.your-infra.example"
  license-key={SIGNED_TENANT_KEY}
></mildport-import>

mountImportFlow(el, {
  id: 'my-import',
  targets: [
    {
      id: 'contact',
      label: 'Contact',
      fields: [
        { key: 'person.email', label: 'Email', columnType: 'email' },
        { key: 'person.firstName', label: 'First name' },
      ],
    },
  ],
  delivery: { mode: 'browser' },
});
el.addEventListener('import-applied', sync); // or a signed apply webhook
```

Published as `@capitality-io/mildport-widget` (React wrapper: `@capitality-io/mildport-react`) on GitHub Packages — restricted access, included with a pilot/license.

## Why self-hosted — the quadrant the cloud importers can't take

Flatfile, OneSchema and Dromo are excellent — and they're someone else's cloud. Mildport is the import engine for the deals where that's the dealbreaker: regulated data, security reviews, procurement that reads the architecture diagram.

- **Runs in your infrastructure.** A self-contained Docker Compose stack — service, Mongo, decode sidecars, optional MinIO. Or deploy the engine into your own cluster with the Helm chart and self-host preflight checks.
- **Licenses verify offline.** Signed EdDSA license keys checked against a public key you deploy — no license server, no phone-home, air-gap friendly. Issue per-tenant keys with one CLI.
- **Explainable, reviewable.** Deterministic matchers with visible scores, reviewable normalization steps, webhook deliveries with an audit trail. When compliance asks "why this mapping?" — there's an answer.
- **AI as an accelerant, never a black box.** An optional AI judge re-ranks the low-confidence tail in hybrid mode, must cite evidence, and starts in shadow — it goes live per tenant only after its suggestions measurably agree with what your people accept. Bring your own model endpoint so AI traffic stays under your deployment controls, including in-VPC when required. Off by default, every decision logged.

## Shipped (from the repo's changelog, recent first)

- **ai 1.0 — AI mapping, deterministic-first, trust earned.** Evidence-gated judge re-ranks the low-confidence tail. Shadow-first per tenant, BYO model, decision log, AI picks tinted in the grid with rationale on hover.
- **Cutover — Capitality deleted its own importer.** Our CRM consumes the standalone Mildport service over REST as tenant #1.
- **v0.2.x — Multi-file imports.** Join contacts + companies on a key pair, left or inner, with join-health feedback.
- **v0.2.6 — Dark mode + theme polish.** Spartan-aligned `--mildport-*` tokens with light/dark presets.
- **v0.2.2 — OCR.** Image-only PDFs and receipt photos become reviewable text, tables, or fact cards with confidence signals and a mandatory extract-confirm step.
- **v0.2.x — PDF ingestion, both kinds.** Embedded tables extracted server-side; document-style PDFs (ZUGFeRD invoices) map grounded facts with a pdf.js preview.
- **v0.2.0 — Reference resolution.** Imported rows resolve against existing records — status chips, ambiguity pickers, graph apply, SSRF-guarded server datasets.
- **v0.1.0 — Value-based detection.** Value shapes (emails, IBANs, dates, phones) vote alongside header names.
- **Engine — self-host stack + signed everything.** Docker Compose with Mongo, decode sidecars, optional MinIO; HMAC-signed deliveries with audit; EdDSA offline licenses with revocation lists and per-plan entitlements.
- **Packages — Angular + React published.** `@capitality-io/mildport-widget` and `import-suite-react` on GitHub Packages, semver-pinned by hosts.

## Roadmap (published openly — NOT yet generally available)

| Status    | Item                                                      |
| --------- | --------------------------------------------------------- |
| Up next   | Smart joins + server-side XLSX joins                      |
| Up next   | Per-tenant AI config for the hosted tier                  |
| In design | AI transforms as reviewable diffs — never silent rewrites |
| Planned   | Vue + vanilla wrappers, headless REST mode                |
| Planned   | Helm hardening + air-gap guide                            |
| Planned   | Auditor-ready decision log for the whole import           |
| Planned   | Web-worker validation for huge sheets                     |
| Planned   | Self-serve licensing + sandbox tier                       |
| Horizon   | Pipelines: recurring imports (SFTP, S3, inbound email)    |
| Horizon   | MCP server — agent-drivable imports                       |

## Contact

- Pilots & licensing: licensing@mildport.com
- Security reports: security@mildport.com (see https://mildport.com/.well-known/security.txt)
- The live demo at https://mildport.com/ runs in-browser; uploaded files never leave your machine.

© 2026 Capitality. Proprietary software — all rights reserved.
