The product

One platform. Every kind of data.

Tabular, time-series, multi-table relational, free-text, geospatial — Doppelset learns it all and synthesizes a faithful clone.

The shape of a doppel

Three editions

One platform. Three ways to land it in your team.

Buy the SDK and ship in an afternoon. Bring Studio for the rest of the org. Run Vault when regulators are watching.

Doppel Studio

The notebook for non-engineers.

  • Point-and-click data connections
  • One-screen quality preview
  • Compliance officer mode (signed reports)
  • Org-wide governance and lineage

Doppel Engine

The Python SDK and REST API.

  • doppelset 0.42 — pip install doppelset
  • Streaming generation for billions of rows
  • Composable hooks for evals & guardrails
  • Plug into Airflow / dbt / Databricks

Doppel Vault

Self-hosted, air-gapped, regulated.

  • Runs on your VPC or on-prem cluster
  • Hardware-backed key management
  • BYOK / HSM / KMIP integrations
  • Tenants, RBAC, audit logs
Coverage

Every shape of data your team works with.

From a 200-row CSV to a 4-billion-row Parquet warehouse. From IoT to ICD-10. Doppelset learns it all.

Tabular

Flat tables, wide or long. Doppelset learns mixed continuous, categorical, ordinal and high-cardinality columns simultaneously.

CSVParquetPostgresSnowflake

Relational

Star schemas, snowflakes, parent/child joins. Foreign keys, cardinalities and check constraints are preserved end-to-end.

DDL importForeign keysCycles

Time-series

Hourly, daily, irregular. Seasonality, drift, regime shifts and rare anomalies are modelled together — not as add-ons.

ParquetKafkaInflux

Geospatial

Latitudes, postal codes, polygons, trajectories. Synthesised so density and movement patterns survive, but no person does.

GeoJSONWKTH3

Free-text

Short fields (addresses, ticket summaries) and longer documents (support transcripts, medical notes) — language-aware, PII-stripped.

32 languages

Event streams

Clickstreams, telemetry, transaction logs. Doppelset preserves session structure, dwell time, and conversion funnels.

JSON LinesAvro
Under the hood

The architecture of a Doppelset run

Five layers, one job: produce statistically faithful data without ever serving a real record back to the model.

  1. 01

    Profile

    Doppelset inspects the schema, infers types, detects PII, and maps relationships.

  2. 02

    Train

    A hybrid model (tabular transformer + diffusion) learns the joint distribution of columns and rows.

  3. 03

    Govern

    Differential-privacy noise is added to gradients. Memorisation tests run automatically.

  4. 04

    Sample

    New rows are sampled, reshaped to your schema, and streamed to your sink of choice.

  5. 05

    Report

    A signed quality + privacy report is attached to every output so your auditor can verify the run.

1from doppelset import Doppelset
2client = Doppelset()
3
4# Train on a 3-table relational schema
5twin = client.train(
6 source={
7 "customers": "postgres://prod/customers",
8 "transactions": "postgres://prod/transactions",
9 "tickets": "postgres://prod/tickets",
10 },
11 relations="auto", # learn the foreign keys
12 privacy={"epsilon": 1.0}, # mathematical privacy budget
13)
14
15# Sample a 10x scaled-up version
16out = twin.sample(scale=10, balance={"is_fraud": 0.5})
17
18# Inspect the run
19print(out.report.fidelity, out.report.epsilon)
Real example

A three-table doppel in one notebook cell.

The Engine learns the joint distribution of three tables (with foreign keys) and lets you scale, balance, and re-sample in one call.

3 tables4 FK relationsε = 1.010× scale-upfraud-rebalanced
Try a doppel now
regulated industries

Built for the teams that get audited.

Doppelset is the only synthetic-data platform with signed quality and privacy reports, deterministic re-runs, and a tamper-evident run log. We help you say yes to data sharing — with a receipt.

SOC 2 Type II
ISO 27001
GDPR Art. 25
HIPAA Safe Harbor
DORA-aligned
EU AI Act ready
Try it now

Ship faster. Stop arguing with legal.

Generate your first 100,000 synthetic rows in the next ten minutes. No credit card.