The feature shelf

Everything in Doppelset, on one page.

Pick a row. Open a feature. Or skim the lot — we hide nothing, we oversell nothing. If a checkmark isn't here, the feature isn't built yet.

Default-on

Privacy controls

ε ∈ [0.1, 12]

Differential privacy

Add calibrated noise to gradients during training. Pick your ε per dataset; we'll report the budget consumed and warn before you exceed it.

60+ entities

Direct identifier vault

60+ entity types across 32 languages: names, emails, IBANs, ICD-10 codes, GPS coordinates, NIE numbers… stripped before model sees them.

k ≥ 30

Quasi-identifier protection

k-anonymity, l-diversity, and t-closeness checks run on every output. Records that look unique against background knowledge get re-sampled.

12 attacks

Memorisation test suite

Every generation runs a battery of attribute-disclosure and membership-inference attacks. The signed report includes pass/fail per attack.

Hybrid engine

Modelling

core

Tabular transformer

A masked-attention model that learns mixed categorical, ordinal and continuous columns jointly — no one-hot blow-up, no embedding hand-tuning.

default

Diffusion sampler

Iterative denoising at sample time gives sharper tails and rarer regimes than GAN-only approaches. Picked automatically for numerical-heavy tables.

multi-table

Relational coordinator

A second-pass model enforces foreign-key cardinality, parent/child structures, and join distributions across multi-table generations.

new

Time-series module

Captures seasonality, drift, change-points, and rare anomalies in a single state-space model. Generates irregular series too.

You can trust

Quality & evals

0–100

Statistical fidelity score

A single number, 0 → 100, that summarises 80+ tests: marginals, correlations, conditional distributions, downstream ML utility.

Distribution diff

Per-column overlap plots with KL/Wasserstein/JS distances. Inline annotations explain why a column drifted.

TSTR

Utility (TSTR) benchmark

Train-on-Synthetic, Test-on-Real benchmark across 14 model families. Pick a task — classification, regression, churn — get a number.

audit-ready

Signed run receipts

Every output ships with a JSON receipt cryptographically signed by Doppelset. Hand it to your auditor, paste it in your DPIA.

Pipe-shaped

Integration

py 3.9+

Python SDK

pip install doppelset · type-checked · async-first · streams sample() output so 4 B rows never live in RAM.

TypeScript SDK

@doppelset/sdk on npm. Works in Node, Bun, Deno, and Cloudflare Workers. ESM first.

OpenAPI

REST API

OpenAPI 3.1, OAuth 2.1, idempotency keys, retries, server-sent events for long-running jobs.

12 stores

Connectors

First-class connectors for Snowflake, Databricks, BigQuery, Redshift, Postgres, MySQL, S3, GCS, Airflow, dbt, Kafka, MongoDB.

Enterprise

Governance

Okta + 5

SSO + SCIM

Okta, Entra, Google, JumpCloud, OneLogin. Automatic provisioning, role propagation, just-in-time access.

Roles & projects

Granular RBAC. Synth-only seats for analysts, viewer seats for compliance officers, admin for platform owners.

SIEM

Audit log

Every generation, every config change, every download — append-only, hash-chained, exportable to your SIEM.

EU · US · VPC

Data residency

EU (Madrid, Frankfurt), US (Virginia, Oregon), or your own VPC. Data never leaves the region you pick.

Try it now

Ship faster. Stop arguing with legal.

Generate your first 100,000 synthetic rows in the next ten minutes. No credit card.