The challenge
Atlas Health's data science team sat on roughly nine years of admissions, lab results, and imaging metadata across ten hospitals. Each new research collaboration meant a new data-sharing agreement, a new IRB cycle, and a new pseudonymisation pass by their compliance team. The median time to a cohort was eleven weeks.
Worse, every variant they generated drifted from production: masking erased rare conditions, k-anonymity collapsed long tails, and the resulting datasets were no longer useful for any model that needed to learn from outliers. The team was caught between two failure modes — too risky to share, or too lossy to learn from.
The solution
Atlas deployed Doppelset Vault inside their own VPC, behind their existing data perimeter. Doppelset ingested the full nine-year admissions schema (47 tables, 312 foreign keys, 8 free-text columns) and trained a relational doppel with ε = 1.4.
Every Friday afternoon, the platform now refreshes a synthetic mirror of the previous week's admissions. Researchers query the synthetic dataset directly through Atlas's existing JupyterHub. No PHI ever leaves the cluster — the synthetic mirror does.
Their compliance team built a one-pager for new collaborators: a hash of the synthetic dataset, a signed quality + privacy report from Doppelset, and a copy of the IRB blanket approval. Time from first email to first query: 14 days, end-to-end.
We used to spend more time pseudonymising than analysing. Doppelset turned that on its head — researchers get realistic data on Monday, and our DPO sleeps better.
The results
Cohorts in days, not weeks
Median time from researcher request to first cohort dropped from 11 weeks to 14 days — an 80% reduction, mostly recovered from the legal and pseudonymisation passes that are now skipped entirely.
Models that still know the long tail
An internal sepsis-prediction model retrained on the synthetic mirror sat within 0.4 AUC of the production version — and crucially, preserved the rare-condition signal that previous anonymisation passes had erased.
An audit trail that defends itself
Every weekly refresh ships with a signed quality + privacy receipt. Atlas's DPO now files a single quarterly report instead of per-project DPIAs.
Real PHI, never moved
Zero gigabytes of production PHI have crossed Atlas's data perimeter since deployment. The platform is fully VPC-resident; Doppelset sees telemetry, not data.
What's running
Next story
Norden Bank — banking & fraud