Back to Blog
May 8, 2026Perspective4 min read

Privacy-safe by Construction

Patient records carry protected information that cannot be open-sourced. Synthetic dialogue built by reasoning models gives clinical control without touching real data.

Meddies Research

Clinical AI research at Meddies

Privacy-safe by Construction

The best clinical conversational data would be real doctor-patient transcripts. You cannot use them. Medical records carry protected health information (PHI), and the law rightly keeps them closed. So teams end up with capable models and no conversational data to train them on.

Why cleanup after the fact falls short

The usual workaround is to collect whatever data is loosely available and scrub it afterwards. De-identification is hard, it is never perfect, and a single missed detail re-identifies a patient. Building a privacy program on top of real records means forever trying to prove that nothing sensitive leaked, which is a claim no cleanup can fully guarantee.

When real records are unavoidable, de-identification is still the tool, and it is what Meddies PII is built for. But when the data can be generated, we take the opposite path. Instead of removing private information after the fact, we never introduce it in the first place.

How we keep synthetic data accurate

Meddies Consultant is generated by reasoning models. Every persona, every symptom, every turn of dialogue is synthetic. There is no real patient behind any record, so there is nothing to de-identify and nothing to leak.

That does not mean the data is loose. Synthetic generation done carelessly produces fluent nonsense that is agreeable, medically shallow, and structurally wrong. We avoid that by constraining the generator inside clinical frameworks and a review gate that filters for clinical safety. Because no real patient is ever involved, privacy is guaranteed without further work, so the work goes into accuracy.

Why privacy by construction is stronger

A cleanup pipeline is only as good as its worst miss. A pipeline that never touches real data has no miss to make. That is why privacy by construction is stronger than privacy by cleanup.

For Vietnamese hospitals, where patient-data handling is both a legal obligation and a trust question, the difference between cleanup and construction is what decides deployability. The dataset that trains a model can be opened, inspected, and shared without putting a single real patient at risk, which is what makes it usable inside a hospital.