Back to Research
May 8, 2026Methodology4 min read

Privacy-safe by Construction

Patient records carry protected information that cannot be open-sourced. Synthetic dialogue built by reasoning models gives clinical control without touching real data.

Meddies Research

Clinical AI research at Meddies

Privacy-safe by Construction

The best clinical conversational data would be real doctor-patient transcripts. You cannot use them. Medical records carry protected personal information, and the law rightly keeps them closed. So engineers end up sitting on capable algorithms with no conversational data to train them on.

The wrong way around the wall

The usual workaround is to scrape whatever is loosely available and scrub it afterwards. De-identification is hard, it is never perfect, and a single missed detail re-identifies a patient. Building a privacy program on top of real records means spending forever proving a negative: that nothing sensitive leaked.

We took the opposite path. Instead of removing private information after the fact, we never introduce it in the first place.

Synthetic, but disciplined

meddies-consultant is generated by reasoning models from the ground up. Every persona, every symptom, every turn of dialogue is synthetic. There is no real patient behind any record, so there is nothing to de-identify and nothing to leak.

That does not mean the data is loose. Synthetic generation done carelessly produces fluent nonsense: agreeable, medically shallow, structurally wrong. We avoid that by boxing the generator inside clinical frameworks and a review gate that filters for clinical safety. The control we gain over accuracy is the upside of synthesis. The privacy guarantee is the floor it stands on.

What "by construction" buys you

Privacy by construction is stronger than privacy by cleanup. A cleanup pipeline is only as good as its worst miss. A construction that never touches real data has no miss to make.

For Vietnamese hospitals, where patient-data handling is both a legal obligation and a trust question, that distinction matters. The dataset that trains a model can be opened, inspected, and shared without putting a single real patient at risk. That is the point.