Why an AI Note Tool Should Sound Like You — And How That Actually Works

Notes from one of our early pilot physicians, edited and published with permission. — The ActiveScribe Team

Every doctor who has tried more than one AI scribe has noticed the same thing: the notes all sound the same. Same headers, same hedging language, same flat institutional voice. You can read three notes from three different vendors and not be able to tell them apart. None of them sound like the way you actually write.

This matters more than people realize. A clinical note isn't just a record — it's communication with future-you, with consulting colleagues, with the medical-legal apparatus that may read it years from now. If the note doesn't sound like you wrote it, you don't trust it, and if you don't trust it, you rewrite it, and now the AI has saved you nothing.

I want to explain — in plain English, from a non-engineer's perspective — how a good AI scribe can match your voice without doing the thing every doctor is rightfully terrified of: training a model on patient data.

The wrong way: fine-tuning on your patients

The first thing most people imagine when they hear "personalized AI" is fine-tuning. You give the AI a thousand of your old notes, it adjusts its weights, and now it writes like you. This is the technique behind most consumer AI personalization, and it is exactly the wrong technique for healthcare.

Fine-tuning means your patients' words become part of the model. Even if the model isn't shared, even if the training data is encrypted, you've crossed a line: a system that learns from PHI is fundamentally different from one that doesn't, both legally and ethically. It's also very hard to undo. What happens when a patient asks you to delete their record and the deletion has to propagate into an opaque trained model? You don't want to be the one explaining that to a privacy commissioner.

The right way: exemplars, lexicons, and runtime context

ActiveScribe takes a different approach. There's no fine-tuning on your data. Instead, the system uses three layers of context that get injected at the moment a note is generated:

The first layer is style. When you onboard, you upload a small number of your old notes — anonymized, scrubbed of patient identifiers — as examples. The system extracts your style from those examples: how dense your abbreviations are, how you structure section headers, whether you're terse or verbose, how you hedge uncertainty. This is called exemplar-based prompting in the technical literature, and it's a fundamentally different technique from fine-tuning. Your examples are shown to the model when it writes your note, not baked into the model's weights.

The second layer is your specialty. A psychiatric note has different requirements than an emergency department note than a pediatric well-baby visit. ActiveScribe maintains a set of specialty-specific rules — what sections to include, what to never omit, what timestamps matter — and overlays them on top of your style. This means a pediatrician's notes follow pediatric conventions while still sounding like that specific pediatrician.

The third layer is non-negotiable safety. No matter what your style says or what specialty you practice, certain rules can't be overridden: no hallucinated symptoms, no medication errors, no fabricated history. These rules sit at the top of the prompt stack and a separate set of validators checks the generated note against them before you ever see it.

Why this architecture matters for trust

There are a few things this approach gets you that fine-tuning wouldn't.

You can change your mind. If you decide tomorrow that you want your notes to be terser, you replace your exemplars and the system adapts on the next visit. There's no retraining, no waiting, no model versioning to manage. Your style is configuration, not code.

Your patients are not in the model. A model trained on your patients is permanently changed by them. A model that uses runtime context is the same model for everyone — your customization happens in the prompt, not in the weights. When a patient asks for their data to be deleted, the deletion is real, not aspirational.

Errors are inspectable. If a generated note has a problem, the team can look at the exact prompt that produced it — your style layer, your specialty layer, the safety layer, the transcript itself. Compare that to debugging a fine-tuned model, which is closer to debugging a memory than debugging a program. When something goes wrong with a clinical AI tool, "we don't really know why" is not an acceptable answer.

It's compatible with how regulators think. Medical data regulators are still figuring out how to govern AI in healthcare, but the consensus is forming around a key principle: clinical AI should be auditable, deletable, and free of hidden state. An exemplar-and-context architecture is all three. A fine-tuned model is none of them.

What this means in practice

When I sit down to review an ActiveScribe-generated note, it sounds like a note I would have written. Not a generic AI version of me — me. The abbreviations I use. The sections I always include. The phrasing I default to when describing a pediatric viral illness for the hundredth time. The system isn't pretending to be me; it's reflecting what I've shown it about how I write, in the moment it writes a note. When I change how I write, the system changes with me. When I delete a patient, the patient is gone.

I'm not sentimental about technology. I'm a working doctor in a busy practice. But when I think about which AI tools I want anywhere near my patients, the question I ask isn't "which one is most accurate" — it's "which one can I trust to be wrong in ways I can understand." Exemplar-based AI is a fundamentally different bet than fine-tuned AI, and for healthcare, it's the right one.

The wrong way: fine-tuning on your patients

The right way: exemplars, lexicons, and runtime context

Why this architecture matters for trust

What this means in practice

Want notes that sound like you?