The claim that writing the guidelines and training the AI from the same source produces superior results sounds elegant. It isn’t.
MCG Health has been making the rounds with a compelling narrative about their Synapse product: their physicians wrote the guidelines, their physicians trained the AI, and therefore the interpretation of those guidelines is uniquely faithful and accurate. It’s a clean pitch. It’s also built on several assumptions that deserve serious scrutiny.
Let me walk through the core claims and why, from a clinical informatics perspective, the argument is significantly weaker than it appears.
Claim 1: IP guidelines are the “gold standard” for inpatient decisions
This framing positions MCG criteria as if they were some proprietary clinical oracle — elusive, uniquely authoritative, irreplaceable. They are not. Inpatient criteria, whether MCG, InterQual, or any other set, are fundamentally sequential clinical decision frameworks: a structured set of scenarios designed to determine whether a patient’s condition is severe and acute enough to warrant inpatient-level care.
They are good tools. They are not magic. Every set of guidelines carries its own blind spots, weighting choices, and gaps. The very act of condensing complex clinical medicine into a structured decision tree introduces reductions and edge cases that no single vendor has solved perfectly.
“Gold standard” is a marketing claim. Clinically, it means “the tool our payers happen to use most often.” That’s a market position, not a clinical designation.
Acknowledging this matters because it reframes the question: the issue for health systems and payers isn’t which guidelines are “gold,” it’s which platform can accurately interpret whichever guidelines their contracts require — today and as those contracts evolve.
Claim 2: The same source for criteria and AI produces more accurate matching
This is the centerpiece argument, and it contains a logical flaw that’s difficult to overlook: if the original source has structural weaknesses, those weaknesses transfer directly to the AI that was trained on it.
Closed-loop reasoning is not a quality guarantee. It is a fidelity guarantee — the AI will faithfully reproduce however the guidelines were written, including however those guidelines are ambiguous, inconsistent, or clinically incomplete.
THE ACHILLES’ HEEL
“Despite observation level of care” — one of the most consequential phrases in inpatient criteria — has no standard clinical definition across conditions, settings, or payers. It appears frequently in MCG guidelines. There is no clear, consistent, structured way to capture what it means in real-world clinical documentation. An AI trained faithfully on language that is itself ambiguous will produce confidently ambiguous outputs.
To their credit, MCG’s own white paper frames Synapse as a tool to improve efficiency and consistency in level-of-care decisions — not a claim of superior accuracy per se. That is a more honest and defensible position. But it raises a problem they don’t address: consistency is only valuable if the underlying logic is sound. A tool that consistently applies criteria containing undefined terms doesn’t reduce ambiguity — it standardizes it. Reviewers still have to resolve the same clinical judgment calls that the criteria leave unanswered, just with an AI-generated summary in front of them instead of the raw chart. That is a workflow improvement. It is not a reasoning improvement.
THE CONSISTENCY TRAP
Consistent application of ambiguous criteria does not produce consistent outcomes — it produces consistently uncertain ones. The hard clinical judgment that determines whether a patient truly failed observation-level care cannot be automated away by surfacing the same ambiguous language more efficiently.
Claim 3: Physician authorship translates to AI training expertise
This is perhaps the most generous assumption in the entire argument. The physicians who authored MCG guidelines did so based on clinical knowledge and evidence synthesis — a genuinely skilled task. But writing structured criteria is categorically different from training AI against unstructured clinical records.
Real-world clinical documentation does not look like criteria. Here is how a hospitalist actually documents a complex cardiovascular picture:
BP trending down overnight, 94/60 this AM. On 2L NC, sats 94%. Tachycardic to 108. Will refrain from further IV fluid administration given CHF. Will discuss with cards.
An MCG criterion, by contrast, calls for “evidence of hemodynamic instability as defined by specific thresholds.” Notice what the note does not say: it never uses the word “unstable.” The physician is actively managing competing risks — volume status against cardiac function — and communicating that reasoning in shorthand that any hospitalist would immediately understand. But there is no explicit criteria-matching language anywhere in it.
Bridging that gap — from a physician’s compressed, context-dependent clinical reasoning to the structured logic of a guideline — is where the actual difficulty lives. Writing the criteria does not confer any special ability to teach a model to do that. Those are different skills, and conflating them is a significant leap.
The core challenge they’re not talking about
Here is what I believe is the hardest problem in this entire space, and it’s one that MCG’s positioning quietly sidesteps:
Criteria are written in the broadest possible clinical language. “Patient is hemodynamically unstable.” “Evidence of altered mental status.” “Failure to respond to outpatient therapy.” These phrases are intentionally broad to capture a wide range of clinical presentations.
But an AI operating on clinical records has to work in the opposite direction — decomposing a vague clinical statement into its underlying data points: specific vitals, lab trends, medication response patterns, nursing documentation, imaging findings. That decomposition task is extraordinarily difficult. The distance between “patient is unstable” as a criterion and the constellation of discrete, scattered, physician-shorthand data points that actually constitute instability in a real chart is enormous — and highly variable across conditions, specialties, and documentation styles.
Guidelines are sequential frameworks, not clinical oracles
Every guideline set — MCG, IQ, or otherwise — is a structured reduction of clinical complexity. None is comprehensive, and each carries its own gaps and edge cases.
Closed-loop reasoning amplifies source flaws, not just source strengths
Training AI on your own guidelines means the AI inherits every ambiguity in those guidelines — including undefined terms like “despite observation level of care.”
Consistency ≠ accuracy
Standardizing the application of ambiguous criteria doesn’t resolve the ambiguity — it just makes it more uniform. The hard clinical judgment calls remain exactly where they were.
Criteria authorship ≠ AI training expertise
Writing structured clinical logic and training models to extract signals from unstructured EHR documentation are fundamentally different disciplines.
The decomposition problem is the hardest part
Mapping broad criteria language (“patient is unstable”) to specific, heterogeneous clinical data points across thousands of EHR documentation styles is where the real technical challenge lives.
Flexibility across guidelines is the durable advantage
Health systems and payers operate across multiple guideline contracts simultaneously. Depth within a single ecosystem is a constraint, not a feature, when the market demands adaptability.
What this means for the market
MCG Synapse’s positioning is a depth play masquerading as a quality play. Depth within a single guideline ecosystem is real and defensible — but it is not the same as superior clinical accuracy, and it is certainly not a scalable advantage in a market where guideline contracts shift, multi-payer environments are the norm, and the real competitive differentiator is how well a platform handles the messy, ambiguous, documentation-dependent reality of inpatient medicine.
The question payers and health systems should be asking is not “who wrote the guidelines?” It’s: “How does this platform perform when a hospitalist documents competing hemodynamic risks in three lines of shorthand without ever using the word ‘unstable’? How does it handle the dozens of ways different physicians across different specialties document the same clinical deterioration?” That’s the test. That’s where differentiation actually lives.
The most honest version of the MCG argument is: “We have deep expertise in our own criteria.” That’s a fair and credible claim. The overreach is implying that expertise in writing criteria translates to superior AI performance in the wild — where clinical records are written by humans, under time pressure, in shorthand, across dozens of specialties, without any awareness of the criteria they’ll eventually be matched against.
Aswani “Ash” Suthrave, MD, MBA, MHA, FACHE is a board-certified physician executive, practicing hospitalist, and Principal of Suthrave & Associates, LLC — a physician-led consulting firm specializing in medicolegal analysis, medical record review, expert witness services, utilization management, and healthcare strategy. He works with plaintiff and defense attorneys, legal teams, and healthcare organizations nationwide.

Your article does a great job of touching on the reality that developing these frameworks is far more complex than most realize, and we must be careful not to put blinders on during the process else the outcomes will continue to fall short. AI in healthcare requires a highly critical, introspective approach to truly address these hidden nuances. Great read!