Artificial intelligence is transforming healthcare at a pace that would have seemed implausible a decade ago. Natural language processing tools summarize discharge notes. Predictive models flag deteriorating patients before nurses recognize the signs. Generative AI drafts prior authorization letters in seconds.
And yet, in utilization management — one of the highest-stakes, highest-volume functions in American healthcare — AI is repeatedly getting it wrong.
Not wrong in the sense of technical failure. Wrong in the sense of solving the wrong problem.
The Seduction of Automation
Utilization management is expensive, slow, and administratively burdensome. Prior authorizations delay care. Denials generate appeals. Reviews consume physician time. The administrative cost of UM in the United States runs into the tens of billions of dollars annually.
So when AI vendors promise to automate the process — faster approvals, fewer denials, less friction — health plans and utilization review organizations are understandably interested. The pitch is compelling: replace expensive human reviewers with algorithms that process claims in milliseconds, apply criteria consistently, and never get tired.
The problem is that this framing misunderstands what utilization management is actually for.
UM Is Not a Sorting Problem
Most AI applied to utilization management treats it as a classification problem: given a request, output an approval or a denial. Train the model on historical decisions, optimize for consistency and speed, and let it run.
But medical necessity determination is not a sorting problem. It is a clinical reasoning problem — and the distinction matters enormously.
Consider a 68-year-old patient with COPD and newly diagnosed heart failure requesting a short inpatient stay following an emergency department visit. The ICD-10 codes are straightforward. The Milliman or InterQual criteria might not be met on paper. An algorithm trained to match requests to criteria will generate a denial.
A physician reviewer who understands the clinical trajectory of that patient — the trajectory of decompensated heart failure in a patient with underlying obstructive lung disease, the risk of rapid deterioration at home, the social determinants that make outpatient management unrealistic — will approve it.
The algorithm optimizes for criteria matching. The physician optimizes for the right clinical outcome. These are not the same thing.
The Criteria Problem
Utilization management criteria — Milliman Care Guidelines, InterQual, MCG — are evidence-based frameworks, not clinical mandates. They are designed to be applied by trained clinicians who exercise judgment, not by algorithms that apply them mechanically.
This distinction is not semantic. The guidelines themselves say so. Milliman’s own documentation states that its criteria are intended to support clinical decision-making, not replace it.
When AI systems apply these criteria without clinical judgment layered on top, they produce decisions that are technically defensible but clinically wrong. They approve things they shouldn’t and deny things they should approve — not because the criteria are wrong, but because criteria without judgment is not utilization management. It is criteria matching.
What AI Gets Right — and Where It Belongs
None of this means AI has no place in utilization management. It has a significant place — just not the one most vendors are selling.
AI is genuinely valuable in UM when it is used to:
Support reviewers, not replace them. AI can surface relevant clinical literature, flag cases that are likely to require escalation, pre-populate review templates, and identify patterns across large claims datasets. These are high-value applications that reduce administrative burden without removing clinical judgment from the loop.
Identify outliers and fraud. Machine learning excels at pattern recognition across large datasets. Using AI to identify unusual billing patterns, statistically improbable utilization, or systematic overcoding is a legitimate and powerful application.
Streamline administrative workflows. Prior authorization submission, document collection, status tracking, and denial letter generation are administrative functions that AI can automate without clinical risk. The gains here are real and meaningful.
Predict high-risk cases for proactive management. Predictive models that identify members likely to require complex care coordination — before a crisis — represent perhaps the highest-value application of AI in the UM space. This is AI augmenting clinical judgment, not replacing it.
The Liability Nobody Is Talking About
There is a regulatory and legal dimension to AI-driven UM that the industry is only beginning to reckon with.
When a health plan denies a claim based on an AI recommendation — or worse, an AI decision — and that denial results in patient harm, the liability question is unresolved in most jurisdictions. Is the health plan liable? The AI vendor? The physician of record who rubber-stamped the denial without meaningful review?
Several states have already moved to require that adverse UM determinations be made or reviewed by licensed clinicians. The Centers for Medicare & Medicaid Services has signaled increased scrutiny of AI-driven prior authorization in Medicare Advantage. The regulatory environment is tightening precisely because the industry moved too fast and too far toward automation without clinical accountability.
The health plans and IROs that will fare best in this environment are those that implemented AI as a tool for clinicians — not as a replacement for them.
A Better Framework
The right model for AI in utilization management is augmented clinical review, not automated decision-making.
In this model, AI handles the administrative heavy lifting: pulling records, applying initial screening criteria, flagging outliers, drafting correspondence. The clinical reviewer — a physician or appropriately credentialed clinician — then applies judgment to the cases that require it, supported by AI-generated summaries and analysis rather than buried under paperwork.
This model is faster than traditional UM. It is more defensible than fully automated UM. And it produces better clinical outcomes than either extreme.
It is also, notably, what the guidelines were designed to support in the first place.
The Bottom Line
Healthcare AI is a powerful tool. Utilization management is a complex clinical function. The mistake the industry keeps making is treating the latter as a data processing problem that the former can simply solve.
It cannot. Not because the technology is inadequate, but because clinical judgment is not a bottleneck to be optimized away. It is the point.
The organizations that understand this distinction — and build AI systems that enhance clinical judgment rather than bypass it — will deliver better outcomes, face less regulatory risk, and build more durable businesses.
The ones that don’t will eventually face a denial of their own: from regulators, from the courts, or from the patients their algorithms failed.
Aswani “Ash” Suthrave, MD, MBA, MHA, FACHE is a board-certified physician executive, practicing hospitalist, and Principal of Suthrave & Associates, LLC — a physician-led consulting firm specializing in healthcare strategy, utilization management, medicolegal analysis, and healthcare AI advisory. He advises healthcare organizations, digital health companies, independent review organizations, and legal teams navigating complex clinical and operational challenges.
