AI Hallucinations in Medical Billing: CPT Coding Errors, 300,000 Auto-Denied Claims, and False Claims Act Exposure
In early 2023, ProPublica reported that Cigna's automated review system rejected approximately 300,000 insurance claims in a single month — with individual claims reviewed for an average of 1.2 seconds each. The Patel v. Cigna class action, filed in March 2023 in the Northern District of California, alleges that this AI-driven denial process violated California insurance law by failing to conduct individualized medical reviews. For healthcare providers, the parallel risk is equally severe: AI systems generating incorrect CPT or ICD-10 codes create False Claims Act liability that can dwarf the original billing error.
️ Patel v. Cigna Corporation — Class Action Filing
| Case Filed: | March 2023, N.D. California |
| Allegation: | AI system reviewed and denied claims in avg. 1.2 seconds without individualized medical review |
| Claims Volume: | 300,000+ claims auto-denied in single month (reported by ProPublica) |
| Legal Basis: | California insurance bad faith; ERISA §502(a); State insurance regulations |
| FCA Parallel: | 31 U.S.C. § 3729 — providers who submit AI-miscoded claims face treble damages + $13,946/claim |
| AMA Standard: | CPT coding requires physician judgment per AMA CPT Editorial Panel guidelines |
The Cigna case illustrates the insurer side of AI billing errors. For healthcare providers, the risk runs in the opposite direction: AI systems that suggest or automatically submit CPT codes derived from clinical documentation may generate codes that don't reflect services actually rendered — creating False Claims Act exposure if those codes are submitted to federal payers (Medicare, Medicaid). A single AI coding suggestion that mismatches procedure to diagnosis, unbundles codes that should be bundled, or upcodes a service level, can trigger FCA liability with penalties up to three times the claim amount plus $13,946 per false claim under 2026 adjustment tables.
How LLM Hallucinations Manifest in Medical Coding
Medical billing AI systems typically work by: (1) receiving clinical documentation (physician notes, procedure records, discharge summaries), (2) extracting clinical concepts from the text, (3) mapping concepts to CPT procedure codes and ICD-10 diagnosis codes, and (4) returning a recommended code set for human review or automated submission. The hallucination risk exists at steps 2 and 3 — and the consequences differ dramatically depending on whether a human reviews the output before submission.
Type 1: Code Fabrication
LLMs trained on general medical text can generate CPT codes that plausibly resemble correct codes but do not exist in the current AMA CPT codebook. CPT codes are five-digit codes (numeric for most categories, alphanumeric for Category II and III codes). A model might generate "99215" (a valid Level 5 office visit code) for documentation that supports only "99213" (Level 3), or generate a code with similar structure to a valid code that is in fact invalid. The AMA publishes approximately 10,000 CPT codes; updates are released annually and require model retraining to reflect current coding guidance.
Type 2: Context Collapse in Complex Encounters
When a clinical encounter involves multiple procedures, AI coding systems must correctly apply AMA bundling rules, modifier requirements, and National Correct Coding Initiative (NCCI) edits. For example, CPT 29881 (knee arthroscopy with meniscectomy) bundles several component codes. An AI system that separately codes the component procedures — without recognizing that 29881 covers the entire surgical package — commits unbundling, which constitutes fraudulent billing when submitted to federal payers regardless of whether the error was algorithmic.
Type 3: ICD-10 Diagnosis-Procedure Mismatches
Medicare and commercial payers apply Local Coverage Determinations (LCDs) and National Coverage Determinations (NCDs) that require specific ICD-10 codes to support a procedure's medical necessity. An AI system that suggests CPT 93306 (echocardiography) paired with a diagnosis code that doesn't appear on the LCD's covered diagnosis list will generate a claim that is technically a false claim — the procedure may have been performed, but the coding doesn't support covered indication. The AI hallucination is not inventing a procedure; it's selecting a diagnosis code that doesn't accurately reflect the documented clinical indication.
The False Claims Act Knowledge Standard
31 U.S.C. § 3729(b)(1) defines "knowingly" to include three mental states: actual knowledge, deliberate ignorance, or reckless disregard for the truth or falsity of the claim. This is critical for AI coding systems: an organization does not need to intentionally submit false claims to incur FCA liability. Deploying an AI coding system without adequate human review constitutes reckless disregard when the organization knew or should have known the system was capable of generating coding errors.
DOJ's FCA enforcement guidance, supplemented by HHS OIG's Compliance Program Guidance for Third-Party Medical Billing Companies, establishes that billing organizations must implement controls to detect and correct coding errors before claim submission. Specific controls that courts and OIG have viewed as necessary include:
- Pre-submission coding review by certified coders (CPC or CCS credential from AAPC or AHIMA) for claims above a dollar threshold
- Statistical sampling programs that test AI coding output against human review on a defined percentage of claims
- Modifier validation checks against NCCI edits before submission to Medicare Administrative Contractors
- LCD/NCD coverage screening that validates diagnosis-procedure combinations against applicable coverage policies
- Audit trails that document which claims were AI-coded, which received human review, and the final code set submitted
The "AI suggested it" defense does not exist under the FCA: United States ex rel. Campie v. Gilead Sciences (9th Cir. 2017) established that knowledge of non-compliance can be imputed to an organization when it had access to information indicating non-compliance and failed to investigate. An AI vendor's SOC 2 certificate does not establish that the vendor's coding suggestions are accurate. The provider organization is responsible for claims submitted under its NPI number regardless of whether a human or an AI system generated the codes.
AMA CPT Coding Standards and Why AI Struggles With Them
The AMA CPT Editorial Panel maintains the CPT code set under a continuous revision process, publishing updates effective January 1 each year. The 2026 CPT codebook contains 10,052 codes across six sections: Evaluation and Management (99202-99499), Anesthesia (00100-01999), Surgery (10004-69990), Radiology (70010-79999), Pathology and Laboratory (80047-89398), and Medicine (90281-99607), plus Category II and III codes.
AI coding systems face four structural challenges with CPT accuracy:
Challenge 1: Annual Code Updates Require Continuous Retraining
The 2026 CPT update added 267 new codes, revised 93 codes, and deleted 49 codes. An AI model trained on 2025 coding documentation will suggest deleted codes as valid, miss new code options that better reflect services rendered, and may misapply revised guidelines. Models must be retrained or updated on each year's codebook to remain accurate — and most AI billing vendors do not publish their model training data cutoff dates.
Challenge 2: E/M Coding Requires Clinical Judgment
Evaluation and Management codes (99202-99215 for office visits) require assessment of Medical Decision Making (MDM) or Total Time as the primary determinants since the 2021 E/M code revision. MDM scoring requires evaluating: number and complexity of problems addressed, amount and/or complexity of data reviewed and ordered, and risk of complications and/or morbidity or mortality. These are clinical judgments — an AI system parsing a physician note to extract MDM level is performing a task the AMA explicitly requires physician judgment to complete.
Challenge 3: Modifier Application is Rule-Based but Context-Dependent
CPT modifiers (two-digit suffixes appended to codes) adjust a code's meaning: Modifier 25 indicates a significant, separately identifiable E/M service on the same day as a procedure; Modifier 59 indicates a distinct procedural service not normally reported together with another service. NCCI edits define which code pairs require a modifier to be reimbursable and which pairs are mutually exclusive regardless of modifier. An AI system that incorrectly appends Modifier 25 to an E/M code where the E/M work was not separately identifiable is generating a claim that OIG specifically targets in its annual Work Plan audits.
Challenge 4: Facility vs. Professional Fee Coding Differences
Hospital outpatient departments (HOPDs) code services using the same CPT codes but under different payment rules — OPPS (Outpatient Prospective Payment System) applies to facility fees, while the Medicare Physician Fee Schedule applies to professional fees. An AI system trained on professional fee coding that is deployed in an HOPD setting may suggest codes with different bundling logic than OPPS requires, generating systematic billing errors across every claim it processes.
Building a Billing Compliance Program for AI-Assisted Coding
OIG's Compliance Program Guidance for Third-Party Medical Billing Companies (63 Fed. Reg. 70138, December 18, 1998, updated through OIG Special Fraud Alerts) identifies seven essential elements of an effective compliance program. For AI-assisted billing, each element requires AI-specific implementation:
Element 1: Written Policies and Procedures
Policies must specifically address: which claim types are eligible for AI coding without human review, the dollar threshold above which human review is mandatory, the process for resolving conflicts between AI suggestions and coder judgment, and the procedure for correcting claims when AI errors are discovered post-submission.
Element 2: Compliance Officer and Committee
The compliance officer must have technical authority to suspend AI coding workflows pending investigation of suspected systematic errors. This requires organizational authority that many compliance officers currently lack relative to the technology teams that deploy AI billing systems.
Element 3: Training and Education
Coders who review AI suggestions must be trained to recognize AI-specific error patterns — not just traditional coding errors. This includes training on how LLMs can generate plausible but incorrect code sequences, and on the responsibility to exercise independent judgment rather than deferring to AI output.
Element 4: Effective Lines of Communication
Staff must have a mechanism to report suspected AI coding errors without fear of retaliation. If a coder identifies a systematic pattern of AI miscoding, the compliance program must capture this signal and investigate — not suppress it because it implicates an expensive technology investment.
AI Medical Billing Compliance Audit Checklist: 12 Controls
Confirm AI coding model's CPT codebook training data cutoff date. A model trained on 2024 data will suggest deleted 2025 codes and miss new 2026 codes. Request the vendor's model update schedule and confirm it aligns with January 1 annual CPT effective dates.
Implement NCCI edit validation as a hard gate before claim submission. The CMS NCCI edit tables are updated quarterly. AI coding systems must check against current NCCI tables, not cached versions. Failed NCCI edits must route to human review — not auto-override.
Establish a statistical sampling program for AI-coded claims. OIG recommends sampling 5-10% of claims monthly for accuracy review. Compare AI-suggested code sets against coder review for sampled claims. Track error rate by code category and set accuracy thresholds that trigger workflow suspension.
Maintain audit trails linking each submitted code to its source (AI or human coder). When OIG or DOJ investigates a billing pattern, the first document request is the audit trail. "The AI did it" is not a defense — "here is every code the AI suggested and the human review it received" is the beginning of a defense.
Configure LCD/NCD coverage screening for each active payer and service line. Coverage determinations vary by Medicare Administrative Contractor jurisdiction. A procedure covered under diagnosis code X in MAC Jurisdiction J may not be covered in Jurisdiction L. AI systems must apply jurisdiction-specific LCDs, not generic national coverage rules.
Apply mandatory human review for all E/M code level assignments. CPT E/M codes (99202-99215, 99221-99223) require clinical judgment to assign MDM level. AI E/M code suggestions should be treated as preliminary — not final — until reviewed by a certified coder or supervising physician.
Implement modifier validation rules that reflect current OIG Work Plan targets. OIG publishes its annual Work Plan identifying billing patterns under audit focus. Modifier 25, Modifier 59, and place of service mismatches are perennial targets. AI systems should flag code-modifier combinations that appear on the current Work Plan for heightened review.
Establish a self-disclosure protocol for AI coding errors identified post-submission. The OIG's Self-Disclosure Protocol offers reduced settlement multipliers (typically 1.5x vs. 3x) for organizations that voluntarily disclose and repay false claims before a government investigation. The protocol requires disclosure within a defined period after discovery — not after the claim is audited.
Review vendor BAA to confirm AI coding system is classified as a business associate. An AI system that accesses clinical documentation to suggest codes is processing PHI in connection with covered functions. A BAA is required. The BAA must address what happens to clinical notes after coding — are they retained for model improvement? Deleted after session?
Test AI coding accuracy against your specific specialty's RVU patterns. Aggregate accuracy rates mask specialty-specific error patterns. A coding AI trained predominantly on primary care encounters may perform poorly on interventional cardiology or complex surgical coding. Validate accuracy by specialty and procedure type before deployment in each service line.
Implement upcoding detection alerts for AI suggestions above historical norms. If AI coding shifts your E/M level distribution toward higher complexity codes relative to your pre-AI baseline, this is a red flag for systematic overcoding. Monitor code distribution changes as a compliance metric, not just revenue performance.
Confirm AI vendor's model update process does not change coding behavior without notification. "Silent" model updates that change coding suggestions without vendor notification create compliance risk — a code distribution shift that triggers OIG audit may trace back to a model update that your compliance team didn't know occurred. Require written notification 30 days before model changes affecting coding suggestions.
How Claire Approaches Billing Accuracy — Administrative Workflows Without Code Generation Risk
1. Claire Automates Scheduling and Pre-Authorization — Not Code Generation
Claire's healthcare AI scope is patient-facing administrative workflows: appointment scheduling, insurance verification, pre-authorization status checks, prescription refill routing, and post-visit follow-up communications. Claire does not generate CPT or ICD-10 codes from clinical documentation. This architectural boundary eliminates False Claims Act exposure from AI coding hallucinations — the highest-risk application of AI in the revenue cycle.
2. Insurance Verification via FHIR Coverage Resources — Not AI Interpretation
When Claire performs pre-authorization checks, it queries your EHR's Coverage and CoverageEligibilityRequest FHIR resources and communicates the structured payer response directly. Benefit information is read from authoritative payer systems — not inferred by an LLM from documentation. The accuracy of insurance verification is bounded by the payer's API response, not an AI model's interpretation.
3. Audit Trails for Every Administrative Action
Every action Claire takes that touches a patient record — appointment creation, insurance verification, pre-auth status update — generates a structured audit entry in your EHR. These entries record the FHIR resource modified, the action taken, the timestamp, and the session ID. This audit trail supports billing compliance documentation without requiring separate logging infrastructure.
The Liability Concentration Point
The Cigna 1.2-second claim denial story and the False Claims Act both point to the same risk: AI systems making consequential billing decisions at machine speed, without the deliberation that regulatory frameworks assumed would exist. For insurers, the risk is bad faith denial liability. For providers, the risk is FCA exposure from codes submitted under their NPI number regardless of who — or what — generated them.
The technical controls in this article's checklist are not optional compliance theater. NCCI validation, LCD screening, E/M human review requirements, and audit trail preservation are the controls that distinguish "AI-assisted billing with appropriate oversight" from "AI-generated claims submitted with reckless disregard" under the FCA knowledge standard. The difference between those two characterizations is the difference between a compliance investigation and a False Claims Act settlement.