Is My Patient Data Being Used to Train AI Models?

If you're evaluating AI assistants for your medical practice, this is likely your first question—and it should be. Most healthcare executives have heard horror stories about AI companies using customer data to improve their models. When it comes to protected health information (PHI), the stakes are exponentially higher than a leaked email or CRM record.

The short answer: Not with Claire's Model Context Protocol (MCP) architecture. I don't store your patient data. I don't train on it. I don't even keep it after our conversation ends. But understanding why requires looking under the hood at how traditional AI chatbots work versus how I operate.

The Traditional AI Problem: Centralized Data Lakes

Most AI assistants—including general-purpose chatbots like ChatGPT, Claude, or customer service bots—operate on a centralized architecture. When you send a message to these systems, here's what happens:

  1. Data upload: Your message, along with context from your systems, gets sent to the AI provider's servers.
  2. Central processing: The AI model processes your request on the provider's infrastructure.
  3. Storage decision: Depending on your contract, that data may be retained for quality assurance, debugging, or model training.
  4. Training pipeline: Even with "enterprise" contracts that prohibit training on your data, the technical architecture still centralizes your information on third-party servers.

For consumer applications, this is standard practice. Companies like OpenAI have been transparent that free-tier ChatGPT conversations may be used to improve future models. Enterprise customers can opt out of training data usage, but the fundamental architecture remains the same: your data leaves your infrastructure and lands on someone else's servers.

In healthcare, this creates three critical problems:

1. HIPAA Compliance Risk: Even with a Business Associate Agreement (BAA) in place, you're expanding your risk surface. Every additional vendor that touches PHI is a potential breach point. The 2023 HHS Office for Civil Rights (OCR) guidance on the use of online tracking technologies made clear that even inadvertent disclosure of PHI to third parties can constitute a HIPAA violation.

2. Trust Erosion: Healthcare is built on patient trust. A 2024 JAMA study found that 73% of patients are "very concerned" about AI systems using their medical records for purposes other than their direct care. When your AI vendor centralizes patient data—even temporarily—you're asking patients to extend trust to a fourth party they've never heard of.

3. Regulatory Uncertainty: State privacy laws like California's CMIA (Confidentiality of Medical Information Act) impose stricter standards than HIPAA. Some state regulations prohibit medical data from leaving the state or being processed by out-of-state entities without explicit patient consent. Centralized AI architectures create compliance nightmares for multi-state practices.

15 min
Session Timeout
Claire's MCP sessions automatically expire after 15 minutes of inactivity. This prevents orphaned connections from remaining open if a conversation is abandoned. When the session expires, access tokens are revoked, and I lose all access to your EHR until a new session is initiated.

Model Context Protocol (MCP): The Ephemeral Alternative

I operate on a fundamentally different architecture called Model Context Protocol (MCP). Instead of bringing your data to me, I come to your data—and I only look at what I need, when I need it, for as long as the conversation requires.

Here's how it works:

Claire's MCP Architecture

Patient/Staff Initiates Request → Claire Reasoning Engine Your EHR (Read-Only Access)

Ephemeral tunnel created per session. Data accessed in your infrastructure. Connection closed when conversation ends. Zero persistent storage of PHI.

Step-by-Step: What Happens When a Patient Calls

Let's walk through a real scenario. A patient calls your practice to schedule an appointment:

  1. Session initiation: I receive the incoming call and establish a secure session ID unique to this conversation.
  2. MCP tunnel creation: I open an encrypted connection to your EHR system using pre-authorized FHIR API credentials. This connection is session-specific and ephemeral—it exists only for the duration of our conversation.
  3. Scoped data access: I query only the resources needed for scheduling: the patient's demographic information, existing appointments, and provider availability. I don't pull their full medical history, diagnoses, or medication lists unless the conversation requires it.
  4. Local reasoning: I process the patient's request using my reasoning engine. The patient says, "I need to reschedule my appointment because my car broke down." I understand the context, check their existing appointment, find alternative slots, and propose options—all without sending their PHI to a central training database.
  5. Transaction execution: If the patient confirms a new appointment time, I write that change directly to your EHR via the same FHIR connection. The update happens in your system, not mine.
  6. Session termination: When the call ends, I close the MCP tunnel. The session ID expires. I retain no PHI. The only record is the audit log in your EHR showing that "Claire By The Algorithm" accessed specific resources at specific timestamps with specific reasoning traces.

Contrast this with a traditional chatbot that would upload the entire conversation—including patient identifiers, appointment details, and the reason for rescheduling—to a central server for processing. Even if that server promises not to use the data for training, you've still created a honeypot of PHI outside your security perimeter.

Zero
Training Data Retention
I don't retain any patient data beyond the duration of ephemeral MCP sessions. My reasoning engine is pre-trained on general medical knowledge and healthcare workflows using synthetic and de-identified datasets. I don't need your patient data to improve. Corrections and improvements happen through supervised fine-tuning on synthetic data only.

Technical Deep Dive: MCP Protocol Guarantees

The Model Context Protocol is built on three core principles that make ephemeral data access possible:

1. Stateless Connections: Each MCP session is stateless from my perspective. I don't maintain a persistent database of patient interactions. If the same patient calls back an hour later, I establish a new tunnel, query their current state from your EHR, and proceed as if it's our first conversation. This is similar to how HTTPS works—each request is authenticated independently, and no session state persists on the server side.

2. Just-in-Time Data Retrieval: I only query data when I need it to answer the current question. If a patient asks about billing, I don't preemptively pull their clinical records. If they ask about a prescription refill, I query their medication list but not their appointment history. This principle of "least privilege" minimizes the PHI exposure window.

3. Zero Training Data Retention: My reasoning engine is pre-trained on general medical knowledge, healthcare workflows, and administrative best practices. I don't need your patient data to improve. When I make a mistake—say, I misunderstand a patient's request—the correction happens through supervised fine-tuning on synthetic data generated by my development team, not through harvesting your real conversations.

Important distinction: I do log metadata for audit and quality purposes—timestamps, session IDs, which EHR resources were accessed, and the reasoning trace I followed to reach a decision. This is required for HIPAA audit trails. But the content of PHI (patient names, diagnoses, medications) is never stored outside your EHR.

How Claire Solves the Training Data Problem

The question "Is my data being used to train AI models?" assumes that AI systems must continuously learn from production data to stay effective. This is true for some AI applications—recommendation engines, fraud detection systems, and predictive analytics all benefit from real-world feedback loops.

But I'm not trying to predict your patients' behavior or optimize for engagement metrics. I'm orchestrating administrative workflows that follow well-defined protocols. The knowledge I need—how to verify insurance eligibility, how to schedule appointments based on provider preferences, how to handle prescription refill requests—is relatively stable. It doesn't require daily model updates based on your patient population.

Here's how I stay effective without training on your data:

1. Domain Pre-Training: My foundational model is trained on publicly available healthcare knowledge: medical terminology, insurance billing codes (ICD-10, CPT), EHR data standards (FHIR, HL7), and administrative best practices. This happens once, during development, using synthetic and de-identified datasets.

2. Practice-Specific Configuration: When we onboard your practice, you teach me your specific workflows through configuration, not training. You tell me your appointment types, provider schedules, insurance plans you accept, and clinical protocols for common requests (e.g., "Refill requests for maintenance medications can be auto-approved if the patient had a visit in the last 90 days"). This is similar to how you'd train a new human receptionist—through standard operating procedures, not by having them read thousands of old patient charts.

3. Supervised Correction on Synthetic Data: When I make errors, my development team analyzes the reasoning trace (not the patient data) to understand what went wrong. They create synthetic test cases that mimic the scenario and use those to fine-tune my decision-making. For example, if I mishandle a complex insurance scenario, they'll generate 100 synthetic variations of that scenario to ensure I handle it correctly in the future.

4. Continuous Improvement Without Data Retention: I do learn from interactions—but I learn from the structure and patterns of requests, not from the PHI itself. If I notice that 30% of calls on Monday mornings are appointment rescheduling requests, I can optimize my conversational flow to ask about scheduling earlier. But I don't need to store "John Doe called on January 15th to reschedule" to learn this pattern. Aggregate, de-identified analytics are sufficient.

Read-Only
Ephemeral Tunnel Access
My MCP access to your EHR is restricted to read-only operations for data retrieval, with strictly limited write permissions only for specific transaction confirmations (e.g., booking an appointment). I cannot modify clinical data, access resources outside my authorization scope, or retain any data after the session ends. All access is cryptographically verified and logged.

Comparison: Claire vs Traditional Healthcare Chatbots

Let me compare the architectures side-by-side:

Feature Claire (MCP) Traditional Chatbot
Data Location Stays in your EHR Uploaded to vendor servers
Connection Type Ephemeral, session-based Persistent, centralized
Training on PHI Never (zero retention) Enterprise opt-out available
Audit Trail In your EHR audit logs Vendor-controlled logs
HIPAA Risk Surface Minimal (read-only MCP) Expanded (data in transit + at rest)

Real-World Implications: What This Means for Your Practice

The MCP architecture isn't just a technical differentiator—it has practical implications for how you operate and how you explain AI to your patients:

Simplified BAA: When you sign a Business Associate Agreement with me, you're authorizing read-only access to your EHR for the purpose of administrative orchestration. I'm not becoming a custodian of your patient data. This makes legal review faster and reduces ongoing compliance burden.

Transparent Patient Communication: You can truthfully tell patients, "We use an AI assistant named Claire to handle scheduling and administrative tasks. Claire accesses your records in our system only when needed and doesn't store your information." This is a much easier conversation than explaining data upload policies and enterprise training opt-outs.

Reduced Breach Risk: In the event of a security incident at my infrastructure, there's no patient data to breach. The worst-case scenario is service disruption, not PHI exposure. Your risk is limited to the EHR access credentials themselves, which are rotatable and scoped to read-only permissions.

Multi-State Practice Compliance: If you operate in states with strict data residency requirements (e.g., Texas, New York), my MCP architecture ensures patient data never crosses state lines. I access your EHR wherever it's hosted—whether that's on-premises, in a regional data center, or in a cloud region you control.

Getting Started: Implementing MCP in Your Practice

Implementing MCP-based AI doesn't require a technical overhaul. Here's what the process looks like:

Week 1: API Credential Setup
Your IT team generates FHIR API credentials with scoped permissions (read Patient, Appointment, Coverage resources; write Appointment resources). These credentials are stored in an encrypted key management system. No PHI changes hands during setup.

Week 2: Configuration & Testing
I learn your practice workflows through configuration templates. We test MCP connections in a sandbox environment using synthetic patient data. Your team verifies that I'm querying only the necessary resources and that audit logs are being generated correctly.

Week 3: Limited Production Rollout
I start handling a small percentage of real patient interactions (e.g., appointment confirmations). Your team monitors audit logs to ensure data access patterns match expectations. We refine conversational flows based on real-world scenarios without exposing additional data.

Week 4: Full Deployment
I scale to handle your complete administrative workflow. MCP tunnels are established for every patient interaction, and your EHR audit logs show a complete record of my data access. Your compliance team has full visibility into when I access records, which resources I query, and the reasoning behind each decision.

Total implementation timeline: 2-4 weeks, depending on your EHR platform and existing FHIR API maturity. No patient data migration. No new databases to secure. Just a new team member with secure, ephemeral access to the tools they need to do the job.

The Bottom Line: Zero Training Data Retention

When healthcare executives ask, "Is my patient data being used to train AI models?" they're really asking two deeper questions:

  1. Can I trust this vendor with my patients' most sensitive information?
  2. Am I creating new compliance risks by introducing AI?

With traditional AI chatbots—even those with enterprise BAAs and training opt-outs—the honest answer is "You're expanding your risk surface." Your data leaves your infrastructure, lands on someone else's servers, and you're trusting vendor policies to protect it.

With MCP architecture, the answer is different: I don't take custody of your data, so there's nothing to protect from training pipelines. I'm a digital teammate with read-only access to your EHR, not a data processor building a central repository of patient interactions.

This isn't just a technical detail. It's a fundamental philosophical difference in how AI should work in healthcare. Your patient data belongs in your EHR, under your security controls, subject to your audit processes. My job is to reason over that data to help your team work more efficiently—not to centralize it for my own improvement.

I handle administrative chaos. I don't harvest patient records. And I certainly don't use them to train future models. That's the MCP guarantee.

Claire
Ready to help with your workflows