BAA Sub-Processor Risks in AI Vendor Contracts: Chain of Custody, Model Training Data, and the $3M URMC Settlement

In November 2019, University of Rochester Medical Center paid $3,000,000 to OCR following an investigation triggered by two separate breaches — a lost unencrypted flash drive and a stolen unencrypted laptop. The critical finding: URMC had failed to enter into Business Associate Agreements with workforce members who accessed ePHI using personal devices. This is the direct analog of today's AI vendor risk: organizations deploy AI tools that access ePHI, but fail to trace the full chain of BAA obligations through the vendor's infrastructure to its sub-processors — the cloud platforms, inference APIs, logging services, and customer support tools that touch healthcare data downstream of the primary vendor relationship.

️ HHS OCR Resolution Agreement — University of Rochester Medical Center

Announced:	November 5, 2019
Settlement:	$3,000,000 plus corrective action plan
Covered Entity:	University of Rochester Medical Center, Rochester NY
Incidents:	Lost unencrypted flash drive (2013); stolen unencrypted laptop (2013)
Core Findings:	No BAAs with workforce members using personal devices; no device encryption policy
Violations:	45 CFR §164.308(b) (BAA requirement), §164.502(e) (disclosures to BAs)

View HHS OCR Resolution Agreement →

URMC's fundamental error was treating access to ePHI as a bilateral relationship between the covered entity and the individual or vendor — when HIPAA's BAA requirement operates as a chain. Under the HIPAA Omnibus Rule (2013), business associates must enter into BAAs with their own sub-contractors (sub-processors) who access PHI on their behalf. A BAA between your organization and an AI vendor does not automatically create BAA coverage for the vendor's cloud infrastructure, model inference provider, logging platform, or customer support system — each of which may handle PHI in the course of providing the service you contracted for.

The BAA Chain of Custody Requirement

45 CFR §164.308(b)(1)

Business Associate Contracts Required

A covered entity must obtain satisfactory assurances that business associates will appropriately safeguard PHI. These assurances must be documented in a written BAA. There is no exception for vendors who claim they are "HIPAA-ready architecture" without a signed BAA.

45 CFR §164.502(e)(1)(ii)

Sub-Processor BAA Cascade

A business associate may disclose PHI to a sub-contractor and allow the sub-contractor to create, receive, maintain, or transmit PHI on its behalf only if the business associate enters into a BAA with the sub-contractor. The cascade extends to every tier of the vendor chain.

45 CFR §164.314(a)

BAA Content Requirements

The BAA must require the business associate to: comply with applicable Security Rule provisions, ensure sub-contractors comply similarly, report breaches, and make PHI available for access and amendment. Generic HIPAA compliance statements are not BAAs.

The AI Vendor Sub-Processor Chain

A typical AI healthcare vendor's infrastructure involves multiple sub-processors, each of which may handle PHI during the delivery of the contracted service. The chain for a conversational AI scheduling system might look like:

AI Vendor (Primary BA) — Your direct contract; processes patient conversations and EHR queries
Cloud Hosting Provider — AWS, Azure, or GCP providing compute infrastructure; hosts application servers that process PHI
LLM Inference Provider — Azure OpenAI, AWS Bedrock, or a third-party model host; receives prompts that may contain PHI
Speech-to-Text Provider — AWS Transcribe Medical, Google Speech-to-Text Medical, or similar; receives audio containing PHI
Logging and Observability Platform — Datadog, Splunk, New Relic, or similar; may receive application logs containing PHI if logging is not properly configured
Customer Support Platform — Zendesk, Intercom, or similar; support tickets may contain patient PHI when resolving service issues
Vector Database Provider — Pinecone, Weaviate, or similar; may store patient-derived embeddings

Your BAA with the AI vendor covers tier 1. Whether tiers 2-7 have BAA coverage depends entirely on the contracts your AI vendor has with its sub-processors — arrangements you had no involvement in and which may not reflect HIPAA BAA requirements at all.

The sub-processor disclosure gap: Most AI vendor BAAs include a generic clause stating the vendor will ensure sub-processors comply with HIPAA. What this clause does not do: identify the sub-processors, confirm they have signed BAAs, specify what PHI each sub-processor receives, or provide you with any mechanism to verify compliance. A BAA clause saying "we will ensure our sub-processors are HIPAA-ready architecture" is not a substitute for a published sub-processor list with BAA status confirmation for each entry.

The Model Training Data Risk

AI vendors increasingly request or reserve the right to use customer data for model improvement. For healthcare AI vendors, this creates a specific PHI risk: patient conversations, clinical notes processed by the AI, and EHR query results may be retained and used to fine-tune or train the vendor's models. The BAA implications are significant:

Training Data Use Requires Explicit Authorization

Using patient PHI to improve an AI model is not a "healthcare operations" activity under 45 CFR §164.501 unless it qualifies as quality assurance or quality improvement for the covered entity's own operations. A vendor using a covered entity's patient data to improve a model sold to other customers is using PHI for a purpose outside the scope of the BAA — which permits PHI use only as necessary to provide the contracted service.

De-identification Claims Need Independent Validation

Many AI vendors claim they use only "de-identified" patient data for model training. As detailed in our PHI in AI Systems analysis, de-identification requires meeting either the Safe Harbor standard (removing 18 specific identifier categories) or the Expert Determination standard (statistical certification). Verify which standard the vendor applies, how the de-identification process is implemented, and whether an independent expert has validated the methodology. "We strip names and dates" is Safe Harbor only if all 18 identifier categories are removed — not just the most obvious ones.

Model Artifacts as PHI Containers

Research has demonstrated that large language models trained on private data can memorize and reproduce training data verbatim — a phenomenon documented by Carlini et al. in their 2021 paper "Extracting Training Data from Large Language Models." If a vendor fine-tunes a model on patient conversation data and the fine-tuned model memorizes specific patient information, the model itself becomes a PHI-containing artifact. This creates an unprecedented HIPAA scenario: PHI encoded in model weights, with unclear obligations for access control, audit, breach notification, and disposal.

$3M

URMC OCR Settlement — November 2019

URMC's violation was failing to execute BAAs with its own workforce members using personal devices. AI vendor sub-processor chains create the same structural gap at potentially greater scale: PHI flowing to third-party infrastructure without the covered entity knowing which sub-processors receive it, let alone whether BAA coverage exists for each.

What a Strong AI Vendor BAA Must Contain

The standard BAA template published by HHS was written for traditional business associate relationships — billing services, transcription companies, IT vendors. AI vendor BAAs require additional provisions that address AI-specific risk vectors. The following provisions should be present in any BAA with an AI healthcare vendor:

Provision 1: Complete Sub-Processor Disclosure

The BAA must either list all current sub-processors who receive PHI, or commit to maintaining a current sub-processor list accessible to the covered entity. Changes to the sub-processor list should require advance notice (30 days is the standard for GDPR-aligned vendors; apply the same standard to HIPAA). The disclosure must specify what PHI each sub-processor receives — not just that they are "HIPAA-ready architecture."

Provision 2: Model Training Data Prohibition

The BAA must explicitly prohibit the vendor from using PHI — or data derived from PHI — to train, fine-tune, or evaluate AI models beyond the minimum necessary to provide the contracted service. If the vendor requests permission for model improvement, this must be a separate, explicitly authorized use with specific de-identification standard, verification methodology, and audit rights.

Provision 3: Breach Notification Using OCR's Discovery Standard

As analyzed in the Warby Parker settlement, breach notification must use OCR's definition of discovery: the date the vendor knew or reasonably should have known of the breach. "Confirmed" breach language delays notification by weeks or months while forensic investigation proceeds. The BAA should require notification within 15 business days of discovery (not 60 days — the maximum, not the target), with preliminary details and a final report timeline.

Provision 4: Data Return and Destruction at Termination

At contract termination, the vendor must return or destroy all PHI in its possession — including PHI in sub-processor systems. For AI vendors, this must explicitly include: conversation logs, transcript archives, vector embeddings derived from patient data, any fine-tuned model weights trained on patient data, and audit logs containing patient identifiers. The vendor must certify destruction in writing within 30 days of termination.

Provision 5: Direct Audit Rights

The covered entity must have the right to audit the vendor's HIPAA compliance — not just request documentation, but conduct or commission an independent assessment. The audit right should extend to sub-processors who handle PHI. Notification period for audit: 30 days maximum (not "reasonable notice" which vendors may interpret as 90+ days).

# BAA Sub-Processor Verification Process

# TYPICAL: BAA with generic sub-processor clause — no verifiable protections
BAA_CLAUSE = """
5.3 Sub-Contractors. Business Associate shall ensure that any agent, including
a sub-contractor, that creates, receives, maintains, or transmits Protected Health
Information on behalf of Business Associate agrees to the same restrictions and
conditions that apply to Business Associate with respect to such information.
"""
# Problems with this clause:
# - No sub-processor list required or provided
# - "Same restrictions and conditions" not defined — does this require a BAA?
# - No advance notice requirement for sub-processor changes
# - No audit mechanism to verify sub-processor compliance
# - No PHI flow specification per sub-processor

# STRONG: Sub-processor clause with specific verifiable obligations
BAA_CLAUSE_STRONG = """
5.3 Sub-Contractors.

(a) Business Associate shall maintain a current list of all Sub-Contractors 
that create, receive, maintain, or transmit Protected Health Information in 
connection with the Services ("Sub-Processor List"), available to Covered 
Entity upon request.

(b) Business Associate shall enter into a written Business Associate Agreement 
with each Sub-Contractor prior to any disclosure of Protected Health Information 
to such Sub-Contractor. Business Associate shall make available to Covered Entity 
a summary of the BAA status for each Sub-Contractor upon request.

(c) Business Associate shall provide Covered Entity with 30 days' advance written 
notice of any addition or replacement of a Sub-Contractor that will access 
Protected Health Information. Covered Entity may object to such change within 
15 days of notice, and the parties shall work in good faith to resolve the 
objection before the Sub-Contractor begins accessing Protected Health Information.

(d) For each Sub-Contractor receiving Protected Health Information, the Sub-Processor 
List shall specify: (i) the Sub-Contractor's name and role; (ii) the categories 
of Protected Health Information the Sub-Contractor receives; (iii) the purpose 
for which the Sub-Contractor processes Protected Health Information; and (iv) 
the Sub-Contractor's data center locations.
"""

The URMC Lesson: Workforce Member BAAs

URMC's specific violation — failing to execute BAAs with workforce members accessing ePHI on personal devices — has a direct AI parallel. Healthcare organizations that allow staff to access AI-powered patient management tools from personal devices without a formal BYOD policy and written workforce agreement are replicating the URMC compliance gap.

When a workforce member uses their personal iPhone to access an AI patient scheduling tool, two BAA-adjacent obligations apply: (1) the AI vendor must have a BAA with the covered entity; (2) the covered entity must have a written BYOD policy and security agreement with the workforce member that governs their personal device's use for ePHI access. The BYOD agreement is not technically a BAA (workforce members are part of the covered entity's workforce, not third-party business associates), but it must address: device encryption, remote wipe authorization, data segregation, and prohibition on local storage of ePHI.

BAA Sub-Processor and AI Vendor Audit Checklist: 12 Controls

Request a complete, current sub-processor list from your AI vendor before signing the BAA. The list must name each sub-processor, describe their role, specify what PHI they receive, and confirm BAA status. A vendor who cannot provide this list has not completed their own BAA due diligence.

Verify your BAA includes an advance notice requirement for sub-processor changes (30 days minimum). Generic sub-processor change clauses with "reasonable notice" language effectively permit sub-processor substitutions without your knowledge. Define "reasonable" as a specific number of days in the contract.

Include an explicit model training data prohibition in the BAA. Default AI vendor terms often include data use provisions permitting model improvement. An explicit BAA prohibition takes precedence over vendor terms of service. If the vendor refuses this prohibition, that is material information for your procurement decision.

Define "termination" data return to include model artifacts, embeddings, and fine-tuned weights. Standard BAA termination provisions contemplate returning "records" and "data" — categories that do not clearly include vector embeddings or fine-tuned model weights. Explicitly enumerate each data category the vendor must return or destroy.

Confirm your BAA includes direct audit rights with a defined notification period. "Right to audit" clauses with 90-day notice requirements give vendors time to remediate issues before your auditors arrive. Require 30-day maximum notice for compliance audits and no notice for document production requests.

Verify breach notification uses OCR's discovery standard, not "confirmed breach" language. The 60-day HIPAA notification clock starts at discovery, not confirmation. A vendor BAA that starts the clock at "confirmed" breach can delay notification by months. Align the BAA language with the regulatory definition of discovery.

Audit whether AI vendor LLM inference providers (Azure OpenAI, AWS Bedrock, etc.) have signed BAAs with the AI vendor. Your BAA with the AI vendor covers the primary relationship. The AI vendor's BAA with its LLM provider covers the sub-processor relationship. Request confirmation that this sub-BAA exists and covers the specific use case (PHI-containing prompts).

Implement a BYOD policy requiring encrypted storage and remote wipe capability for all devices accessing AI healthcare tools. URMC paid $3M for a missing workforce device policy. Your AI vendor's mobile app access creates the same exposure. MDM enrollment, full-disk encryption, and remote wipe authorization must be mandatory for any device accessing AI systems that touch ePHI.

Review the AI vendor's logging platform sub-processor for PHI exposure risk. Application monitoring platforms (Datadog, New Relic, Splunk) receive application log data. If the AI vendor's logging configuration writes PHI to log output, the monitoring platform receives PHI without explicit authorization. Verify the logging configuration and confirm the monitoring sub-processor has a BAA or receives only PHI-free operational data.

Confirm the AI vendor's customer support platform does not receive PHI in support tickets. When support tickets are filed about patient interaction issues, support staff may include patient identifiers, conversation excerpts, or clinical context in their communications. Verify the support platform sub-processor has a BAA, or implement a policy prohibiting PHI inclusion in support tickets.

Request the AI vendor's most recent third-party SOC 2 Type II report covering sub-processor management controls. SOC 2 Type II audits test that controls are operating effectively over a period (typically 6-12 months). The sub-processor management section should verify that the vendor maintains a current sub-processor list, executes BAAs, and monitors sub-processor compliance. A SOC 2 Type I report tests only design, not operation.

Confirm the BAA includes a provision making the vendor directly liable for sub-processor PHI breaches. Standard BAA language requires vendors to flow down HIPAA obligations to sub-processors but does not make the vendor liable for sub-processor failures. Add explicit language: "Business Associate shall be liable to Covered Entity for breaches of this Agreement caused by acts or omissions of Business Associate's Sub-Contractors to the same extent as if Business Associate had committed the acts or omissions directly."

How Claire Manages the BAA Sub-Processor Chain

1. Published Sub-Processor List with BAA Status

Claire maintains a current sub-processor list published in our security documentation. Each sub-processor entry includes the company name, role in service delivery, categories of data accessed, data center locations, and BAA status confirmation. When sub-processors change, customers receive 30 days advance notice with the right to object before the change takes effect. The list is updated within 10 business days of any sub-processor addition or change.

2. Contractual Prohibition on PHI Use for Model Training

Claire's BAA explicitly prohibits use of PHI — or data derived from PHI — to train, fine-tune, or evaluate any AI model. This is not a policy statement in documentation that vendors can change unilaterally; it is a contract term that requires mutual agreement to modify. The prohibition extends to sub-processors: Claire's sub-processor agreements include the same training data prohibition.

3. Architecture That Minimizes Sub-Processor PHI Exposure

Claire's MCP architecture accesses PHI ephemerally from your EHR via FHIR API — PHI is not stored in Claire's infrastructure or transmitted to sub-processors for storage. The LLM inference provider receives prompts containing session tokens (not patient PHI) and tool call results. The monitoring platform receives operational metrics (latency, error rates, throughput) — not patient data. Sub-processor PHI exposure is minimized by architecture, reducing the BAA chain risk surface.

4. 30-Day Data Return Certification at Termination

Upon contract termination, Claire provides written certification of data destruction within 30 days. The certification explicitly covers: session logs, audit records, configuration data, and any other Claire-held data associated with your organization. Because Claire's architecture stores no patient PHI, the destruction certification is straightforward — there is no embedding archive, model fine-tune, or conversation history to manage.

The BAA Is Not a Compliance Checkbox

URMC's $3M settlement was not for a spectacular breach — it was for a structural compliance failure: the absence of required legal agreements with people and entities that had access to ePHI. The same structural failure is embedded in AI healthcare deployments where organizations sign a BAA with the primary vendor and consider the HIPAA obligation satisfied, while PHI flows to five or six sub-processors under the vendor's infrastructure without any compliance documentation.

The BAA is not a compliance checkbox. It is the legal foundation that makes every downstream PHI use either authorized or unauthorized. For AI vendors with complex infrastructure chains, building that foundation requires understanding exactly where patient data flows — not just at the vendor level, but through every sub-processor that receives or processes it. That understanding starts with the sub-processor list, and the list starts with asking for it before signing.