Platform Architecture

Conversation State in Enterprise AI: Why Air Canada Was Liable for Its Chatbot and the Architecture That Prevents It

Updated February 2026 15 min read Vector Databases • Redis • LangChain Memory • Context Compression

Legal and Technical Reference

Air Canada Tribunal (Feb 2024)

Moffatt v. Air Canada

Damages Awarded

~$650 CAD + costs

Chatbot Failure Mode

Context Loss

Gartner Chatbot Failure Rate

67% (2023)

Legal Precedent — Chatbot Statements Are Binding In Moffatt v. Air Canada (February 2024), the BC Civil Resolution Tribunal held that Air Canada was liable for its chatbot's incorrect statement about bereavement fare policy. The tribunal rejected Air Canada's argument that the chatbot was "a separate legal entity" — establishing that organizations are bound by representations their AI systems make to customers.

Section 01

Moffatt v. Air Canada: What the Chatbot Actually Got Wrong

In November 2022, Jake Moffatt purchased an Air Canada flight after the airline's chatbot informed him that he could apply for a bereavement fare discount within 90 days of the original ticket purchase, even if he booked at the regular price first. He was traveling to his grandmother's funeral. He bought the ticket at full price, expecting to claim the bereavement discount retroactively as the chatbot had suggested. Air Canada later refused the discount, stating that bereavement fares must be requested at the time of booking.

The BC Civil Resolution Tribunal's February 2024 decision found Air Canada liable for $812.02 CAD (approximately $650 CAD in refund plus court costs), holding that the chatbot's incorrect statement was a negligent misrepresentation by Air Canada. The tribunal's analysis is significant for enterprise AI deployments: it rejected Air Canada's argument that the chatbot was a separate legal entity whose statements were not binding. The organization is responsible for what its AI systems communicate to customers.

The immediate technical failure was not context loss in the traditional sense — the chatbot appears to have retrieved and stated incorrect policy information. But the case illustrates the broader category of AI customer communication failure that context management problems contribute to: when a chatbot cannot maintain context across a conversation, it may give contradictory information at different points in the conversation, inconsistently apply policies depending on how questions are framed, and fail to follow up on previous commitments or statements made earlier in the conversation.

67%

Enterprise chatbot project failure rate by Gartner's 2023 analysis — context management cited as leading cause

$812

Air Canada damages in Moffatt case — small amount but binding legal precedent for AI liability

128K

Claude 3's context window in tokens — enabling longer sessions without mandatory summarization

Primary memory architecture patterns: in-context, vector (semantic), session (key-value)

Section 02

The Context Window Problem and Memory Architecture Patterns

Every LLM has a context window — the maximum number of tokens (roughly: words and word-fragments) it can process in a single inference call. The context window defines the "working memory" available to the model: everything the model knows about the current conversation must fit within this window. When a conversation exceeds the context window, earlier parts of the conversation must be dropped — and with them, any context, commitments, or information from those earlier turns.

Context windows have grown dramatically: GPT-3.5-turbo launched with 4,096 tokens; GPT-4-turbo reaches 128,000 tokens; Claude 3's largest models support 200,000 tokens; Gemini 1.5 Pro supports 1,000,000 tokens. These larger windows reduce but do not eliminate the context management problem in enterprise conversational AI. Long enterprise interactions — multi-session support threads, complex multi-step workflows, extended sales interactions — can exceed even 200,000 token windows. And cost is a factor: inference cost scales with context window size, so larger contexts significantly increase operating costs.

Three Primary Memory Architecture Patterns

Production enterprise conversational AI systems use three complementary memory patterns:

Pattern 1: In-Context Memory (Conversation Buffer)

The simplest pattern: maintain the full conversation history in the context window for each inference call. This provides perfect recall of everything in the window but is limited by window size, costs scale linearly with conversation length, and there is no persistence across sessions (when the context window resets, all context is lost). LangChain's ConversationBufferMemory implements this pattern. For short, single-session interactions with small context windows, this is sufficient — but for enterprise use cases with extended conversations or multi-session requirements, it fails.

Pattern 2: Vector Database Memory (Semantic Search)

Rather than keeping the full conversation in context, key information from past conversation turns is embedded as vectors and stored in a vector database (Pinecone, Weaviate, Chroma, pgvector in PostgreSQL). At each inference call, the current query is embedded and used to retrieve semantically relevant past context. This provides effectively unlimited memory depth — the vector database can contain years of interaction history. The limitation is retrieval precision: semantic search retrieves contextually similar content, but may miss specific factual statements from past turns that are not semantically related to the current query. LangChain's VectorStoreRetrieverMemory implements this pattern.

# Enterprise conversation state architecture — layered memory pattern
# Combines Redis session store with vector database for semantic recall

class EnterpriseConversationMemory:
    def __init__(self, session_id: str):
        self.session_id = session_id

        # Layer 1: Redis for immediate session state (last N turns)
        self.session_store = RedisStore(
            key=f"session:{session_id}",
            ttl=86400,  # 24 hour session window
            max_turns=20  # Keep last 20 turns in fast cache
        )

        # Layer 2: Vector DB for long-term semantic memory
        self.vector_store = PineconeStore(
            index_name="conversation-memory",
            namespace=session_id,
            embedding_model="text-embedding-3-small"
        )

        # Layer 3: Structured entity store for key facts
        self.entity_store = PostgresStore(
            table="conversation_entities",
            session_id=session_id
        )

    async def get_context(self, query: str) -> str:
        # Combine: recent turns + semantically relevant history + key entities
        recent = await self.session_store.get_recent(5)
        semantic = await self.vector_store.similarity_search(query, k=3)
        entities = await self.entity_store.get_entities()
        return self._format_context(recent, semantic, entities)
    

Pattern 3: Session Store Memory (Key-Value with TTL)

Redis or other key-value stores provide fast, persistent session storage for conversation metadata: user identity confirmed in this session, workflow stage (which step of a multi-step process the conversation is at), explicitly stated user preferences, and commitments made by the AI system. Redis enables sub-millisecond access to session state and TTL-based automatic expiration of sessions. This pattern is complementary to vector memory: session stores handle structured state (the workflow is at Step 3, the user has confirmed their email) while vector stores handle unstructured semantic context (the user mentioned preferring morning appointments two messages ago).

Section 03

Context Compression: Managing Token Budgets in Long Conversations

Context compression is the process of reducing the token count of conversation history while preserving the information necessary for coherent continuation. Several compression strategies are used in production enterprise AI systems:

Summarization-Based Compression

When the conversation history approaches the context window limit, earlier portions of the conversation are summarized into a compact representation. The summary replaces the original turns in the context window, freeing space for new turns. LangChain's ConversationSummaryMemory and ConversationSummaryBufferMemory implement this pattern. The risk is summarization fidelity: specific factual statements, commitments, or numbers mentioned earlier in the conversation may be generalized or omitted in the summary. For enterprise applications where specific commitments were made (as in the Air Canada case), summarization may compress out the very context needed to maintain consistency.

Selective Retention

Rather than summarizing uniformly, selective retention identifies high-importance turns — those containing specific commitments, numerical information, user decisions, or conflict-prone statements — and retains those verbatim while summarizing lower-importance turns. Importance scoring can be implemented using a classifier model or heuristic rules (messages containing dollar amounts, specific dates, policy references, or user confirmations are high importance). This approach better preserves the factual consistency that the Air Canada case makes clear is legally necessary.

Entity Extraction and Tracking

Entity extraction identifies and tracks named entities mentioned in the conversation: people, organizations, dates, monetary amounts, policy references, locations. These entities are stored in a structured format in the session store and injected into the context at each inference call regardless of summarization. Entity tracking ensures that even highly summarized conversation history preserves the factual anchors necessary for consistent AI responses about specific policy applications, commitments, or user-specific circumstances.

Section 04

Conversation State Architecture Technical Audit Checklist

Context Window Management — No Silent Truncation Verify that conversation history is managed explicitly — not silently truncated by the LLM API. When approaching context window limits, implement explicit compression or summarization. Log when compression occurs. Verify compressed context preserves critical factual statements.
Commitment Tracking — AI Statement Consistency Implement structured tracking of commitments or representations made by the AI system during a conversation. Test that subsequent turns in the same conversation are consistent with prior AI statements. Test that context compression does not cause the AI to contradict prior statements.
Redis Session State — TTL and Consistency Verify Redis session store TTL is appropriate for expected conversation length. Verify session state is consistent across multiple API requests within a session. Implement session state versioning to detect state corruption. Test behavior when session expires during an active conversation.
Vector Database Memory — Retrieval Accuracy Testing Test semantic memory retrieval accuracy against a corpus of representative enterprise conversations. Measure recall of specific factual statements from earlier in the conversation. Verify retrieval accuracy meets requirements for the specific use case — customer commitments require higher recall than general preference tracking.
Entity Extraction — Coverage of Critical Information Test entity extraction against scenarios where specific numerical commitments, policy references, dates, and user decisions are mentioned. Verify entities are tracked correctly and injected into context for subsequent turns. Verify entity store is not vulnerable to injection attacks (entities should not contain executable content).
Multi-Session Continuity — Cross-Session Memory For use cases requiring multi-session continuity (customer support threads spanning multiple days), verify that relevant context from previous sessions is retrievable and correctly surfaced. Test that user identity is verified before previous session context is exposed.
Context Compression Fidelity — Commitment Preservation Test context compression against scenarios where specific commitments were made in the compressed portion of the conversation. Verify the compressed representation preserves sufficient fidelity for the AI to remain consistent with those commitments. High-stakes commitments should be retained verbatim, not summarized.
Privacy — Session Data Isolation and Retention Verify session state is isolated per user identity — no cross-user context leakage. Implement session data retention policy with automatic deletion after defined period. Verify session data is included in user data deletion fulfillment for GDPR/CCPA rights requests.
Failure Mode — Graceful Degradation on Memory Failure Test AI system behavior when memory retrieval fails (Redis unavailable, vector search timeout). System should degrade gracefully — acknowledging context limitation rather than providing hallucinated context from failed retrieval. Log memory failures for operational monitoring.
Audit Trail — Conversation Reconstruction Capability Verify that complete conversation history can be reconstructed from audit logs — including compressed and vectorized turns. The ability to reconstruct a conversation in an investigation (as the Air Canada case required) depends on retaining the full conversation record separately from the active context management layer.

Section 05

How Claire Maintains Conversation State for Enterprise AI

Claire's Layered Conversation Memory Architecture

Three-Layer Memory Architecture — Claire implements Redis for immediate session state (last 20 turns, sub-10ms access), Pinecone vector store for semantic long-term memory (unlimited depth, semantic retrieval), and PostgreSQL for structured entity tracking (commitments, decisions, critical facts). Each layer serves a distinct memory function.

Commitment Preservation in Compression — Claire's context compression identifies and preserves verbatim any AI statement that constitutes a commitment, offer, or policy representation — the exact failure mode in the Air Canada case. High-importance statements are flagged and retained intact through any summarization step.

Automatic Entity Extraction and Tracking — Claire's conversation pipeline extracts and tracks entities (monetary amounts, dates, policy references, specific product commitments, user decisions) from every turn. Entities are injected into all subsequent turns in the session — ensuring factual consistency throughout the conversation regardless of compression.

Multi-Session Continuity with Identity Verification — Claire supports multi-session conversation continuity for enterprise customer service use cases. Previous session context is only surfaced after verified user identity — preventing cross-user context exposure. Context retrieval scope is configurable per deployment.

Immutable Conversation Archive for Legal and Compliance — Separate from the active memory management layer, Claire maintains an immutable, timestamped archive of complete conversation records — suitable for legal discovery, regulatory inspection, and incident investigation. Archives are retained per configurable policy, separately from session state TTLs.