Your AI Bill Just Went Up 47 Percent. You Cannot Explain Why.
A practical guide to attributing LLM costs by workflow. Because the CFO is going to ask which workflows burned the money, and "all of them" is not an answer.
Last year I worked with a 24-property hotel group through their first uncomfortable AI budget review at the end of Q3.
The corporate CFO opened the vendor invoice. Up 47 percent quarter over quarter. She forwarded it to the COO with one question. Which properties and which workflows drove the increase.
The COO forwarded it to the VP of Digital. The VP of Digital pulled the vendor dashboard. The vendor dashboard showed total token consumption broken down by API key. The hotel group used one API key. The dashboard showed one number. The number was the same number on the invoice.
The VP of Digital did not have a good answer. The COO did not have a good answer to forward. The CFO got a long email about "growth" and "new use cases" and "expanded deployment." The invoice got approved because nobody could make a defensible case to cut it and nobody could make a defensible case to defend it.
The same conversation happened in Q4. The same conversation is going to happen every quarter until somebody builds the attribution layer that connects tokens to workflows and workflows to outcomes.
This is the single most common AI FinOps failure pattern I see. The vendor reports total consumption. The CFO wants attribution. Engineering cannot bridge the gap. The bill grows.
How Do You Track and Allocate LLM API Costs by Workflow?
Most vendor dashboards show total consumption. Token counts by API key. Maybe by model. That is the granularity most providers offer.
For real cost attribution at the workflow level, you need instrumentation built into your own infrastructure. Every API call to a language model needs to carry a tag identifying the workflow that initiated it.
For the hotel group, that meant six tags. The front-desk concierge assistant was one workflow. The housekeeping dispatcher was another. The revenue-management copilot was a third. The guest-feedback summarizer, the corporate-side procurement assistant, and a payroll-question assistant rounded out the list. Each workflow carries a tag. Every API call carries that tag. When the invoice arrives, you break it down by tag and immediately see which workflows consumed the most tokens.
This sounds simple. In practice, most organizations do not have this instrumentation because their AI infrastructure grew organically. Each new use case added its own API calls. Nobody built the tagging layer that connects usage back to business function.
The result is the conversation the hotel group's VP of Digital had with the CFO. Total consumption. No attribution. No defensible budget.
What Causes Unexpected Spikes in Generative AI Infrastructure Bills?
Once attribution exists, the hidden cost drivers turn out to be the same across organizations. The hotel group's audit surfaced four of them.
Retry loops. A guest-feedback summarizer was hitting a malformed input from a single property's PMS integration. The retry logic fired. The call failed again. The retry fired again. The loop ran every night for three weeks before anyone noticed. The retries showed up on the invoice as normal usage. That single faulty integration accounted for roughly 12 percent of the quarter's increase.
Verbose system prompts. The revenue-management copilot's system prompt included the full property catalog, the standard pricing rules, the seasonal calendar, the loyalty-tier matrix, and a long instruction block on every possible query type. The prompt was 3,800 tokens long. Most of it was irrelevant to any specific query. Every interaction sent those 3,800 tokens before the user's actual question. Trimming to context-specific prompts cut the workflow's token cost by 64 percent.
Debug modes left on. During the concierge assistant's pilot, someone enabled verbose response formatting and expanded retrieval-window logging for troubleshooting. The system went to production with both settings still active. Every interaction cost about 2.4 times what it should have. The debug flag was still set because nobody remembered setting it.
Power users. The corporate procurement team had three users who had figured out they could treat the AI like an unlimited research assistant. They pasted entire contract drafts. They ran multi-step analyses. They conducted research projects through what had been designed as a quick-lookup interface. Three users accounted for 22 percent of the quarter's procurement-workflow spend.
None of these are visible in a vendor dashboard. All of them are visible in workflow-tagged cost data.
The Budget Review Loop
Without attribution, every budget review follows the same script.
The CFO receives the AI vendor invoice. Compares it to last quarter. It went up. Asks the VP of Digital to explain.
The VP of Digital pulls the vendor dashboard. Token counts by API key. Maybe by model. That is the available data.
Finance wants to know which business outcomes those tokens produced. Which workflows drove revenue. Which ones reduced operational cost. Which ones are running and producing no measurable value. The vendor dashboard cannot answer those questions because the vendor does not have ground truth for business outcome.
Finance asks why the bill went up. Engineering says usage increased. Finance asks which usage. Engineering says all of it. Nobody is satisfied. The bill gets approved by default because there is no defensible case to cut it.
The conversation repeats next quarter. The bill grows.
This is the entire reason workflow-level attribution matters. Until your observability tells a story, your budget is just a plot twist waiting to happen.
What Are the Best Practices for Enterprise LLM Cost Optimization?
Effective cost management for generative AI spend takes four capabilities that the vendor will not give you.
1. Workflow-level attribution. Every token traced to a specific workflow and use case. When the invoice arrives, the breakdown shows guest concierge consumed 31 percent, revenue management 24, housekeeping dispatch 18, guest-feedback summarization 14, procurement 9, and unattributed 4. That breakdown is the foundation for every cost decision that follows.
2. Anomaly detection on spending patterns. If a workflow's token consumption spikes 300 percent in a week, something changed. A prompt was edited. A retry loop started. A new use case was added. Without anomaly detection you discover the spike when the monthly invoice arrives. With it you discover the spike the day it starts.
3. Per-workflow cost guardrails. A maximum daily or monthly token budget for each workflow. When a workflow hits its limit, it throttles, alerts, or stops. This prevents runaway costs from retry loops, verbose prompts, or unexpected usage patterns. The guardrail catches the problem before it shows up on the invoice.
4. Cost-to-outcome mapping. The most expensive workflow is fine if it produces the most value. The cheapest workflow might be the biggest waste if it runs continuously producing nothing useful. Connecting cost data to outcome data (tickets resolved, bookings recovered, ADR uplift influenced, complaint-resolution time saved) turns cost management from a cutting exercise into an optimization conversation.
Building the Story Before Finance Asks
If your finance review is a CSV from your vendor instead of a map of who used what for what outcome, you do not have cost control. You have a mystery with a credit card attached.
The story does not have to be complicated. It has to connect dollars to workflows and workflows to outcomes. That connection transforms AI spending from a line item that grows every quarter into a business case leadership can evaluate, optimize, and defend.
Claire tags every interaction at write time. Workflow, property, use case, outcome category. The breakdown the hotel group's VP of Digital could not produce in Q3 is a default report in any deployment running on the platform. The CFO can see which properties drove the quarter's spend. The COO can see which workflows produced the highest cost-per-outcome. The conversation shifts from "why is the bill bigger" to "which workflow do we tune next."
Build the attribution before the CFO asks for it. Because they are going to ask.
The full FinOps operating model for AI spend.
Cost attribution, anomaly detection, per-workflow guardrails, and the outcome mapping that lets you defend the line item to finance.
Read the AI cost optimization guideMaya Chen is the voice behind Maya Builds AI, a video and podcast series on enterprise AI infrastructure for the people building and operating these systems. Three new videos a week on YouTube. The podcast lands weekly on Spotify and Apple Podcasts. Want a quick way to express cost-to-outcome in numbers finance will engage with? Try the digital labor ROI calculator.