The agent didn’t give a wrong answer because the model was weak. It gave a wrong answer because the data it received carried no meaning.
Over the past year, the Model Context Protocol (MCP) has done something genuinely useful: it standardized how an AI agent connects to the systems it needs. One protocol, one way to discover tools, one way to call them. The M and the P, the model and the protocol, are largely solved.
The C is not. And the C is the whole point. Context is what tells an agent what a number actually means, which table to trust, and which definition of “revenue” the business stands behind. MCP gives an agent a clean way to reach your data. It does almost nothing to tell the agent what that data means.
MModelstandardized
CContextstill on you
PProtocolstandardized
First, what “context” even means for an agent
A language model ships knowing nothing about your business. It has never seen your schemas, your fiscal calendar, or your definition of an active customer. Everything it knows about your situation has to be placed in front of it at the moment it answers. That working set of information is its context.
Anthropic, which created MCP, defines it plainly: context is “the set of tokens included when sampling from a large language model”, the instructions, the tools, the data pulled in over MCP, and the conversation so far. Getting an agent to behave is less about clever prompting and more about curating that set so the model has exactly what it needs, and little else.
Two things make this hard. First, context is finite. A model has an attention budget, and the more tokens you cram in, the less reliably it uses any of them, a documented failure mode researchers call context rot. Second, context is where meaning lives. The model can reason beautifully over what you give it, but if what you give it is ambiguous or unexplained, it will reason its way, with total confidence, to the wrong answer.
So the question that decides whether an enterprise agent works is not “how smart is the model.” It is “how good is the context the model receives.” Here is what that difference looks like on a single row of data. Flip the toggle:
Agent task: “Report revenue for customer 8842291.”
RawGrounded
The row the tool returns
cust_id_v28842291canonical key
rev1250USD thousands
statusAA = Active
region33 = EMEA
Confidently wrong “Customer 8842291 has $1,250 in revenue. Status A, region 3.”
Correct “Active EMEA customer with $1.25M in recognized revenue.”
Same row, same model: context is the only variable
What MCP gives you, and what it leaves out
An MCP tool call is a connection, not an explanation. When an agent calls a tool that wraps a database or an API, what comes back is rows and columns, exactly as stored. The protocol carries the values. It does not carry the years of tribal knowledge a human analyst uses to read those values correctly.
That gap is not a detail. Analyses of enterprise MCP are blunt about it: without governance and semantic grounding, servers that expose raw database access let agents hallucinate on unfamiliar schemas, producing confident, inconsistent, untrustworthy results. The tool worked. The plumbing worked. The answer was still wrong.
The same row, two very different readings
Here is a single record from a customer table, the way an MCP tool hands it to an agent, next to what anyone on your data team already knows about it.
Raw from the tool
What it actually means
cust_id_v2: 8842291
The canonical customer key. A migration left a legacy cust_id behind; only this one is current.
rev: 1250
$1,250,000 in recognized revenue. Stored in thousands, in USD.
status: A
Active. A single-letter code: A = Active, C = Churned, P = Paused.
region: 3
EMEA. A foreign key into a region lookup (1 = NA, 2 = LATAM, 3 = EMEA, 4 = APAC).
close_dt: 2026-03-31
Fiscal Q1 close. The fiscal year starts Feb 1, so this is not calendar Q1.
A person reads around all of this without noticing. An agent cannot, because none of it is written in the data itself. It is the same wall the research community keeps hitting: the BIRD benchmark was built from real, messy enterprise databases whose columns carry terse, coded names that are impossible to interpret without outside documentation, and the scores show exactly how much that documentation is worth.
Three ways missing context breaks an agent
Each of these is a real failure mode, and in each one the SQL is valid and the tool call succeeds. Step through them:
1Ambiguous names → the wrong column
Enterprise schemas carry thousands of abbreviated, near-duplicate names. The model links them by surface similarity and grabs the closest match: the legacy key instead of the canonical one.
— agent joins on the legacy key JOIN orders o ON o.cust_id = c.cust_id — should be c.cust_id_v2
Query runs clean, and silently drops ~30% of customers. The result looks fine and is quietly incomplete.
2Hidden units → the wrong number
The query can be perfectly correct and the answer still wrong, because nothing tells the agent that rev is stored in thousands.
SELECT SUM(rev) FROM accounts; — returns 1250 — reported as “$1,250”, but the real figure is $1,250,000
On BIRD, giving GPT‑4 the external knowledge that explains columns like these lifts execution accuracy from 34.88% to 54.89%, a 20‑point swing that comes from context, not a smarter model.
3Missing lineage → the wrong table
With no signal for which tables are current, fresh, or owned, the agent queries a plausible-looking one, a replica last refreshed months ago, or a legacy ERP running 90 days behind.
SELECT * FROM accounts_replica_old; — valid SQL · last refreshed 9 months ago — stale numbers, presented as the truth
No freshness signal, no owner, no warning, just a confident answer built on stale data.
Valid SQL every time: the failure is missing meaning, not broken plumbing
Why you can’t fix this in the prompt
The instinct is to paste the data dictionary into the system prompt and move on. It doesn’t scale, for two reasons.
Volume. Enterprise schemas carry thousands of columns, and the knowledge that explains them is scattered across wikis, tickets, and chat threads measured in millions of tokens. You can’t fit that in every request, keep it current, or enforce who is allowed to see what. And even if you could, you’d pay the context-rot tax on every call: the more you dump in, the less reliably the model uses any of it.
Stuff everything in the prompt
data dictionary · wikis · tickets · thousands of columnsoverflows the budget → context rot
Ground it at the source
just the schema, units, lineage & policy this call needshigh-signal · governed · repeatable
Authority. Ask for “active customers in EMEA last quarter” and the agent needs the certified definition of active, the approved join logic, the right fiscal calendar, and the correct access rules. Finance and sales each have a real, different definition of “revenue.” Without something that says which one the business stands behind, an agent can pull a real number and still be wrong. Meaning has to be attached to the data before the question is ever asked, not improvised at inference time.
Ground the data before the agent sees it
The fix is to attach meaning at the source, so every tool call returns data that already carries it: the canonical schema, representative samples, the units and encodings, the relationships, the lineage that says which table is trustworthy, and the policies that say who may see it. Grounding happens once, and every agent inherits it.
Raw rowscust_id_v2, rev, status, region, straight from the source
Context layermeaning attached:
schemasamplesunitslineagepolicy
Answer-readycustomer 8842291 · $1.25M · Active · EMEA
At Nexla, that layer is Helix, a per-enterprise knowledge graph and vector database that ingests your schemas, documents, metadata, lineage, and prior agent runs, then feeds that grounding into every MCP tool call before data reaches the agent. It pairs with a second idea the research backs up: task-shaped MCP servers that carry only the handful of grounded tools a workflow needs, rather than one giant catalog. Fewer tools to misread means higher tool-selection accuracy and less for the agent to get wrong. Connection and context pull in the same direction.
The takeaway for anyone building on MCP: the M and the P are becoming commodities. The C is where the work, and the differentiation, actually lives. An agent is only ever as good as the meaning attached to the data it receives.
Give your agents data that already means something
MCP Studio builds governed, task-shaped MCP servers grounded by the Helix Context Layer, so every tool call returns answer-ready data with schema, lineage, and policy built in.
MCP Security That Uses Your Identity, Your Credentials, and Your Policies
As AI agents reach into enterprise systems, the question is not whether they can connect, but whether they do it without bypassing your security controls. Here is how Nexla keeps MCP access tied to each user’s identity and credentials, and lets your systems keep enforcing their own policies.
MCP Tool Bench is a controlled way to benchmark MCP server design. We put Nexla’s task-specific MCP servers against off-the-shelf ones on real BigQuery tasks, in two harnesses, and measured the agent effort each demanded.
Data Platform for AI Agents: 7 Capabilities to Demand
A data platform for AI agents must do 7 things: connect, abstract, govern, deliver, act, observe, secure. Use this checklist to evaluate any vendor or stack.