The “C” in MCP: Why Context Is the Hardest Part of Enterprise AI

The “C” in MCP: Why Context Is the Hardest Part of Enterprise AI

The agent didn’t give a wrong answer because the model was weak. It gave a wrong answer because the data it received carried no meaning.

Over the past year, the Model Context Protocol (MCP) has done something genuinely useful: it standardized how an AI agent connects to the systems it needs. One protocol, one way to discover tools, one way to call them. The M and the P, the model and the protocol, are largely solved.

The C is not. And the C is the whole point. Context is what tells an agent what a number actually means, which table to trust, and which definition of “revenue” the business stands behind. MCP gives an agent a clean way to reach your data. It does almost nothing to tell the agent what that data means.

MModelstandardized
CContextstill on you
PProtocolstandardized

First, what “context” even means for an agent

A language model ships knowing nothing about your business. It has never seen your schemas, your fiscal calendar, or your definition of an active customer. Everything it knows about your situation has to be placed in front of it at the moment it answers. That working set of information is its context.

Anthropic, which created MCP, defines it plainly: context is “the set of tokens included when sampling from a large language model”, the instructions, the tools, the data pulled in over MCP, and the conversation so far. Getting an agent to behave is less about clever prompting and more about curating that set so the model has exactly what it needs, and little else.

Two things make this hard. First, context is finite. A model has an attention budget, and the more tokens you cram in, the less reliably it uses any of them, a documented failure mode researchers call context rot. Second, context is where meaning lives. The model can reason beautifully over what you give it, but if what you give it is ambiguous or unexplained, it will reason its way, with total confidence, to the wrong answer.

So the question that decides whether an enterprise agent works is not “how smart is the model.” It is “how good is the context the model receives.” Here is what that difference looks like on a single row of data. Flip the toggle:

Agent task: “Report revenue for customer 8842291.”

RawGrounded
The row the tool returns
cust_id_v28842291canonical key
rev1250USD thousands
statusAA = Active
region33 = EMEA
Confidently wrong
“Customer 8842291 has $1,250 in revenue. Status A, region 3.”
Correct
“Active EMEA customer with $1.25M in recognized revenue.”
Same row, same model: context is the only variable

What MCP gives you, and what it leaves out

An MCP tool call is a connection, not an explanation. When an agent calls a tool that wraps a database or an API, what comes back is rows and columns, exactly as stored. The protocol carries the values. It does not carry the years of tribal knowledge a human analyst uses to read those values correctly.

That gap is not a detail. Analyses of enterprise MCP are blunt about it: without governance and semantic grounding, servers that expose raw database access let agents hallucinate on unfamiliar schemas, producing confident, inconsistent, untrustworthy results. The tool worked. The plumbing worked. The answer was still wrong.

The same row, two very different readings

Here is a single record from a customer table, the way an MCP tool hands it to an agent, next to what anyone on your data team already knows about it.

Raw from the tool
What it actually means
cust_id_v2: 8842291
The canonical customer key. A migration left a legacy cust_id behind; only this one is current.
rev: 1250
$1,250,000 in recognized revenue. Stored in thousands, in USD.
status: A
Active. A single-letter code: A = Active, C = Churned, P = Paused.
region: 3
EMEA. A foreign key into a region lookup (1 = NA, 2 = LATAM, 3 = EMEA, 4 = APAC).
close_dt: 2026-03-31
Fiscal Q1 close. The fiscal year starts Feb 1, so this is not calendar Q1.

A person reads around all of this without noticing. An agent cannot, because none of it is written in the data itself. It is the same wall the research community keeps hitting: the BIRD benchmark was built from real, messy enterprise databases whose columns carry terse, coded names that are impossible to interpret without outside documentation, and the scores show exactly how much that documentation is worth.

Three ways missing context breaks an agent

Each of these is a real failure mode, and in each one the SQL is valid and the tool call succeeds. Step through them:

1Ambiguous names → the wrong column

Enterprise schemas carry thousands of abbreviated, near-duplicate names. The model links them by surface similarity and grabs the closest match: the legacy key instead of the canonical one.

— agent joins on the legacy key
JOIN orders o ON o.cust_id = c.cust_id
— should be c.cust_id_v2

Query runs clean, and silently drops ~30% of customers. The result looks fine and is quietly incomplete.

2Hidden units → the wrong number

The query can be perfectly correct and the answer still wrong, because nothing tells the agent that rev is stored in thousands.

SELECT SUM(rev) FROM accounts; — returns 1250
— reported as “$1,250”, but the real figure is $1,250,000

On BIRD, giving GPT‑4 the external knowledge that explains columns like these lifts execution accuracy from 34.88% to 54.89%, a 20‑point swing that comes from context, not a smarter model.

3Missing lineage → the wrong table

With no signal for which tables are current, fresh, or owned, the agent queries a plausible-looking one, a replica last refreshed months ago, or a legacy ERP running 90 days behind.

SELECT * FROM accounts_replica_old;
— valid SQL · last refreshed 9 months ago
— stale numbers, presented as the truth

No freshness signal, no owner, no warning, just a confident answer built on stale data.

Valid SQL every time: the failure is missing meaning, not broken plumbing

Why you can’t fix this in the prompt

The instinct is to paste the data dictionary into the system prompt and move on. It doesn’t scale, for two reasons.

Volume. Enterprise schemas carry thousands of columns, and the knowledge that explains them is scattered across wikis, tickets, and chat threads measured in millions of tokens. You can’t fit that in every request, keep it current, or enforce who is allowed to see what. And even if you could, you’d pay the context-rot tax on every call: the more you dump in, the less reliably the model uses any of it.

Stuff everything in the prompt

data dictionary · wikis · tickets · thousands of columnsoverflows the budget → context rot

Ground it at the source

just the schema, units, lineage & policy this call needshigh-signal · governed · repeatable

Authority. Ask for “active customers in EMEA last quarter” and the agent needs the certified definition of active, the approved join logic, the right fiscal calendar, and the correct access rules. Finance and sales each have a real, different definition of “revenue.” Without something that says which one the business stands behind, an agent can pull a real number and still be wrong. Meaning has to be attached to the data before the question is ever asked, not improvised at inference time.

Ground the data before the agent sees it

The fix is to attach meaning at the source, so every tool call returns data that already carries it: the canonical schema, representative samples, the units and encodings, the relationships, the lineage that says which table is trustworthy, and the policies that say who may see it. Grounding happens once, and every agent inherits it.

Raw rowscust_id_v2, rev, status, region, straight from the source
Context layermeaning attached:

schemasamplesunitslineagepolicy
Answer-readycustomer 8842291 · $1.25M · Active · EMEA

At Nexla, that layer is Helix, a per-enterprise knowledge graph and vector database that ingests your schemas, documents, metadata, lineage, and prior agent runs, then feeds that grounding into every MCP tool call before data reaches the agent. It pairs with a second idea the research backs up: task-shaped MCP servers that carry only the handful of grounded tools a workflow needs, rather than one giant catalog. Fewer tools to misread means higher tool-selection accuracy and less for the agent to get wrong. Connection and context pull in the same direction.

The takeaway for anyone building on MCP: the M and the P are becoming commodities. The C is where the work, and the differentiation, actually lives. An agent is only ever as good as the meaning attached to the data it receives.

Give your agents data that already means something

MCP Studio builds governed, task-shaped MCP servers grounded by the Helix Context Layer, so every tool call returns answer-ready data with schema, lineage, and policy built in.

Get early access to MCP StudioExplore the Context Layer


You May Also Like

A Guide to AI Readiness
Intercompany Integration Overview

Join Our Newsletter

Share

Related Blogs

MCP security: Nexla preserves user identity and credentials so source systems enforce their own policies
Benchmarking Nexla MCP server design: system-shaped vs task-shaped MCP servers

The Data Layer Your AI Is Missing

Connect, contextualize, and govern enterprise
data across 700+ systems in real time.