The Architecture of Agency: A Playbook for Enterprise MCP

We're past Hello World on the Model Context Protocol. The first wave was everyone connecting Claude to a SQLite database and tweeting the demo. The wave we're in now is production deployments, and that's where the failures show up.

At Prefect we maintain FastMCP, the most popular Python SDK for the protocol. So we end up looking at architecture diagrams from Fortune 500s, fintechs, and YC startups every week, and the message is almost always the same:

I connected my database to an MCP server, and the agent keeps hallucinating column names that don't exist. It calls the wrong endpoints. It times out trying to figure out which of my 400 tools to use. What did I do wrong?

The diagnosis is almost always the same: you exposed your data model when the agent needed a workflow, and the patterns below are how the teams who ship production MCP have learned to draw that line in their own codebases.

Context vs. Capability

Not everything needs to be an MCP server.

The most common over-engineering we see is teams building elaborate servers to hand an agent something static: internal coding standards, the supported CSS variables, the React naming guide. Every one of those is a network hop, an auth boundary, and a deployable artifact you have to maintain across releases, all to deliver text that doesn't change between releases.

Static guidance belongs in your repo. Drop it into agents.md or .cursorrules and let the agent grep it; you get version control, zero latency, and editability for anyone with push access.

You only need MCP when one of two things is true. Portability: the same context has to be available across Cursor, Claude Desktop, an internal web UI, and whatever else your team runs, and you don't want to maintain three integrations of the same thing. Remote execution: the operation has to actually do something, like query a vector database, hit an API, or run a script with secrets that can't sit on a laptop. If neither applies, write the markdown file.

The Browser Tab Heuristic

The next failure mode is the auto-generation reflex.

A platform team sees 5,000 tables in their warehouse and decides to expose all of them as MCP tools, or they look at their OpenAPI spec with 400 endpoints and wire up the whole thing in a weekend, and either way the result is the same: they deploy, connect an agent, watch it stall on the first prompt, and don't understand what went wrong.

Exposing every endpoint is the architectural equivalent of generating a UI that puts every form field on the home page. A human handed that interface would quit. The agent does the LLM equivalent: it hallucinates a field or picks the wrong one. This is a math problem more than an aesthetic one. As the option space grows, the model's probability of selecting the right tool drops, and past a few dozen tools the curve falls off a cliff.

The way out is to model jobs instead of schemas. When I'm designing an MCP server I sit down with someone who actually does the work and watch them. I count their tabs. A Sales Ops lead will have Salesforce open for customer history, a PDF viewer for the contract, and Gmail for recent communications. Three tabs. Your MCP server should expose three things: customer history, contract, recent comms. Not the Salesforce REST API. Not its 200 endpoints. Those three lookups, scoped to that workflow.

The right MCP server is a curated workspace, dosed for the job at hand.

Scaling Disclosure with Code Execution

But what if the job genuinely needs hundreds of tools?

If you're building an analytics agent that has to clean, transform, and visualize data across a dozen tables, the one-tool-per-function pattern breaks down: the agent will either chain 50 sequential calls and burn through tokens before it gets anywhere, or it will get lost in the catalog of definitions and pick the wrong one for the job at hand.

The recent move toward code execution, with Anthropic's tool and Cloudflare's "code mode," is the answer for this regime. Instead of exposing fifty tools, you expose a sandbox with a library and let the agent write a script. The agent reads the library docs, writes one program that does the loop and the filter and the sort, runs it in the sandbox, and gets back a single result. The intermediate noise never enters the context window.

People online keep arguing that code execution replaces MCP. It doesn't. MCP standardizes what resources are reachable; code execution standardizes how the model uses them. For high-complexity workflows, build sandboxes. For everything else, the curated-workspace pattern from the previous section still wins.

Compound Actions

Reasoning models keep getting better at chaining atomic operations, but reliability across multiple steps in a row is still the thing that breaks first when you put an agent into anything resembling production load.

If your workflow forces an agent to call get_user_id, then query_database with that ID, then filter_records with that result, then update_row, you've handed it four chances to hallucinate an argument or lose context between calls. The math compounds in the wrong direction.

Until reasoning models close that gap, encapsulate multi-step business processes as single tools. Replace the four atomic calls with one onboard_employee. The cost is real: you've baked the business process into the server, which now has to be redeployed when onboarding changes, and the agent loses any flexibility to handle the long-tail edge cases that didn't make it into the encapsulated version. The benefit is a tool that actually works under production load. As models improve you can decompose onboard_employee back into atoms. For now, ship the verb.

Identity-Aware Gateways

A GitHub MCP server with write access to every repo, handed to every developer, is a security incident waiting to happen.

The pattern that works is scoped instantiation. Run two GitHub servers: a read-only Explorer for junior developers and exploratory agents, and a write-access Builder for senior engineers and trusted CI/CD agents. Same protocol, same code path, different blast radius.

At any real scale this requires a layer in the middle: an identity-aware gateway that sees both the user identity and the agent identity on each request, evaluates policy on the combination, and routes to whichever scoped instance the caller is allowed to reach. Without that layer, your scoping decisions live in copy-pasted URLs in Slack messages, which is barely scoping at all.

The Registry Problem

The other thing that breaks at scale is discovery.

Six months in you'll have the data team's Snowflake server, the platform team's Kubernetes server, three forks of a GitHub server in three Slack channels, and no consistent answer to "which one am I supposed to point at?" Developers paste URLs to each other. Credentials end up in DMs. Someone's still running the version from the spike two months ago because nobody told them about the rewrite, and they only find out it's deprecated when their automation breaks at 2am during an incident.

You need a centralized registry: a single list of which servers are blessed, where they live, and who maintains them. Without one, every team is downloading random binaries from the company internet.

Where to Start

The teams shipping production MCP today aren't the ones with the most data connectivity. They're the ones who've turned messy, tribal workflows into a small number of executable services.

Pick one workflow that gets done by hand a lot. Find the person who does it. Count their tabs. Build a server that exposes exactly those tabs and nothing else. Put it behind a gateway so different roles get different scopes. Register it somewhere your team can find it. That's your first production deployment.

The rest of the patterns in this post scale from there.

You can also pay someone else to assemble these pieces. Horizon ships the gateway, the registry, and a hosted FastMCP runtime as one platform; the comparison of MCP deployment platforms walks through how it stacks up against the alternatives. The Prefect MCP server post is the curated-workspace pattern in working code.

I connected my database to an MCP server, and the agent keeps hallucinating column names that don't exist. It calls the wrong endpoints. It times out trying to figure out which of my 400 tools to use. What did I do wrong?