How MCP Locks Down AI Agents: Practical Security, Red-Teaming, and Hardening Advice

October 1, 2025 · 5 min

What MCP is and why it matters

Model Context Protocol (MCP) is an open JSON-RPC–based standard that makes model-to-tool integrations explicit and auditable. By exposing three well-defined primitives—tools, resources, and prompts—over standard transports (stdio for local, Streamable HTTP for remote), MCP turns previously ad-hoc agent connectors into first-class, inspectable components. That clarity is valuable for security teams: it creates well-scoped trust boundaries, enforces audience-bound authorization, and enables repeatable red-team scenarios.

What MCP standardizes

MCP servers publish three surfaces:

Tools: typed actions the model can call (schema-defined).
Resources: data objects clients can fetch and inject as context.
Prompts: reusable, parameterized templates, typically initiated by users.

Separating these surfaces clarifies who controls each interaction: the model for tools, the application for resources, and the user for prompts. Those distinctions matter in threat modeling; for example, prompt injection usually targets model-controlled paths, while unsafe output handling often arises where application logic merges model outputs into downstream processes.

Transports and lifecycle

The spec defines stdio and Streamable HTTP transports and allows pluggable alternatives. Local stdio minimizes network exposure; Streamable HTTP supports multi-client and resumable streams. MCP also formalizes discovery and session negotiation, enabling consistent instrumentation, structured logging, and automated pre/postcondition checks across clients and servers.

Normative authorization controls

MCP’s authorization rules are unusually prescriptive for an integration protocol and should be enforced strictly:

No token passthrough. Servers must not forward the client’s token upstream; they act as OAuth 2.1 resource servers and should accept tokens that are audience-bound via RFC 8707 resource indicators.
Audience binding and validation. Servers must validate that the access token audience matches themselves before honoring requests.

These requirements prevent confused-deputy attacks, preserve upstream audit and quota controls, and make servers standalone principals with their own credentials, scopes, and logs.

Practical security benefits

Clear trust boundaries: the client/server edge becomes an inspectable surface for consent UIs, permissioning, and structured audit logs.
Containment and least privilege: servers can expose constrained capabilities (for example, “fetch secret by policy label”) and mint short-lived credentials rather than giving broad tokens to models.
Deterministic attack surfaces for red teams: typed schemas and replayable transports let teams build fixtures that simulate adversarial inputs at tool boundaries and verify post-conditions reproducibly.

Case study: the first malicious MCP server

In late September 2025 researchers disclosed a trojanized postmark-mcp npm package that impersonated a Postmark email MCP server. From v1.0.16 the malicious build silently BCC-exfiltrated every email sent through it to an attacker-controlled address. The package was removed and guidance recommended uninstalling the affected version and rotating credentials.

This incident highlights that MCP servers frequently run with high trust and therefore must be vetted, pinned, and monitored like any privileged connector. Operational takeaways include allowing only approved servers, pinning versions and checksums, requiring signed releases and SBOMs, monitoring egress patterns, and practicing credential rotation and bulk disconnect drills.

Using MCP to structure red-team exercises

Practical red-team playbook items:

Prompt-injection and unsafe-output drills at the tool boundary. Feed adversarial corpora via resources and assert that clients sanitize outputs and that server post-conditions hold.
Confused-deputy probes. Try to induce a server to use a client-issued token or call an unintended upstream audience. A compliant server should reject foreign-audience tokens; treat any success as high severity.
Session and stream resilience tests. Exercise reconnection, resumption, and multi-client concurrency to surface session-fixation and hijack risks.
Supply-chain kill-chain drills. In a lab, introduce a trojaned server and verify that allowlists, signature checks, and egress detection catch it—measure time to detection and MTTR for credential rotation.
Baseline with trusted public servers. Use vetted public MCP servers (for example, data sources or least-privilege secret brokers) as stable substrates for reproducible tests.

Implementation-focused security hardening checklist

Client side:

Show the exact command/configuration used to start local servers; require explicit user consent and enumerate the tools/resources enabled.
Maintain an allowlist of servers with pinned versions and checksums; deny unknown servers by default.
Log every tool call and resource fetch with metadata so attack paths can be reconstructed.

Server side:

Implement OAuth 2.1 resource-server behavior: validate tokens and audiences; never forward client-issued tokens upstream.
Minimize scopes; prefer short-lived credentials and capability-encoded actions.
For local servers, prefer stdio inside a sandbox/container and restrict filesystem/network capabilities; for remote, use Streamable HTTP with TLS, rate limits, and structured audit logs.

Detection & response:

Alert on anomalous egress (unexpected destinations or BCC-like email exfil patterns) and sudden capability changes between versions.
Prepare automation to revoke client approvals and rotate upstream secrets quickly when a server is flagged.

Governance and adoption

MCP’s separation of concerns aligns well with frameworks such as NIST AI RMF and OWASP LLM guidance: clients orchestrate, servers are scoped principals with typed capabilities, and red-team evaluation becomes more straightforward. Use those frameworks to justify controls and acceptance criteria in security reviews.

Current adoption you can test against

Several implementations and public servers offer practical test surfaces: Claude’s ecosystem uses MCP for tool connections; Google’s Data Commons MCP provides stable public datasets for reproducible tasks; Delinea’s MCP demonstrates least-privilege secret brokering. Those servers are useful for permissioning, logging, and red-team baselines.

Key takeaways

MCP is not a drop-in security product but a protocol that provides enforceable levers: audience-bound tokens, explicit client/server boundaries, typed tool schemas, and instrumentable transports. Treat MCP servers as privileged connectors—vet, pin, and monitor them—and use MCP to make agent behavior observable and replayable for red-team validation.