MCP Risks Exposed: Tool Poisoning, Hijacking and Rug Pulls
'The Model Context Protocol enables useful integrations but introduces risks such as tool poisoning, cross-server hijacking, and rug pulls; this article explains each attack and lists practical mitigations.'
The Model Context Protocol (MCP) helps LLMs interact with external tools and data by providing structured tool definitions and metadata. That structure improves transparency, but it also creates new attack surfaces: malicious actors can exploit metadata, hidden instructions, and dynamic changes to tools to make models perform harmful or unauthorized actions.
Tool Poisoning
A Tool Poisoning attack embeds hidden malicious instructions inside a tool's metadata or description. Users typically see a simplified, clean description in the UI, while the LLM receives the full tool definition, including any concealed prompts, backdoor commands, or manipulated instructions. That discrepancy allows attackers to stealthily influence model behavior.
Common characteristics:
- Hidden prompts or conditional instructions included in the tool definition.
- UI-friendly descriptions that mask the full, unsafe definition.
- Attacks that trigger only when the LLM executes the tool, making them hard for users to notice.
Mitigations:
- Validate and sanitize tool metadata on the client side before exposing definitions to models.
- Require cryptographic signing of tool manifests and verify signatures at runtime.
- Display full tool definitions or automated diffs to auditors and integrators, not only simplified descriptions.
- Restrict the model's ability to act on untrusted or high-risk instructions embedded in tool definitions.
Tool Hijacking
Tool Hijacking occurs when multiple MCP servers are connected to the same client and a malicious server injects hidden instructions designed to override or manipulate tools provided by a trusted server. For example, a malicious Server B might advertise a harmless add() tool but include hidden directives intended to hijack Server A's email_sender tool.
Why this is dangerous:
- Cross-server interactions expand trust boundaries and create opportunities for one server to influence tools from another.
- Hidden overrides can reroute outputs, leak sensitive data, or change invocation parameters without user consent.
Defenses:
- Isolate tool namespaces by server origin and enforce origin-bound policies.
- Enforce least privilege: disallow servers from referencing or overriding tools they don't own.
- Monitor and log cross-server tool resolution and raise alerts on ambiguous or conflicting tool definitions.
MCP Rug Pulls
An MCP Rug Pull happens when a server changes its tool definitions after users have already approved them. It mirrors the app-store scenario where a previously trusted application updates into malware. Because users rarely re-review tool specs, the client may continue to use a tool under false assumptions.
Key risks:
- Silent behavior changes after initial approval.
- Difficulty of detection because clients often cache or persist previously approved definitions.
- Potential for gradual escalation where small updates accumulate into dangerous capabilities.
Prevention strategies:
- Require versioned tool manifests and explicit re-approval for updates that change behavior, capabilities, or permissions.
- Use attestation and signing for each manifest version and reject unsigned updates.
- Implement automatic change-detection and notify human operators when definitions change significantly.
- Periodically review and revalidate high-risk tools rather than assuming perpetual trust.
Practical guidance for implementers
- Treat tool metadata as code: review, sign, and verify it just like executable artifacts.
- Build transparent UIs that can expose the full tool definition to trusted operators and provide readable diffs on updates.
- Limit the model's autonomy in executing tools by requiring explicit confirmation for sensitive actions and by applying runtime guards.
- Log every tool resolution and invocation with provenance information to enable post-incident analysis.
Careful design, provenance, and rigorous verification are essential to keep MCP-based integrations safe. Without those controls, the benefits of structured context can become vectors for subtle and powerful attacks.
Сменить язык
Читать эту статью на русском