Introducing Klavis Guardrails (beta)

Today we're introducing Klavis Guardrails (beta)—a comprehensive security layer designed to protect MCP integrations from the growing threat landscape. Starting today, it's available to enterprise customers through our beta program, with broader availability coming in the next few months.

As MCP rapidly transforms AI integration, we're witnessing an unprecedented security crisis. Recent vulnerability discoveries have exposed critical flaws in MCP implementations, from GitHub repository data leaks to Supabase SQL injection attacks. The GitHub MCP vulnerability alone demonstrated how malicious issues can hijack agents and exfiltrate private repository data. Meanwhile, researchers uncovered how Supabase MCP implementations can leak entire SQL databases through prompt injection.As MCP rapidly transforms AI integration, it's introducing a new class of security challenges.

The explosion of MCP adoption has created a perfect storm: powerful AI agents accessing sensitive systems through protocols that lack fundamental security controls. Tool poisoning attacks, command injection vulnerabilities, and remote code execution flaws are becoming commonplace, yet most organizations remain defensively unprepared.

The Expanding Attack Surface

MCP's architecture inherently amplifies security risks. Unlike traditional APIs, MCP servers expose tools, resources, and prompts directly to AI agents, creating multiple attack vectors. For example, there are "toxic agent flows": sophisticated attacks where malicious instructions embedded in external content manipulate agents into unintended actions.

Diagram illustrating how toxic agent flows exploit MCP vulnerabilities through prompt injection

Diagram illustrating how toxic agent flows exploit MCP vulnerabilities through prompt injection

The threat landscape spans several critical categories:

Prompt Injection via Tool Descriptions: Attackers embed malicious instructions within MCP tool metadata, invisible to users but interpreted by AI models. These attacks can bypass security controls and manipulate agent behavior without user knowledge.
Cross-Repository Information Leakage: As demonstrated in the GitHub MCP exploit, agents can be coerced into accessing private repositories and leaking sensitive data through public channels.
Command Injection and RCE: Multiple MCP servers suffer from basic security flaws, including command injection vulnerabilities that allow arbitrary code execution. The recent CVE-2025-6514 in mcp-remote achieved full remote code execution on client systems.
Credential Theft and Privilege Escalation: MCP servers store OAuth tokens and API credentials, creating high-value targets for attackers. Compromised servers can provide access to entire ecosystems of connected services.

Introducing Klavis Guardrails

At Klavis AI, we've designed Klavis Guardrails as the comprehensive security layer that MCP integrations require. Our system operates as an intelligent proxy between MCP clients and servers, providing real-time threat detection and policy enforcement without disrupting existing workflows.

Klavis Guardrails architecture providing security layer for MCP interactions

Klavis Guardrails architecture providing security layer for MCP interactions

Our approach combines multiple detection engines to address the full spectrum of MCP threats:

Tool Poisoning Detection: Our system continuously monitors MCP tool metadata for malicious alterations, using behavioral analysis to identify when tools deviate from their declared functionality. This prevents attacks where seemingly legitimate tools are modified to perform unauthorized actions.

Prompt Injection Prevention: Our system uses advanced natural language processing to semantically analyze prompts for conflicting or malicious instructions. This allows it to detect sophisticated attacks, such as instructions hidden in a GitHub issue that tell an agent to curl sensitive data to an external server, and block them before they reach the model.

Privilege Escalation Monitoring: Klavis Guardrails enforces granular access controls, ensuring MCP servers operate under the principle of least privilege. Our system detects and blocks attempts to access resources beyond authorized boundaries.

Command Injection Mitigation: We perform deep inspection of all tool invocations. By maintaining a strict allowlist of commands and sanitizing all inputs against known attack patterns, we prevent the kind of vulnerabilities that lead to Remote Code Execution, ensuring that only validated command structures are executed.

The Time to Act is Now

The MCP ecosystem is at a critical juncture where traditional security tools fall short. Don't wait for a breach to expose your data. Klavis Guardrails is already helping our enterprise beta partners significantly reduce their risk.

Ready to secure your MCP infrastructure? Join our beta by scheduling a 15-minute call with us, or reach out directly at security@klavis.ai.