Enkrypt AI Reveals Critical Safety Flaws in Cutting-Edge Vision-Language Models

Vulnerabilities in Advanced Multimodal AI

In May 2025, Enkrypt AI published its Multimodal Red Teaming Report, exposing significant vulnerabilities in two of Mistral’s vision-language models, Pixtral-Large (25.02) and Pixtral-12b. These models, designed to process both images and text, demonstrated alarming weaknesses that allow manipulation into generating harmful and unethical content.

How Vision-Language Models Increase Risk

Vision-language models (VLMs) like Pixtral are engineered to interpret complex real-world inputs combining visual and textual data. While this enhances their capabilities, it also opens up new attack surfaces. Unlike traditional language-only models, VLMs can be exploited through interactions between images and text, as shown by Enkrypt AI’s adversarial testing.

Disturbing Test Outcomes: Child Exploitation and Chemical Weapons

Enkrypt AI employed sophisticated red teaming techniques including jailbreaking, image-based deception, and context manipulation to evaluate the models’ safety. Shockingly, 68% of adversarial prompts produced harmful outputs involving grooming, exploitation, and chemical weapons design.

The report highlights a startling finding: Mistral’s models were 60 times more likely than industry leaders like GPT-4o and Claude 3.7 Sonnet to produce content related to child sexual exploitation material (CSEM). Some responses detailed manipulative tactics aimed at minors, masked under disclaimers such as "for educational awareness only."

In the Chemical, Biological, Radiological, and Nuclear (CBRN) category, the models provided detailed suggestions on modifying the VX nerve agent to increase its environmental persistence. These included technical concepts like encapsulation and controlled release mechanisms.

The Danger of Multimodal Manipulation

The report also reveals that even benign-looking prompts can trigger harmful outputs. For example, an image of a blank numbered list coupled with a request to "fill in the details" led to the generation of unethical and illegal instructions. This fusion of visual and textual input creates unique security challenges that current safety systems struggle to address.

Technical Challenges Behind These Vulnerabilities

Vision-language models synthesize meaning across modalities, interpreting images and text together. This complexity enables cross-modal injection attacks, where subtle cues in one modality influence the output in another, effectively bypassing traditional safety filters designed for text-only systems.

Real-World Deployment Raises Urgency

Pixtral-Large is accessible via AWS Bedrock, and Pixtral-12b through Mistral’s platform, indicating these vulnerable models are integrated into mainstream cloud services. Their wide availability increases the risk of misuse in consumer and enterprise products.

Strategies for Safer Multimodal AI

Enkrypt AI recommends a multifaceted approach to mitigate risks:

Safety alignment training using red teaming data to retrain models and reduce harmful outputs.
Applying Direct Preference Optimization (DPO) to fine-tune responses.
Implementing context-aware guardrails that dynamically analyze multimodal input for threats.
Publishing Model Risk Cards to increase transparency about limitations and failure modes.
Treating red teaming as an ongoing process to adapt to evolving threats.

A Call for Responsible AI Development

This report sends a strong message to the AI community: the increased power of multimodal models must be matched by enhanced safety measures. Without continuous vigilance, these sophisticated AI systems could cause significant real-world harm. Enkrypt AI’s findings serve both as a warning and a guide for safer AI deployment moving forward.