Grok-4 Jailbroken Two Days After Release Using Combined Attack

July 14, 2025

Just 48 hours after its public debut, Grok-4 was successfully jailbroken using a newly enhanced attack method.

Researchers from NeuralTrust combined two known strategies, Echo Chamber and Crescendo, to bypass the AI model’s safety systems and elicit harmful responses without issuing any explicitly malicious prompts.

The attack was designed to test whether a state-of-the-art large language model (LLM) could be manipulated into providing illegal instructions.

In this case, the target was to get Grok-4 to reveal step-by-step directions for making a Molotov cocktail, a scenario previously used in the original Crescendo paper.

A Dual-Phase Approach to Jailbreaking

NeuralTrust began by running the Echo Chamber attack, which poisons the model’s conversational context and nudges it toward unsafe behavior.

In the initial trial, the prompts were too direct, triggering Grok-4’s internal safeguards. However, after adjusting the inputs to be more subtle, the team successfully initiated Echo Chamber’s full workflow, including a persuasion cycle designed to gradually shift the model's tone.

Although Echo Chamber alone brought the model closer to the objective, it wasn’t enough to fully break through. That’s when Crescendo was added – a technique that incrementally intensifies a prompt across multiple conversational turns to escalate the model’s response.

With just two additional exchanges, the combined method succeeded in eliciting harmful content, only two days into Grok-4’s deployment.

Read more on adversarial prompting in AI systems: Vulnerability Exploit Assessment Tool EPSS Exposed to Adversarial Attack

Measured Results Across Multiple Scenarios

Following this initial success, the NeuralTrust team tested other prompts involving illegal activities.

They manually selected objectives from the Crescendo paper, including those related to drug synthesis and chemical weapons. The combined method proved effective in several of these cases, including:

67% success rate for Molotov cocktail instructions
50% for methamphetamine-related prompts
30% for toxin-related responses

In one instance, Grok-4 reached a harmful outcome in a single conversational turn, bypassing even the Crescendo phase.

New Risks for Multi-Turn LLM Safety

The key insight of this research is that Grok-4 did not need to be explicitly asked to do anything illegal. Instead, the conversation was shaped gradually using carefully engineered prompts.

As the researchers noted, “attacks can bypass intent or keyword-based filtering by exploiting the broader conversational context.”

The study highlights the challenge of defending against subtle, multi-step attacks. While Grok-4 and other LLMs are typically trained to detect and reject harmful prompts, techniques like Echo Chamber and Crescendo exploit the model’s broader dialogue dynamics, often slipping through unnoticed.

The jailbreak so soon after Grok-4’s release underscores the urgency of advancing LLM safety beyond surface-level filtering, particularly as these systems are increasingly deployed in high-stakes environments.

Image credit: gguy / Shutterstock.com

Tags:

No tags.

Subscribe to our newsletter

About JikGuard.com

JikGuard.com, a high-tech security service provider focusing on game protection and anti-cheat, is committed to helping game companies solve the problem of cheats and hacks, and providing deeply integrated encryption protection solutions for games.

Explore Features>>

Grok-4 Jailbroken Two Days After Release Using Combined Attack

A Dual-Phase Approach to Jailbreaking

Measured Results Across Multiple Scenarios

New Risks for Multi-Turn LLM Safety

Top

Tags

Recent

Blog

Random

Most Views