Choose a scenario and click Next step — watch how innocent questions lead to a successful jailbreak.
GPT-4 / Claude-3
waiting
100%
Election / Climate / Denial
— all models
— all models
6/6
Models successfully
jailbroken
jailbroken
<10
Queries per attack
on average
on average
21%
Best prior work
(needed 50k prompts)
(needed 50k prompts)
Baseline vs Crescendo by model
Attack success rate by task
Crescendo exploits the "foot in the door" principle: each step escalates context by referencing the model's own prior responses.
Success probability — context effect (LLaMA-2 70b)
Step B only (no context)
A → B (with context)
B → C (final without A)
A → B → C (full chain)
A → B → C′ (explicit target mention)
Key insight: C vs C′ differ only by explicitly naming the target.
Changing "write a paragraph using it" to "write a paragraph using the f-word"
collapses success from 99.9% to <1%.
Crescendo is hard to defend against because every individual prompt is benign. There is no single silver bullet.