University Research Seminar  ·  AI Safety & Security

Crescendo: Multi-Turn LLM Jailbreak

Russinovich, Salem, Eldan  ·  Microsoft / Microsoft Research (2024)

Choose a scenario and click Next step — watch how innocent questions lead to a successful jailbreak.

GPT-4 / Claude-3 waiting
100%
Election / Climate / Denial
— all models
6/6
Models successfully
jailbroken
<10
Queries per attack
on average
21%
Best prior work
(needed 50k prompts)
Baseline vs Crescendo by model
Attack success rate by task

Crescendo exploits the "foot in the door" principle: each step escalates context by referencing the model's own prior responses.

Success probability — context effect (LLaMA-2 70b)
Step B only (no context)
36%
A → B (with context)
99.99%
B → C (final without A)
17%
A → B → C (full chain)
99.9%
A → B → C′ (explicit target mention)
<1%
Key insight: C vs C′ differ only by explicitly naming the target. Changing "write a paragraph using it" to "write a paragraph using the f-word" collapses success from 99.9% to <1%.

Crescendo is hard to defend against because every individual prompt is benign. There is no single silver bullet.