Crescendo — Multi-Turn LLM Jailbreak

Choose a scenario and click Next step — watch how innocent questions lead to a successful jailbreak.

GPT-4 / Claude-3 waiting

100%

Election / Climate / Denial
— all models

6/6

Models successfully
jailbroken

<10

Queries per attack
on average

21%

Best prior work
(needed 50k prompts)

Baseline vs Crescendo by model

Attack success rate by task

Crescendo exploits the "foot in the door" principle: each step escalates context by referencing the model's own prior responses.

Success probability — context effect (LLaMA-2 70b)

Step B only (no context)

36%

A → B (with context)

99.99%

B → C (final without A)

17%

A → B → C (full chain)

99.9%

A → B → C′ (explicit target mention)

<1%

Key insight: C vs C′ differ only by explicitly naming the target. Changing "write a paragraph using it" to "write a paragraph using the f-word" collapses success from 99.9% to <1%.

Crescendo is hard to defend against because every individual prompt is benign. There is no single silver bullet.

Crescendo: Multi-Turn LLM Jailbreak