Common patterns

Roleplay — “Pretend you are DAN (Do Anything Now)”
Encoding — base64, ROT13, leetspeak
Multi-turn — gradually shift context away from policy
Character set tricks — Unicode confusables
Adversarial suffixes (GCG) — discovered tokens that flip safety
Crescendo — multi-turn gradient toward sensitive content

Defences

System prompt with policy reminders
Input classification (intent detection before processing)
Output classification (block harmful before return)
Refusal training
Constitutional AI patterns
External moderation models

No defence is complete. Layered + monitoring is the practical reality.

🧠

Check your understanding

Module Quiz · 5 questions

Pass with 80%+ to mark this module complete. Unlimited retries. Each question shows an explanation.

Want this for your team?

Custom team training + practitioner advisory

Beyond the free academy — we run private workshops, vCISO advisory, and red-team exercises tailored to your stack. For Indian SMBs scaling past their first hire.

Book team training call Replies in 4 working hrs · India-only · Senior consultants

Module 12 · LLM Jailbreak Defence

Common patterns

Defences

Module Quiz · 5 questions

Custom team training + practitioner advisory

Other modules in this track