Adversarial Examples — FGSM, PGD, Transfer Attacks (Image and Text)

Manish Garg
Manish Garg Associate of (ISC)² · RingSafe
Apr 29, 2026
9 min read
Read as
A 0.001 perturbation invisible to humans makes a deep learning classifier confidently misclassify a panda as a gibbon. This 2014 demonstration started the adversarial ML field. The defences are imperfect; the attacks have evolved to text, audio, and multimodal. This module covers FGSM, PGD, transfer attacks, and the realistic threat model for production systems.

Adversarial examples are inputs crafted to fool classifiers while looking benign to humans. Originally an image classifier issue, the techniques generalise to text, audio, and any differentiable model. For production security teams, the question is rarely “is the model adversarially robust” (it is not) but “can attackers exploit this in your context.”

The math in one paragraph

A neural network classifier outputs probability per class. Find input perturbation δ (small in some norm — typically L∞ ≤ 0.03) that maximises the loss of the correct class. Computed via gradient ascent on the loss with respect to input. FGSM (Fast Gradient Sign Method): one step of size ε in the sign direction of the gradient. Cheap, somewhat effective. PGD (Projected Gradient Descent): many small FGSM-like steps with projection back onto the L∞ ball. More effective. Carlini-Wagner: optimisation-based with L2 minimisation. Slow but produces minimal-perturbation adversarial examples. All these require white-box access (gradients) — but transfer attacks work without it.

Need help with this?

Book a free 30-minute scoping call

Our senior consultants will review your stack and tell you honestly what to fix first. No slide deck. No obligation. Indian businesses only.

Book scoping call Replies in 4 working hrs · India-only · Senior consultants