Adversarial examples are inputs crafted to fool classifiers while looking benign to humans. Originally an image classifier issue, the techniques generalise to text, audio, and any differentiable model. For production security teams, the question is rarely “is the model adversarially robust” (it is not) but “can attackers exploit this in your context.”
The math in one paragraph
A neural network classifier outputs probability per class. Find input perturbation δ (small in some norm — typically L∞ ≤ 0.03) that maximises the loss of the correct class. Computed via gradient ascent on the loss with respect to input. FGSM (Fast Gradient Sign Method): one step of size ε in the sign direction of the gradient. Cheap, somewhat effective. PGD (Projected Gradient Descent): many small FGSM-like steps with projection back onto the L∞ ball. More effective. Carlini-Wagner: optimisation-based with L2 minimisation. Slow but produces minimal-perturbation adversarial examples. All these require white-box access (gradients) — but transfer attacks work without it.
Book a free 30-minute scoping call
Our senior consultants will review your stack and tell you honestly what to fix first. No slide deck. No obligation. Indian businesses only.