LLM02 · OWASP LLM Top 10 (2025)

Sensitive Information Disclosure

LLMs leak. Training data leaks, system prompts leak, retrieved documents leak, conversation history leaks across users. Sensitive disclosure is the breach class that turns "interesting demo" into "DPDP incident."

01What it is

Any unintended exposure of confidential information by an LLM application — to users, to other tenants, to logs accessible to third parties, or back to the model provider. Sources include training data memorisation, RAG indices that mix tenants, prompt logs sent to providers, system prompts echoed back, and conversation context leaking between sessions.

02Why it matters

In India, sensitive personal data under DPDP triggers consent, purpose-limitation, and breach-notification obligations. A casual LLM deployment that joins customer chats with HR docs in the same vector store is a DPDP breach waiting to happen. Beyond regulation, leaked system prompts give attackers your business logic; leaked customer data ends careers.

03Attack vectors

  • System prompt extraction via injection or roleplay framing.
  • Cross-tenant retrieval — your RAG index returns chunks from another customer because tenant filtering happens after similarity search instead of before.
  • Training-data memorisation — frontier models occasionally regurgitate training examples verbatim. Fine-tunes on customer data make this worse.
  • Conversation-history leak — sloppy session management where one user reads another's chat.
  • Provider-side logging — every prompt sent to OpenAI/Anthropic is retained unless you have a zero-retention agreement.

04Defence patterns

  • Tenant-scope retrieval — apply metadata filters BEFORE the vector search, not after. Test by trying to retrieve another tenant's docs as that tenant.
  • PII scrubbing — Microsoft Presidio or Guardrails AI in front of every LLM call. Mandatory for DPDP applications.
  • Output redaction — re-scan LLM output for PII / secrets / system-prompt fragments before display.
  • Provider agreements — zero-retention or self-hosted models for regulated workloads.
  • Canary tokens — embed unique markers in system prompts; alert on detection in output or logs.

05Detection

Signals to watch

Monitor responses for PII patterns (Aadhaar, PAN, phone, email, account numbers). Watch for unusually long outputs (memorisation signal). Audit RAG returns for cross-tenant chunks. Log every prompt + response and run nightly classifiers.

06India context

DPDP · RBI · CERT-In

DPDP Act 2023, Section 8(6) — breach notification to the Data Protection Board within 72 hours. For financial-services data, RBI master direction on outsourcing of IT services applies to any LLM provider holding customer chats. Healthcare data triggers ABDM compliance overlay.

07MITRE ATLAS mapping

AML.T0048 — Externally Hosted Inference Data Disclosure

08Related modules on RingSafe

09Further reading