Shadow AI data leakage has moved from a hypothetical worry to a measurable, top-tier insider threat — and most organisations cannot see it happening. The pattern is mundane: an engineer pastes a stack trace into a public chatbot, a sales rep summarises a customer list, a developer asks an assistant to refactor a proprietary function. Each interaction quietly ships sensitive data to a third party that the security team never approved and cannot audit.
What the 2026 DBIR says about shadow AI data leakage
The clearest signal that this is no longer a fringe concern comes from Verizon’s 2026 Data Breach Investigations Report (DBIR), which, according to Verizon, names shadow AI a top insider threat. The report analysed 858,440 data-loss-prevention (DLP) events involving uploads to generative-AI tools. The category that ranked first by a large margin was source code, followed by images and structured data.
That ordering matters for how defenders should think about the problem. Source code leaking first means the highest-volume offender is not a careless marketing intern dumping a spreadsheet — it is technical staff pasting proprietary logic, secrets embedded in code, and internal architecture into tools that may retain that input. Images and structured data following close behind suggest screenshots of dashboards and exports of customer or financial tables are routinely flowing out too.
The scale of a single application is striking. ChatGPT alone reportedly generated more than 410 million DLP policy violations in 2025 — sensitive data that attempted to leave organisations through one AI product. Treated as a reported figure rather than an audited industry constant, it still reframes the risk: this is a continuous, high-frequency exfiltration channel, not an occasional lapse.
Why shadow AI risks are mostly invisible
The reason shadow AI risks stay hidden is a governance gap, not a technology gap. Salesforce’s 2026 Workforce AI Survey, according to Salesforce, found that 67% of employees use AI tools at work, yet only 18% of organisations have a formal AI security policy. In the same survey, 98% of organisations reported unsanctioned AI use, and 49% expected a shadow-AI incident within the next 12 months.
Put those numbers together and the picture is uncomfortable: roughly two-thirds of the workforce is already routing work through AI, almost every organisation knows it is happening informally, but fewer than one in five has written rules governing it. The tools arrived faster than the controls. Employees are not malicious — they are productive, and the path of least resistance is a public chatbot with no DLP, no logging, and no data-retention guarantees the security team has reviewed.
The browser-extension vector almost nobody is watching
Most discussion of enterprise AI data leakage assumes a deliberate act: someone copies text and pastes it into a prompt. There is a subtler and more dangerous vector. An AI browser extension can leak the contents of internal portals, ticketing systems, and SaaS dashboards simply because an employee is browsing them — no file is ever uploaded, no prompt is ever typed.
An extension granted broad page-access permissions can read the DOM of whatever tab is open. If that tab is an internal Jira board, a customer-support console, or a billing dashboard, the extension may transmit page contents to its backend for “context.” The employee believes they are using a convenience tool; in practice they have wired a read pipe from sensitive internal systems to an unvetted third party. This is the same class of risk that makes prompt injection (OWASP LLM01) so potent — untrusted content and trusted data sharing one execution context — and it deserves equal attention from anyone running an AI security programme.
What this means for Indian organisations and DPDP exposure
For organisations operating under Indian law, shadow AI is not only a trade-secret problem — it is a compliance one. Pasting personal data into public AI tools raises exposure under the Digital Personal Data Protection (DPDP) Act. A support agent who drops a customer’s name, phone number, and complaint history into a public chatbot to draft a reply has, in effect, handed personal data to a third party with no agreement, no purpose limitation, and no control over retention or onward use.
That single action can undercut the lawful-processing basis an organisation worked hard to establish, and it can happen many times a day across a typical support or sales floor. Teams mapping their obligations should fold AI usage directly into their data-flow inventory; guidance on DPDP compliance and on AI compliance for India across DPDP, RBI and the EU AI Act treats unsanctioned AI as a first-class data-processing channel rather than an afterthought.
Defences against shadow AI data leakage
The goal is not to ban AI — prohibition simply drives usage further underground, where there is even less visibility. The workable strategy is to give employees a safe, sanctioned path and then close the unsafe ones. A practical control set:
- A clear AI usage policy. Write down what data may and may not enter AI tools, which tools are approved, and what the consequences of misuse are. The Salesforce survey suggests the policy itself is the most common missing piece.
- A sanctioned enterprise AI option with DLP. Offer an approved tool with data-retention guarantees, no training on inputs, and DLP inspection. If the safe option is as fast as the public one, most users will choose it.
- Browser and endpoint controls. Inventory and restrict AI browser extensions, block uploads to unapproved domains, and apply DLP at the egress point — not just at the application layer.
- Data classification. Source code and customer records cannot be protected if systems do not know which data is sensitive. Classification is what lets DLP make a meaningful decision.
- Employee training. Most leakage is well-intentioned. Show staff concrete examples — the pasted stack trace, the customer summary — so the abstract policy becomes a recognisable behaviour to avoid.
These controls reinforce each other: classification feeds DLP, the sanctioned tool gives the policy somewhere to point, and training closes the human gap. Building the muscle internally is also worthwhile — the AI Security learning track walks teams through how these systems leak and how to test them. For organisations standing up a programme from scratch, the enterprise AI security checklist sequences the work.
Takeaway
Shadow AI data leakage is now an everyday exfiltration channel rather than an edge case — the 2026 DBIR’s 858,440-event dataset and ChatGPT’s reported 410 million violations both point to volume, not novelty. The organisations that fare best will be the ones that replace an unenforceable ban with a sanctioned tool, real DLP, browser-level controls, classification, and training, while treating personal data in public AI tools as a live DPDP exposure.
To find where AI usage is already leaking data in your environment — and to stress-test the controls before an incident does — book a scoping call with RingSafe.
Get a free attack-surface review
We check what an attacker would see about your business — leaked credentials, exposed services, dark-web mentions. 30 minutes, no obligation.