Introduction
The Digital Personal Data Protection Act came into force in 2023 and matured through the 2024 rules, the 2025 Board notifications, and the early enforcement orders of 2026. For Data Fiduciaries — the term the Act uses for entities that determine the purpose and means of processing personal data — the strategic question is no longer “what does the law require” but “how do we operationalise these requirements at scale across hundreds of data flows, dozens of vendors, and thousands of employees?”
This guide is the engineering counterpart to the legal commentary that already saturates the Indian privacy market. We focus on the technical controls, the data architecture changes, the operational runbooks, and the team patterns that an Indian enterprise needs to actually deliver compliance — not just claim it on a slide.
Background: What the Act Actually Requires
At the level of operational impact, DPDP imposes seven core obligations on Data Fiduciaries:
- Lawful basis. Process personal data only for a specified, legitimate purpose, with consent or under one of the limited “legitimate use” exceptions.
- Consent management. Record consent, allow withdrawal, and stop processing on withdrawal — with the same level of effort it took to obtain.
- Notice. Inform Data Principals (the people whose data you process) about the purpose, the categories of data, the rights they have, and the mechanism to exercise those rights.
- Reasonable security safeguards (Section 8(5)). Protect personal data against breach.
- Breach notification. Notify the Data Protection Board and affected Data Principals within 72 hours of becoming aware.
- Data minimisation and erasure. Retain data only as long as necessary; erase on completion of the specified purpose.
- Data Principal rights fulfilment. Access, correction, completion, updating, erasure, grievance redressal.
Significant Data Fiduciaries (SDFs, designated by the Board based on volume and sensitivity) bear additional obligations: a Data Protection Officer, Data Protection Impact Assessments for high-risk processing, periodic audits, and explicit duties around algorithmic accountability.
Theory: The Data Inventory Problem
Every DPDP control depends on one foundational artefact: a current, accurate inventory of personal data processed by the organisation. Without this, you cannot answer the basic questions:
- What personal data do we have?
- What is the lawful basis for each category?
- Where is it stored, who processes it, who has access?
- What is the retention period?
- When a Data Principal exercises an erasure right, where does the data actually live?
The naive approach is a one-time spreadsheet exercise, typically led by Legal with input from IT. It is consistently wrong within six months because: (a) new data flows are added without updating the inventory; (b) the spreadsheet does not reconcile with actual data stores; (c) third-party processors are usually missed entirely.
The engineering approach is to treat the inventory as code: a continuously-updated artefact generated by combining (i) data discovery scans against actual storage systems, (ii) annotations in application code marking fields as personal data with a category and lawful basis, and (iii) data-flow tracing from API gateway access logs.
Technical Deep Dive: Data Discovery and Classification
Open-source and commercial tools that perform automated PII discovery against structured and unstructured stores:
- BigID, OneTrust DataDiscovery, Securiti.ai — commercial enterprise platforms with broad connector coverage.
- Microsoft Purview — if you are already in the M365 estate, this is the path of least resistance.
- Open-source: Trivy with the
config-dataplugin, Truffhog for secrets-in-repos, Presidio for unstructured-text PII detection.
A scan is necessary but not sufficient. Two patterns make discovery scalable:
In-code annotations. Application engineers annotate fields directly:
from dpdp_annotations import PersonalData, LegalBasis, Retention
class Customer:
name: str = PersonalData(category="identity", basis=LegalBasis.CONTRACT, retention=Retention.years(7))
pan: str = PersonalData(category="identity_token", basis=LegalBasis.LEGAL_OBLIGATION, retention=Retention.years(8))
email: str = PersonalData(category="contact", basis=LegalBasis.CONSENT, retention=Retention.years(3))
marketing_opt_in: bool = PersonalData(category="preference", basis=LegalBasis.CONSENT, retention=Retention.until_withdrawn())
A CI step extracts these annotations into a build-time manifest that feeds the central inventory.
Schema-as-source-of-truth. Database schemas carry PII tags in column comments; a nightly job introspects every database and reconciles against the application-declared inventory.
Theory: Consent Engineering
Consent is the lawful basis that gets the most regulatory attention, and the one most often mishandled by engineering teams. The Act requires that consent be: freely given, specific, informed, unconditional, unambiguous, and capable of being withdrawn.
Three engineering principles follow:
Consent is a record, not a flag. A boolean marketing_opt_in field is not enough. The record must contain the consent text shown to the user, the version of that text, the timestamp, the channel, and an immutable audit trail of changes.
Withdrawal must propagate. When a user withdraws consent, every downstream system that received the data on that consent basis must stop processing. This is hard. It typically requires either a synchronous fanout (slow, error-prone) or an event-driven architecture where consent changes are published to a consent topic that downstream systems subscribe to.
Granular not blanket. A single “I agree to the terms” checkbox is not specific consent. Each processing purpose — marketing emails, analytics, sharing with partners — needs its own consent capture and withdrawal mechanism.
Technical Deep Dive: Right-to-Erasure Implementation
The Data Principal’s right to erasure is the single most operationally complex DPDP requirement. Naive deletion (“DELETE FROM users WHERE id=X”) fails almost immediately because personal data does not live in one place. It lives in:
- Operational databases (primary, read replicas, regional replicas)
- Data warehouse (Snowflake / BigQuery / Redshift)
- Data lake (S3 / GCS / ADLS) in Parquet or Delta format
- Backup snapshots (retained for disaster recovery)
- Logs (application, audit, access)
- Caches (Redis, Memcached)
- CDN logs
- Third-party processors (CRM, email service provider, analytics, customer support tools)
- Vendor support tickets (Salesforce, Zendesk)
A defensible erasure implementation has three layers:
Layer 1: Hard delete from operational stores. Synchronous; user-visible. Triggered by the erasure request UI.
Layer 2: Tombstone propagation. An erasure event is published to a topic. Downstream consumers (warehouse, data lake, CDP, third-party processors via API) subscribe and apply their own deletion logic. Each consumer reports completion.
Layer 3: Long-tail handling. Backups retain personal data by design. The right approach is to document the retention period of backups, ensure they are not used for any purpose other than restoration, and apply erasure during the restoration path. Most DPAs accept this; the Act’s Section 8(7) acknowledges it implicitly through the “as far as may be possible” qualifier on erasure.
Practical Implementation: The 72-Hour Breach Notification Runbook
Section 8(6) requires breach notification “in such form and manner as may be prescribed.” The Rules clarified the 72-hour window. Three things must be in place before you need them:
Detection. You cannot notify within 72 hours if your detection lag is longer than 72 hours. SIEM coverage of personal data stores is the prerequisite control.
Escalation authority. A documented escalation path that gets a breach in front of the named officer (DPO for SDFs; whoever has been designated otherwise) within hours, not days. RACI matrix lives in the IR runbook.
Notification template. Pre-drafted notification to the Board and to affected Data Principals. The Rules specify the content; have the boilerplate ready so the team is only filling in incident-specific fields under time pressure.
A typical runbook structure:
T+0 Incident detection (SIEM alert, employee report, vendor disclosure)
T+1h Triage: is this a personal-data breach under DPDP definition?
T+4h Initial scoping: how many Data Principals, what categories, vendor/process involved
T+12h Decision: notification required? If yes, route to DPO for authorisation
T+24h Draft Board notification + affected-DP notification ready for sign-off
T+48h Notification sent to Board (DigiLocker portal or designated channel)
T+72h Notification to affected Data Principals via documented channel
T+7d Post-incident review, root cause, corrective actions
T+30d Forensic report submission if requested by Board
Enterprise Use Cases
Fintech and lending NBFCs. Public-bucket misconfiguration exposing KYC documents is the highest-frequency Section 8(5) failure pattern we see at audit time. NBFCs should treat cloud security posture management (CSPM) scanning and tokenisation of KYC documents as baseline; cloud misconfiguration is the kind of evidence the Board can use to demonstrate failure of “reasonable security safeguards.”
Healthcare and ABDM-linked providers. Patient data is sensitive personal data under the prevailing interpretation, attracting heightened scrutiny. Federated health IDs propagate breach exposure; the right architecture is per-provider tokenisation with re-identification only at the provider boundary.
EdTech. Children under 18 require verifiable parental consent — this is the most operationally difficult class of DPDP requirement. Most current implementations use simple age-gate checkboxes that will not survive enforcement scrutiny.
BPOs and IT services exporters. Processing personal data of EU, UK, or US Data Principals on Indian infrastructure invokes both DPDP and the source-jurisdiction law. Most BPOs have a GDPR programme; folding DPDP into it usually adds the consent management, breach notification, and Data Principal rights workflows.
Common Pitfalls
- Treating DPDP as a Legal project. Legal can interpret the Act, but only engineering can implement the controls. Without engineering ownership, the inventory rots.
- Consent banners that violate “freely given.” Cookie banners with pre-ticked boxes or “reject” buttons buried three clicks deep do not constitute valid consent.
- Forgotten data flows. Analytics, marketing automation, customer support, fraud-detection vendors typically receive personal data, are typically not in the inventory, and typically have weaker security than the Data Fiduciary’s primary stack.
- “We will localise data” without understanding the model. DPDP does not require general data localisation; sectoral rules (RBI on payments, IRDAI on insurance) do. Conflating them produces over-architected, expensive solutions.
- No mechanism for “withdraw consent with the same ease as giving it.” Many platforms make consent capture two clicks and withdrawal a customer support ticket. This is non-compliant.
Action Items for the Next 90 Days
- Commission a data inventory exercise — either automated discovery or, if budget-constrained, a structured workshop with each engineering team.
- Map every personal data flow to a lawful basis.
- Inventory third-party processors; verify Data Processing Agreements exist and are current.
- Tabletop your 72-hour breach notification runbook with a realistic scenario.
- Build an erasure workflow that propagates to at least your top 5 downstream systems.
- Designate a DPO if you are a Significant Data Fiduciary or expect to be designated.
DPDP compliance, treated as an engineering programme rather than a paperwork exercise, takes 12-18 months to mature at a typical Indian enterprise. Start now; the enforcement curve is steepening.
Get a DPDP gap assessment
Free 30-minute call. We map your data flows against DPDP §8 obligations and tell you exactly which gaps to fix first. Auditor-defensible output.