Network Telemetry — NetFlow, sFlow, IPFIX, and What a SOC Actually Watches

Manish Garg
Manish Garg Associate of (ISC)² · RingSafe
Apr 27, 2026
11 min read
Read as

Last updated: May 1, 2026

100% Free

No signup. No paywall. No catch. One of our 10 most-requested practitioner modules — published in full so anyone can learn for free. We earn through consulting, not by gating knowledge.

See all 10 free modules →

Network telemetry is the per-flow metadata your routers and switches export — who talked to whom, when, how much, on what ports. NetFlow (Cisco), sFlow (broadcom/multivendor), IPFIX (the IETF standard) are the three protocols you will meet. PCAP captures everything; telemetry captures structured summaries that scale to enterprise traffic volumes. This module covers the protocol differences, what a SOC team actually does with telemetry, the modern enrichment stack (Zeek, Suricata, Elastic), and the operator checklist for getting useful telemetry without melting the network.

A SOC analyst with full PCAP for everything that flowed across the network would be in heaven and immediately also bankrupt — at terabytes per day, full PCAP is unaffordable beyond perimeter sensors. Telemetry is the structured-summary alternative: every flow becomes a row of metadata (source, destination, ports, bytes, packets, flags, timestamps), exported continuously, easily indexed and queried. This module is the working introduction to NetFlow / sFlow / IPFIX and how SOCs use them.

Why telemetry matters — the volume and visibility tradeoff

Full PCAP at 10 Gbps = 4.5 TB/hour. Storing days or weeks is six-figure infrastructure. Telemetry — one row per flow rather than every packet — typically reduces storage by 100-1000x while keeping the questions a SOC actually asks: who talked to whom, when, how much, what protocol, was it accepted or rejected.

What you losepayload content, exact packet timing, application-layer evidence.

What you keepcomplete connection-level visibility for forensics, baseline traffic patterns, anomaly detection signals.

The operating principletelemetry everywhere; selective PCAP at perimeter and high-value sensors; Zeek/Suricata logs everywhere on critical segments. The combination — telemetry for breadth, PCAP for depth, NIDS logs for signal — is the modern SOC visibility stack.

NetFlow — the originator

NetFlow (Cisco, late 1990s) defines a flow as a unidirectional sequence of packets sharing 7-tuple (source IP, dest IP, source port, dest port, protocol, type-of-service, ingress interface). The router maintains an internal flow cache; when a flow ages out (idle 15 sec, active 30 min defaults), the router exports a record to a collector via UDP.

Versionsv5 (fixed format, IPv4 only, deprecated), v9 (template-based, IPv6 supported, extensible — current widely-deployed).

Samplingmany routers sample (1-in-N packets) to manage CPU; useful for traffic engineering, dangerous for security if you miss low-volume activity. Set the sampling rate explicitly per role (full unsampled at security boundaries; sampled in transit).

CaveatNetFlow on a router only sees traffic that traverses the router — east-west traffic within a switch is invisible. Pair with switch-side sFlow or virtual-switch telemetry.

sFlow — the multivendor sibling

sFlow (sFlow.org, mid-2000s) is a sampling protocol — every Nth packet header is exported (full packet header up to a configurable cutoff, plus interface counters) along with periodic interface metrics. Unlike NetFlow, sFlow is fundamentally sampling; it never has a full flow cache.

VendorsArista, HP/Aruba, Dell, Juniper, Cumulus all support it natively.

Use casestraffic engineering (where is the bandwidth going), DDoS detection (sudden spikes in single-source flows), basic security visibility.

Tradeoff vs NetFlowsFlow is cheaper for the device CPU and gives you raw packet headers (more flexible analytics) but is statistical (sampled) by nature. Many SOCs run NetFlow at security choke points and sFlow at the campus core.

IPFIX — the standard

IPFIX (RFC 7011, IETF) is the standardised successor to NetFlow v9 — same template-based design, same export protocol structure, but IETF-specified rather than Cisco-specific. Most modern devices export “NetFlow v9 / IPFIX” interchangeably.

Why care about IPFIX specificallyit adds rich extensibility (vendors define new fields per template), proper handling of variable-length fields, and is the format most modern collectors expect.

The 2026 realitywhen you say “NetFlow” in a procurement document and the vendor offers IPFIX, accept — they are operationally interchangeable. The exception: legacy collectors that only parse v5 / v9.

What a SOC actually does with telemetry

1Connection inventory: weekly/daily reports of every external destination by host. New external destinations are noteworthy (could be new business app, could be new C2).
2Beaconing detection: flow-pair regularity (every 60 seconds plus minor jitter) is malware C2 fingerprint. Telemetry flow timestamps make this trivial to query.
3Data-volume anomalies: a workstation suddenly transferring 50 GB to an unfamiliar IP at 3 AM = exfiltration.
4Lateral movement detection: east-west scans (one source touching many internal IPs on uncommon ports) light up clearly in flow data.
5Compliance: “show me every connection from the cardholder-data segment” answers PCI-DSS questions in seconds with telemetry; impossible without.
6Capacity: traffic engineering and capacity planning use the same flow data the SOC consumes. Tooling: open-source — nfdump, ntopng, Elastiflow; commercial — Splunk, Cisco Stealthwatch, Arista AVA, Plixer Scrutinizer.

The Zeek / Suricata layer — beyond raw flow

Zeek (formerly Bro) is a network-protocol-aware analysis engine that produces structured logs per protocol: conn.log (every connection), dns.log, http.log, ssl.log, files.log, x509.log, weird.log. Zeek is not signature-based; it produces evidence that humans and ML models reason over. Suricata is a signature-based IPS/IDS in the Snort lineage with modern features (Lua scripting, EVE JSON output, file extraction). Modern SOC stacks run both: Zeek for behavioural / forensic logs, Suricata for known-bad signature alerts. Both produce JSON streams ingested by SIEM (Elastic, Splunk, Sentinel).

What you gain over telemetry aloneprotocol-level fields (HTTP user-agent, TLS SNI, cert details, DNS query/response details) without storing PCAP.

Costa Zeek sensor at line-rate 10 Gbps requires 16-32 cores and tuning; budget for hardware and skill.

Operator checklist — getting useful telemetry without surprises

1Enable telemetry on every Layer 3 hop: edge routers, core routers, distribution switches that do inter-VLAN routing, cloud VPC flow logs.
2Set explicit sampling rates per role: 1:1 (unsampled) at security-critical points; 1:1000 acceptable for backbone traffic engineering. Document the rate so analysts can interpret volumes correctly.
3Standardise template version: configure all devices for IPFIX (v9 acceptable) so collectors do not need to handle a zoo.
4Deploy a tested collector: nfdump for raw + a SIEM/analytics layer on top (Elasticsearch, Splunk). Test the ingestion path with synthetic flows quarterly.
5Retention policy: 90 days online + 1-2 years offline is the typical floor for incident response; longer if regulated (RBI requires 2-7 years for some BFSI segments).
6Enrich: GeoIP (MaxMind), ASN (CAIDA / IRR), threat intel feeds. Raw flows are useful; enriched flows are powerful.
7Validate: every quarter, generate a known flow and confirm it appears in the SOC dashboard within 5 minutes.

Open source vs commercial — the realistic SOC stack

For a 5-10 person security team in India, the open source stack is genuinely production-grade: Zeek (protocol-aware analysis), Suricata (signature IDS), nfdump or GoFlow (flow ingestion), Elasticsearch + Kibana (or Opensearch) for search and visualisation, Grafana for dashboards, MISP for threat intel curation. Annual operating cost is mostly people. For 20+ person teams with mature processes and need for vendor support, commercial stacks (Splunk Enterprise Security, Microsoft Sentinel, Google Chronicle, Elastic Security commercial tier) reduce maintenance burden.

The transition patternmany Indian teams start with open source, develop expertise, and selectively add commercial layers (managed SIEM, premium threat intel) as the budget grows. Either path works.

The wrong pathdeploying expensive commercial tooling without the analyst capacity to use it — produces a dashboard with red dots that nobody looks at.

The metrics that matter — what to measure your SOC by

1Mean time to detect (MTTD): from initial attacker action to first alert that a human triages.
2Mean time to respond (MTTR): from alert to containment.
3Alert fidelity: percentage of alerts that lead to action vs are false positive. Should be >40% in a healthy SOC; <10% means alert fatigue.
4Coverage: percentage of MITRE ATT&CK techniques with at least one detection. Aim for 60%+ on relevant ATT&CK matrix.
5Hunt-to-find ratio: hunts that uncover new findings per 100 hunts run.
6Telemetry coverage: percentage of network segments with active telemetry. Aim for 100% on production. For management reporting: pair these technical metrics with business-impact metrics (incidents averted, cost of dwell time). The combination tells the story for the board.

Detection engineering — turning telemetry into alerts

Telemetry is data; detections are what convert data into action. Detection engineering is the discipline of writing, testing, and maintaining detection rules. Sigma is the open-source detection-rule format that translates to multiple SIEMs (Splunk, Elastic, Sentinel). The MITRE ATT&CK matrix gives you a coverage target — for each technique relevant to your environment, do you have a detection? The detection lifecycle:

1Hypothesis (“attackers exfiltrate via DNS tunnelling”).
2Telemetry available? (DNS query logs, payload length per query).
3Rule draft (alert on >100 char queries to a single SLD with rate >10/min from one host).
4Deploy in test mode; tune false positives.
5Promote to production with clear runbook.
6Metric: hits, FP rate, escalation outcomes.
7Periodic review — has the technique evolved? For Indian SOCs growing detection capability: start with the MITRE ATT&CK Tier 1 techniques most relevant to your industry (initial-access for BFSI = phishing, supply-chain; for healthcare = exposed databases). Build 50 quality detections first; expand as the team’s tuning capacity grows.

Sampling vs full capture — the math behind the choice

A 10 Gbps link at line rate produces ~1.5 million packets per second. Full unsampled NetFlow on this is millions of flow records per minute — manageable but expensive. 1:1000 sampling drops this 1000× — at the cost of missing low-volume flows entirely.

The matha flow that consists of 10 packets (small DNS lookup) has a 1-in-100 chance of any of its packets being sampled at 1:1000 rate, so most short flows go invisible. A flow with 10,000 packets (file transfer) is virtually guaranteed to be observed.

Implication for securitysampling is fine for traffic engineering and bulk-volume DDoS detection. It is dangerous for security visibility because most attacker activity (DNS lookups, beacon connections) produces small flows.

The right operational choicefull unsampled telemetry at security-critical sensors (perimeter, DC ingress, admin VLAN); sampling acceptable on backbone links for traffic-engineering-only purposes. Document the rate per role; ensure analysts know which datasets carry which fidelity.

Diagrams

NetFlow / IPFIX export pipeline
  Device (router/switch/sensor)
     │ flow cache
     │   ┌─────────────────────────────────────────────┐
     │   │ Flow records (key=7-tuple, value=counters)  │
     │   └──────────────┬──────────────────────────────┘
     │                  │ on idle/active timer
     │                  ▼
     │            UDP export to collector (port 2055/4739)
     ▼
  Collector (nfdump / ntopng / Elastic)
     │
     ▼
  Enrichment (GeoIP, ASN, threat intel)
     │
     ▼
  SIEM / dashboards / alerting
Telemetry vs PCAP — what each tells you
  Telemetry (1 row / flow):
     2026-04-29T14:32:11  10.0.0.5  ──▶  203.0.113.10  TCP 443
     bytes=187423  packets=412  flags=0x1B  duration=42s

  Protocol logs (Zeek http.log / dns.log):
     ts=...  host=ringsafe.in  uri=/api/login  status=200  ua=...

  PCAP (every packet):
     [00.000s] SYN  ...
     [00.011s] SYN+ACK  ...
     [00.012s] ACK  ...
     [00.014s] TLS ClientHello  ...

  Cost ratio (rough):
     Telemetry  : 1
     Zeek logs  : 5-10
     PCAP       : 100-1000

References & deeper reading

FAQ

Is telemetry enough or do I also need PCAP?

For most investigations, telemetry + Zeek logs answer the questions. PCAP becomes essential for malware reverse engineering, payload extraction, and reconstructing exact application-level interactions. The right model: telemetry-everywhere, PCAP-at-choke-points (perimeter sensors), with the ability to spin up targeted PCAP for an incident.

How much does Zeek scale?

A single Zeek worker handles roughly 1-2 Gbps with default scripts; a multi-worker deployment with PF_RING / AF_PACKET pinning scales to 40-100 Gbps on commodity hardware. Most enterprises start with 1-10 Gbps sensors at internet edge and DC ingress, scaling out as visibility expands.

What about cloud telemetry?

AWS VPC Flow Logs, Azure NSG flow logs, GCP VPC Flow Logs all export NetFlow-equivalent data. Coverage is per-network-interface; missing data is unusual but watch for samples lost during scaling events. AWS Traffic Mirroring exists for full PCAP at premium cost.

Should I use commercial SOC tools or open source?

For a 10-50 person security team, commercial (Splunk + Stealthwatch, Sentinel + Defender for Identity, Chronicle + Mandiant) usually wins on integration. For a 5-person team that wants to skill up, open source (Elastic + Zeek + Suricata + Wazuh) is a powerful learning environment but operationally heavier. Most Indian enterprises end up hybrid.

How long should we retain flow logs?

Industry baseline: 30-90 days online, 1-2 years cold storage for incident response. RBI/SEBI/IRDAI/CERT-In may push this longer for specific entity types. Cost-of-storage drops every year; aim for the longest retention your budget supports.

How do I justify the cost of full telemetry to leadership?

Frame it as breach-cost reduction. The IBM Cost of a Data Breach 2024 report showed average breach cost in India at ~₹19 crore; dwell time correlates strongly with cost. Telemetry-driven detection cuts dwell time from months to days. The annual telemetry stack cost (~₹50 lakh-2 crore for mid-market) is dwarfed by avoided breach cost expectation.

What is the biggest blindspot in most SOC telemetry?

East-west traffic inside data centres and across cloud workloads. Most enterprises have decent perimeter telemetry and weak internal visibility. Plan to deploy telemetry on inter-VLAN routers, virtual switches, and cloud VPC flow logs alongside the perimeter sensors.


⚖️ Legal: Use any techniques described here only on networks you own or have explicit written authorisation to test. In India, unauthorised access is punishable under IT Act §66 (up to 3 years + fine). Pair offensive testing with a signed Statement of Work / Rules of Engagement; pair forensic activity with §65B-aligned chain of custody.

Want this for your team?

Custom team training + practitioner advisory

Beyond the free academy — we run private workshops, vCISO advisory, and red-team exercises tailored to your stack. For Indian SMBs scaling past their first hire.

Book team training call Replies in 4 working hrs · India-only · Senior consultants