Academy

Module 2 Β· OSINT Collection for CTI πŸ”’

Manish Garg
Manish Garg Associate CISSP Β· RingSafe
April 22, 2026
4 min read

Open-Source Intelligence (OSINT) is the practice of collecting information from publicly available sources β€” no hacking, no paid access required. For CTI analysts, it fills 60-80% of the picture at zero marginal cost. This module covers the tools, techniques, tradecraft, and operational security a researcher needs to do OSINT safely and effectively.

What OSINT covers

  • Surface web β€” search engines, social media, company websites, press releases
  • Technical data β€” DNS records, WHOIS, SSL certs, passive DNS
  • Code / infrastructure β€” GitHub, Docker Hub, PyPI, cloud metadata
  • Dark web / paste sites β€” forums, marketplaces, leaked data dumps
  • Archive / historical β€” Wayback Machine, archive.today, cached versions
  • Adjacent data β€” business records, court filings, regulatory disclosures

Core tool categories

Search engine operators

site:linkedin.com "security engineer" "target company"
intitle:"index of" "backup" site:target.com
filetype:pdf "confidential" site:target.com
-site:target.com "target company" "password"
cache:target.com/admin            # Google's cached version

Google dorks still work but are rate-limited. Rotate search engines (Bing, DuckDuckGo, Yandex, Baidu) β€” different indices surface different results.

Technical infrastructure

  • Shodan β€” every internet-exposed service, indexed by banner/port/cert. Core tool
  • Censys β€” similar to Shodan with stronger certificate/protocol search
  • BinaryEdge β€” another competitor, strong leak data
  • SecurityTrails β€” historical DNS, subdomain enumeration
  • crt.sh β€” free certificate transparency search (subdomain discovery via issued certs)
  • dnsdumpster.com / ViewDNS.info β€” DNS and reverse-DNS lookups
  • Censys certs β€” search certificates by organization, SAN, fingerprint

Subdomain enumeration

# Free, passive
subfinder -d target.com
assetfinder target.com
amass enum -passive -d target.com

# Cert-based
curl -s "https://crt.sh/?q=%.target.com&output=json" | jq -r '.[].name_value' | sort -u

# Combine with active resolution
subfinder -d target.com | dnsx -resp -a -silent

Social / people

  • LinkedIn β€” employee enumeration. Sales Navigator if budget; otherwise manual scraping
  • Have I Been Pwned β€” email breach history
  • Hunter.io / Snov.io β€” email pattern inference
  • Maltego β€” graph-based OSINT pivot tool; commercial + community editions
  • OSINT Framework (osintframework.com) β€” directory of tools organized by category

Code and leaks

  • GitHub dorks: "target.com" password, "target.com" api_key, org:target-inc extension:env
  • GitLeaks / TruffleHog β€” scan git history for leaked secrets (run against the target’s public repos)
  • pastebin.com + DuckDuckGo β€” site:pastebin.com "target.com"
  • Grep.app β€” fast code search across millions of repos

Historical / archive

  • Wayback Machine β€” historical snapshots of web pages. Find deleted content, old API endpoints
  • archive.today β€” similar; sometimes captures what Wayback misses
  • Google cache β€” short-term but sometimes surfaces same-day deletions

Breach data

  • Dehashed, IntelligenceX, BreachForums (commercial or careful browsing) β€” leaked credentials by domain
  • DeHashed API is the commercial version β€” for research, often integrated into CTI workflows

OpSec for researchers

Researchers sometimes touch adversary infrastructure, dark web forums, or closed communities. OpSec:

πŸ” Intermediate Module Β· Basic Tier

Continue reading with Basic tier (β‚Ή499/month)

You've read 30% of this module. Unlock the remaining deep-dive, quiz, and every other Intermediate module.

99+ modulesAll levels up to this tier
20-question quizzesUnlimited retries with explanations
Completion certificatesShareable on LinkedIn
4 more sections locked below