Open-Source Intelligence (OSINT) is the practice of collecting information from publicly available sources β no hacking, no paid access required. For CTI analysts, it fills 60-80% of the picture at zero marginal cost. This module covers the tools, techniques, tradecraft, and operational security a researcher needs to do OSINT safely and effectively.
What OSINT covers
- Surface web β search engines, social media, company websites, press releases
- Technical data β DNS records, WHOIS, SSL certs, passive DNS
- Code / infrastructure β GitHub, Docker Hub, PyPI, cloud metadata
- Dark web / paste sites β forums, marketplaces, leaked data dumps
- Archive / historical β Wayback Machine, archive.today, cached versions
- Adjacent data β business records, court filings, regulatory disclosures
Core tool categories
Search engine operators
site:linkedin.com "security engineer" "target company"
intitle:"index of" "backup" site:target.com
filetype:pdf "confidential" site:target.com
-site:target.com "target company" "password"
cache:target.com/admin # Google's cached version
Google dorks still work but are rate-limited. Rotate search engines (Bing, DuckDuckGo, Yandex, Baidu) β different indices surface different results.
Technical infrastructure
- Shodan β every internet-exposed service, indexed by banner/port/cert. Core tool
- Censys β similar to Shodan with stronger certificate/protocol search
- BinaryEdge β another competitor, strong leak data
- SecurityTrails β historical DNS, subdomain enumeration
- crt.sh β free certificate transparency search (subdomain discovery via issued certs)
- dnsdumpster.com / ViewDNS.info β DNS and reverse-DNS lookups
- Censys certs β search certificates by organization, SAN, fingerprint
Subdomain enumeration
# Free, passive
subfinder -d target.com
assetfinder target.com
amass enum -passive -d target.com
# Cert-based
curl -s "https://crt.sh/?q=%.target.com&output=json" | jq -r '.[].name_value' | sort -u
# Combine with active resolution
subfinder -d target.com | dnsx -resp -a -silent
Social / people
- LinkedIn β employee enumeration. Sales Navigator if budget; otherwise manual scraping
- Have I Been Pwned β email breach history
- Hunter.io / Snov.io β email pattern inference
- Maltego β graph-based OSINT pivot tool; commercial + community editions
- OSINT Framework (osintframework.com) β directory of tools organized by category
Code and leaks
- GitHub dorks:
"target.com" password,"target.com" api_key,org:target-incextension:env - GitLeaks / TruffleHog β scan git history for leaked secrets (run against the target’s public repos)
- pastebin.com + DuckDuckGo β
site:pastebin.com "target.com" - Grep.app β fast code search across millions of repos
Historical / archive
- Wayback Machine β historical snapshots of web pages. Find deleted content, old API endpoints
- archive.today β similar; sometimes captures what Wayback misses
- Google cache β short-term but sometimes surfaces same-day deletions
Breach data
- Dehashed, IntelligenceX, BreachForums (commercial or careful browsing) β leaked credentials by domain
- DeHashed API is the commercial version β for research, often integrated into CTI workflows
OpSec for researchers
Researchers sometimes touch adversary infrastructure, dark web forums, or closed communities. OpSec:
Continue reading with Basic tier (βΉ499/month)
You've read 30% of this module. Unlock the remaining deep-dive, quiz, and every other Intermediate module.