Last updated: May 1, 2026
EASY
⏱ 90 min
Module 2 of 8
What you’ll learn
- Passive DNS reconnaissance — finding assets without touching the target
- Subdomain enumeration with multiple sources (crt.sh, Subfinder, Amass, VirusTotal)
- Technology fingerprinting — how to read what a server is running
- Directory and file brute-forcing with ffuf, gobuster, and smart wordlists
- JavaScript bundle analysis — extracting hidden endpoints from client code
Prerequisites: Module 1 (HTTP & Web Fundamentals). Familiarity with a Linux terminal.
Before you test anything, you need to know what’s there. In real engagements, “what’s there” is almost always more than the documentation claims. Staging subdomains that accidentally made it to production DNS. Old API versions that were “deprecated” three years ago but never actually turned off. Admin panels that were hidden behind “security through obscurity” — no link to them anywhere, but the URL still responds if you know it. Reconnaissance is the phase where you find all of this.
This module covers the reconnaissance techniques that produce the most value per hour for web application testers. We’re going to skip the reconnaissance frameworks that try to do everything and focus on the commands that actually matter.
Passive vs active reconnaissance
Passive reconnaissance: gathering information without sending traffic to the target. Public DNS records, search-engine results, certificate transparency logs, archived web pages. The target doesn’t see it.
Active reconnaissance: directly probing the target. Port scans, directory brute-forcing, HTTP requests. The target’s logs will show activity.
For web app testing with client permission, active reconnaissance is fine — you’re scoped. For pre-engagement OSINT or bug bounty work on shared infrastructure, start passive and only go active within explicit scope.
Subdomain enumeration
Finding subdomains is the single highest-yield reconnaissance activity. Every subdomain is potentially a separate application with its own vulnerabilities. Acquisitions, internal tools, staging environments, forgotten projects — all leak through subdomain enumeration.
1. Certificate transparency logs (crt.sh)
Every TLS certificate issued by a public CA since 2018 is logged in CT logs. Certificates for *.example.com appear in these logs along with every subdomain that has had a cert issued for it. This is the highest-signal passive source.
# Via web
https://crt.sh/?q=%.example.com
# Via API, extract unique domain names
curl -s "https://crt.sh/?q=%.example.com&output=json" |
jq -r '.[].name_value' | sort -u
2. Subfinder — aggregating multiple sources
Subfinder queries dozens of passive sources (crt.sh, VirusTotal, Shodan, SecurityTrails, etc.) and merges results. With API keys for premium sources, the results are dramatically better.
subfinder -d example.com -all -silent -o subs.txt
# -all uses all sources including slower ones
# Typical output: hundreds to thousands of subdomains for a medium target
3. Amass — deeper, slower, more thorough
Amass does passive sources plus active DNS resolution plus certificate analysis plus reverse DNS. It takes much longer than Subfinder but finds things Subfinder misses.
amass enum -d example.com -o amass.txt
# Slow (hours for large targets); most thorough free option
4. Validate discovered subdomains
Raw lists contain dead subdomains (old records pointing to nothing). Validate which are actually live with httpx:
cat subs.txt | httpx -silent -status-code -title -tech-detect -o live.txt
# Outputs: URL | status code | page title | detected technology
5. Subdomain brute-forcing (active)
When passive sources run dry, brute-force with a wordlist:
gobuster dns -d example.com -w /path/to/subdomain-wordlist.txt -t 50
# Good wordlists: SecLists' dns-Jhaddix.txt, or bitquark's top-1million
Technology fingerprinting
Identifying the server stack tells you what classes of vulnerability to test first. A PHP 7.4 app on Apache has different attack patterns than a Node.js app on Cloudflare Workers.
Sources of fingerprints:
- HTTP response headers —
Server,X-Powered-By,X-AspNet-Version. These should be stripped in production but often aren’t. - Cookie names —
PHPSESSID(PHP),JSESSIONID(Java),csrftoken(Django),laravel_session(Laravel), etc. - Error pages — trigger a 404 or 500 and observe the stack trace or error template. Framework signatures are distinctive.
- Static asset paths —
/wp-content/screams WordPress,/_next/is Next.js,/__webpack_hmrexposes Webpack dev server. - Favicon hash —
favicon.icohashes map to specific applications via Shodan’s favicon search.
Tools:
# Wappalyzer CLI
wappalyzer https://target.example.com
# Httpx with tech detection
echo target.example.com | httpx -tech-detect
# Manual header inspection
curl -I https://target.example.com
# Response body analysis with Nuclei technology templates
nuclei -u https://target.example.com -t technologies/
Directory and file enumeration
Web applications have more URLs than they advertise. Admin consoles, API endpoints, backup files, .git directories, configuration files — all may be present and reachable.
ffuf — the modern standard
ffuf -u https://target.example.com/FUZZ
-w /usr/share/seclists/Discovery/Web-Content/raft-medium-directories.txt
-mc 200,204,301,302,307,401,403
-fs 0
-o results.json -of json
# Breakdown:
# FUZZ is the placeholder where wordlist entries go
# -mc matches specified status codes (interesting responses)
# -fs 0 filters responses of exact size 0 (empty responses)
Recursion — go deeper after discovery
When you find /admin/, enumerate inside it. ffuf supports recursion with -recursion.
Extension bruting
# Append extensions to every wordlist entry
ffuf -u https://target.example.com/FUZZ
-w wordlist.txt
-e .php,.bak,.old,.backup,.zip,.sql,.env,.log
-mc 200,204
Wordlist selection
Good wordlists: SecLists (/Discovery/Web-Content/ directory). Specific picks:
raft-medium-directories.txt— general starting pointcommon.txt— quick first passapi/directory wordlists — for API endpoints- Technology-specific wordlists (
apache.txt,iis.txt) once you know the stack
JavaScript bundle analysis
Modern web apps ship significant logic to the client in JavaScript bundles. Those bundles often contain:
- API endpoints the app calls — revealing the full backend API surface
- Role-based feature flags — revealing admin-only features
- Hardcoded URLs to internal systems
- Leaked API keys, access tokens, or credentials
- Comments with TODO notes referencing bugs or test accounts
Pull and analyze bundles
# Find JavaScript files
curl -s https://target.example.com/ | grep -oE 'src="[^"]*.js[^"]*"' | sort -u
# Or use linkfinder / JSFinder for automated endpoint extraction
python3 linkfinder.py -i https://target.example.com -o cli
# For minified bundles, unminify before reading
js-beautify app.bundle.min.js > app.bundle.js
What to grep for in bundles
grep -oE '"/api/[^"]+"' app.bundle.js # API endpoints
grep -iE 'token|secret|key|apikey|password' app.bundle.js
grep -oE 'https?://[^"]+' app.bundle.js # internal URLs
grep -E 'TODO|FIXME|HACK' app.bundle.js # developer notes
Wayback and archive reconnaissance
archive.org has snapshots of many public sites going back years. Old versions reveal endpoints that no longer exist but may still respond, old JavaScript with different logic, forgotten admin panels.
waybackurls target.example.com | sort -u > wayback.txt
gau target.example.com >> wayback.txt # getallurls: wayback + OTX + VT
# Filter for interesting paths
grep -E '/admin|/api|/internal|.env|.bak|.log' wayback.txt
Cloud asset discovery
Modern apps have cloud footprints beyond the primary domain:
- S3 buckets named after the company —
{company}-backups,{company}-static, etc. - Azure Blob storage —
{company}.blob.core.windows.net - GCP Cloud Storage —
storage.googleapis.com/{company}-* - CloudFront distributions, Elastic Load Balancers — often have predictable patterns
Tools: cloud_enum, s3scanner, Google dorks like site:amazonaws.com "companyname".
Exercises
1. Enumerate a target. Pick any domain you have authorization to test (or use example.com for practice). Run Subfinder, then validate with httpx. How many live subdomains did you find? What percentage had SSL certificates visible in crt.sh that Subfinder missed? Compare approaches.
2. Fingerprint live targets. For 5 discovered subdomains, identify the technology stack using headers, cookies, error pages. Write a one-line summary of each.
3. Find a JavaScript endpoint. Pull a JavaScript bundle from any public web application you use. Grep for /api/ references. How many API endpoints can you extract from the bundle? How many are documented anywhere public?
Check your understanding
- Which passive source is most likely to reveal subdomains that brute-forcing will miss?
- Why is certificate transparency such a strong reconnaissance source?
- What’s the tradeoff between
gobuster dnsbrute-forcing and passive enumeration? - Where in an HTTP response would you look for technology fingerprints?
- Why are JavaScript bundles a high-yield reconnaissance target?
Key takeaways
- Passive sources (CT logs, aggregators) find subdomains brute-forcing never will.
- Technology fingerprinting dictates which vulnerability classes to test first.
- Directory enumeration finds undocumented endpoints — admin panels, backup files, APIs.
- JavaScript bundles leak API routes and occasionally secrets; always pull and grep.
- Wayback archives preserve endpoints that no longer exist but may still be reachable.
Take the 20-question quiz below to confirm your understanding. Pass with 70%+ to mark this module complete. Unlimited retries.
Module Quiz · 20 questions
Pass with 80%+ to mark this module complete. Unlimited retries. Each question shows an explanation.
Real-World Case Study: Capital One, 2019
The story. A former AWS engineer exfiltrated 106 million US and Canadian Capital One customer records — names, addresses, credit scores, 140,000 SSNs, 80,000 bank account numbers. The breach didn’t start with an exploit. It started with recon.
The technical chain.
- External recon — the attacker enumerated Capital One’s external surface and found a misconfigured WAF.
- SSRF discovery — the WAF endpoint accepted user-supplied URLs and fetched them server-side.
- EC2 metadata abuse — the attacker fetched
http://169.254.169.254/latest/meta-data/iam/security-credentials/and obtained the IAM roleWAF-Role‘s temporary credentials. - S3 enumeration — those credentials had over-broad
s3:ListBucket+s3:GetObjectpermissions across hundreds of buckets. - Two days of
aws s3 synclater, ~30 GB of customer data exfiltrated.
What enumeration revealed. The attacker didn’t break in — they read the documentation. AWS metadata endpoints are public knowledge. nmap -sV against Capital One’s external IPs would have shown the WAF banner. gobuster on common WAF management paths would have surfaced the SSRF-able endpoint. The exploit was 4 lines of curl.
The takeaway. Recon is not “low-impact”. It’s the entire kill chain. Run external recon against your own perimeter before attackers do — and treat IMDSv1 as already compromised. Force IMDSv2 on every EC2 instance you own.
Practical depth on what to tune, what to harden, and how this maps to Indian regulatory expectations.
Optimising recon — speed without losing accuracy
ffuf with -t 100 threads is usually fine on internet targets; tune down to 20-40 against rate-limited endpoints.raft-medium-directories.txt (~30k entries); only escalate to seclists-big.txt when the medium pass is dry.-fc 404,400,403 reduces noise; analyse interesting 200/301/302/401/500 responses.cewl generates target-specific wordlists from the site’s own content.httpx resolves and dedupes.Subdomain enumeration — passive plus active, in that order
Passive sources (free, fast, no traffic to target): subfinder aggregates Censys, Shodan, VirusTotal, crt.sh; amass does the same plus more sources; sublist3r for legacy. crt.sh queries — https://crt.sh/?q=%25.target.com&output=json reveals every certificate ever issued for the domain, including staging/dev/test that should not be public. Active sources (slower, leaves traces): amass enum -active, brute-force with massdns + a ~1M-entry wordlist, shuffledns for filtered results. Always pipe through httpx -title -tech-detect to identify which subdomains actually serve content.
For Indian targets specificallyASN-based discovery (every Indian operator has a published ASN range) catches infrastructure that DNS-only enumeration misses.
Operational checklist — recon that produces actionable findings
recon-ng workspaces.Detection — when blue teams catch your recon
Modern WAFs (Cloudflare, Akamai, Imperva) detect recon by request-rate per IP, JA3/JA4 fingerprint of common scanner clients, User-Agent strings, and request-pattern (sequential alphabetical paths).
Evasion patternsrotate through residential proxies (Bright Data, Smart Proxy), randomise User-Agent, slow down to “human-like” rates, mask JA3 (Burp’s upstream proxy + uTLS-based clients).
Defender takeawayrecon detection is one of the highest-ROI WAF rule sets; alerts on “host probed >50 distinct paths in 5 minutes from one source” catch most automated tooling.
For Indian SOCsintegrate WAF logs into the SIEM and create a “recon detected” rule with auto-block-then-review. The 2022 CERT-In directive expects 180-day log retention specifically because retroactive recon analysis often reveals the precursor to a later breach.
TARGET
│
▼
[Passive sources]
crt.sh • Censys • Shodan • VirusTotal • GitHub
│
▼
[Aggregation] subfinder, amass passive
│
▼
[Resolution] massdns / shuffledns
│
▼
[Live check] httpx --title --tech-detect
│
▼
[Active probe] ffuf, gobuster, nuclei (with caution)
│
▼
[Manual triage] interesting endpoints, admin panels, staging
Further reading
- ProjectDiscovery (subfinder, httpx, nuclei)
- OWASP Amass
- crt.sh — Certificate Transparency search
- SecLists (wordlists)
Additional FAQs
Is automated recon legal in India?
Authorised testing under signed Statement of Work / Rules of Engagement is fully legal. Unauthorised scanning even of public targets risks IT Act §43/§66 charges. Always have written authorisation; never assume “the target is public” implies consent.
How do I avoid getting blocked during recon?
Rotate IPs (residential proxies), spread requests over time, vary User-Agent, mimic browser TLS fingerprints, and respect robots.txt as a soft signal. The fastest way to be blocked is sequential 1-thread brute-force on a single IP.
When should I escalate from recon to active testing?
Once you have a complete inventory of in-scope hosts, services, and obvious tech stack. Active testing without complete recon misses the easy wins (forgotten staging, debug endpoints, leaked credentials).
Custom team training + practitioner advisory
Beyond the free academy — we run private workshops, vCISO advisory, and red-team exercises tailored to your stack. For Indian SMBs scaling past their first hire.