Read as

Last updated: May 1, 2026

Subdomain enumeration, technology fingerprinting, directory brute-forcing, JavaScript bundle analysis, and Wayback reconnaissance.

🎯 WEB APP PENTEST PATH
EASY
⏱ 90 min
Module 2 of 8

What you’ll learn

Passive DNS reconnaissance — finding assets without touching the target
Subdomain enumeration with multiple sources (crt.sh, Subfinder, Amass, VirusTotal)
Technology fingerprinting — how to read what a server is running
Directory and file brute-forcing with ffuf, gobuster, and smart wordlists
JavaScript bundle analysis — extracting hidden endpoints from client code

Prerequisites: Module 1 (HTTP & Web Fundamentals). Familiarity with a Linux terminal.

Before you test anything, you need to know what’s there. In real engagements, “what’s there” is almost always more than the documentation claims. Staging subdomains that accidentally made it to production DNS. Old API versions that were “deprecated” three years ago but never actually turned off. Admin panels that were hidden behind “security through obscurity” — no link to them anywhere, but the URL still responds if you know it. Reconnaissance is the phase where you find all of this.

This module covers the reconnaissance techniques that produce the most value per hour for web application testers. We’re going to skip the reconnaissance frameworks that try to do everything and focus on the commands that actually matter.

Passive vs active reconnaissance

Passive reconnaissance: gathering information without sending traffic to the target. Public DNS records, search-engine results, certificate transparency logs, archived web pages. The target doesn’t see it.

Active reconnaissance: directly probing the target. Port scans, directory brute-forcing, HTTP requests. The target’s logs will show activity.

For web app testing with client permission, active reconnaissance is fine — you’re scoped. For pre-engagement OSINT or bug bounty work on shared infrastructure, start passive and only go active within explicit scope.

Subdomain enumeration

Finding subdomains is the single highest-yield reconnaissance activity. Every subdomain is potentially a separate application with its own vulnerabilities. Acquisitions, internal tools, staging environments, forgotten projects — all leak through subdomain enumeration.

1. Certificate transparency logs (crt.sh)

Every TLS certificate issued by a public CA since 2018 is logged in CT logs. Certificates for *.example.com appear in these logs along with every subdomain that has had a cert issued for it. This is the highest-signal passive source.

# Via web
https://crt.sh/?q=%.example.com

# Via API, extract unique domain names
curl -s "https://crt.sh/?q=%.example.com&output=json" | 
  jq -r '.[].name_value' | sort -u

2. Subfinder — aggregating multiple sources

Subfinder queries dozens of passive sources (crt.sh, VirusTotal, Shodan, SecurityTrails, etc.) and merges results. With API keys for premium sources, the results are dramatically better.

subfinder -d example.com -all -silent -o subs.txt
# -all uses all sources including slower ones
# Typical output: hundreds to thousands of subdomains for a medium target

3. Amass — deeper, slower, more thorough

Amass does passive sources plus active DNS resolution plus certificate analysis plus reverse DNS. It takes much longer than Subfinder but finds things Subfinder misses.

amass enum -d example.com -o amass.txt
# Slow (hours for large targets); most thorough free option

4. Validate discovered subdomains

Raw lists contain dead subdomains (old records pointing to nothing). Validate which are actually live with httpx:

cat subs.txt | httpx -silent -status-code -title -tech-detect -o live.txt
# Outputs: URL | status code | page title | detected technology

5. Subdomain brute-forcing (active)

When passive sources run dry, brute-force with a wordlist:

gobuster dns -d example.com -w /path/to/subdomain-wordlist.txt -t 50
# Good wordlists: SecLists' dns-Jhaddix.txt, or bitquark's top-1million

Technology fingerprinting

Identifying the server stack tells you what classes of vulnerability to test first. A PHP 7.4 app on Apache has different attack patterns than a Node.js app on Cloudflare Workers.

Sources of fingerprints:

HTTP response headers — Server, X-Powered-By, X-AspNet-Version. These should be stripped in production but often aren’t.
Cookie names — PHPSESSID (PHP), JSESSIONID (Java), csrftoken (Django), laravel_session (Laravel), etc.
Error pages — trigger a 404 or 500 and observe the stack trace or error template. Framework signatures are distinctive.
Static asset paths — /wp-content/ screams WordPress, /_next/ is Next.js, /__webpack_hmr exposes Webpack dev server.
Favicon hash — favicon.ico hashes map to specific applications via Shodan’s favicon search.

Tools:

# Wappalyzer CLI
wappalyzer https://target.example.com

# Httpx with tech detection
echo target.example.com | httpx -tech-detect

# Manual header inspection
curl -I https://target.example.com

# Response body analysis with Nuclei technology templates
nuclei -u https://target.example.com -t technologies/

Directory and file enumeration

Web applications have more URLs than they advertise. Admin consoles, API endpoints, backup files, .git directories, configuration files — all may be present and reachable.

ffuf — the modern standard

ffuf -u https://target.example.com/FUZZ 
  -w /usr/share/seclists/Discovery/Web-Content/raft-medium-directories.txt 
  -mc 200,204,301,302,307,401,403 
  -fs 0 
  -o results.json -of json

# Breakdown:
# FUZZ is the placeholder where wordlist entries go
# -mc matches specified status codes (interesting responses)
# -fs 0 filters responses of exact size 0 (empty responses)

Recursion — go deeper after discovery

When you find /admin/, enumerate inside it. ffuf supports recursion with -recursion.

Extension bruting

# Append extensions to every wordlist entry
ffuf -u https://target.example.com/FUZZ 
  -w wordlist.txt 
  -e .php,.bak,.old,.backup,.zip,.sql,.env,.log 
  -mc 200,204

Wordlist selection

Good wordlists: SecLists (/Discovery/Web-Content/ directory). Specific picks:

raft-medium-directories.txt — general starting point
common.txt — quick first pass
api/ directory wordlists — for API endpoints
Technology-specific wordlists (apache.txt, iis.txt) once you know the stack

JavaScript bundle analysis

Modern web apps ship significant logic to the client in JavaScript bundles. Those bundles often contain:

API endpoints the app calls — revealing the full backend API surface
Role-based feature flags — revealing admin-only features
Hardcoded URLs to internal systems
Leaked API keys, access tokens, or credentials
Comments with TODO notes referencing bugs or test accounts

Pull and analyze bundles

# Find JavaScript files
curl -s https://target.example.com/ | grep -oE 'src="[^"]*.js[^"]*"' | sort -u

# Or use linkfinder / JSFinder for automated endpoint extraction
python3 linkfinder.py -i https://target.example.com -o cli

# For minified bundles, unminify before reading
js-beautify app.bundle.min.js > app.bundle.js

What to grep for in bundles

grep -oE '"/api/[^"]+"' app.bundle.js    # API endpoints
grep -iE 'token|secret|key|apikey|password' app.bundle.js
grep -oE 'https?://[^"]+' app.bundle.js   # internal URLs
grep -E 'TODO|FIXME|HACK' app.bundle.js   # developer notes

Wayback and archive reconnaissance

archive.org has snapshots of many public sites going back years. Old versions reveal endpoints that no longer exist but may still respond, old JavaScript with different logic, forgotten admin panels.

waybackurls target.example.com | sort -u > wayback.txt
gau target.example.com >> wayback.txt   # getallurls: wayback + OTX + VT
# Filter for interesting paths
grep -E '/admin|/api|/internal|.env|.bak|.log' wayback.txt

Cloud asset discovery

Modern apps have cloud footprints beyond the primary domain:

S3 buckets named after the company — {company}-backups, {company}-static, etc.
Azure Blob storage — {company}.blob.core.windows.net
GCP Cloud Storage — storage.googleapis.com/{company}-*
CloudFront distributions, Elastic Load Balancers — often have predictable patterns

Tools: cloud_enum, s3scanner, Google dorks like site:amazonaws.com "companyname".

Exercises

1. Enumerate a target. Pick any domain you have authorization to test (or use example.com for practice). Run Subfinder, then validate with httpx. How many live subdomains did you find? What percentage had SSL certificates visible in crt.sh that Subfinder missed? Compare approaches.

2. Fingerprint live targets. For 5 discovered subdomains, identify the technology stack using headers, cookies, error pages. Write a one-line summary of each.

3. Find a JavaScript endpoint. Pull a JavaScript bundle from any public web application you use. Grep for /api/ references. How many API endpoints can you extract from the bundle? How many are documented anywhere public?

Check your understanding

Which passive source is most likely to reveal subdomains that brute-forcing will miss?
Why is certificate transparency such a strong reconnaissance source?
What’s the tradeoff between gobuster dns brute-forcing and passive enumeration?
Where in an HTTP response would you look for technology fingerprints?
Why are JavaScript bundles a high-yield reconnaissance target?

Key takeaways

Passive sources (CT logs, aggregators) find subdomains brute-forcing never will.
Technology fingerprinting dictates which vulnerability classes to test first.
Directory enumeration finds undocumented endpoints — admin panels, backup files, APIs.
JavaScript bundles leak API routes and occasionally secrets; always pull and grep.
Wayback archives preserve endpoints that no longer exist but may still be reachable.

Take the 20-question quiz below to confirm your understanding. Pass with 70%+ to mark this module complete. Unlimited retries.

🧠

Check your understanding

Module Quiz · 20 questions

Pass with 80%+ to mark this module complete. Unlimited retries. Each question shows an explanation.

Up next

Module 3 · Authentication Attacks

Continue →

Real-World Case Study: Capital One, 2019

The story. A former AWS engineer exfiltrated 106 million US and Canadian Capital One customer records — names, addresses, credit scores, 140,000 SSNs, 80,000 bank account numbers. The breach didn’t start with an exploit. It started with recon.

The technical chain.

External recon — the attacker enumerated Capital One’s external surface and found a misconfigured WAF.
SSRF discovery — the WAF endpoint accepted user-supplied URLs and fetched them server-side.
EC2 metadata abuse — the attacker fetched http://169.254.169.254/latest/meta-data/iam/security-credentials/ and obtained the IAM role WAF-Role‘s temporary credentials.
S3 enumeration — those credentials had over-broad s3:ListBucket + s3:GetObject permissions across hundreds of buckets.
Two days of aws s3 sync later, ~30 GB of customer data exfiltrated.

What enumeration revealed. The attacker didn’t break in — they read the documentation. AWS metadata endpoints are public knowledge. nmap -sV against Capital One’s external IPs would have shown the WAF banner. gobuster on common WAF management paths would have surfaced the SSRF-able endpoint. The exploit was 4 lines of curl.

The takeaway. Recon is not “low-impact”. It’s the entire kill chain. Run external recon against your own perimeter before attackers do — and treat IMDSv1 as already compromised. Force IMDSv2 on every EC2 instance you own.

⚙ Optimisation · Performance · Security — extended

Practical depth on what to tune, what to harden, and how this maps to Indian regulatory expectations.

Optimising recon — speed without losing accuracy

1Parallelism: ffuf with -t 100 threads is usually fine on internet targets; tune down to 20-40 against rate-limited endpoints.

2Wordlist tiering: start with raft-medium-directories.txt (~30k entries); only escalate to seclists-big.txt when the medium pass is dry.

3Status-code filtering: -fc 404,400,403 reduces noise; analyse interesting 200/301/302/401/500 responses.

4Smart wordlists: cewl generates target-specific wordlists from the site’s own content.

5Skip duplicate hosts: subdomains often share infrastructure; httpx resolves and dedupes.

6Cache previous recon results; do not re-scan static targets every run. The difference between 30 minutes and 4 hours of recon is mostly tooling discipline.

Subdomain enumeration — passive plus active, in that order

Passive sources (free, fast, no traffic to target): subfinder aggregates Censys, Shodan, VirusTotal, crt.sh; amass does the same plus more sources; sublist3r for legacy. crt.sh queries — https://crt.sh/?q=%25.target.com&output=json reveals every certificate ever issued for the domain, including staging/dev/test that should not be public. Active sources (slower, leaves traces): amass enum -active, brute-force with massdns + a ~1M-entry wordlist, shuffledns for filtered results. Always pipe through httpx -title -tech-detect to identify which subdomains actually serve content.

For Indian targets specificallyASN-based discovery (every Indian operator has a published ASN range) catches infrastructure that DNS-only enumeration misses.

Operational checklist — recon that produces actionable findings

1Define scope in writing before starting; out-of-scope targets generate legal risk under IT Act §66 in India.

2Maintain a target-specific wordlist that grows with each engagement.

3Save all output (JSON preferred) for later correlation.

4Run continuous recon on long-term clients via recon-ng workspaces.

5Validate every interesting finding manually; automation produces false positives constantly.

6Generate diff reports between runs to highlight changes.

7Flag exposed admin panels, debug endpoints, or staging instances as P1 findings.

8Keep a personal “interesting strings” wordlist (banking words for BFSI clients, healthcare terms for hospital clients) that improves discovery quality over time.

Detection — when blue teams catch your recon

Modern WAFs (Cloudflare, Akamai, Imperva) detect recon by request-rate per IP, JA3/JA4 fingerprint of common scanner clients, User-Agent strings, and request-pattern (sequential alphabetical paths).

Evasion patternsrotate through residential proxies (Bright Data, Smart Proxy), randomise User-Agent, slow down to “human-like” rates, mask JA3 (Burp’s upstream proxy + uTLS-based clients).

Defender takeawayrecon detection is one of the highest-ROI WAF rule sets; alerts on “host probed >50 distinct paths in 5 minutes from one source” catch most automated tooling.

For Indian SOCsintegrate WAF logs into the SIEM and create a “recon detected” rule with auto-block-then-review. The 2022 CERT-In directive expects 180-day log retention specifically because retroactive recon analysis often reveals the precursor to a later breach.

Recon pipeline — passive then active

  TARGET
    │
    ▼
  [Passive sources]
    crt.sh  •  Censys  •  Shodan  •  VirusTotal  •  GitHub
    │
    ▼
  [Aggregation]   subfinder, amass passive
    │
    ▼
  [Resolution]    massdns / shuffledns
    │
    ▼
  [Live check]    httpx --title --tech-detect
    │
    ▼
  [Active probe]  ffuf, gobuster, nuclei (with caution)
    │
    ▼
  [Manual triage] interesting endpoints, admin panels, staging

Additional FAQs

Is automated recon legal in India?

Authorised testing under signed Statement of Work / Rules of Engagement is fully legal. Unauthorised scanning even of public targets risks IT Act §43/§66 charges. Always have written authorisation; never assume “the target is public” implies consent.

How do I avoid getting blocked during recon?

Rotate IPs (residential proxies), spread requests over time, vary User-Agent, mimic browser TLS fingerprints, and respect robots.txt as a soft signal. The fastest way to be blocked is sequential 1-thread brute-force on a single IP.

When should I escalate from recon to active testing?

Once you have a complete inventory of in-scope hosts, services, and obvious tech stack. Active testing without complete recon misses the easy wins (forgotten staging, debug endpoints, leaked credentials).

Want this for your team?

Custom team training + practitioner advisory

Beyond the free academy — we run private workshops, vCISO advisory, and red-team exercises tailored to your stack. For Indian SMBs scaling past their first hire.

Book team training call Replies in 4 working hrs · India-only · Senior consultants

Module 2 · Web Enumeration & Recon

Passive vs active reconnaissance

Subdomain enumeration

1. Certificate transparency logs (crt.sh)

2. Subfinder — aggregating multiple sources

3. Amass — deeper, slower, more thorough

4. Validate discovered subdomains

5. Subdomain brute-forcing (active)

Technology fingerprinting

Directory and file enumeration

ffuf — the modern standard

Recursion — go deeper after discovery

Extension bruting

Wordlist selection

JavaScript bundle analysis

Pull and analyze bundles

What to grep for in bundles

Wayback and archive reconnaissance

Cloud asset discovery

Exercises

Check your understanding

Key takeaways

Module Quiz · 20 questions

Real-World Case Study: Capital One, 2019

Optimising recon — speed without losing accuracy

Subdomain enumeration — passive plus active, in that order

Operational checklist — recon that produces actionable findings

Detection — when blue teams catch your recon

Further reading

Additional FAQs

Is automated recon legal in India?

How do I avoid getting blocked during recon?

When should I escalate from recon to active testing?

Custom team training + practitioner advisory

Other modules in this track