Academy

Module 2 · Web Enumeration & Recon 🔒

Manish Garg
Manish Garg Associate CISSP · RingSafe
April 19, 2026
7 min read
🎯 WEB APP PENTEST PATH
EASY
⏱ 90 min
Module 2 of 8

What you’ll learn

  • Passive DNS reconnaissance — finding assets without touching the target
  • Subdomain enumeration with multiple sources (crt.sh, Subfinder, Amass, VirusTotal)
  • Technology fingerprinting — how to read what a server is running
  • Directory and file brute-forcing with ffuf, gobuster, and smart wordlists
  • JavaScript bundle analysis — extracting hidden endpoints from client code

Prerequisites: Module 1 (HTTP & Web Fundamentals). Familiarity with a Linux terminal.

Before you test anything, you need to know what’s there. In real engagements, “what’s there” is almost always more than the documentation claims. Staging subdomains that accidentally made it to production DNS. Old API versions that were “deprecated” three years ago but never actually turned off. Admin panels that were hidden behind “security through obscurity” — no link to them anywhere, but the URL still responds if you know it. Reconnaissance is the phase where you find all of this.

This module covers the reconnaissance techniques that produce the most value per hour for web application testers. We’re going to skip the reconnaissance frameworks that try to do everything and focus on the commands that actually matter.

Passive vs active reconnaissance

Passive reconnaissance: gathering information without sending traffic to the target. Public DNS records, search-engine results, certificate transparency logs, archived web pages. The target doesn’t see it.

Active reconnaissance: directly probing the target. Port scans, directory brute-forcing, HTTP requests. The target’s logs will show activity.

For web app testing with client permission, active reconnaissance is fine — you’re scoped. For pre-engagement OSINT or bug bounty work on shared infrastructure, start passive and only go active within explicit scope.

Subdomain enumeration

Finding subdomains is the single highest-yield reconnaissance activity. Every subdomain is potentially a separate application with its own vulnerabilities. Acquisitions, internal tools, staging environments, forgotten projects — all leak through subdomain enumeration.

1. Certificate transparency logs (crt.sh)

Every TLS certificate issued by a public CA since 2018 is logged in CT logs. Certificates for *.example.com appear in these logs along with every subdomain that has had a cert issued for it. This is the highest-signal passive source.

# Via web
https://crt.sh/?q=%.example.com

# Via API, extract unique domain names
curl -s "https://crt.sh/?q=%.example.com&output=json" | 
  jq -r '.[].name_value' | sort -u

2. Subfinder — aggregating multiple sources

Subfinder queries dozens of passive sources (crt.sh, VirusTotal, Shodan, SecurityTrails, etc.) and merges results. With API keys for premium sources, the results are dramatically better.

subfinder -d example.com -all -silent -o subs.txt
# -all uses all sources including slower ones
# Typical output: hundreds to thousands of subdomains for a medium target

3. Amass — deeper, slower, more thorough

Amass does passive sources plus active DNS resolution plus certificate analysis plus reverse DNS. It takes much longer than Subfinder but finds things Subfinder misses.

amass enum -d example.com -o amass.txt
# Slow (hours for large targets); most thorough free option

4. Validate discovered subdomains

Raw lists contain dead subdomains (old records pointing to nothing). Validate which are actually live with httpx:

cat subs.txt | httpx -silent -status-code -title -tech-detect -o live.txt
# Outputs: URL | status code | page title | detected technology

5. Subdomain brute-forcing (active)

When passive sources run dry, brute-force with a wordlist:

gobuster dns -d example.com -w /path/to/subdomain-wordlist.txt -t 50
# Good wordlists: SecLists' dns-Jhaddix.txt, or bitquark's top-1million

Technology fingerprinting

Identifying the server stack tells you what classes of vulnerability to test first. A PHP 7.4 app on Apache has different attack patterns than a Node.js app on Cloudflare Workers.

Sources of fingerprints:

  • HTTP response headersServer, X-Powered-By, X-AspNet-Version. These should be stripped in production but often aren’t.
  • Cookie namesPHPSESSID (PHP), JSESSIONID (Java), csrftoken (Django), laravel_session (Laravel), etc.
  • Error pages — trigger a 404 or 500 and observe the stack trace or error template. Framework signatures are distinctive.
  • Static asset paths/wp-content/ screams WordPress, /_next/ is Next.js, /__webpack_hmr exposes Webpack dev server.
  • Favicon hashfavicon.ico hashes map to specific applications via Shodan’s favicon search.

Tools:

# Wappalyzer CLI
wappalyzer https://target.example.com

# Httpx with tech detection
echo target.example.com | httpx -tech-detect

# Manual header inspection
curl -I https://target.example.com

# Response body analysis with Nuclei technology templates
nuclei -u https://target.example.com -t technologies/

Directory and file enumeration

Web applications have more URLs than they advertise. Admin consoles, API endpoints, backup files, .git directories, configuration files — all may be present and reachable.

ffuf — the modern standard

ffuf -u https://target.example.com/FUZZ 
  -w /usr/share/seclists/Discovery/Web-Content/raft-medium-directories.txt 
  -mc 200,204,301,302,307,401,403 
  -fs 0 
  -o results.json -of json

# Breakdown:
# FUZZ is the placeholder where wordlist entries go
# -mc matches specified status codes (interesting responses)
# -fs 0 filters responses of exact size 0 (empty responses)

Recursion — go deeper after discovery

When you find /admin/, enumerate inside it. ffuf supports recursion with -recursion.

Extension bruting

# Append extensions to every wordlist entry
ffuf -u https://target.example.com/FUZZ 
  -w wordlist.txt 
  -e .php,.bak,.old,.backup,.zip,.sql,.env,.log 
  -mc 200,204

Wordlist selection

Good wordlists: SecLists (/Discovery/Web-Content/ directory). Specific picks:

  • raft-medium-directories.txt — general starting point
  • common.txt — quick first pass
  • api/ directory wordlists — for API endpoints
  • Technology-specific wordlists (apache.txt, iis.txt) once you know the stack

JavaScript bundle analysis

Modern web apps ship significant logic to the client in JavaScript bundles. Those bundles often contain:

  • API endpoints the app calls — revealing the full backend API surface
  • Role-based feature flags — revealing admin-only features
  • Hardcoded URLs to internal systems
  • Leaked API keys, access tokens, or credentials
  • Comments with TODO notes referencing bugs or test accounts

Pull and analyze bundles

# Find JavaScript files
curl -s https://target.example.com/ | grep -oE 'src="[^"]*.js[^"]*"' | sort -u

# Or use linkfinder / JSFinder for automated endpoint extraction
python3 linkfinder.py -i https://target.example.com -o cli

# For minified bundles, unminify before reading
js-beautify app.bundle.min.js > app.bundle.js

What to grep for in bundles

grep -oE '"/api/[^"]+"' app.bundle.js    # API endpoints
grep -iE 'token|secret|key|apikey|password' app.bundle.js
grep -oE 'https?://[^"]+' app.bundle.js   # internal URLs
grep -E 'TODO|FIXME|HACK' app.bundle.js   # developer notes

Wayback and archive reconnaissance

archive.org has snapshots of many public sites going back years. Old versions reveal endpoints that no longer exist but may still respond, old JavaScript with different logic, forgotten admin panels.

waybackurls target.example.com | sort -u > wayback.txt
gau target.example.com >> wayback.txt   # getallurls: wayback + OTX + VT
# Filter for interesting paths
grep -E '/admin|/api|/internal|.env|.bak|.log' wayback.txt

Cloud asset discovery

Modern apps have cloud footprints beyond the primary domain:

  • S3 buckets named after the company — {company}-backups, {company}-static, etc.
  • Azure Blob storage — {company}.blob.core.windows.net
  • GCP Cloud Storage — storage.googleapis.com/{company}-*
  • CloudFront distributions, Elastic Load Balancers — often have predictable patterns

Tools: cloud_enum, s3scanner, Google dorks like site:amazonaws.com "companyname".

Exercises

1. Enumerate a target. Pick any domain you have authorization to test (or use example.com for practice). Run Subfinder, then validate with httpx. How many live subdomains did you find? What percentage had SSL certificates visible in crt.sh that Subfinder missed? Compare approaches.

2. Fingerprint live targets. For 5 discovered subdomains, identify the technology stack using headers, cookies, error pages. Write a one-line summary of each.

3. Find a JavaScript endpoint. Pull a JavaScript bundle from any public web application you use. Grep for /api/ references. How many API endpoints can you extract from the bundle? How many are documented anywhere public?

Check your understanding

  • Which passive source is most likely to reveal subdomains that brute-forcing will miss?
  • Why is certificate transparency such a strong reconnaissance source?
  • What’s the tradeoff between gobuster dns brute-forcing and passive enumeration?
  • Where in an HTTP response would you look for technology fingerprints?
  • Why are JavaScript bundles a high-yield reconnaissance target?

Key takeaways

  • Passive sources (CT logs, aggregators) find subdomains brute-forcing never will.
  • Technology fingerprinting dictates which vulnerability classes to test first.
  • Directory enumeration finds undocumented endpoints — admin panels, backup files, APIs.
  • JavaScript bundles leak API routes and occasionally secrets; always pull and grep.
  • Wayback archives preserve endpoints that no longer exist but may still be reachable.

Take the 20-question quiz below to confirm your understanding. Pass with 70%+ to mark this module complete. Unlimited retries.

🧠
Check your understanding

Module Quiz · 20 questions

Pass with 70%+ to mark this module complete. Unlimited retries. Each question shows an explanation.

Up next
Module 3 · Authentication Attacks

Continue →