Web Content Discovery

Attacking a web application requires you to have an understanding of it’s attack surface, such as accessible URL’s and associated input fields. Performing this type of mapping is relatively simple, but failure to effectively map a target can result in vulnerabilities being missed.

It’s important to note that you may want to repeat some of these steps as newer information becomes available.

Determining Virtual Hosts

Most web servers support virtual hosting, where a single IP address serves multiple different websites. To configure this in Apache, just add a configuration site to sites-available and activate it with a2ensite;

┌──(root㉿kali)-[/etc/apache2/sites-available]
└─# cat /etc/apache2/sites-available/dev.bordergate.co.uk.conf
<VirtualHost *:80>
    ServerAdmin test@test.com
    ServerName dev.bordergate.co.uk
    ServerAlias www.dev.bordergate.co.uk
    DocumentRoot /var/www/dev.bordergate.co.uk/public_html
    <Directory /var/www/dev.bordergate.co.uk/public_html>
        Options Indexes FollowSymLinks
        AllowOverride All
        Require all granted
    </Directory>
    ErrorLog ${APACHE_LOG_DIR}/error.log
    CustomLog ${APACHE_LOG_DIR}/access.log combined
    <IfModule mod_dir.c>
        DirectoryIndex index.html index.php
    </IfModule>
</VirtualHost>

┌──(root㉿kali)-[/etc/apache2/sites-available]
└─# sudo a2ensite dev.bordergate.co.uk

The web server knows which website to send a user to by looking at the host header in a HTTP request;

GET / HTTP/1.1
Host: dev.bordergate.co.uk
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: close
Upgrade-Insecure-Requests: 1

We can use ffuf to brute-force the host header and identify the sub domain we just configured. In this instance, we’re filtering any responses of size 10701, since this is the size of the servers default response. ffuf can also be supplied the “-ac” flag to attempt to auto calibrate it’s response filtering.

ffuf -w /usr/share/wordlists/seclists/Discovery/DNS/subdomains-top1million-5000.txt \
-u http://127.0.0.1 -H "Host: FUZZ.bordergate.co.uk" -fs 10701

        /'___\  /'___\           /'___\       
       /\ \__/ /\ \__/  __  __  /\ \__/       
       \ \ ,__\\ \ ,__\/\ \/\ \ \ \ ,__\      
        \ \ \_/ \ \ \_/\ \ \_\ \ \ \ \_/      
         \ \_\   \ \_\  \ \____/  \ \_\       
          \/_/    \/_/   \/___/    \/_/       

       v2.0.0-dev
________________________________________________

 :: Method           : GET
 :: URL              : http://127.0.0.1
 :: Wordlist         : FUZZ: /usr/share/wordlists/seclists/Discovery/DNS/subdomains-top1million-5000.txt
 :: Header           : Host: FUZZ.bordergate.co.uk
 :: Follow redirects : false
 :: Calibration      : false
 :: Timeout          : 10
 :: Threads          : 40
 :: Matcher          : Response status: 200,204,301,302,307,401,403,405,500
 :: Filter           : Response size: 10701
________________________________________________

[Status: 200, Size: 23, Words: 2, Lines: 2, Duration: 0ms]
    * FUZZ: dev

[Status: 200, Size: 23, Words: 2, Lines: 2, Duration: 587ms]
    * FUZZ: www.dev

:: Progress: [4989/4989] :: Job [1/1] :: 66 req/sec :: Duration: [0:00:04] :: Errors: 0 ::


Manually Examine the Target

One mistake I’ve seen people make is starting off by running tools against the target application. It’s worth visiting the site first, and making notes of things of interest. This could include;

  • Any login functionality
  • Any input forms
  • Contact details of site users
  • The contents of robots.txt

Finding Hidden Comments

In BurpSuite, selecting Engagement Tools > Find Comments will show any comments included in HTML. Comments often provide information on the technology in use on the site, and potentially useful comments left by developers.


Identifying Technology

Webappalyzer is a great extension for Firefox that will assist in determining the technology in use on a site.

On the command line, whatweb can be used to determine the same type of information;

whatweb www.bordergate.co.uk
http://www.bordergate.co.uk [302 Found] Apache, Country[UNITED STATES][US], HTTPServer[Apache], IP[15.237.112.139], RedirectLocation[https://www.bordergate.co.uk/], Title[302 Found], X-Frame-Options[SAMEORIGIN]
https://www.bordergate.co.uk/ [200 OK] Apache, Bootstrap[4.0.0], Country[UNITED STATES][US], HTML5, HTTPServer[Apache], IP[15.237.112.139], JQuery[3.6.4], MetaGenerator[WordPress 6.2.1], Open-Graph-Protocol[website], PHP[7.4.13], Script[application/ld+json], Title[BorderGate &gt; Penetration Testing &amp; Exploit Development], UncommonHeaders[link,x-mod-pagespeed], WordPress[6.2.1], X-Frame-Options[SAMEORIGIN], X-Powered-By[PHP/7.4.13]

Having a general overview of the technology in use will then allow you to select scanners specifically designed for the target. For instance, wpscan could be used for WordPress sites.

For sites not based on a specific technology, a couple of good scanners are nikto and nuclei

nuclei -u dev.bordergate.co.uk
nikto --url http://dev.bordergate.co.uk

Mapping Web Pages & Parameters

There are two approaches to determining web pages;

  • Spidering the site based on links to content identified
  • Guessing file names based on common naming conventions

BurpSuite will attempt to do both of these things by selecting Engagement Tools > Discover Content.

Whilst BurpSuite does a good job on most sites, using dedicated fuzzing tools can yield addition results.

Fuzz Directories & Files

ffuf can be used to recursively brute force directories that exist in the target application. raft-medium-directories.txt from seclists is normally good for this purpose.

ffuf -w /usr/share/wordlists/seclists/Discovery/Web-Content/raft-medium-directories.txt \
-u http://dev.bordergate.co.uk/FUZZ -recursion -o output.json

        /'___\  /'___\           /'___\       
       /\ \__/ /\ \__/  __  __  /\ \__/       
       \ \ ,__\\ \ ,__\/\ \/\ \ \ \ ,__\      
        \ \ \_/ \ \ \_/\ \ \_\ \ \ \ \_/      
         \ \_\   \ \_\  \ \____/  \ \_\       
          \/_/    \/_/   \/___/    \/_/       

       v2.0.0-dev
________________________________________________

 :: Method           : GET
 :: URL              : http://dev.bordergate.co.uk/FUZZ
 :: Wordlist         : FUZZ: /usr/share/wordlists/seclists/Discovery/Web-Content/raft-medium-directories.txt
 :: Output file      : output.json
 :: File format      : json
 :: Follow redirects : false
 :: Calibration      : false
 :: Timeout          : 10
 :: Threads          : 40
 :: Matcher          : Response status: 200,204,301,302,307,401,403,405,500
________________________________________________

[Status: 301, Size: 330, Words: 20, Lines: 10, Duration: 0ms]
    * FUZZ: private
[INFO] Adding a new job to the queue: http://dev.bordergate.co.uk/private/FUZZ
[Status: 301, Size: 333, Words: 20, Lines: 10, Duration: 0ms]
    * FUZZ: javascript
[INFO] Adding a new job to the queue: http://dev.bordergate.co.uk/javascript/FUZZ
[Status: 301, Size: 329, Words: 20, Lines: 10, Duration: 0ms]
    * FUZZ: public
[INFO] Adding a new job to the queue: http://dev.bordergate.co.uk/public/FUZZ
[Status: 200, Size: 21282, Words: 495, Lines: 351, Duration: 0ms]
    * FUZZ: server-status
[Status: 200, Size: 23, Words: 2, Lines: 2, Duration: 0ms]
    * FUZZ: 
[INFO] Starting queued job on target: http://dev.bordergate.co.uk/private/FUZZ
[Status: 301, Size: 337, Words: 20, Lines: 10, Duration: 0ms]
    * FUZZ: secret
[INFO] Adding a new job to the queue: http://dev.bordergate.co.uk/private/secret/FUZZ

The JSON output can then be parsed with jq

cat output.json | jq | grep "url"
      "url": "http://dev.bordergate.co.uk/private",
      "url": "http://dev.bordergate.co.uk/javascript",
      "url": "http://dev.bordergate.co.uk/public",
      "url": "http://dev.bordergate.co.uk/server-status",
      "url": "http://dev.bordergate.co.uk/",
      "url": "http://dev.bordergate.co.uk/private/secret",
      "url": "http://dev.bordergate.co.uk/private/",
      "url": "http://dev.bordergate.co.uk/javascript/jquery",
      "url": "http://dev.bordergate.co.uk/javascript/",
      "url": "http://dev.bordergate.co.uk/javascript/skeleton",
      "url": "http://dev.bordergate.co.uk/javascript/jquery-ui",
      "url": "http://dev.bordergate.co.uk/public/",
      "url": "http://dev.bordergate.co.uk/private/secret/files",
      "url": "http://dev.bordergate.co.uk/private/secret/",
      "url": "http://dev.bordergate.co.uk/javascript/jquery/jquery",

Once the site directories have been identified, we can fuzz for files within them;

ffuf -w /usr/share/wordlists/seclists/Discovery/Web-Content/raft-medium-files.txt -u http://dev.bordergate.co.uk/private/secret/files/FUZZ -fs 285

Fuzz File Extensions

When you first visit a website, you will most likely be redirected to it’s index page. This can be written in a variety of languages. Fuzzing the extension of index can be useful to identify the language in use if it’s not readily apparent.

ffuf -w /usr/share/wordlists/seclists/Discovery/Web-Content/raft-small-extensions.txt -u http://dev.bordergate.co.uk/indexFUZZ

Finding Hidden GET Parameters

We can brute force parameter names to find parameters which are not referenced otherwise referenced (and as such wouldn’t be identified by spidering the site)

ffuf -w /usr/share/wordlists/seclists/Discovery/Web-Content/burp-parameter-names.txt \
-u http://dev.bordergate.co.uk/private/secret/files/users.php?FUZZ=test -fs 13

        /'___\  /'___\           /'___\       
       /\ \__/ /\ \__/  __  __  /\ \__/       
       \ \ ,__\\ \ ,__\/\ \/\ \ \ \ ,__\      
        \ \ \_/ \ \ \_/\ \ \_\ \ \ \ \_/      
         \ \_\   \ \_\  \ \____/  \ \_\       
          \/_/    \/_/   \/___/    \/_/       

       v2.0.0-dev
________________________________________________

 :: Method           : GET
 :: URL              : http://dev.bordergate.co.uk/private/secret/files/users.php?FUZZ=test
 :: Wordlist         : FUZZ: /usr/share/wordlists/seclists/Discovery/Web-Content/burp-parameter-names.txt
 :: Follow redirects : false
 :: Calibration      : false
 :: Timeout          : 10
 :: Threads          : 40
 :: Matcher          : Response status: 200,204,301,302,307,401,403,405,500
 :: Filter           : Response size: 13
________________________________________________

[Status: 200, Size: 17, Words: 2, Lines: 3, Duration: 0ms]
    * FUZZ: id

The Param Miner BurpSuite extension can also be used for the same purpose;

Finding Hidden POST Parameters

ffuf can also fuzz POST requests, which requires setting the request type with “-X”, and setting a content type header;

ffuf -w params.txt -u "http://dev.bordergate.co.uk/private/secret/files/users.php" \
-X POST -d 'FUZZ=parameter' -H 'Content-Type: application/x-www-form-urlencoded' -ac

        /'___\  /'___\           /'___\       
       /\ \__/ /\ \__/  __  __  /\ \__/       
       \ \ ,__\\ \ ,__\/\ \/\ \ \ \ ,__\      
        \ \ \_/ \ \ \_/\ \ \_\ \ \ \ \_/      
         \ \_\   \ \_\  \ \____/  \ \_\       
          \/_/    \/_/   \/___/    \/_/       

       v2.0.0-dev
________________________________________________

 :: Method           : POST
 :: URL              : http://dev.bordergate.co.uk/private/secret/files/users.php
 :: Wordlist         : FUZZ: /home/kali/params.txt
 :: Header           : Content-Type: application/x-www-form-urlencoded
 :: Data             : FUZZ=parameter
 :: Follow redirects : false
 :: Calibration      : true
 :: Timeout          : 10
 :: Threads          : 40
 :: Matcher          : Response status: 200,204,301,302,307,401,403,405,500
________________________________________________

[Status: 200, Size: 22, Words: 2, Lines: 3, Duration: 0ms]
    * FUZZ: test

Extracting the Site Map

Typically, I use BurpSuite as the primary way of tracking a sites contents. If another tool detects some additional content, I’ll visit the page with BurpSuite to make sure it’s added to it’s site map.

The BurpSuite site map can be extracted using the cunning named extension “Site Map Extractor

In Conclusion

Mapping all functionality presented in a web application is essential to discovering otherwise hard to find vulnerabilities. Bear in mind sending large numbers of HTTP requests may cause a denial of service condition, so traffic should be rate limited as required.