x86/x86-64 ASSEMBLY IMPORTANT NOTE: There are two main styles of assembly code: AT&T syntax, and Intel syntax. AT&T syntax can be recognized through its use of parentheses, whereas Intel syntax uses square brackets. This document attempts to remain independent of either syntax, but when it is impossible to be independent, we will use AT&T syntax, as that is what objdump uses by default. Most of the information in this document pertains to both 32-bit architectures (x86) and 64-bit architectures (x86-64, sometimes known as x64) with code compiled using C/C++ compilers. When we have to choose a specific architecture (for example, when explaining memory addressing syntax), we assume a 64-bit machine. When discussing the different sections of executables, we only discuss Executable and Linkable Format (ELF) executables. ================================================================== What is a program? ================================================================== At its most basic level, a program is just a series of instructions for a computer to execute. In CS 1570 you learned the high-level programming language C++. However, computers don't understand the high-level constructs such as for loops, if statements, and expressions, so we can't just run our C++ code. Instead, we have to compile a program and run the compiled code. Compiling is the process of converting high-level instructions into the simple low-level instructions computers can understand. Assembly code is a human readable version of these low-level instructions consisting of short mnemonics for each type of instruction. Several of the most common instructions are: mov - copy a value from one location to another add - add two numbers sub - subtract two numbers mul - multiply two numbers div - divide two numbers jump - Start executing instructions at a given memory address call - call a function that starts at the given memory address cmp - compare two numbers (used in condition checking) ret - return from a function push - push a value on the stack (more on the stack later) pop - pop a value off of the stack nop - No operation (no-op). This instruction literally tells the CPU to do nothing Instructions mnemonics can have a suffix letter, where the different letters denote how much data is being operated on. The pattern is as follows: b - byte (8 bits) w - word (16 bits) l - long (32 bits) q - quadword (64 bits) no suffix - 32 bits by default Here are a few examples. Note that the last example has no suffix and therefore defaults to 32 bits. movl - copies 4 bytes of data movb - copies 1 byte of data addq - adds two 64 bit integers cmpw - compare two 16 bit integers sub - subtracts two 32 bit integers ================================================================== Memory - The stack ================================================================== So now that we have an executable, we need to understand how the program keeps track of all our variables. Programs contain two types of memory relevant here: the stack and heap. The stack is where function call information and local variables are stored. The heap is where dynamically allocated memory is stored. For the rest of this assembly tutorial, we will pretend that the heap is magic and only worry about the stack. The stack works just like a stack of blocks, where each block is some number of bytes. Values can be "pushed" onto the top of the stack, and "popped" off of the top. Values anywhere in the stack can be read from and written to, but the stack's size can only be changed by changing where the top is (shrink the stack by popping, and increase the stack by pushing). Local variables are placed on the stack, one after another, usually in the order they are declared in the source program. If you push so many things on the stack that the computer runs out of memory, and there is no more stack space, you get a stack overflow error, and the program crashes. So we have a stack, but where exactly is it? How do we access it? There are two values that maintain, or keep track of, the stack: the stack pointer and the base pointer. The base pointer holds the memory address of the bottom of the current stack frame, and the stack pointer holds the memory address of the top of the stack. The difference between the two values yields the size of the stack in bytes. | | high memory address +-----------+ <-- base pointer | stack | +-----------+ <-- stack pointer | | | | V | | | | A | | | | +-----------+ | heap | +-----------+ | program | +-----------+ | | low memory address Notice how the stack is shown "growing down", with the base of the stack at the top of the image and the top of the stack at the bottom of the image. The reason for this is that the stack always starts at a high memory address and grows down toward lower memory addresses. So pushing a 4-byte value onto the stack would actually decrement the stack pointer by 4, and popping off 4 bytes would increment it by 4. This directionality of how the stack grows is a fundamental idea for exploits we will explore later such as buffer overflows. Function calls and the stack We said earlier that the stack pointer and base pointer kept track of the current "stack frame", but what is a stack frame? A stack frame is a section of the stack that manages the memory for a single function call. A program is linear and each instruction has its own memory address. As a program executes, the address of the instruction currently being executed is stored in a value called the instruction pointer. When a function is called, the program jumps to the first instruction of the called function (callee) and continues executing, which is equivalent to setting the instruction pointer to the memory address of the first instruction of the callee. However, when the function returns, the program needs to know where it originally came from so it can resume executing the calling function (caller). It does this by pushing the memory address of the instruction AFTER the call instruction on the stack. The address pushed on the stack is called the "return address". For example, if instruction 57 is a call to function foo, and the next instruction is instruction 58 which adds 2 and 8, then when the program performs instruction 57 (the function call), it pushes the value '58' on the stack so that it knows where to continue from when foo returns. When a function returns, it simply pops the return address off of the stack and puts it in the instruction pointer, resuming execution of the caller function. However, this usually isn't all that is needed to keep track of function calls. We can only reference memory on the stack relative to the stack pointer and base pointer because of something called address space layout randomization (ASLR). ASLR is what causes programs to be loaded at different memory locations every time so you can't use hardcoded addresses. It is a safety feature to help prevent exploits, but for the purposes of this tutorial, you just need to know it is a thing that exists, not how or why it works. While a function is executing, values can be pushed and popped onto and off of the stack a lot, so using the stack pointer as a reference can get ugly because of all the changes it undergoes. This means we want to reference local variables relative to the base pointer since it doesn't (shouldn't) change during function calls. Functions can be called whenever, regardless of how much memory there is on the stack, meaning that a function's stack frame can be located anywhere on the stack. To provide a "constant" point for our function to reference its local variables from, the first thing a function usually does is set the base pointer to the stack pointer (top of the stack). From that point, the function knows that the first local variable (assuming it is 4 bytes) is always at location "base pointer - 4". Everything local to the function now has a constant known offset from the base pointer and can be easily referenced. But we just screwed everything up. Now when we return from our function, the base pointer is different, and the previous function won't be able to properly reference its local variables. The fix for this is the same as for the instruction pointer: just push the original value of the base pointer on the stack, and, just before we return from the function, pop that value back off the stack and store it back into the base pointer. To summarize, here is what happens when a function is called: 1. Caller pushes the address of the next instruction of the caller on the stack 2. Instruction pointer gets set to the address of the first instruction of the callee 3. Callee pushes the value of the base pointer on the stack 4. Callee sets the value of the base pointer to the value of the stack pointer 5. Callee allocates stack space for local variables (changes stack pointer) 6. Callee does its thing 7. Callee deallocates the space it used for local variables (set the stack pointer to the value of the base pointer) 8. Callee pops the original value of the base pointer off the stack and into the base pointer 9. Callee returns, popping the return address into the instruction pointer 10. Execution continues at wherever the instruction pointer is pointing (instruction at the return address) ================================================================== Registers ================================================================== Accessing data on the stack is relatively slow. There are several reasons for this in how the hardware works, but you don't need to know them for this tutorial. Because the stack is slow, it would be nice if we had a few spaces to store values that are used really often and need fast accesses. These spaces are called registers. Each register can hold a single value (up to 64 bits on a 64 bit machine, or 32 bits on a 32 bit machine), and although computers have many registers, you will usually only see a few that are consistently used because many of the programs you will look at are unoptimized. Each register has a name, and the name specifies information about the register. The name consists of the base register name and a prefix or suffix denoting what part of the register is being referenced. Every register is the same size, but the different parts you can reference are different sizes, allowing operations on smaller data types like 8 bit values instead of 32 or 64 bit values. Some of the most common registers are: di - data index si - source index ax - accumulator bx - base cx - counter dx - data bp - base pointer sp - stack pointer ip - instruction pointer The base, counter, and data registers (bx, cx, and dx) are essentially just general purpose registers, so don't worry about the names meaning anything. However the accumulator (ax) register is special. Instructions that produce results place them in the accumulator if no other destination is or can be specified. Note that the special variables mentioned while discussing how the stack works (stack pointer, base pointer, and instruction pointer) are actually stored in registers specifically devoted to those values. This is because those three values are so essential to how every program works. The prefix/suffix rules for naming registers are as follows. Note that these rules are different than those used in naming instructions. Prefixes: e - 32 bits r - 64 bits Suffixes (replaces the 'x' in ax, bx, cx, and dx): l - lower 8 bits (least significant 8 bits) of the 16 bit value h - upper 8 bits (most significant 8 bits) of the 16 bit value no prefix or suffix - 16 bits Examples: register | accumulator | base | counter | data | ---------+----------------+----------------+----------------+----------------+ 64 bit | rax | rbx | rcx | rdx | 32 bit | | eax | | ebx | | ecx | | edx | 16 bit | | ax | | bx | | cx | | dx | 8 bit | |ah|al| |bh|bl| |ch|cl| |dh|dl| register | stack pointer | base pointer | source | destination | ---------+----------------+----------------+----------------+----------------+ 64 bit | rsp | rbp | rsi | rdi | 32 bit | | esp | | ebp | | esi | | edi | 16 bit | | sp | | bp | | si | | di | 8 bit | These registers do not have 8 bit accessible versions ================================================================== Assembly code itself (bringing it all together) ================================================================== Each instruction consists of an opcode (operator code) followed by the parameters. An opcode is just an arbitrary number that tells the computer which instruction it is. For example, adding the constant 5 to the register ax could have the opcode of 0x1337. The opcode would then be followed by only 0x0005 (16 bits since we are adding 16 bit values). Note that we didn't add ax as a parameter. If registers are used as operands to an instruction, the register to use is "baked" into the opcode. For example, adding 5 to ax could have an opcode 1337, but adding 5 to rax could have an opcode of 1338 instead. Both would still be followed by the value 5, except the first instruction would be followed by 0x0005 (16 bit), and the second instruction would be followed by 0x0000000000000005 (64 bit). When naming a register in assembly code, it must be prefixed by a '%'. When constants appear in assembly code (called immediates), they must be prefixed by a '$'. For example, an instruction adding 5 to the eax register would look like: addl $5, %eax In whatever assembly syntax you are working in (AT&T or Intel), it is important to note the order of the operands. In AT&T syntax, instructions follow the pattern: Source is one parameter (literally the source in the case of mov instructions), and destination is the second parameter, and literally the destination in many instructions, specifically math. For example, the addl instruction from above adds 5 to eax and stores the result in eax. Likewise, a sub instruction would subtract the first argument from the second and store the result in the second argument. Operands to instructions don't just have to be constant/known values like 5 or a specific register. Operands can also be the data at certain addresses. Remember when we talked about the stack, and how, by using the base pointer as a reference, we can access local variables and data on the stack? In assembly, to get the data at an address that is stored in a register, surround the register with parentheses. For example, suppose we wanted to copy the top-most byte on the stack to the eax register. The instruction would look like "movb (%rsp), %al". Here, the "(%rsp)" means the data at the address contained in the rsp register. When dealing with addressing, you must use the full 32 or 64 bits required by the CPU architecture. For example, if you use only 32 bits for an address in x86-64, the computer will interpret that address as having all zeroes in the top 32 bits, and will likely give you a segfault for accessing memory outside of the memory designated to your program. Only being able to reference data at addresses that are contained in registers is rather limiting, so we also have the capability to provide an offset to the address. The offset comes just before the opening parenthese containing the register with the base address. Suppose we still wanted to copy a byte of data into the al register, but instead of the topmost byte, we wanted to copy the twelfth byte on the stack. Now the instruction looks like "movb -12(%rbp), %al". This moves a single byte at the address "rbp - 12" into the lower byte of the ax register. Note that because lower memory addresses are toward the top of the stack, we have to offset by a full -12 instead of -11, which it would be if the stack grew toward higher memory addresses. When referencing memory by an address, the address specifies the start of the value and takes the next bytes at increasing memory addresses. This makes sense if you think about it like this: if you had a line of blocks and were told to pick up four blocks starting at block 5, you would pick up the fifth, sixth, seventh, and eighth blocks, not the fourth, third, second, and first. This example shows that if we wanted the first byte on the stack, it would be addressed at "-1(%rbp)", since the %rbp'th byte is actually the first byte of the original base pointer value we pushed on the stack as part of the stack frame. This is why referencing the byte on the top of the stack required no offset. The image below illustrates this concept in x86. stack | old ebp | return address | < - - - - - - - | | | +---+---+---+---+---+---+---+---+---+----+----+----+----+----+ | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | +---+---+---+---+---+---+---+---+---+----+----+----+----+----+ ^ | ^ | | esp |ebp | | Since the base pointer (ebp) has the value 7, the current stack data starts at byte 6 and goes all the way to byte 2, meaning that we have 5 bytes on the stack. If we access the 4 byte value at ebp via "(%ebp)", we get the value that starts at byte 7 (the initial address we told the computer) and ends four bytes later at byte 10. This happens to be the base pointer value for the calling function. To get the first 4-byte value on the stack, we have to go four bytes to the left, so that when the computer counts to the right, we get the data we want. This is done by accessing the 4-bytes value at "-4(%ebp)". In most assembly code, offsets and other constants are usually written in hexadecimal to easily determine what value each individual byte has in multi-byte values. This means that although completely valid with a base 10 offset, our "movb -12(%rbp), %al" instruction would most likely be seen in the wild as "movb -0xc(%rbp), %al". Comments are also allowed in assembly code. Although common programming language comments usually start with either '#' or '//', assembly language comments start with ';'. That's right, semicolon. However, some compilers will allow comments starting with '#'. Jumps/Branching: The jmp instruction is used to start executing code at a different location. This is essentially equivalent to setting the instruction pointer to a given value. Jumps can also be conditional, in which case the processor jumps to the given address only if a certain condition is met. These conditions are usually represented by flags in the CPU that are set by certain instructions. The most common instruction that sets flags is the cmp instruction which subtracts its operands from each other and sets the appropriate flags. The most common flags are the zero flag (ZF) which gets set if the result is zero, and the sign flag (SF) which gets set if the result is negative. The SF flag is equivalent to the most significant bit (sign bit) of the result. Another instruction commonly used to set flags is the "test" instruction. This instruction performs a logical AND on its operands and sets the appropriate flags. A common use is for testing if a value is zero by testing it with itself. Logical AND is faster than a subtraction, and if it isn't zero, then the result will not be zero, hence ZF will not be set. Conditional jump instructions are named after the type of condition they check. For example, jne Jumps if Not Equal and a jle instruction Jumps if Less than or Equal to. The order of the operands obviously makes a difference, and is one of the big differences between AT&T syntax and Intel syntax: the two syntaxes swap the order of the operands in almost every case. I f you get the order of operands mixed up, you could interpret a jle conditional jump to be jumping in the cases it doesn't actually jump in. Jump instructions require a target address to jump to, but if you are writing your own assembly code, you don't know what address each instruction is at until you compile your program. Also, if you do happen to know the target address, adding an instruction between the jump and the target location will change the address of the desired target instruction. To get around these issues, assembly code has what are called labels. Labels just give a name to a location/instruction in a program. They follow the format of "myLabel:" and occur before the instruction they name. For example, given the code below: movl $5, %eax here: addl $3, %eax jmp here This code initializes eax to 5, then continually adds 3 to it in an infinite loop. The label is "here", and that is where we jump after adding 3. ================================================================== Calling functions with arguments ================================================================== Calling functions requires what's known as a calling convention, which specifies how callers and callees interact (manage stack frames) and share data such as parameters and return values. This allows proper communication between functions and lets calling work smoothly. If there was no calling convention, a caller could put function arguments on the stack and the callee could look for them in registers, and functions would never do what you expected. The calling conventions in x86 and x86-64 are mostly the same except for how they pass arguments. In x86, the convention is to pass arguments on the stack, whereas in x86-64, the convention is to pass arguments via registers if there are only a couple arguments, else pass them on the stack. Note that this is called a convention. This means it is only a suggestion and programs can actually just do whatever they want as long as it works. As such, the rules outlined in this section apply to C and C++ programs compiled for x86 and x86-64 architecture processors running Unix-based operating systems such as Linux and Mac. Compilers for other languages and operating systems may not use the same conventions, but these are the ones you will see ninety percent of the time in CTFs. For more information about the different calling conventions, see https://en.wikipedia.org/wiki/X86_calling_conventions#List_of_x86_calling_conventions. We have already partially discussed the calling convention with regard to managing stack frames during function calls. The caller saves the return address and the callee saves the base pointer. When returning, the callee restores the base pointer and sets the instruction pointer to resume execution at the return address. One thing we did not mention is that the callee's return value is always stored in the accumulator register. This is because function return values are usually immediately operated on, and the accumulator has the most capability for operations. Now on to passing arguments. x86 calling convention: In an x86 architecture, passing arguments to functions is done by pushing all arguments on the stack in reverse order. For example, the function call foo(one, two, three) translates to: push three push two push one call foo where one, two, and three are placeholders for the first, second, and third arguments, respectively. Once foo has been called and its stack frame created (pushed rip and rbp), the stack will look like this: | | caller's stack data +--------------+ | three | third parameter \ | two | second parameter |- parameters to foo() | one | first parameter / +--------------+ | rip | where instruction we stopped at in the caller \ stack frame | rbp | address of the bottom of the caller's stack / management +--------------+ | | foo's stack data Note how the arguments are located before the callee's stack frame. This makes them technically part of the caller's stack. x86-64 calling convention: Remember how we said that accessing the stack is slow and registers are fast? x86-64 architectures take advantage of this when calling functions with six or fewer arguments. If there are more than six arguments, the first six are passed via registers (the same ones used for six or fewer arguments) and the rest of the arguments are pushed on the stack in reverse order, like in the x86 calling convention. The registers used to pass arguments are, in order: rdi - argument 1 rsi - argument 2 rdx - argument 3 rcx - argument 4 r8 - argument 5 r9 - argument 6 We have only discussed the commonly used registers, but in the transition from x86 to x86-64, eight more general purpose registers were added: r8 through r15. Here is an example of passing three arguments, the same function call from above, foo(one, two, three), would result in the assembly code mov one, %rdi mov two, %rsi mov three, %rdx call foo where one, two, and three are placeholders for the actual arguments. Here is another example, but this time with nine arguments: foo(one, two, three, four, five, six, seven, eight, nine) yields the assembly code: mov one, %rdi mov two, %rsi mov three, %rdx mov four, %rcx mov five, %r8 mov six, %r9 push nine push eight push seven call foo resulting in the stack: | | caller's stack data +--------------+ | nine | ninth parameter \ | eight | eighth parameter |- last three parameters to foo() | seven | seventh parameter / +--------------+ | rip | where instruction we stopped at in the caller \ stack frame | rbp | address of the bottom of the caller's stack / management +--------------+ | | foo's stack data As you can see, the first six arguments are passed via the agreed upon registers, and the rest are passed on the stack like in x86. For more information on stack frames and calling conventions, visit: http://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64 ================================================================== Viewing assembly code - objdump ================================================================== You can view the assembly code for an executable with the command line utility "objdump", which should be available on unix-based systems. The most commonly used command is "objdump -d myExecutable > asm". This command disassembles the "interesting" parts myExecutable (the parts that we are interested in, not the entire binary). The -d option is what tells objdump to disassemble only the interesting parts. To disassemble the entire executable, use the -D option instead. By default, objdump prints its output to stdout (i.e. the screen), so the last bit of the command redirects the output into a file we called "asm". The following discussion goes through the various parts of the output from an executable disassembled with the -d option. There are several sections to the standard disassembly. The first section you will see is the PLT section, which stands for "procedure linkage table". This is a table of all the library functions used by the program, and when the program is run, the operating system changes some placeholder values to the actual addresses of the library functions. This is one of the mechanisms behind dynamical linking. The other dynamic linking mechanism uses the global offset table, or GOT. A good 20-part article on linking can be found here: http://www.airs.com/blog/archives/38. There aren't links to the other parts on each page, but the 20 parts go from archive 38 to archive 57, so just manually change the number at the end of the link. After the PLT section, there may be a few program management functions called things like "deregister_frame" that you shouldn't need to worry about. Functions are named by the label on the line just before each section, with the function's content indented by a few spaces. After the special management functions, we find the functions that actually carry out the functionality of the program, such as main. In nearly all challenges, this is the only section you care about. After the normal functions, there are more helper/utility functions for simply running a program and potentially some program data as well, but you can generally always ignore this part. Now you may be wondering, "if an executable is just a bunch of instructions, where did objdump get all the function names"? The answer is in the stored symbols. Part of an executable file stores the names of functions, and, depending on how it was compiled, sometimes extra information for debugging, such as local variable names and information required to compute source code line numbers for each instruction. On the other hand, code can be compiled as stripped of symbols. This means that objdump doesn't even know where functions start and end, let alone what their names are. In these cases, it is usually possible to split the disassembly into functions by manually entering breaks after every "ret" instruction, and/or before every sequence of "push %rbp" "mov %rsp, %rbp", as these are the two instructions that manage the base pointer in the stack frame. CTFs sometimes provide stripped binaries, making it slightly more difficult to debug. You can tell if a binary is stripped without disassembling it by running the "file" command on it. Figuring out which function main is usually requires a debugger and breakpoints to guess and check which function gets executed first. Sometimes you can figure it out by looking at what functions get called in each function. For example, main usually will be the function that gathers input from the user, so it will usually have a call to "read", "gets", or "fgets", as well as "printf" or "puts" for printing data or prompts. Every line of the objdump assembly code has the following format: The memory address is the address of the instruction itself. Think of it as the value found in rip when this instruction is being executed. However, rip doesn't store this literal address. If it did, the program would always have to be loaded into memory at the same spot, and if you had two programs that needed the same memory locations, you wouldn't be able to run them at the same time! We previously mentioned address space layout randomization (ASLR) and how it randomly decides what memory addresses to give to a program for the stack and heap space. Fortunately, ASLR also applies to code memory, solving this issue. Memory virtualization also plays a part in this, but is mostly transparent to the programs running. Likely the only time you will look at the memory address section is when using a debugger and looking for where to set breakpoints, or when trying to pass in an address to an exploitable program. After he memory address is the hex representation (two hex digits per byte) of the bytes that make up the instruction. This can be useful to find instructions. For example, suppose you are reverse engineering a binary to guess a special input, and the first thing the program does is call the function sleep(20), which will cause the program to hang for 20 seconds. This immediately makes all brute forcing of the input impossible since each attempt takes at least 20 seconds. However, if you open the executable in a hex editor, you can search for the hex/byte sequence that represents that instruction and change all the bytes of the call instruction to nop instructions, effectively removing the function call. Now the program doesn't hang for 20 seconds and you can brute force the input! A bunch of nop instructions in a row is sometimes called a "nop sled", since the program "slides" over them. After the hex comes the assembly code containing the human-readable version of the instruction. Most of your time will be spent reading this part. This instruction line format is specifically for the output of objdump, but also follows a general format of assembly code. If you are writing assembly code, you only need the third part for each instruction. All the above information is about the different parts of the standard disassembly. However, there are more parts to an executable. Executables are broken up into sections, each containing different kinds of data. Below are a few of the different sections and what they contain. Section names begin with a period. Section | Contents --------+--------------------------------------------------------- .text | Contains all the code and actual instructions --------+--------------------------------------------------------- .data | Contains global variables --------+--------------------------------------------------------- .rodata | Contains Read-Only DATA (hence rodata). This is where your | constants go that are too large to fit into instructions. | String constants are the most common data found here. --------+--------------------------------------------------------- .bss | Used for initializing static variables (variables that exist | for the lifetime of the program, yet are not globals). Since | uninitialized static variables are automatically set to zero, | a bunch of zeroes don't need to be stored. The .bss section | therefore really only stores the number of bytes needed for | all the static variables, and doesn't actually contain anything. --------+--------------------------------------------------------- Some other sections you might see in objdump disassembly are .init, .fini, and .plt.got. You likely will never need to worry about these. To view a hexdump of a specific section, use the following command: objdump -s -j
The `-s` option means to display the entire section's data, and the -j option is used to specify a section name. The provided section name must contain the initial period. For example, to find all the string constants used in a program called bob.out, you would run: objdump -s -j .rodata bob.out For more information about the ELF file format and the different sections, visit: http://wiki.osdev.org/ELF ================================================================== More Assembly ================================================================== Everything mentioned in this document deals solely with integers. The stack acts the same regardless of integers or floats, but floats require registers specifically designed to handle them. There are also many other types of registers such as xmm registers, and special instructions that deal with them as well as a multitude of other instructions that are unneeded to understand 99% of assembly code.