1 15-BufferOverflow

1.1 Program inputs??

Besides user-input, what should be considered program inputs?

A bio-centric attack:
https://www.technologyreview.com/s/608596/scientists-hack-a-computer-using-dna/
http://dnasec.cs.washington.edu/dnasec.pdf
http://dnasec.cs.washington.edu/

1.2 Extra reading

13b-ReverseEngineering/assembly64.pdf
https://www.hacksplaining.com/exercises/buffer-overflows
https://www.hacksplaining.com/prevention/buffer-overflows

1.3 Screencasts

Password for the Vimeo videos is in Zulip chat.
SP21: https://vimeo.com/534609228
FS20: https://vimeo.com/408220458
Tip: If anyone wants to speed up the lecture videos a little, inspect the page, go to the browser console, and paste this in:
document.querySelector('video').playbackRate = 1.2

1.4 Background

https://en.wikipedia.org/wiki/Buffer_overflow

1.4.1 Address space

Every program needs to access memory in order to run.
For simplicity sake, it would be nice to allow each process (i.e., each executing program) to act as if it owns all of memory.
The address space model is used to accomplish this.
Each process can allocate space anywhere it wants in memory.
Most kernels manage each process’ allocation of memory through the virtual memory model.
How the memory is managed is irrelevant to the process.

1.4.2 Virtual memory

Mapping virtual addresses to real addresses
15-BufferOverflow/virt_mem.png

Virtual memory maps memory addresses used by a program, called virtual addresses, into physical addresses in computer memory.
Main storage, as seen by a process or task, appears as a contiguous address space or collection of contiguous segments.
The operating system manages virtual address spaces and the assignment of real memory to virtual memory.
Address translation hardware in the CPU, often referred to as a memory management unit or MMU, automatically translates virtual addresses to physical addresses.
Primary benefits of virtual memory include
- freeing applications from having to manage a shared memory space,
- increased security due to memory isolation, and
- being able to conceptually use more memory than might be physically available, using the technique of paging.

1.4.3 Memory organization

15-BufferOverflow/memory_diagram_stack_heap.png
15-BufferOverflow/stack00.png

Text: machine code of the program, compiled from the source code
Data: static program variables initialized in the source code prior to execution
BSS (block started by symbol): static variables that are uninitialized
Heap: data dynamically generated during the execution of a process
Stack: structure that grows downwards and keeps track of the activated method calls, their arguments and local variables

1.4.4 Stack

Stack growing downward here
15-BufferOverflow/call_stack.png

1.4.5 Stack and heap growth

Stack grows down, heap grows up
15-BufferOverflow/f4-crop.png

1.4.6 Frame

Each frame on the stack:
15-BufferOverflow/f3-crop.png

1.5 Buffer overflow

1.5.1 Exploits

What is an Exploit?

An exploit is any input (i.e., a piece of software, an argument string, or sequence of commands) that takes advantage of a bug, glitch or vulnerability in order to cause an attack.
An attack is an unintended or unanticipated behavior that occurs on computer software, hardware, or something electronic, that brings an advantage to the attacker

1.5.2 Definition

Buffer Overflow Attack

One of the most common software and OS bugs is a buffer overflow
The developer fails to include code that checks whether an input fits into its buffer array
An input to the running process exceeds the length of the buffer
The input string overwrites a portion of the memory of the process
Causes the application to behave improperly and unexpectedly
A very common attack mechanism
First widely used by the Morris Worm in 1988
Prevention techniques known
Still of major concern
Legacy of buggy code in widely deployed operating systems and applications
Continued careless programming practices by programmers…
- Having done work in “human factors” and “engineering psychology”, I actually question this notion — bad human factors in the design of the languages themselves is at fault!
- These do not have to occur! Just use a modern low level language like Rust.
- The language should support the human.
Effect of a buffer overflow:
- The process can operate on malicious data or execute malicious code passed in by the attacker.
- Further, if the process is executed as root, the malicious code will be executing with root privileges.

1.5.3 History

Buffer overflow attack history
15-BufferOverflow/image4.png
Still quite common due to legacy code in C/C++/asm!

A buffer overflow, also known as a buffer overrun, is defined in the NIST Glossary of Key Information Security Terms as follows:
“A condition at an interface under which more input can be placed into a buffer or data holding area than the capacity allocated, overwriting other information. Attackers exploit such a condition to crash a system or to insert specially crafted code that allows them to gain control of the system.”

1.5.4 Buffer Overflow Basics

Programming error when a process attempts to store data beyond the limits of a fixed-sized buffer
- Overwrites adjacent memory locations
- Locations could hold other program variables, parameters, or program control flow data
- Buffer could be located on the stack, in the heap, or in the data section of the process
Consequences:
- Corruption of program data,
- Unexpected transfer of control,
- Memory access violations,
- Execution of code chosen by attacker

1.5.5 Example in C

Contiguous memory in C

unsigned short B = 1979;
char A[8] = "";

Lower in stack 15-BufferOverflow/over00.png Higher in stack
Contiguous memory in C

Unsafe:

strcpy(A, "excessive");

Lower in stack 15-BufferOverflow/over01.png Higher in stack

Safer: To prevent the buffer overflow from happening in this example, the call to strcpy could be replaced with strncpy, which takes the maximum capacity of A as an additional parameter and ensures that no more than this amount of data is written to A:

strncpy(A, "excessive", sizeof(A));

Safest: Use a modern fast systems-capable language like Rust.

1.5.6 strcpy() vs. strncpy()

Explicit is better than implicit!

Function strcpy() copies the string in the second argument into the first argument
- e.g., strcpy(dest, src)
- If source string > destination string, the overflow characters may occupy the memory space used by other variables
- The null character is appended at the end automatically
Function strncpy() copies the string by specifying the number n of characters to copy
- e.g., strncpy(dest, src, n); dest[n] = '\0’
- If source string is longer than the destination string, the overflow characters are discarded automatically
- You have to place the null character manually.

1.5.7 Attacks

Buffer Overflow Attacks

To exploit a buffer overflow an attacker needs:
- To identify a buffer overflow vulnerability in some program that can be triggered using externally sourced data under the attacker’s control
- To understand how that buffer is stored in memory and determine potential for corruption
Identifying vulnerable programs can be done by:
- Inspection of program source manually
- Tracing the execution of programs as they process over-sized input
- Using tools such as fuzzing to automatically identify potentially vulnerable programs:

1.5.7.1 Fuzzing

https://en.wikipedia.org/wiki/Fuzzing
https://www.fuzzingbook.org/

This is like randomized unit testing, but barrages the binary with far more randomized inputs.
Fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program.
The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks.
Typically, fuzzers are used to test programs that take structured inputs.
This structure is specified, e.g., in a file format or protocol and distinguishes valid from invalid input.
An effective fuzzer generates semi-valid inputs that are “valid enough” in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are “invalid enough” to expose corner cases that have not been properly dealt with.

1.5.8 Programming language

At the machine level data manipulated by machine instructions executed by the computer processor are stored in either the processor’s registers or in memory
Assembly language programmer is responsible for the correct interpretation of any saved data value
Modern high-level languages have a strong notion of type and valid operations;
- Not vulnerable to buffer overflows;
- Does incur overhead, some limits on use
C and related languages have high-level control structures, but allow direct access to memory;
- Hence are vulnerable to buffer overflow;
- Have a large legacy of widely used, unsafe, and hence vulnerable code

Memory organization
15-BufferOverflow/memory_diagram_stack_heap.png
15-BufferOverflow/stack00.png

1.6 Stack buffer overflow

Stack buffer overflow or stack buffer overrun occurs when a program writes to a memory address on the program’s call stack outside of the intended data structure, which is usually a fixed-length buffer.
A program writes more data to a buffer located on the stack than what is actually allocated for that buffer.
This almost always results in corruption of adjacent data on the stack, and in cases where the overflow was triggered by mistake, will often cause the program to crash or operate incorrectly.
Stack buffer overflow is a type of the more general programming malfunction known as buffer overflow (or buffer overrun).
Overfilling a buffer on the stack is more likely to derail program execution than overfilling a buffer on the heap because the stack contains the return addresses for all active function calls.

1.6.1 Simple example

Stack buffer overflow vulnerability example in C

#include <string.h>

void foo(char *bar)
{
    char  c[12];
    strcpy(c, bar);  // no bounds checking
}

int main(int argc, char **argv)
{
    foo(argv[1]);
    return 0;
}
// g++ source.cpp
// ./a.out arg1tooooloooooongg

+++++++++++++++ Cahoot15-1

Stack buffer overflow vulnerability example in C

The previous code takes an argument from the command line and copies it to a local stack variable c.
This works fine for command line arguments smaller than 12 characters (as you can see in figure B below).
Any arguments larger than 11 characters long will result in corruption of the stack.
The maximum number of characters that is safe is one less than the size of the buffer here because in the C programming language strings are terminated by a zero byte character.
A twelve-character input thus requires thirteen bytes to store, the input followed by the sentinel zero byte.
The zero byte then ends up overwriting a memory location that’s one byte beyond the end of the buffer.

Stack buffer overflow vulnerability example Before data is copied
NOTE: this image series is “upside-down” compared to most stack diagrams
15-BufferOverflow/overflow3.png

Stack buffer overflow vulnerability example Hello is the first command line argument
15-BufferOverflow/overflow1.png

Stack buffer overflow vulnerability example
“AAAAAAAAAAAAAAAAAAAA
is the first command line argument (address of what?)
15-BufferOverflow/overflow2.png

1.6.2 Another example

Buffer overflow attack
15-BufferOverflow/image6.png

Buffer overflow attack
15-BufferOverflow/image7.png

1.6.3 Recall

Password example buffer_overflow.cpp uploaded during the last section.

1.7 Shellcode injection

An exploit takes control of attacked computer so injects code to “spawn a shell” or “shellcode”
A shellcode is:
- Code assembled in the CPU’s native instruction set (e.g. x86 , x86-64, arm, sparc, risc, etc.)
- Injected as a part of the buffer that is overflowed.
- We inject the code directly into the buffer that we send for the attack
- A buffer containing shellcode is a “payload”
Code supplied by attacker
- Often saved in buffer being overflowed
- Traditionally transferred control to a user command-line interpreter (shell)
Machine code
- Specific to processor and operating system
- Traditionally needed good assembly language skills to create
- More recently a number of sites and tools have been developed that automate this process

Side note:

Metasploit Project
- https://www.metasploit.com/
- https://en.wikipedia.org/wiki/Metasploit_Project
- Provides useful information to people who perform penetration, IDS signature development, and exploit research.
- I recommend testing this out on a virtual network, on virtual machines, which are isolated from the internet at large.
- Do NOT let this loose on the campus network, or other real-world environments; doing so would likely be illegal, and irresponsible (or malicious), unless you were paid to do some kind of audit, for example.

1.7.1 Stack Overflow Variants

Target

A trusted system utility
Network service daemon
Commonly used library code

Shellcode functions

Launch a remote shell when connected to
Create a reverse shell that connects back to the hacker
Use local exploits that establish a shell
Flush firewall rules that currently block other attacks
Break out of a chroot (restricted execution) environment (like a Docker/OCI container), giving full access to the system.
- Side note: Containers are much easier to escape than real VMs…

1.8 Defenses

Two approaches to buffer overflow defense

Compile time
Aim to harden programs to resist attacks in new programs.

Run time
Aim to detect and abort attacks in existing programs.

1.8.1 Compile-Time Defenses: Programming Language

Solution:
Use a modern high-level language (python, rust, ruby, lua, go etc).

Advantages

Not vulnerable to buffer overflow attacks that exploit memory in this way
Compiler enforces range checks and permissible operations on variables

Disadvantages

Additional code must be executed at run time to impose checks
Flexibility and safety comes at a cost in resource use
Distance from the underlying machine language and architecture means that access to some instructions and hardware resources is lost
Limits their usefulness in writing code, such as device drivers, that must interact with such resources

1.8.2 Compile-Time Defenses: Safe Coding Techniques

C designers placed much more emphasis on space efficiency and performance considerations than on type safety
Assumed programmers would exercise due care in writing code…
- hah, people exercise care and discretion… right; they’ve clearly never taught large numbers of students!
Programmers need to inspect the code and rewrite any unsafe coding… yea right.
An example of this is the OpenBSD project, where programmers have audited the existing code base, including the operating system, standard libraries, and common utilities, and this has resulted in what is widely regarded as one of the safest operating systems in widespread use.
It’s better to automate safety! Clearly humans can’t handle it.

1.8.3 Run-time defenses

Will discuss more soon

1.9 Heap overflow

Attack buffer located in heap
- Typically located above program code
- Memory is requested by programs to use in dynamic data structures (such as linked lists of records)
No return address
- Hence no easy transfer of control
- May have function pointers can exploit
- Or manipulate management data structures
Defenses
- Making the heap non-executable
- Randomizing the allocation of memory on the heap

1.10 Defenses

Two approaches to buffer overflow defense

Compile time
Aim to harden programs to resist attacks in new programs.

Run time
Aim to detect and abort attacks in existing programs.

1.10.1 Compile time

Prevent or detect buffer overflows by instrumenting programs when they are compiled.
Examples include:
- choosing a high-level language that does not permit buffer overflows,
- safe coding standards,
- safe standard libraries, or
- including additional code to detect corruption of the stack frame

1.10.1.1 Programming language

like above

Examples
15-BufferOverflow/languages.png

1.10.1.1.1 Classic overflow

void copyData(char *userId)
{
    char smallBuffer[10]; // size of 10
    strcpy(smallBuffer, userId);
}

int main(int argc, char *argv[])
{
    // Payload of 11
    char *userId = "01234567890";
    // this shall cause a buffer overload
    copyData(userId);
    return 0;
}

Unsafe functions
15-BufferOverflow/image14.png

Range checking with strncopy, strncat, cin.getline, etc., is often suggested, as human-factors hand-holding / mommying you.

Classic overflow fix?

#include <iostream>
#include <cstring>

int main(void)
{
    char strDest[3]= "hi";
    char strSrc[] = "Welcome";
    char anotherCstring[] = "Hello";
    strncpy(strDest, strSrc, 5);
    std::cout << strDest;
    return 0;
}

strncopy can cause another overflow too (no NULL check)

Unsafe functions
15-BufferOverflow/funcs.png

Additional checks on buffer size are performed
*_s are Windows, and strl* are Unix

1.10.1.1.2 Format Strings

Format string buffer overflows (usually called “format string vulnerabilities”) are highly specialized buffer overflows that can have the same effects as other buffer overflow attacks.
Format string vulnerabilities take advantage of the mixture of data and control information in certain functions, such as C/C++’s printf.

Some format parameters:

%x  // hexadecimal (unsigned int)
%s  // string ((const) (unsigned) char *)
%n  // number of bytes written so far, (* int)
%d  // decimal (int)
%u  // unsigned decimal (unsigned int)

Example:
printf ("Hello:%s\n", a273150);

Issues:

printf (User_Input); // Vulnerability:
// Attack:
"%s%s%s%s%s%s%s%s%s%s%s%s"
printf("%s", str); //Fix:

For every % in the argument the printf function finds, it assumes that there is an associated value on the stack.
In this way, the function walks the stack downwards, reading the corresponding values from the stack, and printing them to the user.

1.10.1.1.3 Integer overflow

int main(void)
{
    int val;
    val = 0x7fffffff;     /* 2147483647*/
    printf("val=%d(0x%x)\n", val, val);
    /*Overflow the int*/
    printf("val+1=%d(0x%x)\n", val + 1, val + 1);
    return 0;
}

The binary representation of 0x7fffffff is 1111111111111111111111111111111.
This integer is initialized with the highest positive value a signed long integer can hold.
Here when we add 1 to the hex value of 0x7fffffff the value of the integer overflows and goes to a negative number (0x7fffffff + 1 = 80000000).
In decimal this is (-2147483648).

Integer overflow

#include <limits.h>

int safe_add(int a, int b)
{
    if(a > 0 && b > INT_MAX - a)
    {
        /* handle overflow */
    }
    else if(a < 0 && b < INT_MIN - a)
    {
        /* handle underflow */
    }
    return a + b;
}

1.10.1.1.4 Array bounds checking

Code copies len bytes out of the from array into the to array starting at position pos and returning the end position.
Unfortunately, this function is given no information about the actual size of the destination buffer to and hence is unable to ensure an overflow does not occur.
In this case, the calling code should to ensure that the value of size+len is not larger than the size of the to array.
Also, input is not necessarily a string; it could just as easily be binary data, just carelessly manipulated.

1.10.1.2 Language Extensions/Safe Libraries

A common concern with C comes from the use of unsafe standard library routines, especially some of the string manipulation routines.
One approach to improving the safety of systems has been to replace these with safer variants.
Using these requires rewriting the source to conform to the new safer semantics.
Programs and libraries need to be recompiled
Likely to have problems with third-party applications
One approach has been to replace these with safer variants
Libsafe is an example implemented as a dynamic library arranged to load before the existing standard libraries, so programs don’t need to be recompiled.

1.10.1.3 Stack protection

Add function entry and exit code to check stack for signs of corruption of the stack frame
- Insert random canary below the old frame pointer address, before the allocation of space for local variables
  - Value needs to be unpredictable
  - Should be different on different systems
- Added function exit code checks that the canary value has not changed before continuing with the usual function exit operations of restoring the old frame pointer and transferring control back to the return address.
- Requires recompiling
GCC extensions that include additional function entry and exit code
- Function entry writes a copy of the return address to a safe region of memory
- Function exit code checks the return address in the stack frame against the saved copy
- If change is found, aborts the program.

1.10.2 Run-time defenses

Most of the compile-time approaches require recompilation of existing programs.
Hence, there is interest in run-time defenses that can be deployed as operating systems updates to provide some protection for existing vulnerable programs.
These defenses involve changes to the memory management of the virtual address space of processes.
These changes act to either alter the properties of regions of memory, or to make predicting the location of targeted buffers sufficiently difficult to thwart many types of attacks.
Most of these are annoying extra complexity, which really should not need to exist, if compile time was done correctly…

1.10.2.1 Executable address space protection

Overview

Many of the buffer overflow attacks, such as the stack overflow examples in this chapter, involve copying machine code into the targeted buffer and then transferring execution to it.
A defense is to block the execution of code on the stack, on the assumption that executable code should only be found elsewhere in the processes address space.
Making the stack (and heap) non-executable provides a high degree of protection against many types of buffer overflow attacks for existing programs, hence the inclusion of this practice is standard in a number of recent operating systems releases.

Issues

One issue is support for programs that do need to place executable code on the stack.

1.10.2.2 Address space randomization

To implement the classic stack overflow attack, the attacker needs to be able to predict the approximate location of the targeted buffer to determine a suitable return address to use in the attack to transfer control to the shell-code.
One technique to greatly increase the difficulty of this prediction is to change the address at which the stack is located in a random manner for each process.
The range of addresses available on modern processors is large (32 bits), and most programs only need a small fraction of that.
Therefore, moving the stack memory region around by a megabyte or so has minimal impact on most programs but makes predicting the targeted buffer’s address almost impossible.
This amount of variation is also much larger than the size of most vulnerable buffers.
Here, the arms race introduces unfortunate extra complexity.

1.10.2.3 Random dynamic memory allocation

Similar to address space randomization
There is a class of heap buffer overflow attacks that exploit the expected proximity of successive memory allocations, or indeed the arrangement of the heap management data structures.
Randomizing the allocation of memory on the heap makes the possibility of predicting the address of targeted buffers extremely difficult, thus thwarting the successful execution of some heap overflow attacks.

1.10.2.4 Random standard library locations

Another target of attack is the location of standard library routines.
In an attempt to bypass protections such as non-executable stacks, some buffer overflow variants exploit existing code in standard libraries.
These are typically loaded at the same address by the same program.
To counter this form of attack, we can use a security extension that randomizes the order of loading standard libraries by a program and their virtual memory address locations.
This makes the address of any specific function sufficiently unpredictable as to render the chance of a given attack correctly predicting its address, very low.

1.10.2.5 Guard pages

Similar to canary above.
A process has much more virtual memory available than it typically needs
Place gaps (guard pages) between regions of memory (either large divisions like stack and heap, or smaller like stack frames)
Flagged as illegal addresses
Any attempted access aborts process
Further extension places guard pages Between stack frames and heap buffers
Cost in execution time to support the large number of page mappings necessary

1.11 Other attack types

1.11.1 Replacement stack frame

Variant that overwrites buffer and saved frame pointer address
- Saved frame pointer value is changed to refer to a dummy stack frame
- Current function returns to calling function
- Then when calling function returns, control is transferred to the shell-code in the overwritten buffer
- Can be used to overcome limitations in buffer overflow size or content
Off-by-one vulnerability:
- Coding error that allows one more byte to be copied than there is space available
- Defenses
  - Any stack protection mechanisms to detect modifications to the stack frame or return address by function exit code
  - Use non-executable stacks
  - Randomization of the stack in memory and of system libraries

1.11.2 Return to system call

Stack overflow variant replaces return address with standard library function

Response to non-executable stack defenses
Transfer to system call, and trick the system call to then execute malicious code

Defenses

Any stack protection mechanisms to detect modifications to the stack frame or return address by function exit code
Use non-executable stacks
Randomization of the stack in memory and of system libraries

1.11.3 Heap overflow

Attack buffer located in heap
- Typically located above program code, and grows up
- Memory is requested by programs to use in dynamic data structures (such as linked lists of records)
No return address
- Hence no easy transfer of control
- However, if there are pointers to functions, control can be transferred
- Or manipulate management data structures
Defenses
- Making the heap non-executable
- Randomizing the allocation of memory on the heap

1.11.4 Global data overflow

Can attack buffer located in global data

May be located above program code
If has function pointer and vulnerable buffer
Or adjacent process management tables, e.g., with references to destructor function
Aim to overwrite function pointer later called

Defenses

Non executable or random global data region
Move function pointers
Guard pages

+++++++++++++++++++++++ Cahoot15-2

Next: 16-Databases.html