Tip: If anyone wants to speed up the lecture videos a little,
inspect the page, go to the browser console, and paste this in: document.querySelector('video').playbackRate = 1.2
Every program needs to access memory in order to run.
For simplicity sake, it would be nice to allow each process (i.e.,
each executing program) to act as if it owns all of memory.
The address space model is used to accomplish this.
Each process can allocate space anywhere it wants in memory.
Most kernels manage each process’ allocation of memory through the
virtual memory model.
How the memory is managed is irrelevant to the process.
1.4.2 Virtual memory
Mapping virtual addresses to real addresses
Virtual memory maps memory addresses used by a program, called
virtual addresses, into physical addresses in computer memory.
Main storage, as seen by a process or task, appears as a contiguous
address space or collection of contiguous segments.
The operating system manages virtual address spaces and the
assignment of real memory to virtual memory.
Address translation hardware in the CPU, often referred to as a
memory management unit or MMU, automatically translates virtual
addresses to physical addresses.
Primary benefits of virtual memory include
freeing applications from having to manage a shared memory
space,
increased security due to memory isolation, and
being able to conceptually use more memory than might be physically
available, using the technique of paging.
1.4.3 Memory organization
Text: machine code of the program, compiled from
the source code
Data: static program variables initialized in the
source code prior to execution
BSS (block started by symbol): static variables
that are uninitialized
Heap: data dynamically generated during the
execution of a process
Stack: structure that grows downwards and keeps
track of the activated method calls, their arguments and local
variables
1.4.4 Stack
Stack growing downward here
1.4.5 Stack and heap growth
Stack grows down, heap grows up
1.4.6 Frame
Each frame on the stack:
1.5 Buffer overflow
1.5.1 Exploits
What is an Exploit?
An exploit is any input (i.e., a piece of software, an argument
string, or sequence of commands) that takes advantage of a bug, glitch
or vulnerability in order to cause an attack.
An attack is an unintended or unanticipated behavior that occurs on
computer software, hardware, or something electronic, that brings an
advantage to the attacker
1.5.2 Definition
Buffer Overflow Attack
One of the most common software and OS bugs is a buffer
overflow
The developer fails to include code that checks whether an input
fits into its buffer array
An input to the running process exceeds the length of the
buffer
The input string overwrites a portion of the memory of the
process
Causes the application to behave improperly and unexpectedly
A very common attack mechanism
First widely used by the Morris Worm in 1988
Prevention techniques known
Still of major concern
Legacy of buggy code in widely deployed operating systems and
applications
Continued careless programming practices by programmers…
Having done work in “human factors” and “engineering psychology”, I
actually question this notion — bad human factors in the design of the
languages themselves is at fault!
These do not have to occur! Just use a modern low level language
like Rust.
The language should support the human.
Effect of a buffer overflow:
The process can operate on malicious data or execute malicious code
passed in by the attacker.
Further, if the process is executed as root, the malicious code will
be executing with root privileges.
1.5.3 History
Buffer overflow attack history
Still quite common due to legacy code in C/C++/asm!
A buffer overflow, also known as a buffer overrun, is defined in the
NIST Glossary of Key Information Security Terms as follows:
“A condition at an interface under which more input can be placed into a
buffer or data holding area than the capacity allocated, overwriting
other information. Attackers exploit such a condition to crash a system
or to insert specially crafted code that allows them to gain control of
the system.”
1.5.4 Buffer Overflow Basics
Programming error when a process attempts to store data beyond the
limits of a fixed-sized buffer
Overwrites adjacent memory locations
Locations could hold other program variables, parameters, or program
control flow data
Buffer could be located on the stack, in the heap, or in the data
section of the process
Consequences:
Corruption of program data,
Unexpected transfer of control,
Memory access violations,
Execution of code chosen by attacker
1.5.5 Example in C
Contiguous memory in C
unsigned short B = 1979;
char A[8] = "";
Lower in stackHigher in stack
Contiguous memory in C
Unsafe:
strcpy(A, "excessive");
Lower in stackHigher in stack
Safer: To prevent the buffer overflow from happening
in this example, the call to strcpy could be replaced with strncpy,
which takes the maximum capacity of A as an additional parameter and
ensures that no more than this amount of data is written to A:
strncpy(A, "excessive", sizeof(A));
Safest: Use a modern fast systems-capable language
like Rust.
1.5.6 strcpy() vs. strncpy()
Explicit is better than implicit!
Function strcpy() copies the string in the second argument into the
first argument
e.g., strcpy(dest, src)
If source string > destination string, the overflow characters
may occupy the memory space used by other variables
The null character is appended at the end automatically
Function strncpy() copies the string by specifying the number n of
characters to copy
e.g., strncpy(dest, src, n); dest[n] = '\0’
If source string is longer than the destination string, the overflow
characters are discarded automatically
You have to place the null character manually.
1.5.7 Attacks
Buffer Overflow Attacks
To exploit a buffer overflow an attacker needs:
To identify a buffer overflow vulnerability in some program that can
be triggered using externally sourced data under the attacker’s
control
To understand how that buffer is stored in memory and determine
potential for corruption
Identifying vulnerable programs can be done by:
Inspection of program source manually
Tracing the execution of programs as they process over-sized
input
Using tools such as fuzzing to automatically
identify potentially vulnerable programs:
This is like randomized unit testing, but barrages the binary with
far more randomized inputs.
Fuzzing or fuzz testing is an automated software testing technique
that involves providing invalid, unexpected, or random data as inputs to
a computer program.
The program is then monitored for exceptions such as crashes,
failing built-in code assertions, or potential memory leaks.
Typically, fuzzers are used to test programs that take structured
inputs.
This structure is specified, e.g., in a file format or protocol and
distinguishes valid from invalid input.
An effective fuzzer generates semi-valid inputs that are “valid
enough” in that they are not directly rejected by the parser, but do
create unexpected behaviors deeper in the program and are “invalid
enough” to expose corner cases that have not been properly dealt
with.
1.5.8 Programming language
At the machine level data manipulated by machine instructions
executed by the computer processor are stored in either the processor’s
registers or in memory
Assembly language programmer is responsible for the correct
interpretation of any saved data value
Modern high-level languages have a strong notion of type and valid
operations;
Not vulnerable to buffer overflows;
Does incur overhead, some limits on use
C and related languages have high-level control structures, but
allow direct access to memory;
Hence are vulnerable to buffer overflow;
Have a large legacy of widely used, unsafe, and hence vulnerable
code
Memory organization
1.6 Stack buffer overflow
Stack buffer overflow or stack buffer overrun occurs when a program
writes to a memory address on the program’s call stack outside of the
intended data structure, which is usually a fixed-length buffer.
A program writes more data to a buffer located on the stack than
what is actually allocated for that buffer.
This almost always results in corruption of adjacent data on the
stack, and in cases where the overflow was triggered by mistake, will
often cause the program to crash or operate incorrectly.
Stack buffer overflow is a type of the more general programming
malfunction known as buffer overflow (or buffer overrun).
Overfilling a buffer on the stack is more likely to derail program
execution than overfilling a buffer on the heap because the stack
contains the return addresses for all active function calls.
The previous code takes an argument from the command line and copies
it to a local stack variable c.
This works fine for command line arguments smaller than 12
characters (as you can see in figure B below).
Any arguments larger than 11 characters long will result in
corruption of the stack.
The maximum number of characters that is safe is one less than the
size of the buffer here because in the C programming language strings
are terminated by a zero byte character.
A twelve-character input thus requires thirteen bytes to store, the
input followed by the sentinel zero byte.
The zero byte then ends up overwriting a memory location that’s one
byte beyond the end of the buffer.
Stack buffer overflow vulnerability example Before data is
copied NOTE: this image series is “upside-down” compared to most stack
diagrams
Stack buffer overflow vulnerability example Hello is the first
command line argument
Stack buffer overflow vulnerability example
“AAAAAAAAAAAAAAAAAAAA
is the first command line argument (address of what?)
1.6.2 Another example
Buffer overflow attack
Buffer overflow attack
1.6.3 Recall
Password example buffer_overflow.cpp uploaded during the
last section.
1.7 Shellcode injection
An exploit takes control of attacked computer so injects code to
“spawn a shell” or “shellcode”
A shellcode is:
Code assembled in the CPU’s native instruction set
(e.g. x86 , x86-64, arm, sparc, risc, etc.)
Injected as a part of the buffer that is
overflowed.
We inject the code directly into the buffer that we send for the
attack
A buffer containing shellcode is a “payload”
Code supplied by attacker
Often saved in buffer being overflowed
Traditionally transferred control to a user command-line interpreter
(shell)
Machine code
Specific to processor and operating system
Traditionally needed good assembly language skills to create
More recently a number of sites and tools have been
developed that automate this process
Provides useful information to people who perform penetration, IDS
signature development, and exploit research.
I recommend testing this out on a virtual network, on virtual
machines, which are isolated from the internet at large.
Do NOT let this loose on the campus network, or other real-world
environments; doing so would likely be illegal, and irresponsible (or
malicious), unless you were paid to do some kind of audit, for
example.
1.7.1 Stack Overflow Variants
Target
A trusted system utility
Network service daemon
Commonly used library code
Shellcode functions
Launch a remote shell when connected to
Create a reverse shell that connects back to the hacker
Use local exploits that establish a shell
Flush firewall rules that currently block other attacks
Break out of a chroot (restricted execution) environment (like a
Docker/OCI container), giving full access to the system.
Side note: Containers are much easier to escape than real VMs…
1.8 Defenses
Two approaches to buffer overflow defense
Compile time
Aim to harden programs to resist attacks in new programs.
Run time
Aim to detect and abort attacks in existing programs.
1.8.1 Compile-Time Defenses:
Programming Language
Solution:
Use a modern high-level language (python, rust, ruby, lua, go etc).
Advantages
Not vulnerable to buffer overflow attacks that exploit memory in
this way
Compiler enforces range checks and permissible operations on
variables
Disadvantages
Additional code must be executed at run time to impose checks
Flexibility and safety comes at a cost in resource use
Distance from the underlying machine language and architecture means
that access to some instructions and hardware resources is lost
Limits their usefulness in writing code, such as device drivers,
that must interact with such resources
C designers placed much more emphasis on space efficiency and
performance considerations than on type safety
Assumed programmers would exercise due care in writing code…
hah, people exercise care and discretion… right; they’ve clearly
never taught large numbers of students!
Programmers need to inspect the code and rewrite any unsafe coding…
yea right.
An example of this is the OpenBSD project, where programmers have
audited the existing code base, including the operating system, standard
libraries, and common utilities, and this has resulted in what is widely
regarded as one of the safest operating systems in widespread use.
It’s better to automate safety! Clearly humans can’t handle it.
1.8.3 Run-time defenses
Will discuss more soon
1.9 Heap overflow
Attack buffer located in heap
Typically located above program code
Memory is requested by programs to use in dynamic data structures
(such as linked lists of records)
No return address
Hence no easy transfer of control
May have function pointers can exploit
Or manipulate management data structures
Defenses
Making the heap non-executable
Randomizing the allocation of memory on the heap
1.10 Defenses
Two approaches to buffer overflow defense
Compile time
Aim to harden programs to resist attacks in new programs.
Run time
Aim to detect and abort attacks in existing programs.
1.10.1 Compile time
Prevent or detect buffer overflows by instrumenting programs when
they are compiled.
Examples include:
choosing a high-level language that does not permit
buffer overflows,
safe coding standards,
safe standard libraries, or
including additional code to detect corruption of
the stack frame
1.10.1.1 Programming language
like above
Examples
1.10.1.1.1 Classic overflow
void copyData(char*userId){char smallBuffer[10];// size of 10 strcpy(smallBuffer, userId);}int main(int argc,char*argv[]){// Payload of 11char*userId ="01234567890";// this shall cause a buffer overload copyData(userId);return0;}
Unsafe functions
Range checking with strncopy, strncat,
cin.getline, etc., is often suggested, as human-factors
hand-holding / mommying you.
strncopy can cause another overflow too (no NULL check)
Unsafe functions
Additional checks on buffer size are performed
*_s are Windows, and strl* are Unix
1.10.1.1.2 Format Strings
Format string buffer overflows (usually called “format string
vulnerabilities”) are highly specialized buffer overflows that can have
the same effects as other buffer overflow attacks.
Format string vulnerabilities take advantage of the mixture of data
and control information in certain functions, such as C/C++’s
printf.
Some format parameters:
%x // hexadecimal (unsigned int)
%s // string ((const) (unsigned) char *)
%n // number of bytes written so far, (* int)
%d // decimal (int)
%u // unsigned decimal (unsigned int)
For every % in the argument the printf function finds,
it assumes that there is an associated value on the stack.
In this way, the function walks the stack downwards, reading the
corresponding values from the stack, and printing them to the user.
1.10.1.1.3 Integer overflow
int main(void){int val; val =0x7fffffff;/* 2147483647*/ printf("val=%d(0x%x)\n", val, val);/*Overflow the int*/ printf("val+1=%d(0x%x)\n", val +1, val +1);return0;}
The binary representation of 0x7fffffff is
1111111111111111111111111111111.
This integer is initialized with the highest positive value a signed
long integer can hold.
Here when we add 1 to the hex value of 0x7fffffff the value of the
integer overflows and goes to a negative number (0x7fffffff + 1 =
80000000).
In decimal this is (-2147483648).
Integer overflow
#include <limits.h>int safe_add(int a,int b){if(a >0&& b > INT_MAX - a){/* handle overflow */}elseif(a <0&& b < INT_MIN - a){/* handle underflow */}return a + b;}
1.10.1.1.4 Array bounds
checking
Code copies len bytes out of the from array into the to array
starting at position pos and returning the end position.
Unfortunately, this function is given no information about the
actual size of the destination buffer to and hence is unable to ensure
an overflow does not occur.
In this case, the calling code should to ensure that the value of
size+len is not larger than the size of the to array.
Also, input is not necessarily a string; it could just as easily be
binary data, just carelessly manipulated.
1.10.1.2 Language Extensions/Safe
Libraries
A common concern with C comes from the use of unsafe standard
library routines, especially some of the string manipulation
routines.
One approach to improving the safety of systems has been to replace
these with safer variants.
Using these requires rewriting the source to conform to the new
safer semantics.
Programs and libraries need to be recompiled
Likely to have problems with third-party applications
One approach has been to replace these with safer variants
Libsafe is an example implemented as a dynamic library arranged to
load before the existing standard libraries, so programs don’t need to
be recompiled.
1.10.1.3 Stack protection
Add function entry and exit code to check stack for signs of
corruption of the stack frame
Insert random canary below the old frame pointer
address, before the allocation of space for local variables
Value needs to be unpredictable
Should be different on different systems
Added function exit code checks that the canary value has not
changed before continuing with the usual function exit operations of
restoring the old frame pointer and transferring control back to the
return address.
Requires recompiling
GCC extensions that include additional function entry and exit code
Function entry writes a copy of the return address to a safe region
of memory
Function exit code checks the return address in the stack frame
against the saved copy
If change is found, aborts the program.
1.10.2 Run-time defenses
Most of the compile-time approaches require recompilation of
existing programs.
Hence, there is interest in run-time defenses that can be deployed
as operating systems updates to provide some protection for existing
vulnerable programs.
These defenses involve changes to the memory management of the
virtual address space of processes.
These changes act to either alter the properties of regions of
memory, or to make predicting the location of targeted buffers
sufficiently difficult to thwart many types of attacks.
Most of these are annoying extra complexity, which really should not
need to exist, if compile time was done correctly…
1.10.2.1 Executable address space
protection
Overview
Many of the buffer overflow attacks, such as the stack overflow
examples in this chapter, involve copying machine code into the targeted
buffer and then transferring execution to it.
A defense is to block the execution of code on the stack, on the
assumption that executable code should only be found elsewhere in the
processes address space.
Making the stack (and heap) non-executable provides a high degree of
protection against many types of buffer overflow attacks for existing
programs, hence the inclusion of this practice is standard in a number
of recent operating systems releases.
Issues
One issue is support for programs that do need to place executable
code on the stack.
1.10.2.2 Address space
randomization
To implement the classic stack overflow attack, the attacker needs
to be able to predict the approximate location of the targeted buffer to
determine a suitable return address to use in the attack to transfer
control to the shell-code.
One technique to greatly increase the difficulty of this prediction
is to change the address at which the stack is located in a random
manner for each process.
The range of addresses available on modern processors is large (32
bits), and most programs only need a small fraction of that.
Therefore, moving the stack memory region around by a megabyte or so
has minimal impact on most programs but makes predicting the targeted
buffer’s address almost impossible.
This amount of variation is also much larger than the size of most
vulnerable buffers.
Here, the arms race introduces unfortunate extra complexity.
1.10.2.3 Random dynamic memory
allocation
Similar to address space randomization
There is a class of heap buffer overflow attacks that exploit the
expected proximity of successive memory allocations, or indeed the
arrangement of the heap management data structures.
Randomizing the allocation of memory on the heap makes the
possibility of predicting the address of targeted buffers extremely
difficult, thus thwarting the successful execution of some heap overflow
attacks.
1.10.2.4 Random standard library
locations
Another target of attack is the location of standard library
routines.
In an attempt to bypass protections such as non-executable stacks,
some buffer overflow variants exploit existing code in standard
libraries.
These are typically loaded at the same address by the same
program.
To counter this form of attack, we can use a security extension that
randomizes the order of loading standard libraries by a program and
their virtual memory address locations.
This makes the address of any specific function sufficiently
unpredictable as to render the chance of a given attack correctly
predicting its address, very low.
1.10.2.5 Guard pages
Similar to canary above.
A process has much more virtual memory available than it typically
needs
Place gaps (guard pages) between regions of memory (either large
divisions like stack and heap, or smaller like stack frames)
Flagged as illegal addresses
Any attempted access aborts process
Further extension places guard pages Between stack frames and heap
buffers
Cost in execution time to support the large number of page mappings
necessary
1.11 Other attack types
1.11.1 Replacement stack frame
Variant that overwrites buffer and saved frame pointer address
Saved frame pointer value is changed to refer to a dummy stack
frame
Current function returns to calling function
Then when calling function returns, control is transferred to the
shell-code in the overwritten buffer
Can be used to overcome limitations in buffer overflow size or
content
Off-by-one vulnerability:
Coding error that allows one more byte to be copied than there is
space available
Defenses
Any stack protection mechanisms to detect modifications to the stack
frame or return address by function exit code
Use non-executable stacks
Randomization of the stack in memory and of system libraries
1.11.2 Return to system call
Stack overflow variant replaces return address with standard
library function
Response to non-executable stack defenses
Transfer to system call, and trick the system call to then execute
malicious code
Defenses
Any stack protection mechanisms to detect modifications to the stack
frame or return address by function exit code
Use non-executable stacks
Randomization of the stack in memory and of system libraries
1.11.3 Heap overflow
Attack buffer located in heap
Typically located above program code, and grows up
Memory is requested by programs to use in dynamic data structures
(such as linked lists of records)
No return address
Hence no easy transfer of control
However, if there are pointers to functions, control can be
transferred
Or manipulate management data structures
Defenses
Making the heap non-executable
Randomizing the allocation of memory on the heap
1.11.4 Global data overflow
Can attack buffer located in global data
May be located above program code
If has function pointer and vulnerable buffer
Or adjacent process management tables, e.g., with references to
destructor function