Tip: If anyone wants to speed up the lecture videos a little,
inspect the page, go to the browser console, and paste this in:
document.querySelector('video').playbackRate = 1.2
1.2 Introduction
Software security is not all memory issues and buffer overflows,
despite what some older textbooks may cover.
Many vulnerabilities result from poor programming practices.
They are often consequence of insufficient checking and validation
of input data and error codes.
Awareness of these issues is a critical initial step in writing more
secure program code.
1.2.3 Software error
super-categories:
Insecure interaction between components
Risky resource management
Porous defenses
1.2.3.1 1) Component
interaction
1.2.3.2 2) Resource management
1.2.3.3 3) Porous defenses
1.2.4 Quality vs. Security
To what extent are these the same?
Software quality and reliability:
Concerned with the accidental failure of program as a result of some
theoretically random, unanticipated input, system interaction, or use of
incorrect code
Improve using structured design and testing to identify and
eliminate as many bugs as possible from a program
Concern is not how many bugs, but how often they are triggered, and
consequences
Software security:
Attacker targets bugs that result in a failure that can be exploited
by the attacker
Triggered by inputs that differ dramatically from what is usually
expected
Unlikely to be identified by common testing approaches
1.2.5 Defensive programming
Designing and implementing software so that it continues to function
even when under attack
Requires attention to all aspects of program execution, environment,
and type of data it processes
Software is able to detect erroneous conditions resulting from some
attack
Also referred to as secure programming
Key rule is to never assume anything, check all assumptions,
minimize dependencies, and handle any possible error states
1.2.6 What are all the program
inputs?
Neat/atypical example:
model-stealing
attacking deep learning vision models, model-stealing
Sometimes poisoning is better than hiding.
For example, web browsers that crawl random sites to pollute your
internet history.
1.2.7 Defensive programming
Programmers often make assumptions about the type of inputs a
program will receive and the environment it executes in.
Assumptions need to be validated by the program, and all potential
failures handled gracefully and safely.
Requires a different mindset from traditional programming
practices.
Programmers have to understand how failures can occur, and the steps
needed to reduce the chance of them occurring in their programs.
Conflicts with business pressures to keep development times as short
as possible to maximize market advantage.
1.2.8 Security by design
Security and reliability are common design goals in most engineering
disciplines, imagine building an un-tested bridge??
Software development not as mature
Recent years have seen increasing efforts to improve secure software
development processes
Software Assurance Forum for Excellence in Code (SAFECode):
Develop publications outlining industry best practices for software
assurance and providing practical advice for implementing proven methods
for secure software development: https://www.safecode.org/
1.3 Program input
Handling program input safely
Incorrect input handling is a very common failing!
Must identify all data sources, obvious and non-obvious
Input is any source of data from outside and whose value is not
explicitly known by the programmer when the code was written
Explicitly validate assumptions on size and type of values before
use (pre-conditions and post-conditions)
1.3.1 Input size
Input size and buffer overflow
Programmers often make assumptions about the maximum expected size
of input
Allocated buffer size is not confirmed
Resulting in buffer overflow
Testing may not identify vulnerability
Test inputs are unlikely to include large enough inputs to trigger
the overflow
Safe coding treats all input as dangerous
Data can be inputted indirectly
1.3.2 Input interpretation
A broad problem, one example of which is the SQLi attack we covered
last time.
1.3.2.1 Interpretation of program
input
Program input may be binary or text
Binary interpretation depends on encoding, and is usually
application-specific
When processing binary data, the program assumes some interpretation
of the raw binary values as representing integers, floating-point
numbers, character strings, or some more complex structured data
representation.
For text input, there is an increasing variety of character sets
being used
Care is needed to identify just which set is being used, and which
characters are being read
Failure to validate encoding and input may result in an exploitable
vulnerability
E.g., 2014 Heartbleed OpenSSL bug is a recent example of a failure
to check the validity of a binary input value, leading to a buffer
over-read.
Flaws relating to invalid handling of input data, specifically when
program input data can accidentally or deliberately influence the flow
of execution of the program
More often occur in scripting languages
Encourage re-use of other programs and system utilities where
possible to save coding effort
Often used as Web CGI scripts
bash injection of perl
Attack: xxx; echo attack success; ls -l finger*
Bash command injection attack on finger command (display user)
SQL injection
An input such as "Bobby'; drop table suppliers" results in
the specified record being retrieved, followed by deletion of the entire
table!
But, the safer version below uses a function to sanitize the input
before building a string (sanitization function not shown).
PHP code injection
script below was not intended to be called directly.
Rather, it is a component of a larger, multi-file program.
The main script set the value of the $path variable to
refer to the main directory containing the program, and all its code and
data files.
$path can be set maliciously
Fix: block assignment of form field values to global variables
Another defense is to only use constant values in include (and
require) commands
Software testing technique that uses randomly generated data as
inputs to a program
Range of possible inputs to most programs is very large
Intent is to determine if the program or function correctly handles
abnormal inputs
Simple, free of assumptions, cheap
Assists with reliability as well as security
Can also use templates to generate classes of known problem inputs
Disadvantage is that bugs triggered by other forms of input would be
missed
Combination of approaches is needed for reasonably comprehensive
coverage of the inputs
1.3.4 Unexpected executable
sources
Code may be executed in more situations than a programmer or user is
aware of:
Examples:
JavaScript on most websites can escape sand-boxing with relative
ease
Importing libraries relies on the integrity of those libraries, and
their imports!
Python pickle file reading:
A pickle file can contain essentially any python objects.
Pickle files can thus include malicious payloads.
Don’t just import any random pickle you find on the internet!
It could have harmful code in it, that would run arbitrary commands
when you try to import it.
If you really really want to handle suspect code (i.e., a pickle),
opening it in a contained environment is one way to mitigate risks…
++++++++++++++++++++++ Cahoot-17.2
1.4 Safe coding
Writing safe program code
Security issues:
Correct algorithm implementation
Correct machine instructions for algorithm
Valid manipulation of data
1.4.1 Correct algorithm
implementation
Issue of good program development technique
Algorithm may not correctly handle all problem variants
Consequence of deficiency is a bug in the resulting program that
could be exploited
Algorithm problem: TCP/IP exploit
Initial sequence numbers used by many TCP/IP implementations were
too predictable
Combination of the sequence number as an identifier and
authenticator of packets and the failure to make them sufficiently
unpredictable enables the attack to occur.
Algorithm problem: Deep learning
Fun example: Vulnerable to inputs optimized to be terrible for
learning.
Artifactual debugging code (backdoor)
Another variant is when the programmers deliberately include
additional code in a program to help test and debug it
Often code remains in production release of a program and could
inappropriately release information
May permit a user to bypass security checks and perform actions they
would not otherwise be allowed to perform
This vulnerability was exploited by the Morris Internet Worm
1.4.2 Binary matches
algorithm?
Might you have a malicious compiler?
Issue is ignored by most programmers.
Assumption is that the compiler or interpreter generates or executes
code that validly implements the language statements.
Requires manually comparing machine code with original source: Slow
and difficult (but possible).
Development of computer systems with very high assurance level is
one area where this level of checking is required.
1.4.3 Interpretation of data
values?
Correct data interpretation
Different languages provide different capabilities for restricting
and validating interpretation of data in variables
Strongly typed languages are often more limited, safer in some
regard.
Other languages allow more liberal interpretation of data and permit
program code to explicitly change their interpretation
1.4.4 Use of memory
Issue of dynamic memory allocation
Used to manipulate unknown amounts of data
Allocated when needed, released when done
Memory leak:
steady reduction in memory available on the heap to the point where
it is completely exhausted
Many older languages have no explicit support for dynamic memory
allocation (e.g., C), and use standard library routines to allocate and
release memory
Modern languages handle automatically
1.4.5 Race conditions with shared
memory
Without synchronization of accesses it is possible that values may
be corrupted or changes lost due to overlapping access, use, and
replacement of shared values
Requires correct synchronizations
Bad sync procedures can result in deadlock
Processes or threads wait on a resource held by the other
One or more programs has to be terminated
1.5 OS-program,
program-program
Operating system interaction
Programs execute on systems under the control of an operating system
Mediates and shares access to resources
Constructs execution environment
Includes environment variables and arguments
Systems have a concept of multiple users
Resources are owned by a user and have permissions granting access
with various rights to different categories of users
Programs need access to various resources, however excessive levels
of access are dangerous
Concerns when multiple programs access shared resources such as a
common file
1.5.1 Environment variables
Collection of string values inherited by each process from its
parent
Can affect the way a running process behaves
Included in memory when it is constructed
Can be modified by the program process at any time
Modifications will be passed to its children
Another source of untrusted program input
Most common use is by a local user attempting to gain increased
privileges
Goal is to subvert a program that grants superuser or administrator
privileges
Example (do in class) $ printenv lists these variables in Linux. $ env let’s you pre-modify the environment that programs
run in by passing a set of variable definitions into a command like
this:
A script: DefensiveProgramming/vuln.sh
#!/bin/bashecho $HOMEecho $PATHecho $VAR1
Can see the environment: env VAR1="hacked" bash vuln.sh
or VAR1="hacked" bash vuln.sh
In the figure below, this simple script calls two separate programs:
sed and grep
Assumes that the standard system versions of these scripts would be
called.
To locate the actual program, the shell will search each directory
named in the shell $PATH variable for a file with the
desired name.
The attacker simply has to redefine the $PATH variable
to include a directory they control, which contains a program called
grep, for example.
Fix for (a) at least?
To address, use absolute names for each program or
$PATH variable could be reset to a known default value by
the script
Why is (b) still vulnerable:
IFS env is used to separate the words that form a line
of commands;
It defaults to a space, tab, or newline character, but could have
“=” added as well
If the attacker has also changed the $PATH variable to
include a directory with an attack program $PATH, then this
will be executed when the script is run.
Compiled programs are also vulnerable
Programs can be vulnerable to $PATH variable
manipulation
Must reset to “safe” values
If dynamically linked may be vulnerable to manipulation of
LD_LIBRARY_PATH
Used to locate suitable dynamic library
Must either statically link privileged programs or prevent use of
this variable
Mention: fixed env vulnerability in a popular program here…
In general, try not to rely on environment variables!
1.5.2 Least privilege
Principle of least privilege
Privilege escalation:
Exploit of flaws may give attacker greater privileges
Least privilege:
Run programs with least privilege needed to complete their
function
Determine appropriate user and group privileges required:
Decide whether to grant extra user or just group privileges
Ensure that privileged program can modify only those files and
directories necessary
Admin privilege
Programs with root/administrator privileges are a major target of
attackers
They provide highest levels of system access and control
Are needed to manage access to protected system resources
Often privilege is only needed at start
Can then run as normal user
Good design should partition complex programs in smaller modules
with needed privileges
Provides a greater degree of isolation between the components
Reduces the consequences of a security breach in one component
Easier to test and verify
1.5.3 System calls and standard
libraries
Programs use system calls and standard library functions for common
operations
Programmers make assumptions about their operation
If incorrect behavior is not what is expected
May be a result of system optimizing access to shared resources
Results in requests for services being buffered, re-sequenced, or
otherwise modified to optimize system use
Optimizations can conflict with program goals
1.5.3.1 Secure file deletion
Secure delete operations don’t make it to the disk as expected because
of “efficient” OS procedures chattr +s filetosecurelydelete
Not universally reliable thought — better off just encrypting the full
disk.
1.5.4 Race conditions with system
resources
1.5.4.1 Preventing race
conditions
Programs may need to access a common system resource
Need suitable synchronization mechanisms
Most common technique is to acquire a lock on the shared file
Lockfile: Process must create and own the lockfile in order to gain
access to the shared resource
Concerns:
If a program chooses to ignore the existence of the lockfile and
access the shared resource the system will not prevent this
All programs using this form of synchronization must cooperate
Implementation
Lockfile creation should happen as one event
Rather than as a check for existence and then creation. Why?
1.5.5 Safe temporary files
Many programs use temporary files
Often in common, shared system area
Must be unique, not accessed by others
Commonly create name using process ID
Unique, but predictable
Attacker might guess and attempt to create own file between program
checking and creating
Secure temporary file creation and use requires the use of random
names
Show stat /tmp ll /tmp
1.5.6 Interaction with other
programs
Data flowing among various programs.
When these programs are running on the same computer system,
appropriate use of system functionality such as pipes or temporary files
provides this protection.
If the programs run on different systems, linked by a suitable
network connection, then appropriate security mechanisms should be
employed by these network connections.
Alternatives include the use of IP Security (IPSec), Transport
Layer/Secure Socket Layer Security (TLS/SSL), or Secure Shell (SSH)
connections.
1.6 Handling program output
safely
Final component is program output
May be stored for future use, sent over net, displayed
May be binary or text
Important from a program security perspective that the output
conform to the expected form and interpretation
Programs must identify what is permissible output content and filter
any possibly untrusted data to ensure that only valid output is
displayed