1 17-DefensiveProgramming

Previous: 16-Databases.html

17-DefensiveProgramming/joke-developer-problems.jpg
or at least go find more existing problems…

1.1 Screencasts

Password for the Vimeo videos is in Zulip chat.
SP21: https://vimeo.com/537439535
FS20: https://vimeo.com/411152382
Tip: If anyone wants to speed up the lecture videos a little, inspect the page, go to the browser console, and paste this in: document.querySelector('video').playbackRate = 1.2

1.2 Introduction

Software security is not all memory issues and buffer overflows, despite what some older textbooks may cover.

1.2.1 Extra reading

Web-sec guides here:
- https://guides.codepath.com/websecurity
- https://www.hacksplaining.com/lessons
If you enjoy web and database security, after reading the above guides on SQL and web-sec, try these demos:
https://www.hackthissite.org/
https://docs.fedoraproject.org/en-US/Fedora_Security_Team/1/html/Defensive_Coding/index.html
https://docs-old.fedoraproject.org/en-US/Fedora_Security_Team/1/html/Defensive_Coding/index.html

1.2.2 Software security issues

Many vulnerabilities result from poor programming practices.
They are often consequence of insufficient checking and validation of input data and error codes.
Awareness of these issues is a critical initial step in writing more secure program code.

1.2.3 Software error super-categories:

Insecure interaction between components
Risky resource management
Porous defenses

1.2.3.1 1) Component interaction

1.2.3.2 2) Resource management

1.2.3.3 3) Porous defenses

1.2.4 Quality vs. Security

To what extent are these the same?

Software quality and reliability:
- Concerned with the accidental failure of program as a result of some theoretically random, unanticipated input, system interaction, or use of incorrect code
- Improve using structured design and testing to identify and eliminate as many bugs as possible from a program
- Concern is not how many bugs, but how often they are triggered, and consequences
Software security:
- Attacker targets bugs that result in a failure that can be exploited by the attacker
- Triggered by inputs that differ dramatically from what is usually expected
- Unlikely to be identified by common testing approaches

1.2.5 Defensive programming

Designing and implementing software so that it continues to function even when under attack
Requires attention to all aspects of program execution, environment, and type of data it processes
Software is able to detect erroneous conditions resulting from some attack
Also referred to as secure programming
Key rule is to never assume anything, check all assumptions, minimize dependencies, and handle any possible error states

1.2.6 What are all the program inputs?

Neat/atypical example:

model-stealing
attacking deep learning vision models, model-stealing

model-poisoning
https://arxiv.org/pdf/2310.13828.pdf
https://venturebeat.com/ai/meet-nightshade-the-new-tool-allowing-artists-to-poison-ai-models-with-corrupted-training-data/

Sometimes poisoning is better than hiding.
For example, web browsers that crawl random sites to pollute your internet history.

1.2.7 Defensive programming

Programmers often make assumptions about the type of inputs a program will receive and the environment it executes in.
Assumptions need to be validated by the program, and all potential failures handled gracefully and safely.
Requires a different mindset from traditional programming practices.
Programmers have to understand how failures can occur, and the steps needed to reduce the chance of them occurring in their programs.
Conflicts with business pressures to keep development times as short as possible to maximize market advantage.

1.2.8 Security by design

Security and reliability are common design goals in most engineering disciplines, imagine building an un-tested bridge??
Software development not as mature
Recent years have seen increasing efforts to improve secure software development processes
Software Assurance Forum for Excellence in Code (SAFECode):
- Develop publications outlining industry best practices for software assurance and providing practical advice for implementing proven methods for secure software development: https://www.safecode.org/

1.3 Program input

Handling program input safely

Incorrect input handling is a very common failing!
Must identify all data sources, obvious and non-obvious
Input is any source of data from outside and whose value is not explicitly known by the programmer when the code was written
Explicitly validate assumptions on size and type of values before use (pre-conditions and post-conditions)

1.3.1 Input size

Input size and buffer overflow

Programmers often make assumptions about the maximum expected size of input
Allocated buffer size is not confirmed
Resulting in buffer overflow
Testing may not identify vulnerability
Test inputs are unlikely to include large enough inputs to trigger the overflow
Safe coding treats all input as dangerous
Data can be inputted indirectly

1.3.2 Input interpretation

A broad problem, one example of which is the SQLi attack we covered last time.

1.3.2.1 Interpretation of program input

Program input may be binary or text
Binary interpretation depends on encoding, and is usually application-specific
When processing binary data, the program assumes some interpretation of the raw binary values as representing integers, floating-point numbers, character strings, or some more complex structured data representation.
For text input, there is an increasing variety of character sets being used
Care is needed to identify just which set is being used, and which characters are being read
Failure to validate encoding and input may result in an exploitable vulnerability
E.g., 2014 Heartbleed OpenSSL bug is a recent example of a failure to check the validity of a binary input value, leading to a buffer over-read.

1.3.2.2 Command injection attacks

https://www.hacksplaining.com/exercises/command-execution
(do this in lecture)

../../ComputationalThinking/Content/24-EvilEval.html

Flaws relating to invalid handling of input data, specifically when program input data can accidentally or deliberately influence the flow of execution of the program
More often occur in scripting languages
Encourage re-use of other programs and system utilities where possible to save coding effort
Often used as Web CGI scripts

bash injection of perl
Attack: xxx; echo attack success; ls -l finger*
17-DefensiveProgramming/image12a.png
Bash command injection attack on finger command (display user)

SQL injection
An input such as "Bobby'; drop table suppliers" results in the specified record being retrieved, followed by deletion of the entire table!
17-DefensiveProgramming/image13.png
But, the safer version below uses a function to sanitize the input before building a string (sanitization function not shown).

PHP code injection

1. script below was not intended to be called directly.
Rather, it is a component of a larger, multi-file program.
The main script set the value of the $path variable to refer to the main directory containing the program, and all its code and data files.
$path can be set maliciously
Fix: block assignment of form field values to global variables
Another defense is to only use constant values in include (and require) commands

Cross Site Scripting (XSS) Attacks
Do this exercise in lecture: https://www.hacksplaining.com/exercises/xss-stored

Attacks where input provided by one user is subsequently provided to another user
Commonly seen in scripted Web applications (e.g., comment boxes)
Vulnerability involves the inclusion of script code in the HTML content
Script code may need to access data associated with other pages
Browsers impose security checks and restrict data access to pages originating from the same site
Exploit assumption that all content from one site is equally trusted and hence is permitted to interact with other content from the site
XSS reflection vulnerability: Attacker includes the malicious script content in data supplied to a site

For example, paste the above into a comment box on someone else’s site to get visitors to execute your malicious JavaScript, perhaps remotely hosted.

+++++++++++++++++ Cahoot-17.1

1.3.2.3 Validating input syntax

It is necessary to ensure that data conform with any assumptions made about the data before subsequent use
Input data should be compared against what is wanted
Alternative is to compare the input data with known dangerous values
By only accepting known safe data the program is more likely to remain secure
Strict positive formulations less likely to fail than negative exclusions

1.3.2.3.1 Alternate encodings

A means to avoid sanitization and input-validation in the arms race.

May have multiple means of encoding text
Growing requirement to support users around the globe and to interact with them using their own languages
Unicode used for internationalization
- Uses 16-bit value for characters
- UTF-8 encodes as 1-4 byte sequences
- Many Unicode decoders accept any valid equivalent sequence
Canonicalization
- Transforming input data into a single, standard, minimal representation
- Once this is done the input data can be compared with a single representation of acceptable input values
- PDF sanitizing (Qubes-OS and others), copy-paste sanitizing (Qubes-OS)

Example in SQLi defense
attempting to exclude some characters may not work if you miss one of the encodings!

1.3.2.3.2 Validating numerical input

Additional concern when input data represents numeric values
Internally stored in fixed sized value
- 8, 16, 32, 64-bit integers
- Floating point numbers depend on the processor used
- Values may be signed or unsigned
Must correctly interpret text form and process consistently
- Have issues comparing signed to unsigned
- Could be used to thwart buffer overflow check

1.3.3 Input fuzzing

https://en.wikipedia.org/wiki/Fuzzing
https://www.fuzzingbook.org/

Software testing technique that uses randomly generated data as inputs to a program
- Range of possible inputs to most programs is very large
- Intent is to determine if the program or function correctly handles abnormal inputs
- Simple, free of assumptions, cheap
- Assists with reliability as well as security
Can also use templates to generate classes of known problem inputs
- Disadvantage is that bugs triggered by other forms of input would be missed
- Combination of approaches is needed for reasonably comprehensive coverage of the inputs

1.3.4 Unexpected executable sources

Code may be executed in more situations than a programmer or user is aware of:

Examples:

JavaScript on most websites can escape sand-boxing with relative ease
Importing libraries relies on the integrity of those libraries, and their imports!
Python pickle file reading:
- A pickle file can contain essentially any python objects.
- Pickle files can thus include malicious payloads.
- Don’t just import any random pickle you find on the internet!
- It could have harmful code in it, that would run arbitrary commands when you try to import it.
- If you really really want to handle suspect code (i.e., a pickle), opening it in a contained environment is one way to mitigate risks…

++++++++++++++++++++++ Cahoot-17.2

1.4 Safe coding

Writing safe program code

Security issues:
- Correct algorithm implementation
- Correct machine instructions for algorithm
- Valid manipulation of data

1.4.1 Correct algorithm implementation

Issue of good program development technique
Algorithm may not correctly handle all problem variants
Consequence of deficiency is a bug in the resulting program that could be exploited

Algorithm problem: TCP/IP exploit

Initial sequence numbers used by many TCP/IP implementations were too predictable
Combination of the sequence number as an identifier and authenticator of packets and the failure to make them sufficiently unpredictable enables the attack to occur.

Algorithm problem: Deep learning

Fun example: Vulnerable to inputs optimized to be terrible for learning.

Artifactual debugging code (backdoor)

Another variant is when the programmers deliberately include additional code in a program to help test and debug it
Often code remains in production release of a program and could inappropriately release information
May permit a user to bypass security checks and perform actions they would not otherwise be allowed to perform
This vulnerability was exploited by the Morris Internet Worm

1.4.2 Binary matches algorithm?

Might you have a malicious compiler?

Issue is ignored by most programmers.
Assumption is that the compiler or interpreter generates or executes code that validly implements the language statements.
Requires manually comparing machine code with original source: Slow and difficult (but possible).
Development of computer systems with very high assurance level is one area where this level of checking is required.

1.4.3 Interpretation of data values?

Correct data interpretation

Different languages provide different capabilities for restricting and validating interpretation of data in variables
Strongly typed languages are often more limited, safer in some regard.
Other languages allow more liberal interpretation of data and permit program code to explicitly change their interpretation

1.4.4 Use of memory

Issue of dynamic memory allocation
- Used to manipulate unknown amounts of data
- Allocated when needed, released when done
Memory leak:
- steady reduction in memory available on the heap to the point where it is completely exhausted
Many older languages have no explicit support for dynamic memory allocation (e.g., C), and use standard library routines to allocate and release memory
Modern languages handle automatically

1.4.5 Race conditions with shared memory

Without synchronization of accesses it is possible that values may be corrupted or changes lost due to overlapping access, use, and replacement of shared values
Requires correct synchronizations
Bad sync procedures can result in deadlock
- Processes or threads wait on a resource held by the other
- One or more programs has to be terminated

1.5 OS-program, program-program

Operating system interaction

Programs execute on systems under the control of an operating system
- Mediates and shares access to resources
- Constructs execution environment
- Includes environment variables and arguments
Systems have a concept of multiple users
- Resources are owned by a user and have permissions granting access with various rights to different categories of users
- Programs need access to various resources, however excessive levels of access are dangerous
- Concerns when multiple programs access shared resources such as a common file

1.5.1 Environment variables

Collection of string values inherited by each process from its parent
Can affect the way a running process behaves
Included in memory when it is constructed
Can be modified by the program process at any time
Modifications will be passed to its children
Another source of untrusted program input
Most common use is by a local user attempting to gain increased privileges
Goal is to subvert a program that grants superuser or administrator privileges

Example (do in class)
$ printenv lists these variables in Linux.
$ env let’s you pre-modify the environment that programs run in by passing a set of variable definitions into a command like this:
A script: 17-DefensiveProgramming/vuln.sh

#!/bin/bash

echo $HOME
echo $PATH
echo $VAR1

Can see the environment:
env VAR1="hacked" bash vuln.sh
or
VAR1="hacked" bash vuln.sh

Decent tutorial:
https://www.digitalocean.com/community/tutorials/how-to-read-and-set-environmental-and-shell-variables-on-a-linux-vps

In the figure below, this simple script calls two separate programs: sed and grep
Assumes that the standard system versions of these scripts would be called.
To locate the actual program, the shell will search each directory named in the shell $PATH variable for a file with the desired name.
The attacker simply has to redefine the $PATH variable to include a directory they control, which contains a program called grep, for example.
Fix for (a) at least?
- To address, use absolute names for each program or $PATH variable could be reset to a known default value by the script
Why is (b) still vulnerable:
- IFS env is used to separate the words that form a line of commands;
  - It defaults to a space, tab, or newline character, but could have “=” added as well
- If the attacker has also changed the $PATH variable to include a directory with an attack program $PATH, then this will be executed when the script is run.

Compiled programs are also vulnerable

Programs can be vulnerable to $PATH variable manipulation
Must reset to “safe” values
If dynamically linked may be vulnerable to manipulation of LD_LIBRARY_PATH
- Used to locate suitable dynamic library
- Must either statically link privileged programs or prevent use of this variable

Mention: fixed env vulnerability in a popular program here…

In general, try not to rely on environment variables!

1.5.2 Least privilege

Principle of least privilege

Privilege escalation:
- Exploit of flaws may give attacker greater privileges
Least privilege:
- Run programs with least privilege needed to complete their function
Determine appropriate user and group privileges required:
- Decide whether to grant extra user or just group privileges
Ensure that privileged program can modify only those files and directories necessary

Admin privilege

Programs with root/administrator privileges are a major target of attackers
- They provide highest levels of system access and control
- Are needed to manage access to protected system resources
Often privilege is only needed at start
- Can then run as normal user
Good design should partition complex programs in smaller modules with needed privileges
- Provides a greater degree of isolation between the components
- Reduces the consequences of a security breach in one component
- Easier to test and verify

1.5.3 System calls and standard libraries

Programs use system calls and standard library functions for common operations
Programmers make assumptions about their operation
If incorrect behavior is not what is expected
May be a result of system optimizing access to shared resources
Results in requests for services being buffered, re-sequenced, or otherwise modified to optimize system use
Optimizations can conflict with program goals

1.5.3.1 Secure file deletion

17-DefensiveProgramming/image20.png
Secure delete operations don’t make it to the disk as expected because of “efficient” OS procedures
chattr +s filetosecurelydelete
Not universally reliable thought — better off just encrypting the full disk.

1.5.4 Race conditions with system resources

1.5.4.1 Preventing race conditions

Programs may need to access a common system resource
Need suitable synchronization mechanisms
Most common technique is to acquire a lock on the shared file
Lockfile: Process must create and own the lockfile in order to gain access to the shared resource
Concerns:
- If a program chooses to ignore the existence of the lockfile and access the shared resource the system will not prevent this
- All programs using this form of synchronization must cooperate
- Implementation

Lockfile creation should happen as one event
17-DefensiveProgramming/image21.png
Rather than as a check for existence and then creation. Why?

1.5.5 Safe temporary files

Many programs use temporary files
Often in common, shared system area
Must be unique, not accessed by others
Commonly create name using process ID
- Unique, but predictable
- Attacker might guess and attempt to create own file between program checking and creating
Secure temporary file creation and use requires the use of random names

Show
stat /tmp
ll /tmp

1.5.6 Interaction with other programs

Data flowing among various programs.
When these programs are running on the same computer system, appropriate use of system functionality such as pipes or temporary files provides this protection.
If the programs run on different systems, linked by a suitable network connection, then appropriate security mechanisms should be employed by these network connections.
Alternatives include the use of IP Security (IPSec), Transport Layer/Secure Socket Layer Security (TLS/SSL), or Secure Shell (SSH) connections.

1.6 Handling program output safely

Final component is program output
May be stored for future use, sent over net, displayed
May be binary or text
Important from a program security perspective that the output conform to the expected form and interpretation
Programs must identify what is permissible output content and filter any possibly untrusted data to ensure that only valid output is displayed
Character set should be explicitly specified

Next: 18-Authentication.html