1 11-Hashing


Previous: 10b-SymmetricStream.html

To err is human;
to make a real mess,
you need a computer.

1.1 Screencasts

1.2 Reading

Chapter 10 - Hash functions
https://github.com/crypto101/crypto101.github.io/raw/master/Crypto101.pdf

1.3 Message authentication basics

(a quick aside on crypto-systems)

More to come later, here: 12a-AppliedCryptoSystems.html

Authentication:

Protects against active attacks.

Verifies a received message is authentic and that:
message contents have not been altered,
a message is from an authentic source,
message delivery is in the correct sequence.

Can use conventional encryption as part of a larger system (or other methods).

Only sender and receiver share a key.

1.3.1 Message Authentication Code (MAC)

MAC is an over-overloaded term!
This MAC is not the same MAC term as:
a media access control (MAC) address,
or the mandatory access control (MAC) permissions control,
nor as the big MAC…
11-Hashing/f3-crop-b.png

Alice (A) and Bob (B), share a common secret key KAB and compute a MAC.

MACM = F(KAB, M)

Where F():
DES is used to generate an ­encrypted version of the message,
and the last number of bits of ciphertext are used as the code.
A 16 or 32 bit code is typical.
Is this reversible or lossy?

+++++++++++++++++++++
Cahoot-11.1

1.4 Hashing

Unlike a MAC, a hash function does not take secret key as input.
Unlike symmetric or asymmetric cryptography,
it is not reversible (there is information loss).
This is like a deterministic meat grinder for your data:
11-Hashing/f4-crop.png

1.4.0.1 Three authentication schemes using hashing

The goal of authentication can be satisfied using hashing as part of a system:
11-Hashing/f5-crop.png
Enumerated descriptions below correspond to images above:

  1. encrypt hash of message with pre-exchanged symmetric key
    (must coordinate key exchange)

  2. encrypt hash of message with public key
    (no need to coordinate key exchange, assuming authentication method)

  3. hash function over the concatenation of the secret key and the message:
    Where || is simple concatenation,
    H() is your hash function,
    M is message,
    and K is key,
    MDM = H( K || M || K ).

1.4.0.2 Secure hash function characteristics

Can be applied to a block of data of any size.

Produces a fixed-length output.

H(x) should be relatively easy to compute, for any given x.
Exception to this idea:
Key derivation functions and slow hash functions should ideally be slow!

One-way or pre-image resistant.
Assuming H(x) = h,
it should be computationally infeasible to find x.

Weak collision resistant
Given x,
it should be computationally infeasible to find y != x,
such that H(y) = H(x).

Strong collision resistance.
It should be computationally infeasible to find any pair (x, y),
such that H(x) = H(y).

+++++++++++++++++++++
Cahoot-11.2

1.4.0.3 Security of Hash Functions

There are two approaches to attacking a secure hash function:

  1. Cryptanalysis:
  2. Brute-force attack:
1.4.0.3.1 Additional secure hash function applications

Passwords:
Hash of a password is stored by an operating system (better when this is a slow hash).

Intrusion detection:
Store H(F) for each file on a system and secure the hash values

1.4.1 Example

There are a number of hash functions.
SHA family is most widely used hashing algorithms
SHA-3 is one:

1.4.1.0.1 Modern SHA-3

Uses A sponge construction.
Data is “absorbed” into the sponge,
then the result is “squeezed” out.
In absorbing phase,
message blocks are XOR’ed into a subset of the state,
which is then transformed as a whole using a permutation function f.
11-Hashing/SpongeConstruction.png
Pi are input, Zi are hashed output.
The unused “capacity” c should be twice the desired,
to insure resistance to collision or pre-image attacks.
More details later!

1.5 Secure hash functions

https://en.wikipedia.org/wiki/Cryptographic_hash_function

An ideal cryptographic hash function should have the following properties, that:
1. it is deterministic, so the same message always results in the same hash

  1. it is relatively efficient to compute the hash value for any given message

  2. it is not feasible to generate a message from its hash value, except by trying all possible messages

  3. a small change to a message should change the hash value extensively, so that the new hash value appears uncorrelated with the old hash value.

  4. it is not feasible to find two different messages with the same hash value

Functions of hashing include:
Verifying the integrity of messages and files.
Cryptographic signature generation and verification.
Password verification.
Proof-of-work (e.g., bitcoin).
File or data identifier.

Hashing is not typically sufficient for authentication alone!

1.5.0.0.1 Simple hash function example using bitwise XOR

Fold your message into a matrix (notice a pattern yet??), then XOR down columns:
11-Hashing/00a.png
Ask:
Is this a robust and/or secure hash function?
Why?
Can we recover the message from the hash?
Is there data-loss here?
Why?

1.5.0.0.2 Key Properties for Secure Hash

One-way
The most important property of a hash function is being one-way:
Easy to compute, but hard to “invert”.
Inversion with hashing is not really even inversion at all, but re-generation.
Given M, it should be somewhat easy to compute h(M).
Given y = h(M), it should be hard to find M.

Collision resistant
Cryptographic hash functions should be collision resistant:
It should be hard to find two messages with the same hash value.
Given M, it should be hard to find a different message M’ such that h(M’) = h(M).

1.5.0.0.3 Merkle-Damgard construction

11-Hashing/MD_hash.png
f is a one-way compression function.
Truly 1-way, not reversible, unlike encryption,
with hashing producing real formal information loss.
Used for MD5, SHA-1, and SHA-2 below:

1.5.0.1 MD5

https://en.wikipedia.org/wiki/MD5
MD5 is not considered secure, and is vulnerable to finding collisions.
It should only be used for applications not requiring cryptographic security.

Example when NOT to use md5:
expand on downloading an apk from outside an app store.

In python

#!/usr/bin/python3

import hashlib
hashlib.md5('stringtohash').hexdigest()

hashlib.md5('stringtohash_plusMitMinjection').hexdigest()

help(md5)

In bash

#!/bin/bash

echo "hey" >filetohash.txt
md5sum filetohash.txt

echo "hey" >>filetohash.txt
md5sum filetohash.txt

1.5.0.2 Secure Hash Algorithm (SHA)

https://en.wikipedia.org/wiki/Secure_Hash_Algorithms
11-Hashing/sha_family.png
The Secure Hash Algorithms (SHA) are a family of cryptographic hash functions published by the National Institute of Standards and Technology (NIST) as a U.S. Federal Information Processing Standard (FIPS), including:

SHA-0:
A retronym applied to the original version of the 160-bit hash function,
published in 1993 under the name “SHA”.
It was withdrawn shortly after publication,
due to an undisclosed “significant flaw”,
and replaced by the slightly revised version SHA-1.

SHA-1:
A 160-bit hash function which resembles the earlier MD5 algorithm.
This was designed by the National Security Agency (NSA),
to be part of the Digital Signature Algorithm.
Cryptographic weaknesses were discovered in SHA-1,
and the standard was no longer approved for most cryptographic uses after 2010.

SHA-2:
A family of two similar hash functions,
with different block sizes,
known as SHA-256 and SHA-512.
They differ in the word size;
SHA-256 uses 32-bit words where SHA-512 uses 64-bit words.
There are also truncated versions of each standard,
known as SHA-224, SHA-384, SHA-512/224 and SHA-512/256.
These were also designed by the NSA.

SHA-3:
A hash function formerly called Keccak,
chosen in 2012 after a public competition among non-NSA designers.
It supports the same hash lengths as SHA-2,
and its internal structure differs significantly from the rest of the SHA family.

1.5.0.2.1 SHA-1 is not secure on its own

https://en.wikipedia.org/wiki/SHA-1
Takes arbitrary data as input,
outputs a 160-bit (20-byte) hash value,
known as a message digest,
typically rendered as a hexadecimal number, 40 digits long.
11-Hashing/sha1.png

One iteration within the SHA-1 compression function:
A, B, C, D and E are 32-bit words of the state.
F is a nonlinear function that varies.
<<<n (Left shift n) denotes a left bit rotation by n places,
where n varies for each operation.
Wt is the expanded message word of round t.
Kt is the round constant of round t.
Red square denotes addition modulo 232.

In python

#!/usr/bin/python3

import hashlib
#sha1 - 160bit
hashlib.sha1('stringtohash').hexdigest()

From bash

#!/bin/bash

echo "hey" >filetohash.txt
sha1sum filetohash.txt
man sha1sum

# or
rhash --sha1 filetohash.txt
man rhash

SHA was originally developed by NIST.
Published as FIPS 180 in 1993.
Was revised in 1995 as SHA-1 (Produces 160-bit hash values).
NIST issued revised FIPS 180-2 in 2002.
Adds 3 additional versions of SHA.
SHA-256, SHA-384, SHA-512,
256/384/512-bit hash values.
Same basic structure as SHA-1, but greater security.
In 2005 NIST announced the intention to phase out approval of SHA-1,
and move to a reliance on the other SHA versions by 2010
SHA-1 is not secure on its own,
but can be in the context of another system like HMAC

1.5.0.3 Comparison of SHA Parameters

in bits
11-Hashing/01a.png
When you don’t see a -1 or -2 before the bit-size, it’s SHA-2.
For example, SHA-512 above is SHA-2-512 (below).

1.5.0.4 SHA-2

https://en.wikipedia.org/wiki/SHA-2
224-512 bit versions
11-Hashing/SHA-2.png
Similar to SHA-1, though an improvement.
Still vulnerable to extension attacks,
and thus needs to be used as part of a MAC,
if the goal is authentication.

In python

#!/usr/bin/env python3

import hashlib

# sha2-512
hashlib.sha512('stringtohash').hexdigest()

From bash

#!/bin/bash

# Note the sha binaries on my system:
ls /usr/bin/sha*
#/usr/bin/sha1hmac*
#/usr/bin/sha1sum*
#/usr/bin/sha224hmac*
#/usr/bin/sha224sum*
#/usr/bin/sha256hmac*
#/usr/bin/sha256sum*
#/usr/bin/sha384hmac*
#/usr/bin/sha384sum*
#/usr/bin/sha512hmac*
#/usr/bin/sha512sum*

echo "hey" >filetohash.txt
sha256sum filetohash.txt
sha512sum filetohash.txt

# or
rhash --sha256 filetohash.txt
rhash --sha512 filetohash.txt

As noted above, sha### is SHA-2, with the bit size specified.

1.5.0.4.1 Message Digest Generation Using SHA-512
11-Hashing/f2-crop.png
1.5.0.4.2 SHA-512 Processing of a Single 1024-Bit Block
11-Hashing/f3-crop.png

1.5.0.5 SHA-3

https://en.wikipedia.org/wiki/SHA-3
Is the first hash function we’re covering that is not Merkle-Damgard,
and is a class of it’s own,
with some extra beneficial properties,
discussed below.

The sponge construction for hash functions:
Pi are input,
Zi are hashed output.
Data are “absorbed” into the sponge,
then the result is “squeezed” out.
In the absorbing phase,
message blocks are XOR’ed into a subset of the state,
which is then transformed as a whole using a permutation function f.
In the “squeeze” phase,
output blocks are read from the same subset of the state,
alternated with the state transformation function f.
The size of the part of the state that is written and read is called the “rate” (denoted r),
and the size of the part that is untouched by input/output is called the “capacity” (denoted c).

11-Hashing/SHA-3.png

In python

#!/usr/bin/python3

import hashlib
hashlib.sha3_512('stringtohash').hexdigest()

From bash

#!/bin/bash

echo "hey" >filetohash.txt
rhash --sha3-512 filetohash.txt

SHA-2 shared same structure and mathematical operations as its predecessors,
and there was some concern for its future.
Due to time required to replace SHA-2,
should it become vulnerable,
NIST announced in 2007 a competition to produce SHA-3
SHA-3 does not need a MAC to be reasonably used for authentication,
but HMAC is just an improvement anyway,
so you might as well use SHA-3 within HMAC, for example.

1.6 Key derivation functions

https://en.wikipedia.org/wiki/Key_derivation_function
A key derivation function (KDF) takes a secret initial value such as:
a master key, a password, or a passphrase.
Using a pseudorandom function,
it derives one or more secret keys from that original value.

A KDF can be used to “stretch” keys into longer keys,
or to obtain keys of a required format,
such as converting a group element that is the result of a Diffie Hellman key exchange,
into a symmetric key for use with AES.

They can ideally also be made slower to compute,
and thus are better for password hashing,
in key stretching and key strengthening;
a fast hash enables brute forcing hashed passwords.
bcrypt and scrypt are popular KDFs.
We will talk about salt values in password hashing later…
(salt is another over-overloaded term).

1.7 Example

Next: 12a-AppliedCryptoSystems.html