1 02-Application

1.1 Audio-recording

Password for the Vimeo videos is in Zulip chat.
- Part 1
  - https://vimeo.com/671561564 (section 1)
  - https://vimeo.com/671562212 (section 2)
- Part 2
  - https://vimeo.com/671562955 (section 1)
  - https://vimeo.com/671563409 (section 2)
- Part 3
  - https://vimeo.com/674616608
  - https://vimeo.com/674616664
- Part 4
  - https://vimeo.com/674616694
  - https://vimeo.com/674616727
- Part 5
  - https://vimeo.com/677005697
  - https://vimeo.com/677006637
- Part 6
  - With a projector having tech issues, I forgot to press “start recording” on this chaotic Friday…
- Part 7
  - https://vimeo.com/680719670
  - https://vimeo.com/680721078
Tip: If anyone want to speed up the lecture videos a little, inspect the page, go to the browser console, and paste this in: document.querySelector('video').playbackRate = 1.2

1.2 Opening thought

“Web users ultimately want to get at data quickly and easily.
They don’t care as much about attractive sites and pretty design.”
- Tim Berners-Lee

Is this assumption currently true?
Was it ever true?
Is it true for some people at least?

1.3 Reading

Overview, HTTP, email, FTP, DNS, P2P, Multimedia (CDN, P2P), Socket programming intro.

Reading:
https://www.computer-networking.info/1st/html/index.html (Part 2: The application layer)
https://www.binarytides.com/netcat-tutorial-for-beginners/ (instead of telnet!)
Kurose chapter 2

1.4 Overview

Principles of network applications
Web and HTTP
Electronic mail: SMTP, POP3, IMAP
DNS
P2P applications
Socket programming with UDP and TCP

1.5 Network applications

Classic text-based applications that became popular in the 1970s and 1980s:
text email, remote access to computers, file transfers, and newsgroups.
Killer application of the mid-1990s:
the World Wide Web, encompassing Web surfing, search, and electronic commerce.
Instant messaging and P2P file sharing, the two killer applications introduced at the end of the millennium.
Since 2000, voice-over-IP (VoIP), YouTube, Netflix, World of Warcraft, Facebook, and Twitter

End system communication at application layer
02-Application/app_layer00.png

Goal: write programs that:
run on (different) end systems,
communicate over network, e.g.,
web server software communicates with browser software

No need (or desire) to write software for network-core devices:
Network-core devices do not (and SHOULD not) run user applications!
Applications on end systems allows for rapid app development, propagation, and innovation.
This is changing somewhat, which could impede the development of new protocols.

Client-server versus peer-to-peer (P2P)
02-Application/app_layer01.png

1.5.1 Client-server architecture

1.5.1.1 Server

Always-on host, called the server, which services requests from many other hosts, called clients.
Permanent addresses are either an IP address, or
overlay/hash address of some kind, in the case of IP-anonymizing networks
Web server services requests from browsers running on client hosts.
When a Web server receives a request for an object from a client host,
it responds by sending the requested object to the client host.
Now in data centers for scaling

1.5.1.2 Clients

Communicate with server.
May be intermittently connected.
May have dynamic IP addresses (or dynamically changing overlay addresses).
Do not usually communicate directly with each other (unless hybrid client-server/p2p with p2p).

1.5.2 P2P architecture

Minimal (or no) reliance on dedicated servers in data centers.
Direct communication between pairs of intermittently connected hosts, called peers.
The peers are not owned by the service provider,
but are instead desktops and laptops controlled by users,
with most of the peers residing in homes, universities, and offices.
Peers request service from other peers, provide service in return to other peers
Self-scalabile, since each peer adds service capacity to the system by distributing files to other peers.
Cost effective, since they normally don’t require significant server infrastructure and server bandwidth
(in contrast with clients-server designs with data centers).
Peers often exchange IP addresses (or overlay addresses,
e.g., onion, garlic routing and crypto addresses).
Complex management, implementation, debugging?

++++++++++++++ Cahoot-02-1

Discussion question:
If p2p works well, why has it not become the norm?
What are some other pros/cons of p2p architectures?

1.5.3 Process communication via sockets

This is a hint of the next layer, transport
02-Application/app_layer02.png

1.5.3.1 Processes

It’s not actually programs, but processes that communicate with each other.
A process is assigned by OS to a program that is running within an end system.
When processes are running on the same end system,
they can communicate with each other with inter-process communication (IPC).
Processes on two different end systems communicate with each other,
by exchanging messages across the computer network.
A sending process creates and sends messages into the network.
A receiving process receives these messages and possibly responds by sending messages back.
A process initiates the communication by initially contacting the other process at the beginning of the session.
The initiator labeled as the client.
The process that waits to be contacted to begin the session is nominally the server.

Client process: process that initiates communication.
Server process: process that waits to be contacted.

Aside: applications with P2P architectures have both client processes and server processes.

1.5.3.2 Sockets

Any message sent from one process to another must go through the underlying network.
A process sends messages into, and receives messages from, the network,
through an OS-defined software interface called a socket.
A socket is the interface between the application layer and the transport layer within a host OS.
It is also referred to as the Application Programming Interface (API) between the application and the network,
since the socket is the programming interface with which network applications are built.

1.5.3.3 Addresses

To receive messages, process must have identifier

Q: does the IP address of a host, on which a process runs, suffice for identifying the process?
A: no, many processes can be running on one host

In order for a process running on one host to send packets to a process running on another host,
the receiving process needs to have an address with both:
an address of the host OS itself (IP), and
an identifier that specifies the specific receiving process in the destination host OS (port).

Popular applications have been assigned specific port numbers, e.g.,
A Web server (using the HTTP protocol) is usually identified by port number 80.
A mail server process (using the SMTP protocol) is usually identified by port number 25.
An SSH server is usually identified on port number 22

For example, to send HTTP message to gaia.cs.umass.edu web server:

IP address: 128.119.245.12
port number: 80

Hosts are often identified by IP address:

IP address is a 32-bit quantity uniquely identifying the host, in ipv4.
Addresses are 128 bit for ipv6.
We ran out of ipv4 addresses…

The sending process must also identify the receiving process (more specifically, the receiving socket) running in the host.
This information is needed because in general a host could be running many network applications.
A destination port number serves this purpose.

++++++++++++++ Cahoot-02-2

1.5.4 Preview of transport services

Data integrity
Some apps (e.g., file transfer, web transactions) require 100% reliable data transfer.
Other apps (e.g., audio) can tolerate some loss.

Timing
Some apps (e.g., Internet telephony, interactive games) require low delay to be “effective”.
Some apps (e.g., file download) tolerate delay.

Throughput
Some apps (e.g., multimedia) require minimum amount of throughput to be “effective”.
Other apps (“elastic apps”) make use of whatever throughput they get.

Security
Encryption, data integrity, …

1.5.4.1 Multiple types of service:

Unreliable datagram (UDP)
02-Application/service.png

Byte-stream (TCP)
02-Application/stream.png

Service requirements for applications?
02-Application/app_layer03.png

Applications usually choose between TCP and UDP
TCP and UDP are transport layer.
Employed by most application layer programs.
Other Transport layer, or pseudo-transport layer protocols exist.
SCTP (stream control transmission protocol), SSU (I2P app), DCCP, RUDP, UDP-lite, etc.
An application designer could design their own transport layer protocol,
since Transport layer and up runs on end hosts, as opposed to network infrastructure.
Could build into the core/kernel of end operating systems and languages as new socket type.
Could also just design the features into the application layer,
rather than actually get a transport protocol built into the kernel of an OS.
UDP lets you build new things!

1.5.4.1.1 TCP

Connection-oriented service
TCP has the client and server exchange transport-layer control information with each other,
before the application-level messages begin to flow.
This so-called handshaking procedure alerts the client and server,
allowing them to prepare for an onslaught of packets.
After the handshaking phase,
a TCP connection is said to exist between the sockets of the two processes.
The connection is a full-duplex connection,
in that the two processes can both send messages to each other,
over the connection at the same time, bi-directionally
When the application finishes sending messages,
it must tear down the connection.

TCP has a Reliable data transfer service
The communicating processes can rely on TCP to deliver all data sent without error and in the proper order.

TCP also includes a congestion-control mechanism
The TCP congestion-control mechanism throttles a sending process (client or server),
when the network is congested between sender and receiver.

Summary
Reliable transport: between sending and receiving process.
Flow control: sender won’t overwhelm receiver.
Congestion control: throttle sender when network overloaded.
Does not provide: timing, minimum throughput guarantee, security.
Connection-oriented: setup required between client and server processes.

1.5.4.1.2 UDP

UDP is a no-frills, lightweight transport protocol, providing minimal services.
UDP is connectionless, so there is no handshaking before the two processes start to communicate.
UDP provides an unreliable data transfer service.
When a process sends a message into a UDP socket,
UDP provides no guarantee that the message will ever reach the receiving process.
Messages that do arrive at the receiving process may arrive out of order.
UDP does not include a congestion-control mechanism,
so the sending side of UDP can attempt to pump data into the layer below (the network layer) at any rate it pleases.

UDP service:
Provides: Unreliable data transfer between sending and receiving process
Does not provide: Reliability, flow control, congestion control, timing, throughput guarantee, security, or connection setup,

Discussion questions:
Why bother with UDP?
With TCP, why is there a UDP?

++++++++++++++ Cahoot-02-3

1.5.4.2 Encryption

https://en.wikipedia.org/wiki/Transport_Layer_Security

Neither base TCP nor UDP provide any encryption!

An Enhancement for TCP provides:
1. encryption,
2. data integrity, and
3. end-point authentication.

The great thing about a TLS joke,
is that you can tell if it’s not the original…

This security enhancement used to be called called Secure Sockets Layer (SSL).
Transport layer security (TLS) is just a newer version of SSL.
SSL was the name of now-defunct versions of what is now modern TLS.
TLS is not a third Internet transport protocol,
on the same level as TCP and UDP, but an enhancement of TCP, at the application layer.
Applications needs to include TLS code (existing libraries) in both the client and server sides of the application.
TLS has its own socket API that is similar to the traditional TCP socket API.
sending process passes cleartext data to the SSL socket;
TLS in the sending host then encrypts the data and passes the encrypted data to the TCP socket.
encrypted data travels over the Internet to the TCP socket in the receiving process.
receiving socket passes the encrypted data to SSL, which decrypts the data.
TLS passes the cleartext data through its SSL socket to the receiving process.
TLS socket API is like that of a TCP socket (just import and use).

Transport layer protocols used
02-Application/app_layer04.png

Tunneling:
Inner-most -> Outer-more… -> Outer-most
Application -> TLS -> TCP -> IP -> MAC -> Ethernet -> Physical

More detail here:
05-Security.html
../../Security/Content/12a-AppliedCryptoSystems.html

1.5.5 Application-layer protocols

An application-layer protocol defines how an application’s processes,
running on different end systems, pass messages to each other, for example:

The types of messages exchanged, for example, request messages and response messages.
E.g., request, response.

The syntax of the various message types, such as the fields in the message and how the fields are delineated.

The semantics of the fields.
meaning of the information in the fields.

Rules for determining when and how a process sends messages and responds to messages, and change state.

Open protocols:
Defined in RFCs.
Allows for interoperability.
e.g., HTTP, SMTP

Proprietary protocols:
e.g., Skype (used to be open, fun story)

1.6 Web and HTTP

https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol
https://tools.ietf.org/html/rfc7230 (HTTP, show this)
https://media.pearsoncmg.com/aw/ecs_kurose_compnetwork_7/cw/content/interactiveanimations/http-delay-estimation/index.html
https://www.computer-networking.info/1st/html/application/http.html
https://www.computer-networking.info/2nd/html/protocols/http.html

1.6.1 Web protocol example

Observe HTTP with Wireshark:

$ nc -C info.cern.ch 80
$ GET / HTTP/1.1
$ Host: info.cern.ch

$ ncat -C hackware.ru 80
$ GET / HTTP/1.0
$ Host: hackware.ru

Trace HTTP conversation in Wireshark
Observe each packet has headers from multiple layers

These must be typed exactly, or they will not work!
ncat -C option is for crlf: $ man ncat to read more

Encrypted option:

$ ncat -C --ssl hackware.ru 443
$ GET / HTTP/1.0
$ Host: hackware.ru

1.6.2 Web server and clients

HTTP: hypertext transfer protocol is the Web’s application layer protocol.

Client/Server model:
Client: Browser that requests, receives, (using HTTP protocol) and “displays” Web objects.
Server: Web server sends (using HTTP protocol) objects in response to requests.

Web pages
A Web page (also called a document) consists of objects.
An object is simply a file such as an HTML file, a JPEG image, a Java applet (lol…),
or a video clip that is addressable by a single URL.
If a Web page contains HTML text and five JPEG images,
then the Web page has six objects: the base HTML file plus the five images.
The base HTML file references the other objects in the page with the objects’ URLs.

Each URL has two components:
the hostname of the server that houses the object and
the object’s path name.

For example, the URL:
http://www.someSchool.edu/someDepartment/picture.gif
has www.someSchool.edu for a hostname and
/someDepartment/picture.gif for a path name.

02-Application/webpage.png
http://www.w3.org/MarkUp/ defines the standard

HTTP/1 and HTTP/2 use TCP (not UDP).

The HTTP client first initiates a TCP connection with the server.
Once the connection is established, the browser and the server processes access TCP through their socket interfaces.
Server sends requested files to clients without storing any state information about the client, a stateless protocol.

Client initiates TCP connection (creates socket) to server, port 80 or 443.
server accepts TCP connection from client.
HTTP messages (application-layer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server).
TCP connection closed.

1.6.3 Persistence

HTTP is “stateless”.
Server maintains no information about past client requests as a part of the protocol itself.
It could do so in other ways, but that would not be part of HTTP.

Protocols that maintain “state” are complex!
Past history (state) must be maintained.
If server/client crashes, their views of “state” may be inconsistent, must be reconciled.

HTTP can choose either:
Each request/response pair sent over a separate TCP connection (non-persistent connections), or
all of the requests and their corresponding responses sent over the same TCP connection (persistent connections).

HTTP sequence
A base HTML file and 10 JPEG images, and that all 11 of these objects reside on the same server:
<http://www.someSchool.edu/someDepartment/home.index>

HTTP client process initiates a TCP connection to the server https://www.someSchool.edu on port number 80,
which is the default port number for HTTP.
Associated with the TCP connection, there will be a socket at the client and a socket at the server.
HTTP client sends an HTTP request message to the server via its socket.
The request message includes the path name /someDepartment/home.index.
HTTP server process receives the request message via its socket,
retrieves the object /someDepartment/home.index from its storage (RAM or disk),
encapsulates the object in an HTTP response message,
and sends the response message to the client via its socket.
HTTP server process tells TCP to close the TCP connection.
But TCP doesn’t actually terminate the connection,
until it knows for sure that the client has received the response message intact.
HTTP client receives the response message.
The TCP connection terminates.
The message indicates that the encapsulated object is an HTML file.
The client extracts the file from the response message, examines the HTML file,
and finds references to the 10 JPEG objects.
first four steps are then repeated for each of the referenced JPEG objects.

Time to fill a request
02-Application/app_layer06.png

Non-persistent HTTP
At most one object sent over TCP connection.
Connection then closed.
Downloading multiple objects required multiple connections.

Persistent HTTP
Multiple objects can be sent over single TCP connection between client, server

Disadvantages of non-persistent connections
First, a brand-new connection must be established and maintained for each requested object.
For each of these connections, TCP buffers must be allocated and TCP variables must be kept in both the client and server.
Each object suffers a delivery delay of two RTTs one RTT to establish the TCP connection and one RTT to request and receive an object.

RTT (definition): time for a small packet to travel from client to server and back

HTTP response time:
One RTT to initiate TCP connection,
one RTT for HTTP request and first few bytes of HTTP response to return,
plus the file transmission time.

Non-persistent HTTP response time:
2RTT + file transmission time.

Non-persistent HTTP issues:
Requires 2 RTTs per object
OS overhead for each TCP connection.
Browsers often open parallel TCP connections to fetch referenced objects.

Persistent connections
With persistent connections, the server leaves the TCP connection open after sending a response.
Subsequent requests and responses between the same client and server can be sent over the same connection.
Multiple Web pages residing on the same server can be sent from the server to the same client over a single persistent TCP connection.
Requests for objects can be made back-to-back, without waiting for replies to pending requests (pipelining).
Typically, the HTTP server closes a connection when it isn’t used for a certain time (a configurable timeout interval).

Persistent HTTP:
Server leaves connection open after sending response.
Subsequent HTTP messages between same client/server sent over open connection.
Client sends requests as soon as it encounters a referenced object.
As little as one RTT for all the referenced objects.

++++++++++++++ Cahoot-02-4

1.6.4 Message format

two types of HTTP messages:
1. request,
2. response

HTTP request message:
ASCII (human-readable format)

1.6.4.1 HTTP request message

GET /somedir/page.html HTTP/1.1
Host: www.mst.edu
Connection: close
User-agent: Mozilla/5.0
Accept-language: en

General request
02-Application/app_layer07.png
sp=space; cr=carriage return; lf=line feed

Example:

GET /index.html HTTP/1.1\r\n
Host: www-net.cs.umass.edu\r\n
User-Agent: Firefox/3.6.10\r\n
Accept: text/html,application/xhtml+xml\r\n
Accept-Language: en-us,en;q=0.5\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7\r\n
Keep-Alive: 115\r\n
Connection: keep-alive\r\n
\r\n

1.6.4.2 HTTP Response Message

HTTP/1.1 200 OK
Connection: close
Date: Tue, 09 Aug 2011 15:44:04 GMT
Server: Apache/2.2.3 (CentOS)
Last-Modified: Tue, 09 Aug 2011 15:11:03 GMT
Content-Length: 6821
Content-Type: text/html
(data data data data data ...
The entity body is the
meat of the message,
it contains the requested object itself)

General reply
02-Application/app_layer08.png
sp=space; cr=carriage return; lf=line feed

Example:

HTTP/1.1 200 OK\r\n
Date: Sun, 26 Sep 2010 20:09:20 GMT\r\n
Server: Apache/2.0.52 (CentOS)\r\n
Last-Modified: Tue, 30 Oct 2007 17:00:02 GMT\r\n
ETag: "17dc6-a5c-bf716880"\r\n
Accept-Ranges: bytes\r\n
Content-Length: 2652\r\n
Keep-Alive: timeout=10, max=100\r\n
Connection: Keep-Alive\r\n
Content-Type: text/html; charset=ISO-8859-1\r\n
\r\n
data data data data data ...

Server responses
Status code appears in 1st line in server-to-client response message.

200 OK: Request succeeded and the information is returned in the response.
301 Moved Permanently: Requested object has been permanently moved; the new URL is specified in Location: header of the response message. The client software will automatically retrieve the new URL.
400 Bad Request: This is a generic error code indicating that the request could not be understood by the server.
404 Not Found: The requested document does not exist on this server.
505 HTTP Version Not Supported: The requested HTTP protocol version is not supported by the server.

The best thing about 404 jokes is …
wait, damnit, it’s around here somewhere…

418: I’m a teapot

A “real” joke built into the protocol
https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/418
https://en.wikipedia.org/wiki/Hyper_Text_Coffee_Pot_Control_Protocol
https://save418.com/

Old example: Open TCP connection, send GET request

telnet cis.poly.edu 80
GET /~ross/ HTTP/1.1
Host: cis.poly.edu

E.g.,
02-Application/Http_request_telnet.png

Note: ncat has generally replaced telnet, though they both still work

nc -C info.cern.ch 80
GET / HTTP/1.1
Host: info.cern.ch

1.6.4.3 Uploading form input

POST method:
Web page often includes form input.
Input is uploaded to server in entity body (i.e., message part of packet).

URL method:
Uses GET method.
Input is uploaded in URL field of request line:
www.somesite.com/animalsearch?monkeys&banana

We’ll demo this later!

1.6.5 Method types

HTTP/1.0:
GET
POST
HEAD (asks server to leave requested object out of response)

HTTP/1.1:
GET, POST, HEAD
PUT (uploads file in entity body to path specified in URL field)
DELETE (deletes file specified in the URL field)

Demonstrate:
More wireshark HTTP examples in detail
http://info.cern.ch/
via nc
via a browser that does not generate junk traffic:
epiphany or surf or qutebrowser
Record it in Wireshark
Identify HTTP headers, match them to fields

1.6.6 Cookies

Many Web sites use cookies.
Four components:
1) cookie header line of HTTP response message
2) cookie header line in next HTTP request message
3) cookie file kept on user’s host, managed by user’s browser
4) back-end database at Web site

Example:

Susan always access Internet from PC.
She visits specific e-commerce site for first time.
When initial HTTP requests arrives at site, site creates both a:
unique ID, and
entry in backend database for ID
02-Application/app_layer09.png

What cookies can be used for:
authorization,
shopping carts,
recommendations,
user session state (Web e-mail),
tracking.

How to keep “state”:
Protocol endpoints: maintain state at sender/receiver over multiple transactions.
Cookies: http messages carry state.

Cookies and privacy:
Cookies permit sites to learn a lot about you.
You may supply name and e-mail to sites.

1.6.7 Caching

Proxy server can cache web file data.
Goal: satisfy client request without involving origin server.

User sets browser: Web accesses via cache.
Browser sends all HTTP requests to cache.
If object in cache, then cache returns object.
Else cache requests object from origin server
then returns object to client.

Bottleneck
02-Application/app_layer11.png

Caching helps bottleneck
02-Application/app_layer12.png

Demos
Briefly show (not link) $ Webserver.py
Run it, show visiting in browser <http://localhost:6789>

1.6.8 QUIC

Sometimes UDP is used simply because it allows new or experimental protocols to run entirely as user-space applications.
No kernel updates are required, as would be the case with TCP changes.
Google has created a protocol named QUIC (Quick UDP Internet Connections,
chromium.org/quic) in this category, rather specifically to support the HTTP protocol.
QUIC can in fact be viewed as a transport protocol specifically tailored to HTTPS:
HTTP plus TLS encryption.

Some interesting reading on QUIC (Google’s web protocol on top of UDP):
http://intronetworks.cs.luc.edu/current2/uhtml/udp.html#quic
https://en.wikipedia.org/wiki/QUIC
https://daniel.haxx.se/blog/2018/11/11/http-3/
https://blog.cloudflare.com/the-road-to-quic/
Both innovative, and breaks federated interoperability, which has pros and cons.

I received this HTTP 200 joke.
It was OK…

++++++++++++++ Cahoot-02-5

1.6.9 Demo first Wireshark lab on HTTP myself in class

Simple file
http://gaia.cs.umass.edu/wireshark-labs/HTTP-wireshark-file1.html
GET, OK, etc.

Refreshing a cached page
http://gaia.cs.umass.edu/wireshark-labs/HTTP-wireshark-file2.html
IF-MODIFIED-SINCE, refresh, re-sent?

Large file
http://gaia.cs.umass.edu/wireshark-labs/HTTP-wireshark-file3.html
Initial HTTP GET, TCP segments, how many HTTP OK, when?

Multiple-parts
http://gaia.cs.umass.edu/wireshark-labs/HTTP-wireshark-file4.html
Notice the image retrieval.
What entity is responsible for requesting the multiple objects in a page, when?

“Secure” web-page with login
http://gaia.cs.umass.edu/wireshark-labs/protected_pages/HTTP-wireshark-file5.html
username: wireshark-students
password: network
auth field of request

#!/usr/bin/python3

import base64
coded_string = "d2lyZXNoYXJrLXN0dWRlbnRz="
base64.b64decode(coded_string)

#!/bin/bash

echo -n d2lyZXNoYXJrLXN0dWRlbnRz= | base64 -d

Let’s think about http, privacy, and security in various scenarios:
https://www.eff.org/pages/tor-and-https

1.6.10 Reverse http proxy

https://en.wikipedia.org/wiki/Reverse_proxy
Reverse proxy allows multiple HTTP endpoints (via more than “Host:”),
at one IP/domain.
* [ ] expand on this maybe later.

1.6.11 CORS security in browser

https://en.wikipedia.org/wiki/Cross-origin_resource_sharing

1.7 Socket programming

Goal
Learn how to build client/server applications that communicate using sockets.

Socket:
A tunnel between application processes, in an end-to-end transport protocol

Two primary socket types for two transport services exist.
UDP is an unreliable, lightweight datagram service.
TCP is a reliable, heavier, byte-stream, connection oriented service.

1.7.1 Example

Application example we’ll put in code, in order:

Client inputs a line of characters (data) from the keyboard, and
sends the data to server.

Server receives the data,
converts the characters to uppercase, and
sends the modified data to client.

Client receives modified data, and
displays it as a printed line on the screen

1.7.1.1 UDP

UDP involves no persistent “connection” between a client and server.
No handshaking occurs before sending data.
A sender explicitly attaches a destination IP address and port number to each packet.
A receiver extracts the sender’s IP address and port number from each received packet.
Transmitted data may be lost.
Transmitted data may be received out-of-order.
UDP provides unreliable transfer of groups of bytes (“datagrams”) between client and server.

UDP socket code:
02-Application/socket_01_UDP_server.py
02-Application/socket_01_UDP_client.py

Demonstrate:
0. Run in background:
python3 socket_01_UDP_server.py

Show Wireshark watching the client and server code:
sudo wireshark &
Connect with:

python3 socket_01_UDP_client.py
nc -uC 127.0.0.1 6789
man nc # ncat can send UDP packets too!

Show how nc or multiple python clients can block

+++++++++++++++++++++++++++++++++ Cahoot-02-13

1.7.1.2 TCP

A server process must first be running.
The server must have created a TCP socket,
that welcomes a client’s contact.
A client contacts a server.
The client specifies an IP address and port number of a server process.
The client uses that address to create a TCP socket.
The client’s TCP socket establishes a connection to the server.
When contacted by a client on the welcoming socket,
the server’s TCP socket creates a secondary new socket,
for the server process to communicate with that particular client.
This allows server to talk with multiple clients.
Source port numbers distinguish different clients.
TCP provides a reliable, in-order, byte-stream transfer between a client and server.

TCP socket code:
Example 1:

02-Application/socket_02_TCP_server.py
02-Application/socket_02_TCP_client.py

Example 2:

02-Application/socket_02_TCP_server2.py
02-Application/socket_02_TCP_client2.py

Demonstrate:
0. Run a server:
python3 socket_02_TCP_server.py

Show Wireshark watching the client and server code:
sudo wireshark &
Connect with

python socket_02_TCP_client.py
nc -C 127.0.0.1 6789

show currently connected sockets with:

# new option
man ss
ss

# old, ss is better
man netstat
netstat -an

# another option
man lsof
lsof -i -n

02-Application/app_layer28.png
The term “port” is not the same idea or definition as the term “socket”.

Socket is an instance object dually created both:
within a requesting application, and
within an operating system for that requesting application.

Port is a designation dually configured both:
as field in the transport-layer headers, in actual packets, and
in the OS’s kernel networking core, and firewall configuration.
The OS’s kernel routes packets to the application.

Step by step:
02-Application/app_layer29.png

+++++++++++++++++++++++++++++++++ Cahoot-02-14

1.7.2 Concurrency in in python

https://realpython.com/python-concurrency/

1.7.2.1 Global interpreter lock

https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock
https://wiki.python.org/moin/GlobalInterpreterLock
https://realpython.com/python-gil/
Don’t like the GIL,
perhaps go with pypy:
https://www.pypy.org
https://realpython.com/pypy-faster-python/

You can use either:
multithreading
multiprocessing
asyncio

When should you use each?
multithreading to deal with simple blocking (no real speed up).
multiprocessing to run over multiple cores (speed up).
asyncio to deal more more complex or larger-scale needs (often blocking).

1.7.2.2 Multiprocessing

https://www.geeksforgeeks.org/multiprocessing-python-set-1/
https://www.geeksforgeeks.org/multiprocessing-python-set-2/

1.7.2.3 Multithreading

https://www.geeksforgeeks.org/multithreading-python-set-1/
https://www.geeksforgeeks.org/multithreading-in-python-set-2-synchronization/
https://realpython.com/intro-to-python-threading/

See my code now:
02-Application/thread_00_none.py
02-Application/thread_01_unrolled.py
02-Application/thread_02_fake.py
02-Application/thread_03_storage.py

Show multi-threaded examples now:
02-Application/socket_04_TCP_server_mt.py
02-Application/socket_04_TCP_client_mt.py

Now, nc does not block the server from other client’s:

python3 socket_04_TCP_server_mt.py &
nc -C 127.0.0.1 50002
python3 socket_04_TCP_server_mt.py

1.7.3 General socket functionality

https://docs.python.org/3/library/socket.html
https://realpython.com/python-sockets/

Let’s review some program-internal functions.

1.7.3.1 socket.socket.bind

>>> help(socket.socket.bind)
bind(...)
bind(address)
Bind the socket to a local address.
For IP sockets, the address is a pair (host, port);
the host must refer to the local host.
For raw packet sockets the address is a tuple
(ifname, proto [,pkttype [,hatype [,addr]]])

socket.socket.bind takes a tuple: (hostname or IP, port)
https://serverfault.com/questions/78048/whats-the-difference-between-ip-address-0-0-0-0-and-127-0-0-1
What are valid hostname or IP addresses to use?

The use of the term “local” above is ambiguous.
Q: What does it mean here, operationally?
A: That the IP address being bound is assigned to an interface managed by your operating system!
More to come on interfaces when we cover the network layer:
04-NetworkData.html

""
defaults to all traffic to the machine.
It is the same as 0.0.0.0 for IPv4.
It’s easier for IPv6.

0.0.0.0
which also listens to all traffic on the machine
(0.0.0.0 means various different things in different contexts).
https://www.rfc-editor.org/rfc/rfc1122#page-29 section 3.2.1.3
(a) { 0, 0 }
This host on this network.
MUST NOT be sent,
except as a source address as part of an initialization procedure,
by which the host learns its own IP address.
See also Section 3.3.6 for a non-standard use of {0,0}.
https://www.rfc-editor.org/rfc/rfc5735#section-3
0.0.0.0/8 - Addresses in this block refer to source hosts on “this” network.
Address 0.0.0.0/32 may be used as a source address for this host on this network;
other addresses within 0.0.0.0/8 may be used to refer to specified hosts on this network ([RFC1122], Section 3.2.1.3).
Despite the standard, 0.0.0.6 for example, won’t bind in python3.

<hostname>
https://docs.python.org/3/library/socket.html
If you use a hostname in the host portion of IPv4/v6 socket address,
the program may show a nondeterministic behavior,
as Python uses the first address returned from the DNS resolution.
The socket address will be resolved differently into an actual IPv4/v6 address,
depending on the results from DNS resolution and/or the host configuration.
For deterministic behavior use a numeric address in host portion.
On my Fedora machine, it resolves to 127.0.0.1.
Hostname is a shallow alias, implemented via checking: /etc/hosts

127.0.0.1 through 127.255.255.254 (CIDER notation: 127.0.0.0/8)
https://www.rfc-editor.org/rfc/rfc5735#section-3
127.0.0.0/8 - This block is assigned for use as the Internet host loopback address.
A datagram sent by a higher-level protocol to an address anywhere within this block loops back inside the host.
This is ordinarily implemented using only 127.0.0.1/32 for loopback.
As described in [RFC1122], Section 3.2.1.3,
addresses within the entire 127.0.0.0/8 block do not legitimately appear on any network anywhere.
Your local machine only.
You can use 127.0.0.4 (or whatever in the range),
but that socket will only be reachable on that IP.
Python’s sending socket defaults to 127.0.0.1 as the sending IP,
when sending to any localhost address.

A LAN-only IP address
10.0.0.0 - 10.255.255.255 (10.0.0.0/8 prefix)
172.16.0.0 - 172.31.255.255 (172.16.0.0/12 prefix)
192.168.0.0 - 192.168.255.255 (192.168.0.0/16 prefix)
https://datatracker.ietf.org/doc/html/rfc1918
These IP ranges are declared as LAN IPs,
as opposed to public, globally routable IPs,
or to localhost IPs, etc.
If you have an interface bound to an IP in this range,
then you could bind any of these.
If your interface in the OS is not bound to one,
then you can not bind the socket in python either.

A public, globally routable IP address
More-or-less anything not in the below list:
https://www.iana.org/assignments/iana-ipv4-special-registry/iana-ipv4-special-registry.xhtml
https://en.wikipedia.org/wiki/IPv4#Special-use_addresses
If you have an interface bound to an IP in this range,
then you could bind any of these.
If your interface in the OS is not bound to one,
then you can not bind the socket in python either.

Which should you choose?

If you’re debugging locally,
then use 127.0.0.1.

If you are lazy,
then use “” or 0.0.0.0

If you want more security,
then consider using a specific IP,
of an interface on your machine.

1.7.3.2 POSIX sockets

Below, we illustrate state diagrams for UDP and TCP sockets.
These are standard POSIX sockets,
also known as BSD or Berkeley sockets.
https://en.wikipedia.org/wiki/Berkeley_sockets
Many languages use similar BSD sockets to those in the C language.
Python’s also follow the below API.

Discussion question:
What is the value of having a POSIX standard?
What is the value of specifying the socket API itself as part of POSIX?
https://en.wikipedia.org/wiki/POSIX

1.7.3.2.1 UDP

02-Application/UDP-socket-programming.png

1.7.3.2.2 TCP

The overview
02-Application/sockets-ethernet-interface.webp

The states:
02-Application/sockets-tcp-flow.webp

02-Application/socket_Berkeley_SOCKET.jpg

To think ahead to what we’re covering next,
TCP’s actual internal FSM is much more detailed than this!
These images below are just the high-level API.

1.8 FTP

File Transfer Protocol

https://en.wikipedia.org/wiki/File_Transfer_Protocol
https://tools.ietf.org/html/rfc2428

Transfer file to/from remote host.
Client/Server model:
- Client: side that initiates transfer (either to/from remote)
- Server: remote host
ftp: RFC 959
ftp server: port 21

FTP client contacts FTP server at port 21, using TCP.
Client authorized over control connection.
Client browses remote directory, sends commands over control connection.
When server receives file transfer command, server opens 2nd TCP data connection (for file) to client.
After transferring one file, server closes data connection.
Server opens another TCP data connection to transfer another file.
Control connection: “out of band”.
FTP server maintains “state”: current directory, earlier authentication.

FTP control and data connections
FTP uses two parallel TCP connections to transfer a file, a control connection and a data connection.
The control connection is used for sending control information between the two hosts,
information such as user identification, password, commands to change remote directory,
and commands to “put” and “get” files.
The data connection is used to actually send a file.
FTP is said to send its control information out-of-band.
HTTP sends request and response header lines into the same TCP connection that carries the transferred file itself,
named in-band.

FTP sequence
When a user starts an FTP session with a remote host,
the client side of FTP (user) first initiates a control TCP connection with the server side (remote host) on server port number 21.
Client side of FTP sends the user identification and password over this control connection.
Client side of FTP also sends, over the control connection, commands to change the remote directory.
When the server side receives a command for a file transfer over the control connection (either to, or from, the remote host),
the server side initiates a TCP data connection to the client side.
FTP sends exactly one file over the data connection and then closes the data connection.
If, during the same session, the user wants to transfer another file,
then FTP opens another data connection.
Control connection remains open throughout the duration of the user session,
but a new data connection is created for each file transferred within a session
(data connections are non-persistent).

FTP requests
Commands, from client to server, and replies, from server to client, are sent across the control connection in 7-bit ASCII format.
In order to delineate successive commands, a carriage return and line feed end each command.
Each command consists of four uppercase ASCII characters, some with optional arguments:

USER username: Used to send the user identification to the server.

PASS password: Used to send the user password to the server.

LIST: Used to ask the server to send back a list of all the files in the current remote directory. The list of files is sent over a (new and non-persistent) data connection rather than the control TCP connection.

RETR filename: Used to retrieve (that is, get) a file from the current directory of the remote host. This command causes the remote host to initiate a data connection and to send the requested file over the data connection.

STOR filename: Used to store (that is, put) a file into the current directory of the remote host.

FTP replies Some typical replies, along with their possible messages, are as follows:

331 Username OK, password required
125 Data connection already open; transfer starting
425 Can’t open data connection
452 Error writing file

Demonstrate:
[ ] Find an open ftp site, watch connection with wireshark
With sftp, do we see any application layer protocol details with Wireshark?

1.9 E-mail

https://www.computer-networking.info/1st/html/application/email.html
https://www.computer-networking.info/2nd/html/protocols/email.html

https://tools.ietf.org/html/rfc5321 (SMTP)
https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol

https://tools.ietf.org/html/rfc1939 (POP3)
https://en.wikipedia.org/wiki/Post_Office_Protocol

https://tools.ietf.org/html/rfc3501 (IMAP)
https://en.wikipedia.org/wiki/Internet_Message_Access_Protocol

Three major components:
1. user agents
2. mail servers
3. simple mail transfer protocol: SMTP

User Agent
a.k.a. “mail reader”
For composing, editing, reading mail messages.
e.g., Thunderbird, K9, Outlook, Kmail, iPhone mail client, etc.
Outgoing and incoming messages can be stored on server.

Mail servers:
Mailbox contains incoming messages for user.
Message queue of outgoing (to be sent) mail messages.
SMTP protocol between mail servers to send email messages, with each entity:
client: sending mail server
server: receiving mail server

1.9.1 Email protocol example

Observe SMTP with wireshark (does any of this show in wireshark)

ncat -C smtp.zoho.com 587

Does any of the application layer information show in wireshark here?
ncat --ssl -C smtp.zoho.com 465
HELO web.site, MAIL FROM, RCPT TO, DATA, QUIT

Observe POP

ncat --ssl -C pop.zoho.com 995
user bob
pass password
list

1.9.2 Overview: SMTP, mail servers, mail user agents

User-agent is local, but also remote.
User-agent used to be on remote machine.
Then, real mail user-agents were used.
Then user-agent moved back onto remote machines.

Mail server is a messy multi-part aggregate of things in software.

How many people still use a real, local, MUA?

1.9.3 SMTP

Electronic Mail: SMTP
[RFC 2821]

Uses TCP to reliably transfer email message from client to server, port 25
Direct transfer: sending server to receiving server
Three phases of transfer
- Handshaking (greeting)
- Transfer of messages
- Closure
Command/response interaction (like HTTP)
- Commands: ASCII text
- Response: status code and phrase
Messages must be in 7-bit ASCI

Alice sends a message to Bob
02-Application/app_layer16.png

Basic process

Alice invokes her user agent for e-mail, provides Bob’s e-mail address (for example, bob@someschool.edu), composes a message, and instructs the user agent to send the message.
Alice’s user agent sends the message to her mail server, where it is placed in a message queue.
The client side of SMTP, running on Alice’s mail server, sees the message in the message queue. It opens a TCP connection to an SMTP server, running on Bob’s mail server.
After some initial SMTP handshaking, the SMTP client sends Alice’s message into the TCP connection.
At Bob’s mail server, the server side of SMTP receives the message. Bob’s mail server then places the message in Bob’s mailbox.
Bob invokes his user agent to read the message at his convenience.

Example SMTP transcript
Hostname of the client is crepes.fr
Hostname of the server is server.edu

S: 220 server.edu
C: EHLO crepes.fr // a nicer HELO
S: 250 Hello crepes.fr, pleased to meet you
C: MAIL FROM: <alice@crepes.fr>
S: 250 alice@crepes.fr ... Sender ok
C: RCPT TO: <bob@server.edu>
S: 250 bob@server.edu ... Recipient ok
C: DATA
S: 354 Enter mail, end with "." on a line by itself
C: Do you like ketchup?
C: How about pickles?
C: .
S: 250 Message accepted for delivery
C: QUIT
S: 221 server.edu closing connection

Another SMTP example
02-Application/smtp.png

base64 encoding is required for username and password:
https://en.wikipedia.org/wiki/Base64

c: AUTH LOGIN
s: 334 VXNlcm5hbWU6
c: yourusernameinb64encoding
s: 334 VXNlcm5hbWU6
c: yourpasswordinb64encoding

To get base64 encoding of a string:

# encode in bash
$ echo -n 'string' | base64

# decode in bash
$ echo -n c3RyaW5nCg== | base64 -d

# In python:
>>> import base64
>>> base64.b64encode('string'.encode())
>>> base64.b64decode('c3RyaW5n')

then, you can proceed sending:

C: MAIL FROM: <alice@crepes.fr>
S: 250 alice@crepes.fr ... Sender ok
C: RCPT TO: <bob@server.edu>
S: 250 bob@server.edu ... Recipient ok
C: DATA
S: 354 Enter mail, end with "." on a line by itself
C: Do you like ketchup?
C: How about pickles?
C: .
S: 250 Message accepted for delivery
C: QUIT
S: 221 server.edu closing connection

1.9.3.2 Notes

SMTP uses persistent connections
SMTP requires message (header and body) to be in 7-bit ASCII
SMTP server uses CRLF.CRLF to determine end of message

Comparison with HTTP:

HTTP: pull
SMTP: push
both have ASCII command/response interaction, status codes
HTTP: each object encapsulated in its own response message
SMTP: multiple objects sent in multipart message

1.9.4 Message format

SMTP: protocol for exchanging email messages
RFC 822: standard for text message format:
- header lines, e.g.,
  - To:
  - From:
  - Subject:
  - different from SMTP MAIL FROM, RCPT TO: commands!
- Body: the “message”
  - ASCII characters only

Message header

Header containing peripheral information that precedes the body of the message itself.
The header lines and the body of the message are separated by a blank line (CRLF).
RFC 5322 specifies the exact format for mail header lines as well as their semantic interpretations.
As with HTTP, each header line contains readable text, consisting of a keyword followed by a colon followed by a value.
Some of the keywords are required and others are optional.
Every header must have a From: header line and a To: header line; a header may include a Subject: header line as well as other optional header lines.

From: alice@crepes.fr
To: bob@hamburger.edu
Subject: Searching for the meaning of life.

Show: Open an email in Mutt/raw to illustrate headers, MIME, multipart

1.9.5 Access protocols

Email protocols and direction of communication
When sent an email by Alice, how does a recipient like Bob, running a user agent on his local PC, obtain his messages, which are sitting in a mail server within Bob’s mail provider?
02-Application/app_layer17.png
* Post Office Protocol—Version 3 (POP3)
* Internet Mail Access Protocol (IMAP)
* HTTP

SMTP: delivery/storage to receiver’s server
mail access protocol: retrieval from server
- POP: Post Office Protocol [RFC 1939]: authorization, download
- IMAP: Internet Mail Access Protocol [RFC 1730]: more features, including manipulation of stored messages on server
- HTTP: mailbox.org, gmail, Hotmail, Yahoo! Mail, etc.
  - Not really HTTP. You just access the user-agent via HTTP.

1.9.5.1 POP3

In a POP3 transaction, the user agent issues commands, and the server responds to each command with a reply.
The authorization phase has two principal commands:
- user
- pass

C: client
S: server

authorization phase
- client commands:
  - user: declare username
  - pass: password
- server responses
  - +OK
  - -ERR

ncat mailServer 110
S: +OK POP3 server ready
C: user bob
S: +OK
C: pass hungry
S: +OK user successfully logged on

transaction phase client:
- list: list message numbers
- retr: retrieve message by number
- dele: delete
- quit

C: list
S: 1 498
S: 2 912
S: .
C: retr 1
S: (blah blah ...
S: .................
S: ..........blah)
S: .
C: dele 1
C: retr 2
S: (blah blah ...
S: .................
S: ..........blah)
S: .
C: dele 2
C: quit
S: +OK POP3 server signing off

Another POP3 example
02-Application/pop3.png

more about POP3
- previous example uses POP3 “download and delete” mode
  - Bob cannot re-read e-mail if he changes client
- POP3 “download-and-keep”: copies of messages on different clients
- POP3 is stateless across sessions

1.9.5.2 IMAP

keeps all messages in one place: at server
allows user to organize messages in folders
keeps user state across sessions:
names of folders and mappings between message IDs and folder name
MUCH more complex
Less secure

++++++++++++++ Cahoot-02-6

1.10 Domain name resolution (DNS)

https://en.wikipedia.org/wiki/Name_server
https://en.wikipedia.org/wiki/Domain_Name_System
https://www.computer-networking.info/1st/html/application/dns.html
https://www.computer-networking.info/2nd/html/protocols/dns.html
http://intronetworks.cs.luc.edu/current2/uhtml/ipv4companions.html#dns
https://en.wikipedia.org/wiki/Domain_Name_System#RFC_documents
<https://media.pearsoncmg.com/aw/ecs_kurose_compnetwork_7/cw/content/interactiveanimations/recursive-iterative-queries-in-dns/index.html|recursive query demo>
https://www.homenethowto.com/basics/dns-linking-names-with-ip-addresses/

Discussion question:
At first guess, would you think the internet has a kill-switch, like it might in a Hollywood movie?
If it did, what might the consequences be?
On businesses?
On people?
In the USA?
In China?
In Russia?
In Kazakhstan?
etc.

“The Domain Name Server (DNS) is the Achilles heel of the Web.
The important thing is that it’s managed responsibly.”
-Tim Berners-Lee

1.10.1 Basic idea

People: many identifiers:

SSN, name, passport #, etc.,

Internet hosts, routers:

IP address (32 bit for IPv4) - used for addressing datagrams
Easy to remember “name”, e.g., https://www.yahoo.com - used by humans

The big questions:

How to securely and fairly map between IP address and name, and vice versa ?
- What happens if you want a name someone else has?
- What happens if someone wants the name you have?
- Is the entity with the name really the remote entity?
- Are security and fairness opposed?
- Are both opposed to policing the space, to censorship?

1.10.2 DNS: the problem

(double-meaning intended)

In the early days of the Internet, there were only a few hosts (mainly minicomputers) connected to the network.
The most popular applications were remote login and file transfer.
By 1983, there were already five hundred hosts attached to the Internet (whoa!).
Each of these hosts were identified by a unique IPv4 address.
Forcing human users to remember the IPv4 addresses of the remote hosts that they want to use was not user-friendly.
Human users find it easier to remember names.
Using names or aliases for addresses or operations is a common technique in computing.
It simplifies the development of applications and allows the developer to ignore the low level details.
For example, by using a programming language instead of writing machine code, a developer can write software without knowing whether the variables that it uses are stored in memory or inside registers.
One identifier for a networked computer host is its hostname.
Hostnames can operate on a LAN, or the WAN (internet).
Hostnames, such as cnn.com, yahoo.com, gaia.cs.umass.edu, and cis.poly.edu, are mnemonic and are therefore appreciated by humans.
However, hostnames provide little, if any, information about the location within the Internet of the host.
Hosts are also identified by so-called IP addresses, like : 121.7.106.83
An IP address consists of four bytes and has a rigid hierarchical structure.
Each period separates one of the bytes expressed in decimal notation from 0 to 255.
An IP address is hierarchical; as we scan the address from left to right, we obtain more and more specific information about where the host is located in the Internet (that is, within which network, in the network of networks).

1.10.3 Basics of DNS: host aliasing

Web browser example:

User machine runs the client side of the DNS application.
A web browser extracts the hostname, <https://www.someschool.edu>, from a URL entered by the user, and passes the hostname to the client side of the DNS application.
The local DNS client sends a query containing the hostname to a somewhat-remote DNS server (or a chain of such servers).
The DNS client eventually receives a reply, which includes the IP address for the hostname.
Once the web browser receives the IP address from DNS, it can initiate a TCP connection to the HTTP server process located at port 80 at that IP address.

Domain Name System:

is a distributed database implemented in a hierarchy of many name servers
application-layer protocol:
- hosts, name servers communicate to resolve names (address/name translation)
- clients request name/address bindings
This is a core Internet function.
- an example of keeping the complexity at network’s “edge”, where it can be maintained.

1.10.4 DNS: services, structure

DNS services
- hostname to IP address translation
- host aliasing
- canonical, alias names
- mail server aliasing
- load distribution
- replicated Web servers: many IP addresses correspond to one name

1.10.5 History of DNS

Hosts.txt on ARPANET
- The hosts.txt file is not maintained anymore. A historical snapshot retrieved on April 15th, 1984 is available from:
- 02-Application/hosts.txt (my copy if this snapshot).
- http://ftp.univie.ac.at/netinfo/netinfo/hosts.txt
- https://web.archive.org/web/20140130121347/ftp.univie.ac.at/netinfo/netinfo/hosts.txt
single point of failure
traffic volume
distant centralized database
maintenance
Does not scale that well!
Some remnants exist, show:
/etc/hosts

Discussion Question:
What are several reasons an entity might want to steal a network name?
Would you guess that all such purposes bad?

1.10.6 Partial hierarchy of DNS servers

02-Application/app_layer18.png
DNS is just a pyramid scheme…

client wants IP for <https://www.amazon.com>; 1st approximation:

client queries public root server (directly or indirectly) to find .com DNS server
client queries .com DNS server to get amazon.com’s public-facing DNS server
client queries amazon.com’s public-facing DNS server to get IP address for <https://www.amazon.com>

1.10.6.1 Root DNS servers

https://en.wikipedia.org/wiki/Root_name_server
https://en.wikipedia.org/wiki/DNS_root_zone
02-Application/app_layer19.png

root servers are contacted by local name server that can not resolve name
root contacts authoritative name server if name mapping not known
- gets mapping
- returns mapping to local name server

1.10.6.2 Top-level domain (TLD) servers

https://en.wikipedia.org/wiki/Top-level_domain

responsible for com, org, net, edu, aero, jobs, museums, and all top-level country domains, e.g.: uk, fr, ca, jp
Network Solutions maintains servers for .com TLD
Educause for .edu TLD

If https://mst.eu is available…
What fun things could we do with that…?

Ask: How can one “be” an EU resident on the internet?

1.10.6.3 Authoritative DNS servers

organization’s own local out-facing DNS server(s), providing authoritative hostname to IP mappings for organization’s named hosts
can be maintained by organization or service provider
Note: these out-facing servers are different than in-facing resolvers (more later)

1.10.6.4 Local DNS name server / Resolver

The client side OS-provided API for DNS is called a DNS resolver.
does not strictly belong to a hierarchical structure
each ISP (residential ISP, company, university) usually has one in-facing DNS resolver
- also called “default name server”
when a host on the local network sends a DNS query outward, query is sent to its local in-facing DNS server
- has local cache of recent name-to-address translation pairs (but may be out of date!)
- acts as proxy, forwards query into outer/upper hierarchy
A resolver is responsible for initiating and sequencing the queries that ultimately lead to a full resolution (translation) of the resource sought, e.g., translation of a domain name into an IP address.
DNS resolvers are classified by a variety of query methods, such as recursive, non-recursive, and iterative.
A resolution process may use a combination of these methods.

1.10.7 DNS query example

Show: some Wireshark observations of nslookup for various types of record (overview this time, more detail again lower).

Visit https://mst.edu with web browser
Make a manual query using command line tools

#!/bin/bash

nslookup mst.edu
nslookup www.mst.edu

dig mst.edu
dig www.mst.edu

whois mst.edu
whois icann.org

# What are the authoritative servers?
nslookup -type=NS mst.edu

# What do the authoritative servers say?
nslookup mst.edu ns-1.mst.edu

Use python to query the same

#!/usr/bin/python3

import socket

print(socket.gethostbyname('mst.edu'))

https://en.wikipedia.org/wiki/WHOIS
WHOIS going to tell us a Domain Name joke?

1.10.8 DNS server interaction

Standard iterated query
Some host at cis.poly.edu wants IP address for gaia.cs.umass.edu
02-Application/app_layer20.png
Iterated query:
Contacted server replies with name of server to contact.
“I don’t know this name,
but ask this other server who is responsible for knowing,
or is responsible for asking some server that is.”

Recursive queries
02-Application/app_layer21.png
Recursive query:
Puts burden of name resolution on contacted name server.
Heavy load at upper levels of hierarchy?

Q:
Can one’s own machine just do the query to root, TLD, and authoritative?
Why bother with the institutional resolver?

A:
Yes, if you set up your own DNS server (easy).
Just install bind, and configure it.
It’s just extra functionality not built into every client.

++++++++++++++ Cahoot-02-7

DNS caching, updating records
02-Application/DNS_Architecture.png

once (any) name server / resolver learns mapping, it caches the mapping for a while
- cache entries timeout (disappear) after some Time To Live (TTL)
- TLD server information is typically cached in local name servers
  - thus root name servers visited less
cached entries may be out-of-date (best effort name-to-address translation!)
- if name host changes IP address, may not be known Internet-wide until all TTLs expire
update/notify mechanisms proposed IETF standard
- RFC 2136

DNS is at the root of many internet problems…

1.10.9 DNS protocol, message format

https://en.wikipedia.org/wiki/Domain_Name_System#DNS_message_format

Query and reply messages, both with same overall message format

Message header

identification:
- 16 bit number for query,
- reply to query uses same number as query
flags:
- query or reply
- recursion desired
- recursion available
- reply is authoritative

The header of DNS messages is composed of 12 bytes and its structure is shown in the figure below.

QR
- Indicates if the message is a query (0) or a reply (1)
- 1 byte
OPCODE
- The type can be
  - QUERY (standard query, 0),
  - IQUERY (inverse query, 1), or
  - STATUS (server status request, 2)
- 4 bytes
AA
- Authoritative Answer, in a response, indicates if the DNS server is authoritative for the queried hostname
- 1 byte
TC
- TrunCation, indicates that this message was truncated due to excessive length
- 1 byte
RD
- Recursion Desired, indicates if the client wants a recursive query
- 1 byte
RA
- Recursion Available, in a response, indicates if the replying DNS server supports recursion
- 1 byte
Z
- Zero, reserved for future use
- 3 byte
RCODE
- Response code, can be
  - NOERROR (0),
  - FORMERR(1, Format error),
  - SERVFAIL (2),
  - NXDOMAIN (3,
  - Non existent domain), etc.
- 4 bytes
  
  The ID (identifier) is a 16-bits random value chosen by the client.
  When a client sends a question to a DNS server, it remembers the question and its identifier.
  When a server returns an answer, it returns in the ID field, the identifier chosen by the client.
  Thanks to this identifier, the client can match the received answer with the question that it sent.

The QR flag is set to 0 in DNS queries and 1 in DNS answers.

The Opcode is used to specify the type of query.
For instance, a standard query is when a client sends a name, and the server returns the corresponding data.
An update request is when the client sends a name, and new data, and the server then updates its database.

The AA bit is set, when the server that sent the response has authority for the domain name found in the question section.
In the original DNS deployments, two types of servers were considered : authoritative servers and non-authoritative servers.
The authoritative servers are managed by the system administrators responsible for a given domain.
They always store the most recent information about a domain.
Non-authoritative servers are servers or resolvers that store DNS information about external domains without being managed by the owners of a domain.
They may thus provide answers that are out of date.
From a security point of view, the authoritative bit is not an absolute indication about the validity of an answer.

Ask: Is this secure?
It uses UDP; what does this imply?

Where TC is set, the partial RRSet that would not completely fit may be left in the response.
When a DNS client receives a reply with TC set, it should ignore that response, and query again, using a mechanism, such as a TCP connection, that will permit larger replies.

The RD (recursion desired) bit is set by a client when it sends a query to a resolver.
Such a query is said to be recursive because the resolver will recurse through the DNS hierarchy to retrieve the answer on behalf of the client.
In the past, all resolvers were configured to perform recursive queries on behalf of any Internet host.
However, this exposes the resolvers to several security risks.
The simplest one is that the resolver could become overloaded by having too many recursive queries to process.
As of this writing, most resolvers only allow recursive queries from clients belonging to their company or network and discard all other recursive queries.

The RA bit indicates whether the server supports recursion.

The RCODE is used to distinguish between different types of errors. See RFC 1035 for additional details.

The last four fields indicate the size of the Question, Answer, Authority and Additional sections of the DNS message.

1.10.10 DNS records (RR)

The last four sections of the DNS message contain Resource Records (RR).
All RRs have the same top level format shown in the figure below.
DNS: distributed database storing resource records (RR)
RR format: (name, value, type, ttl)
NAME
- Name of the node to which this record pertains
- Variable bytes
TYPE
- Type of RR in numeric form (A, AAAA, MX, TXT, etc.) (e.g., 15 for MX RRs)
- 2 bytes
CLASS
- Class code
- 2 bytes
TTL
- Count of seconds that the RR stays valid (The maximum is 231−1, which is about 68 years…)
- 4 bytes
RDLENGTH
- Length of RDATA field (specified in octets)
- 2 bytes
RDATA
- Additional RR-specific data
- Variable bytes, as per RDLENGTH

Name indicates the name of the node to which this resource record pertains.

The two bytes Type field indicates the type of resource record.

The Class field was used to support the utilization of the DNS in other environments than the Internet.

The TTL field indicates the lifetime of the Resource Record in seconds.
This field is set by the server that returns an answer and indicates for how long a client or a resolver can store the Resource Record inside its cache.
A long TTL indicates a stable RR.
Some companies use short TTL values for mobile hosts and also for popular servers.
For example, a web hosting company that wants to spread the load over a pool of hundred servers can configure its nameservers to return different answers to different clients.
If each answer has a small TTL, the clients will be forced to send DNS queries regularly.
The nameserver will reply to these queries by supplying the address of the less loaded server.

The RDLength field is the length of the RData field that contains the information of the type specified in the Type field.

Several types of DNS RR are used in practice.

The A type is used to encode the IPv4 address that corresponds to the specified name.
The AAAA type is used to encode the IPv6 address that corresponds to the specified name.
A NS record contains the name of the DNS server that is responsible for a given domain.

type=A

name is hostname
value is IP address

type=NS

name is domain (e.g., foo.com)
value is hostname of authoritative name server for this domain

type=CNAME

name is alias name for some “canonical” (the real) name
https://www.ibm.com is really <https://servereast.backup2.ibm.com>
value is canonical name
CNAME (or canonical names) are used to define aliases.
- For example <https://www.example.com> could be a CNAME for pc12.example.com that is the actual name of the server on which the web server for <https://www.example.com> runs.

type=MX

value is name of mailserver associated with name

There are more record types (summary of commonly used):
https://en.wikipedia.org/wiki/List_of_DNS_record_types

Address Mapping record (A Record) also known as a DNS host record, stores a hostname and its corresponding IPv4 address.
IP Version 6 Address record (AAAA Record) stores a hostname and its corresponding IPv6 address.
Canonical Name record (CNAME Record) can be used to alias a hostname to another hostname. When a DNS client requests a record that contains a CNAME, which points to another hostname, the DNS resolution process is repeated with the new hostname.
Mail exchanger record (MX Record) specifies an SMTP email server for the domain, used to route outgoing emails to an email server.
Name Server records (NS Record) specifies that a DNS Zone, such as “example.com” is delegated to a specific Authoritative Name Server, and provides the address of the name server.
Reverse-lookup Pointer records (PTR Record) allows a DNS resolver to provide an IP address and receive a hostname (reverse DNS lookup).
Certificate record (CERT Record) stores encryption certificates PKIX, SPKI, PGP, and so on.
Service Location (SRV Record) a service location record, like MX but for other communication protocols.
Text Record (TXT Record) typically carries machine-readable data such as opportunistic encryption, sender policy framework, DKIM, DMARC, etc.
Start of Authority (SOA Record) this record appears at the beginning of a DNS zone file, and indicates the Authoritative Name Server for the current DNS zone, contact details for the domain administrator, domain serial number, and information on how frequently DNS information for this zone should be refreshed.

02-Application/dns_iterative_resolve.png

+++++++++++++++++ Cahoot-02-8

1.10.11 Wireshark protocol details

Show: some Wireshark observations of nslookup for various types of record, this time in detail about the fields.

Visit https://mst.edu with web browser
Make a manual query using command line tools

#!/bin/bash

nslookup mst.edu
nslookup www.mst.edu

dig mst.edu
dig www.mst.edu

whois mst.edu
whois icann.org

# What are the authoritative servers?
nslookup -type=NS mst.edu

# What do the authoritative servers say?
nslookup mst.edu ns-1.mst.edu

Use python to query the same

#!/usr/bin/python3

import socket

print(socket.gethostbyname('mst.edu'))

1.10.12 Inserting records into DNS

example: new startup “Network Utopia”
register name networkuptopia.com at low-level DNS registrar (e.g., Network Solutions)
- provide the registrar with names, IP addresses of authoritative name server (primary and secondary)
- low level registrar inserts two RRs into .com TLD server:
  - (networkutopia.com, dns1.networkutopia.com, NS)
  - (dns1.networkutopia.com, 212.212.212.1, A)
create authoritative server type A record for <https://www.networkuptopia.com>;
- type MX record for <networkutopia.com>

02-Application/dns_lookup_v_registration.png

1.10.13 Reverse DNS

https://en.wikipedia.org/wiki/Reverse_DNS_lookup

The DNS is mainly used to find the IP address that correspond to a given name.
However, it is sometimes useful to obtain the name that corresponds to an IP address.
This done by using the PTR (pointer) RR.
The RData part of a PTR RR contains the name while the Name part of the RR contains the IP address encoded in the in-addr.arpa domain.
IPv4 addresses are encoded in the in-addr.arpa by reversing the four digits that compose the dotted decimal representation of the address.
For example, consider IPv4 address 192.0.2.11.
The hostname associated to this address can be found by requesting the PTR RR that corresponds to 11.2.0.192.in-addr.arpa.
A similar solution is used to support IPv6 addresses, see RFC 3596.

1.10.14 Attacks on DNS

https://en.wikipedia.org/wiki/Domain_Name_System#Security_issues
https://en.wikipedia.org/wiki/Domain_Name_System#Privacy_and_tracking_issues
02-Application/dns_abuse.jpg

1.10.14.1 Specific attacks

DDoS bandwidth-flooding attack
An attacker could attempt to send to each DNS root server a deluge of packets,
so many that the majority of legitimate DNS queries never get answered.
Bombard root servers with traffic.
This has not really been successful to date.

Defenses include:
Traffic filtering.
Local DNS servers cache IPs of TLD servers,
allowing root server bypass.

Bombarding TLD servers is potentially more dangerous.
02-Application/dns_flood.jpg

Man-in-the-middle attack
The attacker intercepts queries from hosts and returns bogus replies.
https://en.wikipedia.org/wiki/DNS_hijacking
(show in class)
02-Application/dns_spoof.jpg

DNS poisoning attack
The attacker sends bogus replies to a DNS server,
who is making outgoing requests itself,
tricking the server into accepting bogus records into its cache.
Send bogus replies to DNS server, which caches
https://en.wikipedia.org/wiki/DNS_spoofing
(show in class)
02-Application/dns_poison.jpg

DNS redirection
Another important DNS attack is not an attack on the DNS service, per se,
but instead exploits the DNS infrastructure,
to launch a DDoS attack against a targeted host.
Attacker sends DNS queries to many authoritative DNS servers,
with each query having the spoofed source address of the targeted host.
The DNS servers then send their replies directly to the targeted host.

Exploit DNS for DDoS:
send queries with spoofed source address and target IP.
This often requires amplification

DNS as exfiltration / infiltration / tunneling
One can sneak data through DNS requests or replies.
02-Application/dns_tunneling.jpg

+++++++++++++++++ Cahoot-02-9

1.10.15 DNS protections

Ways to avoid those attacks:

Just encrypt the connections to the server:
https://en.wikipedia.org/wiki/DNS_over_HTTPS
https://en.wikipedia.org/wiki/DNS_over_TLS

Tor/VPN/Proxy (privacy, but also some security).

Cryptographic signatures on DNS messages
https://en.wikipedia.org/wiki/Domain_Name_System_Security_Extensions
https://en.wikipedia.org/wiki/DNS-based_Authentication_of_Named_Entities
https://en.wikipedia.org/wiki/DNS_Certification_Authority_Authorization

1.10.16 Who controls DNS?

The Lord of the DNS

One DNS to rule them all,
One DNS to find them,
One DNS to bring them all,
and in the darkness bind them…
02-Application/sauron.webp
(i.e., a big boring Sauron committee…)
https://en.wikipedia.org/wiki/ICANN
https://en.wikipedia.org/wiki/ICANN#Criticism

https://en.wikipedia.org/wiki/Internet_Assigned_Numbers_Authority

https://en.wikipedia.org/wiki/OpenNIC
(permitted to be an open alternative)
02-Application/dns_overview.png

+++++++++++++++++ Cahoot-02-10

1.11 Alternative name systems

Fellowship of the DNS…

Fair, robust, distributed, decentralized, non-exploitable name resolution,
is a bit of a:
https://en.wikipedia.org/wiki/Catch-22_(logic)
and a real difficult problem to solve…

Discussion question:
What might a reliable distributed solution look like?
Might they come with their own exploits and problems?
Might a p2p system end up even more dictatorially problematic than DNS?
(e.g., Mr. Robot’s Evil Corp cryptocurrency)?

GNU name system
https://gnunet.org/gns
https://lsd.gnunet.org/lsd0001/
https://news.ycombinator.com/item?id=30154830
(discuss proposal to replace DNS!)
ICANN
https://icann.zoom.us/rec/play/znYwyZWPwrNraKqiZCLwOkHp_NITBj0QdhMpIrZPTrJumDRxIaecB8DHAygsgO-8PxQKkYx5ESGj6pBl.vZAWJHZoGeNyX9R4?startTime=1572978711000&_x_zm_rtaid=M4Wj53e3QXyaUK9nI6hiQg.1644387258044.8569edd15b9c2bafee5b5a283ad9fa90&_x_zm_rhtaid=108
of using GNUnet instead of DNS

I2P web-of-trust name system
https://geti2p.net/en/docs/naming
(web of trust based)

Crypto-currency-based
https://ens.domains/
https://docs.ens.domains/en/latest/introduction.html

https://www.namecoin.org/
https://en.wikipedia.org/wiki/Namecoin

1.12 Hosting a site

Do you need to buy a name to host a site on clearnet?
Do you need to buy an static IP to host a site on clearnet,
or does a dynamic IP suffice?
What about dynamic DNS?
https://en.wikipedia.org/wiki/Dynamic_DNS
Do you need to buy act actual machine? A virtual one?
Do you need to buy an HTTPS certificate?
Do you need to buy anything else?

What about overlay layers or darknets for simple free hosting?
https://en.wikipedia.org/wiki/I2P
https://en.wikipedia.org/wiki/Tor_(anonymity_network)
Can one circumvent DNS editing as a censorship technique?
Can one block sites at all with common darknets?

What is the easiest way to set up an independent site on your own hardware, or a VPS you rent?
Static websites:
https://onionshare.org/
http://lldan5gahapx5k7iafb3s4ikijc4ni7gx5iywdflkba5y2ezyg6sjgyd.onion/

Demo this one in class, with Wireshark

sudo dnf install tor
pip3 install --upgrade onionshare-cli --user
echo "cool publically accessible website" >index.html
onionshare-cli --website --public index.html

You could even host a website like this on your phone,
in under 10 minutes:
https://onionshare.org/mobile/#download
Anywhere that had an internet connection,
you could leave your phone plugged in an host a website there…

https://medium.com/axon-technologies/hosting-anonymous-website-on-tor-network-3a82394d7a01
Interactive backend easily possible with tor process and Apache.

https://geti2p.net/en/faq#myI2P%20Site

1.13 P2P

Today:
Theoretical difficulties with P2P and their solutions (general).
An overview of protocols and services provided by P2P overlay applications (general).
High level protocol specification for an example P2P application (BitTorrent).

1.13.1 A variety of protocols

There are many P2P protocols.
BitTorrent is just one we will review today.

1.13.2 P2P vs. Client server

No always-on server exists.
Arbitrary end systems directly communicate.
Peers are intermittently connected, and change IP addresses.

Examples:

File distribution (BitTorrent)
Streaming (KanKan)
VoIP (Skype - partial, funny story of history)

File distribution problem: Client server vs. P2P

Upload/download capacity is limited resource!

Question:
How much time to distribute file (size F),
from one server, to N peers?
02-Application/app_layer23.png

1.13.2.1 Client-server time to distribute file

Let’s first determine the distribution time for the client-server architecture,
which we denote by D_cs. In the client-server architecture, none of the peers aids in
distributing the file. We make the following observations:

The server must transmit one copy of the file to each of the N peers.
Thus, the server must transmit N * F bits.
Since the server’s upload rate is u_s,
the time to distribute the file must be at least (N * F) / u_s

Server transmission
Must sequentially send (upload) a number (N) of file (F) copies:

Server upload rate:
u_s

Time to send one copy:
F / u_s

\[ time = \frac{bits}{\frac{bits}{seconds}} \]

\[ time = bits * \frac{seconds}{bits} \]

Time to send N copies:
(N * F) / u_s

Let d min denote the download rate of the peer with the lowest download rate,
that is, d_min = min{d₁, d₂, …, d~N}.
The peer with the lowest download rate,
cannot obtain all F bits of the file in less than F / d_min seconds.
Thus the minimum distribution time is at least F / d_min
That however, will almost never be the real time,
as the server must distribute to many peers.

Client: each client must download file copy.

d_min = min client download rate
min client download time: F / d_min

Time to distribute F to N clients using client-server approach:
D_cs > max{ (N * F) / u_s, F / d_min }
Max numerator increases linearly with N.

Question: how much time to distribute file (size F) from one server to N peers?
02-Application/app_layer23.png

1.13.2.2 P2P time to distribute file

At the beginning of the distribution, only the server has the file.
To get this file into the community of peers, the server must send each bit of the file at least once into its access link.
Thus, the minimum distribution time is at least F / u_s
Unlike the client-server scheme, a bit sent once by the server may not have to be sent by the server again, as the peers may redistribute the bit among themselves.

Server transmission
Must upload at least one copy.
Time to send one copy:
F / u_s

As with the client-server architecture, the peer with the lowest download rate,
cannot obtain all F bits of the file in less than F / d_min seconds.
Thus the minimum distribution time is at least F / d_min
Unlike with the client-server model, with p2p,
this could actually (and often is) the server’s bandwidth contribution.

Client: each client must download file copy.
Min client download time: F / d_min

The total upload capacity of the system as a whole,
is equal to the upload rate of the server,
plus the upload rates of each of the individual peers, that is:
u_total = u_s + u₁ + … + u_N
The system must deliver (upload) F bits to each of the N peers,
thus delivering a total of N * F bits.
This cannot be done at a rate faster than u_total.
Thus, the minimum distribution time is also at least:
(N * F) / (u_s + u₁ + … + u_N).

Clients: as aggregate, each individual (i) must download N * F bits
Max upload rate (limiting max download rate) is u_s + sum(u_i)

Time to distribute F to N clients using P2P approach:
D_P2P > max{ F / u_s, F / d_min, (N * F) / (u_s + sum(u_i)) }
Max numerator increases with N
But, so does the denominator,
since each peer provides service capacity

1.13.2.3 Distribution time for P2P vs. Client-server

02-Application/app_layer24.png
Net client upload rate = u
F / u = 1 hour
u_s = 10u
d_min >= u_s

P2P vs Client server
For the P2P architecture the minimal distribution time is always lesser,
compared to the distribution time of the client-server architecture.
It is also less than a fixed duration, above some number of peers N!
Applications with the P2P architecture can be self-scaling.
This scalability is a direct consequence of peers being re-distributors,
as well as consumers of bits.

1.13.3 BitTorrent overview

Standard protocol,
many clients (Vuze, BigglyBt, I2P-Snark, Bittorrent-official, etc., ),
and versions of tracker software (some server-based trackers).

File divided into 256Kb chunks (or other equal size).
Peers in torrent send/receive file chunks.
02-Application/app_layer25.png
Tracker:
Tracks peers participating in torrent (or DHT).
Runs their own choice of tracker software.
Used to be only a server-side operation, now also can be P2P!

Torrent:
Meta-data and group of peers exchanging chunks of a file.

Client:
Uploads and downloads files.
Runs their own client torrent software.

Process:
Alice arrives, chooses a torrent, and using the torrent meta-data,
obtains a list of peers from tracker server (or distributed tracker),
and finally begins exchanging file chunks with peers in torrent.

++++++++++++++++ Cahoot-02-11

1.13.3.1 Overview of process

Peer joining torrent:

Rew peer has no chunks, but will accumulate them over time from other peers.
Registers with tracker (server or distributed) to get list of peers who have the torrent of interest, connects to subset of peers (“neighbors”).

After joining:

While downloading, peer uploads chunks to other peers.
Peer may change peers, with whom it exchanges chunks.

Churn:

Peers may come and go.
Once peer has entire file, it may (selfishly) leave or (altruistically) remain in torrent.

Requesting chunks:

At any given time, different peers have different subsets of file chunks.
Periodically, Alice asks each peer for list of chunks that they have.
Alice requests missing chunks from peers, rarest first.

Ask: Why rarest first?

Discussion question:
Why not just be a leech (download but not contribute)?
How might you design a protocol with incentives?
What might an incentive look like?
Should you build incentives into protocols?
Do people follow incentives?

1.13.3.2 BitTorrent participation incentive

How do we put a kink in the wires of those who don’t contribute enough,
slowing down their transfers,
to encourage every peer to reciprocate?

Sending chunks: tit-for-tat incentives
https://en.wikipedia.org/wiki/Tit_for_tat

Alice sends chunks to those four peers currently sending her chunks at highest rate, rewarding them with more data.
- Other peers are choked by Alice (do not receive chunks from her).
Re-evaluate top 4 every 10 secs
Every 30 secs: randomly select another peer, starts sending chunks
- “optimistically un-choke” this peer, in hope that the new peer reciprocates
- newly chosen peer may join top 4

Overview:
(1) Alice “optimistically un-chokes” a new participant, Bob, in hopes that reciprocates
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates
(3) Bob becomes one of Alice’s top-four providers

All this results in higher upload rate, finding better trading partners, and getting file faster !

Sharing is caring…

Due to risk or costs in internet speed or throughput,
individuals could potentially download, but not upload.

1.13.3.2.2 Solution 0: Participation incentives

An interesting read, game theory in software design and CompSci:
http://bittorrent.org/bittorrentecon.pdf

Alice gives priority to the neighbors that are currently supplying her data at the highest rate.
Alice continually measures the rate at which she receives bits and determines the four peers that are providing her bits at the highest rate.
She then reciprocates by sending chunks to these same four peers.
Every 10 seconds, she recalculates the rates and possibly modifies the set of four peers.
Every 30 seconds, she also picks one additional neighbor at random and sends it chunks, Bob
Because Alice is sending data to Bob, she may become one of Bob’s top four uploaders, in which case Bob would start to send data to Alice.
If the rate at which Bob sends data to Alice is high enough, Bob could then, in turn, become one of Alice’s top four uploaders.

1.13.4 Distributed Hash Table (DHT)

General (not BitTorrent specific)

++++++++++++++++ Cahoot-02-12

Hash table
DHT paradigm
Circular DHT and overlay networks
Peer churn
DHT
- https://en.wikipedia.org/wiki/Distributed_hash_table
- https://en.wikipedia.org/wiki/Kademlia

Review: dictionaries, maps, and hash tables

Simple database with (key, value) pairs:
key: human name;
value: social security number
02-Application/pasted_image.png

key = hash(original key)
02-Application/pasted_image001.png
02-Application/hash_table.png

O(1) complexity regardless of size of data.
Can store large sparse key-space in smaller array with constant access time

Note: There are potentially two distributed databases (or merged into one) in some p2p networks:
1. Routing table for overlay network peers, who are defined by their addresses
2. Database of torrents: addresses/peers

1.13.4.1 Distributed database

It’s easy to keep a database on a server,
but how do we increase the censorship resistance and robustness?

1.13.4.1.1 Problem 1: how to keep a database everywhere?

Distribute (key, value) pairs over millions of peers
- pairs are evenly distributed over peers
Any peer can query database with a key
- database returns value for the key
- To resolve query, small number of messages exchanged among peers
Each peer only knows about a small number of other peers
Robust to peers coming and going (churn)
How to keep a database of pairings between IP addresses and torrents without a central server?
Keys could be content names (e.g., names of books and software), and the value could be the IP address at which the content is stored; in this case, an example key-value pair is (ComputerNetworkingEssentials.pdf, 128.17.123.38)
Building such a database is straightforward with a client-server architecture that stores all the (key, value) pairs in one central server.

DHT
02-Application/DHT.png

1.13.4.1.2 Solution 1: distributed hash table (DHT)

n users
Each user identifier is an integer in the range [0, 2ⁿ - 1]
Hash the key (author/book name) into a number (mod 2ⁿ - 1)
The user that has the closest value after the hashed key stores the item
- rule: assign key-value pair to the peer that has the closest ID.
- convention: closest is the immediate successor of the key.
- e.g., ID space {0,1,2,3,…,63}
- suppose 8 peers: 1, 12, 13, 25, 32, 40, 48, 60
  - If key = 51, then assigned to peer 60
  - If key = 60, then assigned to peer 60
  - If key = 61, then assigned to peer 1

Problem

How to lookup which user is storing a particular hashed key?

Solution

each peer only aware of immediate successor and predecessor.
circular DHT (a)
- Only index forward neighbors
- number of messages is n/2; O(N) messages on average to resolve query, when there are N peers

Storing indices of more neighbors increases messaging efficiency, and increases storage overhead
- Each peer keeps track of IP addresses of predecessor, successor, short cuts.
- A balance of connections: space versus time
- DHT can be designed so that both the number of neighbors per peer as well as the average number of messages per query is O(log N), where N is the number of peers.

1.13.4.2 Peer churn

Peers come an go, and the network must adapt.
02-Application/app_layer26.png

1.13.4.2.1 Problem 2: peers turn over

Example, peer 5 abruptly leaves, or is disconnected

1.13.4.2.2 Solution 2: synchronization procedure

Handling peer churn:

peers may come and go (churn)
each peer knows address of its two successors
each peer periodically pings its
two successors to check aliveness
if immediate successor leaves, choose next successor as new immediate successor

Example: peer 5 abruptly leaves:

peer 4 detects peer 5’s departure; makes 8 its immediate successor
4 asks 8 who its immediate successor is; makes 8’s immediate successor its second successor.

1.13.5 BitTorrent protocols

https://en.wikipedia.org/wiki/Bittorrent
https://wiki.wireshark.org/BitTorrent
http://bittorrent.org/beps/bep_0000.html

1.13.5.0.1 Main protocol and torrent files

https://www.bittorrent.org/beps/bep_0003.html (show in class)
https://en.wikipedia.org/wiki/Torrent_file (show in class)
https://en.wikipedia.org/wiki/Magnet_URI_scheme
Show a real torrent file, map to specifications.
For example, https://ftp.qubes-os.org/iso/Qubes-R4.1.0-x86_64.torrent

1.13.5.0.2 DHT server-less tracker protocol extension

https://en.wikipedia.org/wiki/BitTorrent#Distributed_trackers
https://www.bittorrent.org/beps/bep_0005.html
Creates a tracker-server that is much more resistant to being taken down, and is more reliable.
BitTorrent now can use a “distributed sloppy hash table” (DHT) for storing peer contact information for “trackerless” torrents.
In effect, each peer becomes a tracker. The protocol is based on Kademila DHT and is implemented over UDP.
A “peer” is a client/server listening on a TCP port that implements the BitTorrent protocol.
A “node” is a client/server listening on a UDP port implementing the distributed hash table protocol.
The DHT is composed of nodes and stores the location of peers.
BitTorrent clients include a DHT node, which is used to contact other nodes in the DHT to get the location of peers to download from using the BitTorrent protocol.

1.13.5.0.3 Transport layer protocols used by BitTorrent

BitTorrent protocol: two main transport level choices

Option 1: BitTorrent started with using TCP as its transport protocol.

The well known TCP port for BitTorrent traffic is 6881-6889
and 6969 for the tracker port
Show with torrent download (bt-dht)

Option 2: UDP-based Micro Transport Protocol, called uTP.

https://www.bittorrent.org/beps/bep_0029.html
The motivation for uTP is for BitTorrent clients to not disrupt internet connections, while still utilizing the unused bandwidth fully.
When using regular TCP connections, BitTorrent quickly fills up the send buffer, adding multiple seconds delay to all interactive traffic.
More detail on this when we get to the details of TCP buffers (next major topic is transport layer).
Show with torrent download (bt-utp header):

0       4       8               16              24              32
+-------+-------+---------------+---------------+---------------+
| type  | ver   | extension     | connection_id                 |
+-------+-------+---------------+---------------+---------------+
| timestamp_microseconds                                        |
+---------------+---------------+---------------+---------------+
| timestamp_difference_microseconds                             |
+---------------+---------------+---------------+---------------+
| wnd_size                                                      |
+---------------+---------------+---------------+---------------+
| seq_nr                        | ack_nr                        |
+---------------+---------------+---------------+---------------+

1.13.5.0.4 BT video streaming (P2P)

An extension
Similar protocol to Bittorrent, but prioritizes chunks you need for viewing
A number of BT clients implemented non-official protocol additions first
- https://www.tribler.org/
- https://www.vuze.com/ (also the first to cross-seed with I2P!)
After, bittorrent official added the feature into the core protocol.
How does this compare to upcoming CDN video distribution?
Non-bittorrent hybrid P2P approaches (Xunlei KanKan) are still very efficient for network resources

1.13.5.1 BitTorrent Protocol

Show/demo: Wireshark downloading Linux ISO with transmission

1.13.6 P2P security

1.13.6.1 Problems:

DoS and DDoS attacks
- Centralized directory (like tracker-server)
- Query flooding
- DHT
MitM attacks
Worm/Malware propagation
Human/privacy/anonymity attacks
Rational attacks (i.e., leaching)
File poisoning (e.g., bad video distributed by media producer)
Sybil attack (assume many identities)
Eclipse attack (split network)
Network-level censoring/blocking of protocol (e.g., malicious ISPs, oppressive governments)

1.13.6.2 Solutions:

Encrypt traffic
- For example: https://en.wikipedia.org/wiki/BitTorrent_protocol_encryption
Improve core protocol
Randomization of peers
Anonymization
- https://en.wikipedia.org/wiki/Comparison_of_file_sharing_applications
- https://en.wikipedia.org/wiki/Anonymous_P2P
- https://geti2p.net/en/docs/applications/bittorrent (strong anonymization)
- https://www.tribler.org/ (newer, somewhat experimental)
- https://www.vuze.com/ now open source (strong via I2P / or mixed clearnet)
Pure P2P (no server)
Reputation management / web of trust
Distributed file store:
- IPFS: https://ipfs.io
- FreeNet: https://freenetproject.org/
- Some GNUnet services: https://www.gnunet.org/en/applications.html
- etc.,
Generalized overlay layers

Next: 03-Transport.html