1 02-Application


Previous: 01-Overview.html

1.1 Audio-recording

1.2 Opening thought

“Web users ultimately want to get at data quickly and easily.
They don’t care as much about attractive sites and pretty design.”

- Tim Berners-Lee

Is this assumption currently true?
Was it ever true?
Is it true for some people at least?

1.3 Reading

Overview, HTTP, email, FTP, DNS, P2P, Multimedia (CDN, P2P), Socket programming intro.

Reading:
https://www.computer-networking.info/1st/html/index.html (Part 2: The application layer)
https://www.binarytides.com/netcat-tutorial-for-beginners/ (instead of telnet!)
Kurose chapter 2

1.4 Overview

Principles of network applications
Web and HTTP
Electronic mail: SMTP, POP3, IMAP
DNS
P2P applications
Socket programming with UDP and TCP

1.5 Network applications

Classic text-based applications that became popular in the 1970s and 1980s:
text email, remote access to computers, file transfers, and newsgroups.
Killer application of the mid-1990s:
the World Wide Web, encompassing Web surfing, search, and electronic commerce.
Instant messaging and P2P file sharing, the two killer applications introduced at the end of the millennium.
Since 2000, voice-over-IP (VoIP), YouTube, Netflix, World of Warcraft, Facebook, and Twitter

End system communication at application layer
02-Application/app_layer00.png

Goal: write programs that:
run on (different) end systems,
communicate over network, e.g.,
web server software communicates with browser software

No need (or desire) to write software for network-core devices:
Network-core devices do not (and SHOULD not) run user applications!
Applications on end systems allows for rapid app development, propagation, and innovation.
This is changing somewhat, which could impede the development of new protocols.

Client-server versus peer-to-peer (P2P)
02-Application/app_layer01.png

1.5.1 Client-server architecture

02-Application/client-serrver.png

1.5.1.1 Server

Always-on host, called the server, which services requests from many other hosts, called clients.
Permanent addresses are either an IP address, or
overlay/hash address of some kind, in the case of IP-anonymizing networks
Web server services requests from browsers running on client hosts.
When a Web server receives a request for an object from a client host,
it responds by sending the requested object to the client host.
Now in data centers for scaling

1.5.1.2 Clients

Communicate with server.
May be intermittently connected.
May have dynamic IP addresses (or dynamically changing overlay addresses).
Do not usually communicate directly with each other (unless hybrid client-server/p2p with p2p).

1.5.2 P2P architecture

Minimal (or no) reliance on dedicated servers in data centers.
Direct communication between pairs of intermittently connected hosts, called peers.
The peers are not owned by the service provider,
but are instead desktops and laptops controlled by users,
with most of the peers residing in homes, universities, and offices.
Peers request service from other peers, provide service in return to other peers
Self-scalabile, since each peer adds service capacity to the system by distributing files to other peers.
Cost effective, since they normally don’t require significant server infrastructure and server bandwidth
(in contrast with clients-server designs with data centers).
Peers often exchange IP addresses (or overlay addresses,
e.g., onion, garlic routing and crypto addresses).
Complex management, implementation, debugging?

++++++++++++++ Cahoot-02-1

Discussion question:
If p2p works well, why has it not become the norm?
What are some other pros/cons of p2p architectures?

1.5.3 Process communication via sockets

This is a hint of the next layer, transport
02-Application/app_layer02.png

1.5.3.1 Processes

It’s not actually programs, but processes that communicate with each other.
A process is assigned by OS to a program that is running within an end system.
When processes are running on the same end system,
they can communicate with each other with inter-process communication (IPC).
Processes on two different end systems communicate with each other,
by exchanging messages across the computer network.
A sending process creates and sends messages into the network.
A receiving process receives these messages and possibly responds by sending messages back.
A process initiates the communication by initially contacting the other process at the beginning of the session.
The initiator labeled as the client.
The process that waits to be contacted to begin the session is nominally the server.

Client process: process that initiates communication.
Server process: process that waits to be contacted.

Aside: applications with P2P architectures have both client processes and server processes.

1.5.3.2 Sockets

Any message sent from one process to another must go through the underlying network.
A process sends messages into, and receives messages from, the network,
through an OS-defined software interface called a socket.
A socket is the interface between the application layer and the transport layer within a host OS.
It is also referred to as the Application Programming Interface (API) between the application and the network,
since the socket is the programming interface with which network applications are built.

1.5.3.3 Addresses

To receive messages, process must have identifier

Q: does the IP address of a host, on which a process runs, suffice for identifying the process?
A: no, many processes can be running on one host

In order for a process running on one host to send packets to a process running on another host,
the receiving process needs to have an address with both:
an address of the host OS itself (IP), and
an identifier that specifies the specific receiving process in the destination host OS (port).

Popular applications have been assigned specific port numbers, e.g.,
A Web server (using the HTTP protocol) is usually identified by port number 80.
A mail server process (using the SMTP protocol) is usually identified by port number 25.
An SSH server is usually identified on port number 22

For example, to send HTTP message to gaia.cs.umass.edu web server:

IP address: 128.119.245.12
port number: 80

Hosts are often identified by IP address:

IP address is a 32-bit quantity uniquely identifying the host, in ipv4.
Addresses are 128 bit for ipv6.
We ran out of ipv4 addresses…

The sending process must also identify the receiving process (more specifically, the receiving socket) running in the host.
This information is needed because in general a host could be running many network applications.
A destination port number serves this purpose.

++++++++++++++ Cahoot-02-2

1.5.4 Preview of transport services

Data integrity
Some apps (e.g., file transfer, web transactions) require 100% reliable data transfer.
Other apps (e.g., audio) can tolerate some loss.

Timing
Some apps (e.g., Internet telephony, interactive games) require low delay to be “effective”.
Some apps (e.g., file download) tolerate delay.

Throughput
Some apps (e.g., multimedia) require minimum amount of throughput to be “effective”.
Other apps (“elastic apps”) make use of whatever throughput they get.

Security
Encryption, data integrity, …

1.5.4.1 Multiple types of service:

Unreliable datagram (UDP)
02-Application/service.png

Byte-stream (TCP)
02-Application/stream.png

Service requirements for applications?
02-Application/app_layer03.png

Applications usually choose between TCP and UDP
TCP and UDP are transport layer.
Employed by most application layer programs.
Other Transport layer, or pseudo-transport layer protocols exist.
SCTP (stream control transmission protocol), SSU (I2P app), DCCP, RUDP, UDP-lite, etc.
An application designer could design their own transport layer protocol,
since Transport layer and up runs on end hosts, as opposed to network infrastructure.
Could build into the core/kernel of end operating systems and languages as new socket type.
Could also just design the features into the application layer,
rather than actually get a transport protocol built into the kernel of an OS.
UDP lets you build new things!

1.5.4.1.1 TCP

Connection-oriented service
TCP has the client and server exchange transport-layer control information with each other,
before the application-level messages begin to flow.
This so-called handshaking procedure alerts the client and server,
allowing them to prepare for an onslaught of packets.
After the handshaking phase,
a TCP connection is said to exist between the sockets of the two processes.
The connection is a full-duplex connection,
in that the two processes can both send messages to each other,
over the connection at the same time, bi-directionally
When the application finishes sending messages,
it must tear down the connection.

TCP has a Reliable data transfer service
The communicating processes can rely on TCP to deliver all data sent without error and in the proper order.

TCP also includes a congestion-control mechanism
The TCP congestion-control mechanism throttles a sending process (client or server),
when the network is congested between sender and receiver.

Summary
Reliable transport: between sending and receiving process.
Flow control: sender won’t overwhelm receiver.
Congestion control: throttle sender when network overloaded.
Does not provide: timing, minimum throughput guarantee, security.
Connection-oriented: setup required between client and server processes.

1.5.4.1.2 UDP

UDP is a no-frills, lightweight transport protocol, providing minimal services.
UDP is connectionless, so there is no handshaking before the two processes start to communicate.
UDP provides an unreliable data transfer service.
When a process sends a message into a UDP socket,
UDP provides no guarantee that the message will ever reach the receiving process.
Messages that do arrive at the receiving process may arrive out of order.
UDP does not include a congestion-control mechanism,
so the sending side of UDP can attempt to pump data into the layer below (the network layer) at any rate it pleases.

UDP service:
Provides: Unreliable data transfer between sending and receiving process
Does not provide: Reliability, flow control, congestion control, timing, throughput guarantee, security, or connection setup,

Discussion questions:
Why bother with UDP?
With TCP, why is there a UDP?

++++++++++++++ Cahoot-02-3

1.5.4.2 Encryption

https://en.wikipedia.org/wiki/Transport_Layer_Security

Neither base TCP nor UDP provide any encryption!

An Enhancement for TCP provides:
1. encryption,
2. data integrity, and
3. end-point authentication.

The great thing about a TLS joke,
is that you can tell if it’s not the original…

This security enhancement used to be called called Secure Sockets Layer (SSL).
Transport layer security (TLS) is just a newer version of SSL.
SSL was the name of now-defunct versions of what is now modern TLS.
TLS is not a third Internet transport protocol,
on the same level as TCP and UDP, but an enhancement of TCP, at the application layer.
Applications needs to include TLS code (existing libraries) in both the client and server sides of the application.
TLS has its own socket API that is similar to the traditional TCP socket API.
sending process passes cleartext data to the SSL socket;
TLS in the sending host then encrypts the data and passes the encrypted data to the TCP socket.
encrypted data travels over the Internet to the TCP socket in the receiving process.
receiving socket passes the encrypted data to SSL, which decrypts the data.
TLS passes the cleartext data through its SSL socket to the receiving process.
TLS socket API is like that of a TCP socket (just import and use).

Transport layer protocols used
02-Application/app_layer04.png

Tunneling:
Inner-most -> Outer-more… -> Outer-most
Application -> TLS -> TCP -> IP -> MAC -> Ethernet -> Physical

More detail here:
05-Security.html
../../Security/Content/12a-AppliedCryptoSystems.html

1.5.5 Application-layer protocols

An application-layer protocol defines how an application’s processes,
running on different end systems, pass messages to each other, for example:

The types of messages exchanged, for example, request messages and response messages.
E.g., request, response.

The syntax of the various message types, such as the fields in the message and how the fields are delineated.

The semantics of the fields.
meaning of the information in the fields.

Rules for determining when and how a process sends messages and responds to messages, and change state.

Open protocols:
Defined in RFCs.
Allows for interoperability.
e.g., HTTP, SMTP

Proprietary protocols:
e.g., Skype (used to be open, fun story)

1.6 Web and HTTP

https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol
https://tools.ietf.org/html/rfc7230 (HTTP, show this)
https://media.pearsoncmg.com/aw/ecs_kurose_compnetwork_7/cw/content/interactiveanimations/http-delay-estimation/index.html
https://www.computer-networking.info/1st/html/application/http.html
https://www.computer-networking.info/2nd/html/protocols/http.html

1.6.1 Web protocol example

Observe HTTP with Wireshark:

$ nc -C info.cern.ch 80
$ GET / HTTP/1.1
$ Host: info.cern.ch
$ ncat -C hackware.ru 80
$ GET / HTTP/1.0
$ Host: hackware.ru

Trace HTTP conversation in Wireshark
Observe each packet has headers from multiple layers

These must be typed exactly, or they will not work!
ncat -C option is for crlf: $ man ncat to read more

Encrypted option:

$ ncat -C --ssl hackware.ru 443
$ GET / HTTP/1.0
$ Host: hackware.ru

1.6.2 Web server and clients

HTTP: hypertext transfer protocol is the Web’s application layer protocol.

Client/Server model:
Client: Browser that requests, receives, (using HTTP protocol) and “displays” Web objects.
Server: Web server sends (using HTTP protocol) objects in response to requests.

02-Application/app_layer05.png
02-Application/httpreqresp.png

Web pages
A Web page (also called a document) consists of objects.
An object is simply a file such as an HTML file, a JPEG image, a Java applet (lol…),
or a video clip that is addressable by a single URL.
If a Web page contains HTML text and five JPEG images,
then the Web page has six objects: the base HTML file plus the five images.
The base HTML file references the other objects in the page with the objects’ URLs.

Each URL has two components:
the hostname of the server that houses the object and
the object’s path name.

For example, the URL:
http://www.someSchool.edu/someDepartment/picture.gif
has www.someSchool.edu for a hostname and
/someDepartment/picture.gif for a path name.

02-Application/webpage.png
http://www.w3.org/MarkUp/ defines the standard

HTTP/1 and HTTP/2 use TCP (not UDP).

The HTTP client first initiates a TCP connection with the server.
Once the connection is established, the browser and the server processes access TCP through their socket interfaces.
Server sends requested files to clients without storing any state information about the client, a stateless protocol.

Client initiates TCP connection (creates socket) to server, port 80 or 443.
server accepts TCP connection from client.
HTTP messages (application-layer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server).
TCP connection closed.

1.6.3 Persistence

HTTP is “stateless”.
Server maintains no information about past client requests as a part of the protocol itself.
It could do so in other ways, but that would not be part of HTTP.

Protocols that maintain “state” are complex!
Past history (state) must be maintained.
If server/client crashes, their views of “state” may be inconsistent, must be reconciled.

HTTP can choose either:
Each request/response pair sent over a separate TCP connection (non-persistent connections), or
all of the requests and their corresponding responses sent over the same TCP connection (persistent connections).

HTTP sequence
A base HTML file and 10 JPEG images, and that all 11 of these objects reside on the same server:
<http://www.someSchool.edu/someDepartment/home.index>

  1. HTTP client process initiates a TCP connection to the server https://www.someSchool.edu on port number 80,
    which is the default port number for HTTP.
    Associated with the TCP connection, there will be a socket at the client and a socket at the server.

  2. HTTP client sends an HTTP request message to the server via its socket.
    The request message includes the path name /someDepartment/home.index.

  3. HTTP server process receives the request message via its socket,
    retrieves the object /someDepartment/home.index from its storage (RAM or disk),
    encapsulates the object in an HTTP response message,
    and sends the response message to the client via its socket.

  4. HTTP server process tells TCP to close the TCP connection.
    But TCP doesn’t actually terminate the connection,
    until it knows for sure that the client has received the response message intact.

  5. HTTP client receives the response message.
    The TCP connection terminates.
    The message indicates that the encapsulated object is an HTML file.
    The client extracts the file from the response message, examines the HTML file,
    and finds references to the 10 JPEG objects.

  6. first four steps are then repeated for each of the referenced JPEG objects.

Time to fill a request
02-Application/app_layer06.png

Non-persistent HTTP
At most one object sent over TCP connection.
Connection then closed.
Downloading multiple objects required multiple connections.

Persistent HTTP
Multiple objects can be sent over single TCP connection between client, server

Disadvantages of non-persistent connections
First, a brand-new connection must be established and maintained for each requested object.
For each of these connections, TCP buffers must be allocated and TCP variables must be kept in both the client and server.
Each object suffers a delivery delay of two RTTs one RTT to establish the TCP connection and one RTT to request and receive an object.

RTT (definition): time for a small packet to travel from client to server and back

HTTP response time:
One RTT to initiate TCP connection,
one RTT for HTTP request and first few bytes of HTTP response to return,
plus the file transmission time.

Non-persistent HTTP response time:
2RTT + file transmission time.

Non-persistent HTTP issues:
Requires 2 RTTs per object
OS overhead for each TCP connection.
Browsers often open parallel TCP connections to fetch referenced objects.

Persistent connections
With persistent connections, the server leaves the TCP connection open after sending a response.
Subsequent requests and responses between the same client and server can be sent over the same connection.
Multiple Web pages residing on the same server can be sent from the server to the same client over a single persistent TCP connection.
Requests for objects can be made back-to-back, without waiting for replies to pending requests (pipelining).
Typically, the HTTP server closes a connection when it isn’t used for a certain time (a configurable timeout interval).

Persistent HTTP:
Server leaves connection open after sending response.
Subsequent HTTP messages between same client/server sent over open connection.
Client sends requests as soon as it encounters a referenced object.
As little as one RTT for all the referenced objects.

++++++++++++++ Cahoot-02-4

1.6.4 Message format

two types of HTTP messages:
1. request,
2. response

HTTP request message:
ASCII (human-readable format)

1.6.4.1 HTTP request message

GET /somedir/page.html HTTP/1.1
Host: www.mst.edu
Connection: close
User-agent: Mozilla/5.0
Accept-language: en

General request
02-Application/app_layer07.png
sp=space; cr=carriage return; lf=line feed

Example:

GET /index.html HTTP/1.1\r\n
Host: www-net.cs.umass.edu\r\n
User-Agent: Firefox/3.6.10\r\n
Accept: text/html,application/xhtml+xml\r\n
Accept-Language: en-us,en;q=0.5\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7\r\n
Keep-Alive: 115\r\n
Connection: keep-alive\r\n
\r\n

1.6.4.2 HTTP Response Message

HTTP/1.1 200 OK
Connection: close
Date: Tue, 09 Aug 2011 15:44:04 GMT
Server: Apache/2.2.3 (CentOS)
Last-Modified: Tue, 09 Aug 2011 15:11:03 GMT
Content-Length: 6821
Content-Type: text/html
(data data data data data ...
The entity body is the
meat of the message,
it contains the requested object itself)

General reply
02-Application/app_layer08.png
sp=space; cr=carriage return; lf=line feed

Example:

HTTP/1.1 200 OK\r\n
Date: Sun, 26 Sep 2010 20:09:20 GMT\r\n
Server: Apache/2.0.52 (CentOS)\r\n
Last-Modified: Tue, 30 Oct 2007 17:00:02 GMT\r\n
ETag: "17dc6-a5c-bf716880"\r\n
Accept-Ranges: bytes\r\n
Content-Length: 2652\r\n
Keep-Alive: timeout=10, max=100\r\n
Connection: Keep-Alive\r\n
Content-Type: text/html; charset=ISO-8859-1\r\n
\r\n
data data data data data ...

Server responses
Status code appears in 1st line in server-to-client response message.

The best thing about 404 jokes is …
wait, damnit, it’s around here somewhere…

418: I’m a teapot

A “real” joke built into the protocol
https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/418
https://en.wikipedia.org/wiki/Hyper_Text_Coffee_Pot_Control_Protocol
https://save418.com/

Old example: Open TCP connection, send GET request

telnet cis.poly.edu 80
GET /~ross/ HTTP/1.1
Host: cis.poly.edu

E.g.,
02-Application/Http_request_telnet.png

Note: ncat has generally replaced telnet, though they both still work

nc -C info.cern.ch 80
GET / HTTP/1.1
Host: info.cern.ch

1.6.4.3 Uploading form input

POST method:
Web page often includes form input.
Input is uploaded to server in entity body (i.e., message part of packet).

URL method:
Uses GET method.
Input is uploaded in URL field of request line:
www.somesite.com/animalsearch?monkeys&banana

We’ll demo this later!

1.6.5 Method types

HTTP/1.0:
GET
POST
HEAD (asks server to leave requested object out of response)

HTTP/1.1:
GET, POST, HEAD
PUT (uploads file in entity body to path specified in URL field)
DELETE (deletes file specified in the URL field)

Demonstrate:
More wireshark HTTP examples in detail
http://info.cern.ch/
via nc
via a browser that does not generate junk traffic:
epiphany or surf or qutebrowser
Record it in Wireshark
Identify HTTP headers, match them to fields

1.6.6 Cookies

Many Web sites use cookies.
Four components:
1) cookie header line of HTTP response message
2) cookie header line in next HTTP request message
3) cookie file kept on user’s host, managed by user’s browser
4) back-end database at Web site

Example:

Susan always access Internet from PC.
She visits specific e-commerce site for first time.
When initial HTTP requests arrives at site, site creates both a:
unique ID, and
entry in backend database for ID
02-Application/app_layer09.png

What cookies can be used for:
authorization,
shopping carts,
recommendations,
user session state (Web e-mail),
tracking.

How to keep “state”:
Protocol endpoints: maintain state at sender/receiver over multiple transactions.
Cookies: http messages carry state.

Cookies and privacy:
Cookies permit sites to learn a lot about you.
You may supply name and e-mail to sites.

1.6.7 Caching

Proxy server can cache web file data.
Goal: satisfy client request without involving origin server.

User sets browser: Web accesses via cache.
Browser sends all HTTP requests to cache.
If object in cache, then cache returns object.
Else cache requests object from origin server
then returns object to client.

02-Application/app_layer10.png

Bottleneck
02-Application/app_layer11.png

Caching helps bottleneck
02-Application/app_layer12.png

Demos
Briefly show (not link) $ Webserver.py
Run it, show visiting in browser <http://localhost:6789>

1.6.8 QUIC

Sometimes UDP is used simply because it allows new or experimental protocols to run entirely as user-space applications.
No kernel updates are required, as would be the case with TCP changes.
Google has created a protocol named QUIC (Quick UDP Internet Connections,
chromium.org/quic) in this category, rather specifically to support the HTTP protocol.
QUIC can in fact be viewed as a transport protocol specifically tailored to HTTPS:
HTTP plus TLS encryption.

Some interesting reading on QUIC (Google’s web protocol on top of UDP):
http://intronetworks.cs.luc.edu/current2/uhtml/udp.html#quic
https://en.wikipedia.org/wiki/QUIC
https://daniel.haxx.se/blog/2018/11/11/http-3/
https://blog.cloudflare.com/the-road-to-quic/
Both innovative, and breaks federated interoperability, which has pros and cons.

I received this HTTP 200 joke.
It was OK…

++++++++++++++ Cahoot-02-5

1.6.9 Demo first Wireshark lab on HTTP myself in class

Simple file
http://gaia.cs.umass.edu/wireshark-labs/HTTP-wireshark-file1.html
GET, OK, etc.

Refreshing a cached page
http://gaia.cs.umass.edu/wireshark-labs/HTTP-wireshark-file2.html
IF-MODIFIED-SINCE, refresh, re-sent?

Large file
http://gaia.cs.umass.edu/wireshark-labs/HTTP-wireshark-file3.html
Initial HTTP GET, TCP segments, how many HTTP OK, when?

Multiple-parts
http://gaia.cs.umass.edu/wireshark-labs/HTTP-wireshark-file4.html
Notice the image retrieval.
What entity is responsible for requesting the multiple objects in a page, when?

“Secure” web-page with login
http://gaia.cs.umass.edu/wireshark-labs/protected_pages/HTTP-wireshark-file5.html
username: wireshark-students
password: network
auth field of request

#!/usr/bin/python3

import base64
coded_string = "d2lyZXNoYXJrLXN0dWRlbnRz="
base64.b64decode(coded_string)
#!/bin/bash

echo -n d2lyZXNoYXJrLXN0dWRlbnRz= | base64 -d

Let’s think about http, privacy, and security in various scenarios:
https://www.eff.org/pages/tor-and-https

1.6.10 Reverse http proxy

https://en.wikipedia.org/wiki/Reverse_proxy
Reverse proxy allows multiple HTTP endpoints (via more than “Host:”),
at one IP/domain.
* [ ] expand on this maybe later.

1.6.11 CORS security in browser

https://en.wikipedia.org/wiki/Cross-origin_resource_sharing

1.7 Socket programming

Goal
Learn how to build client/server applications that communicate using sockets.

Socket:
A tunnel between application processes, in an end-to-end transport protocol

Two primary socket types for two transport services exist.
UDP is an unreliable, lightweight datagram service.
TCP is a reliable, heavier, byte-stream, connection oriented service.

1.7.1 Example

Application example we’ll put in code, in order:

Client inputs a line of characters (data) from the keyboard, and
sends the data to server.

Server receives the data,
converts the characters to uppercase, and
sends the modified data to client.

Client receives modified data, and
displays it as a printed line on the screen

1.7.1.1 UDP

UDP involves no persistent “connection” between a client and server.
No handshaking occurs before sending data.
A sender explicitly attaches a destination IP address and port number to each packet.
A receiver extracts the sender’s IP address and port number from each received packet.
Transmitted data may be lost.
Transmitted data may be received out-of-order.
UDP provides unreliable transfer of groups of bytes (“datagrams”) between client and server.

UDP socket code:
02-Application/socket_01_UDP_server.py
02-Application/socket_01_UDP_client.py

Demonstrate:
0. Run in background:
python3 socket_01_UDP_server.py

  1. Show Wireshark watching the client and server code:
    sudo wireshark &

  2. Connect with:

python3 socket_01_UDP_client.py
nc -uC 127.0.0.1 6789
man nc # ncat can send UDP packets too!

Show how nc or multiple python clients can block

02-Application/app_layer27.png

+++++++++++++++++++++++++++++++++ Cahoot-02-13

1.7.1.2 TCP

A server process must first be running.
The server must have created a TCP socket,
that welcomes a client’s contact.
A client contacts a server.
The client specifies an IP address and port number of a server process.
The client uses that address to create a TCP socket.
The client’s TCP socket establishes a connection to the server.
When contacted by a client on the welcoming socket,
the server’s TCP socket creates a secondary new socket,
for the server process to communicate with that particular client.
This allows server to talk with multiple clients.
Source port numbers distinguish different clients.
TCP provides a reliable, in-order, byte-stream transfer between a client and server.

TCP socket code:
Example 1:

02-Application/socket_02_TCP_server.py
02-Application/socket_02_TCP_client.py

Example 2:

02-Application/socket_02_TCP_server2.py
02-Application/socket_02_TCP_client2.py

Demonstrate:
0. Run a server:
python3 socket_02_TCP_server.py

  1. Show Wireshark watching the client and server code:
    sudo wireshark &

  2. Connect with

python socket_02_TCP_client.py
nc -C 127.0.0.1 6789
  1. show currently connected sockets with:
# new option
man ss
ss
# old, ss is better
man netstat
netstat -an
# another option
man lsof
lsof -i -n

02-Application/app_layer28.png
The term “port” is not the same idea or definition as the term “socket”.

Socket is an instance object dually created both:
within a requesting application, and
within an operating system for that requesting application.

Port is a designation dually configured both:
as field in the transport-layer headers, in actual packets, and
in the OS’s kernel networking core, and firewall configuration.
The OS’s kernel routes packets to the application.

Step by step:
02-Application/app_layer29.png

+++++++++++++++++++++++++++++++++ Cahoot-02-14

1.7.2 Concurrency in in python

https://realpython.com/python-concurrency/

1.7.2.1 Global interpreter lock

https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock
https://wiki.python.org/moin/GlobalInterpreterLock
https://realpython.com/python-gil/
Don’t like the GIL,
perhaps go with pypy:
https://www.pypy.org
https://realpython.com/pypy-faster-python/

You can use either:
multithreading
multiprocessing
asyncio

When should you use each?
multithreading to deal with simple blocking (no real speed up).
multiprocessing to run over multiple cores (speed up).
asyncio to deal more more complex or larger-scale needs (often blocking).

1.7.2.2 Multiprocessing

https://www.geeksforgeeks.org/multiprocessing-python-set-1/
https://www.geeksforgeeks.org/multiprocessing-python-set-2/

1.7.2.3 Multithreading

https://www.geeksforgeeks.org/multithreading-python-set-1/
https://www.geeksforgeeks.org/multithreading-in-python-set-2-synchronization/
https://realpython.com/intro-to-python-threading/

See my code now:
02-Application/thread_00_none.py
02-Application/thread_01_unrolled.py
02-Application/thread_02_fake.py
02-Application/thread_03_storage.py

Show multi-threaded examples now:
02-Application/socket_04_TCP_server_mt.py
02-Application/socket_04_TCP_client_mt.py

Now, nc does not block the server from other client’s:

python3 socket_04_TCP_server_mt.py &
nc -C 127.0.0.1 50002
python3 socket_04_TCP_server_mt.py

1.7.3 General socket functionality

https://docs.python.org/3/library/socket.html
https://realpython.com/python-sockets/

Let’s review some program-internal functions.

1.7.3.1 socket.socket.bind

>>> help(socket.socket.bind)
bind(...)
bind(address)
Bind the socket to a local address.
For IP sockets, the address is a pair (host, port);
the host must refer to the local host.
For raw packet sockets the address is a tuple
(ifname, proto [,pkttype [,hatype [,addr]]])

socket.socket.bind takes a tuple: (hostname or IP, port)
https://serverfault.com/questions/78048/whats-the-difference-between-ip-address-0-0-0-0-and-127-0-0-1
What are valid hostname or IP addresses to use?

The use of the term “local” above is ambiguous.
Q: What does it mean here, operationally?
A: That the IP address being bound is assigned to an interface managed by your operating system!
More to come on interfaces when we cover the network layer:
04-NetworkData.html

""
defaults to all traffic to the machine.
It is the same as 0.0.0.0 for IPv4.
It’s easier for IPv6.

0.0.0.0
which also listens to all traffic on the machine
(0.0.0.0 means various different things in different contexts).
https://www.rfc-editor.org/rfc/rfc1122#page-29 section 3.2.1.3
(a) { 0, 0 }
This host on this network.
MUST NOT be sent,
except as a source address as part of an initialization procedure,
by which the host learns its own IP address.
See also Section 3.3.6 for a non-standard use of {0,0}.
https://www.rfc-editor.org/rfc/rfc5735#section-3
0.0.0.0/8 - Addresses in this block refer to source hosts on “this” network.
Address 0.0.0.0/32 may be used as a source address for this host on this network;
other addresses within 0.0.0.0/8 may be used to refer to specified hosts on this network ([RFC1122], Section 3.2.1.3).
Despite the standard, 0.0.0.6 for example, won’t bind in python3.

<hostname>
https://docs.python.org/3/library/socket.html
If you use a hostname in the host portion of IPv4/v6 socket address,
the program may show a nondeterministic behavior,
as Python uses the first address returned from the DNS resolution.
The socket address will be resolved differently into an actual IPv4/v6 address,
depending on the results from DNS resolution and/or the host configuration.
For deterministic behavior use a numeric address in host portion.
On my Fedora machine, it resolves to 127.0.0.1.
Hostname is a shallow alias, implemented via checking: /etc/hosts

127.0.0.1 through 127.255.255.254 (CIDER notation: 127.0.0.0/8)
https://www.rfc-editor.org/rfc/rfc5735#section-3
127.0.0.0/8 - This block is assigned for use as the Internet host loopback address.
A datagram sent by a higher-level protocol to an address anywhere within this block loops back inside the host.
This is ordinarily implemented using only 127.0.0.1/32 for loopback.
As described in [RFC1122], Section 3.2.1.3,
addresses within the entire 127.0.0.0/8 block do not legitimately appear on any network anywhere.
Your local machine only.
You can use 127.0.0.4 (or whatever in the range),
but that socket will only be reachable on that IP.
Python’s sending socket defaults to 127.0.0.1 as the sending IP,
when sending to any localhost address.

A LAN-only IP address
10.0.0.0 - 10.255.255.255 (10.0.0.0/8 prefix)
172.16.0.0 - 172.31.255.255 (172.16.0.0/12 prefix)
192.168.0.0 - 192.168.255.255 (192.168.0.0/16 prefix)
https://datatracker.ietf.org/doc/html/rfc1918
These IP ranges are declared as LAN IPs,
as opposed to public, globally routable IPs,
or to localhost IPs, etc.
If you have an interface bound to an IP in this range,
then you could bind any of these.
If your interface in the OS is not bound to one,
then you can not bind the socket in python either.

A public, globally routable IP address
More-or-less anything not in the below list:
https://www.iana.org/assignments/iana-ipv4-special-registry/iana-ipv4-special-registry.xhtml
https://en.wikipedia.org/wiki/IPv4#Special-use_addresses
If you have an interface bound to an IP in this range,
then you could bind any of these.
If your interface in the OS is not bound to one,
then you can not bind the socket in python either.

Which should you choose?

If you’re debugging locally,
then use 127.0.0.1.

If you are lazy,
then use “” or 0.0.0.0

If you want more security,
then consider using a specific IP,
of an interface on your machine.

1.7.3.2 POSIX sockets

Below, we illustrate state diagrams for UDP and TCP sockets.
These are standard POSIX sockets,
also known as BSD or Berkeley sockets.
https://en.wikipedia.org/wiki/Berkeley_sockets
Many languages use similar BSD sockets to those in the C language.
Python’s also follow the below API.

Discussion question:
What is the value of having a POSIX standard?
What is the value of specifying the socket API itself as part of POSIX?
https://en.wikipedia.org/wiki/POSIX

1.7.3.2.1 UDP
02-Application/UDP-socket-programming.png
02-Application/UDP-socket1.png
02-Application/UDP-socket2.png
02-Application/UDP-socket3.png
02-Application/UDP-socket4.png
02-Application/UDP-sockets.jpg
1.7.3.2.2 TCP

The overview
02-Application/sockets-ethernet-interface.webp

The states:
02-Application/sockets-tcp-flow.webp

02-Application/socket_BSDflow.png
02-Application/socket_Berkeley_SOCKET.jpg
02-Application/socket_diag.png
02-Application/socket_review.gif

To think ahead to what we’re covering next,
TCP’s actual internal FSM is much more detailed than this!
These images below are just the high-level API.

1.8 FTP

File Transfer Protocol

https://en.wikipedia.org/wiki/File_Transfer_Protocol
https://tools.ietf.org/html/rfc2428

02-Application/app_layer13.png

FTP client contacts FTP server at port 21, using TCP.
Client authorized over control connection.
Client browses remote directory, sends commands over control connection.
When server receives file transfer command, server opens 2nd TCP data connection (for file) to client.
After transferring one file, server closes data connection.
Server opens another TCP data connection to transfer another file.
Control connection: “out of band”.
FTP server maintains “state”: current directory, earlier authentication.

FTP control and data connections
FTP uses two parallel TCP connections to transfer a file, a control connection and a data connection.
The control connection is used for sending control information between the two hosts,
information such as user identification, password, commands to change remote directory,
and commands to “put” and “get” files.
The data connection is used to actually send a file.
FTP is said to send its control information out-of-band.
HTTP sends request and response header lines into the same TCP connection that carries the transferred file itself,
named in-band.

02-Application/app_layer14.png

FTP sequence
When a user starts an FTP session with a remote host,
the client side of FTP (user) first initiates a control TCP connection with the server side (remote host) on server port number 21.
Client side of FTP sends the user identification and password over this control connection.
Client side of FTP also sends, over the control connection, commands to change the remote directory.
When the server side receives a command for a file transfer over the control connection (either to, or from, the remote host),
the server side initiates a TCP data connection to the client side.
FTP sends exactly one file over the data connection and then closes the data connection.
If, during the same session, the user wants to transfer another file,
then FTP opens another data connection.
Control connection remains open throughout the duration of the user session,
but a new data connection is created for each file transferred within a session
(data connections are non-persistent).

FTP requests
Commands, from client to server, and replies, from server to client, are sent across the control connection in 7-bit ASCII format.
In order to delineate successive commands, a carriage return and line feed end each command.
Each command consists of four uppercase ASCII characters, some with optional arguments:

USER username: Used to send the user identification to the server.

PASS password: Used to send the user password to the server.

LIST: Used to ask the server to send back a list of all the files in the current remote directory. The list of files is sent over a (new and non-persistent) data connection rather than the control TCP connection.

RETR filename: Used to retrieve (that is, get) a file from the current directory of the remote host. This command causes the remote host to initiate a data connection and to send the requested file over the data connection.

STOR filename: Used to store (that is, put) a file into the current directory of the remote host.

FTP replies Some typical replies, along with their possible messages, are as follows:

331 Username OK, password required
125 Data connection already open; transfer starting
425 Can’t open data connection
452 Error writing file

Demonstrate:
[ ] Find an open ftp site, watch connection with wireshark
With sftp, do we see any application layer protocol details with Wireshark?

1.9 E-mail

https://www.computer-networking.info/1st/html/application/email.html
https://www.computer-networking.info/2nd/html/protocols/email.html

https://tools.ietf.org/html/rfc5321 (SMTP)
https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol

https://tools.ietf.org/html/rfc1939 (POP3)
https://en.wikipedia.org/wiki/Post_Office_Protocol

https://tools.ietf.org/html/rfc3501 (IMAP)
https://en.wikipedia.org/wiki/Internet_Message_Access_Protocol

Three major components:
1. user agents
2. mail servers
3. simple mail transfer protocol: SMTP

User Agent
a.k.a. “mail reader”
For composing, editing, reading mail messages.
e.g., Thunderbird, K9, Outlook, Kmail, iPhone mail client, etc.
Outgoing and incoming messages can be stored on server.

Mail servers:
Mailbox contains incoming messages for user.
Message queue of outgoing (to be sent) mail messages.
SMTP protocol between mail servers to send email messages, with each entity:
client: sending mail server
server: receiving mail server

1.9.1 Email protocol example

Observe SMTP with wireshark (does any of this show in wireshark)

ncat -C smtp.zoho.com 587

Does any of the application layer information show in wireshark here?
ncat --ssl -C smtp.zoho.com 465
HELO web.site, MAIL FROM, RCPT TO, DATA, QUIT

Observe POP

ncat --ssl -C pop.zoho.com 995
user bob
pass password
list

1.9.2 Overview: SMTP, mail servers, mail user agents

02-Application/app_layer15.png

User-agent is local, but also remote.
User-agent used to be on remote machine.
Then, real mail user-agents were used.
Then user-agent moved back onto remote machines.

Mail server is a messy multi-part aggregate of things in software.

How many people still use a real, local, MUA?

1.9.3 SMTP

Electronic Mail: SMTP
[RFC 2821]

Alice sends a message to Bob
02-Application/app_layer16.png

Basic process

  1. Alice invokes her user agent for e-mail, provides Bob’s e-mail address (for example, bob@someschool.edu), composes a message, and instructs the user agent to send the message.
  2. Alice’s user agent sends the message to her mail server, where it is placed in a message queue.
  3. The client side of SMTP, running on Alice’s mail server, sees the message in the message queue. It opens a TCP connection to an SMTP server, running on Bob’s mail server.
  4. After some initial SMTP handshaking, the SMTP client sends Alice’s message into the TCP connection.
  5. At Bob’s mail server, the server side of SMTP receives the message. Bob’s mail server then places the message in Bob’s mailbox.
  6. Bob invokes his user agent to read the message at his convenience.

Example SMTP transcript
Hostname of the client is crepes.fr
Hostname of the server is server.edu

S: 220 server.edu
C: EHLO crepes.fr // a nicer HELO
S: 250 Hello crepes.fr, pleased to meet you
C: MAIL FROM: <alice@crepes.fr>
S: 250 alice@crepes.fr ... Sender ok
C: RCPT TO: <bob@server.edu>
S: 250 bob@server.edu ... Recipient ok
C: DATA
S: 354 Enter mail, end with "." on a line by itself
C: Do you like ketchup?
C: How about pickles?
C: .
S: 250 Message accepted for delivery
C: QUIT
S: 221 server.edu closing connection

Another SMTP example
02-Application/smtp.png

1.9.3.1 If login is required

base64 encoding is required for username and password:
https://en.wikipedia.org/wiki/Base64

c: AUTH LOGIN
s: 334 VXNlcm5hbWU6
c: yourusernameinb64encoding
s: 334 VXNlcm5hbWU6
c: yourpasswordinb64encoding

To get base64 encoding of a string:

# encode in bash
$ echo -n 'string' | base64

# decode in bash
$ echo -n c3RyaW5nCg== | base64 -d
# In python:
>>> import base64
>>> base64.b64encode('string'.encode())
>>> base64.b64decode('c3RyaW5n')

then, you can proceed sending:

C: MAIL FROM: <alice@crepes.fr>
S: 250 alice@crepes.fr ... Sender ok
C: RCPT TO: <bob@server.edu>
S: 250 bob@server.edu ... Recipient ok
C: DATA
S: 354 Enter mail, end with "." on a line by itself
C: Do you like ketchup?
C: How about pickles?
C: .
S: 250 Message accepted for delivery
C: QUIT
S: 221 server.edu closing connection

1.9.3.2 Notes

SMTP uses persistent connections
SMTP requires message (header and body) to be in 7-bit ASCII
SMTP server uses CRLF.CRLF to determine end of message

Comparison with HTTP:

1.9.4 Message format

Message header

From: alice@crepes.fr
To: bob@hamburger.edu
Subject: Searching for the meaning of life.

Show: Open an email in Mutt/raw to illustrate headers, MIME, multipart

1.9.5 Access protocols

Email protocols and direction of communication
When sent an email by Alice, how does a recipient like Bob, running a user agent on his local PC, obtain his messages, which are sitting in a mail server within Bob’s mail provider?
02-Application/app_layer17.png
* Post Office Protocol—Version 3 (POP3)
* Internet Mail Access Protocol (IMAP)
* HTTP

1.9.5.1 POP3

C: client
S: server
ncat mailServer 110
S: +OK POP3 server ready
C: user bob
S: +OK
C: pass hungry
S: +OK user successfully logged on
C: list
S: 1 498
S: 2 912
S: .
C: retr 1
S: (blah blah ...
S: .................
S: ..........blah)
S: .
C: dele 1
C: retr 2
S: (blah blah ...
S: .................
S: ..........blah)
S: .
C: dele 2
C: quit
S: +OK POP3 server signing off

Another POP3 example
02-Application/pop3.png

1.9.5.2 IMAP

++++++++++++++ Cahoot-02-6

1.10 Domain name resolution (DNS)

Discussion question:
At first guess, would you think the internet has a kill-switch, like it might in a Hollywood movie?
If it did, what might the consequences be?
On businesses?
On people?
In the USA?
In China?
In Russia?
In Kazakhstan?
etc.

02-Application/DNS-Server.png

“The Domain Name Server (DNS) is the Achilles heel of the Web.
The important thing is that it’s managed responsibly.”

-Tim Berners-Lee

1.10.1 Basic idea

People: many identifiers:

Internet hosts, routers:

The big questions:

1.10.2 DNS: the problem

(double-meaning intended)

1.10.3 Basics of DNS: host aliasing

Web browser example:

  1. User machine runs the client side of the DNS application.
  2. A web browser extracts the hostname, <https://www.someschool.edu>, from a URL entered by the user, and passes the hostname to the client side of the DNS application.
  3. The local DNS client sends a query containing the hostname to a somewhat-remote DNS server (or a chain of such servers).
  4. The DNS client eventually receives a reply, which includes the IP address for the hostname.
  5. Once the web browser receives the IP address from DNS, it can initiate a TCP connection to the HTTP server process located at port 80 at that IP address.

Domain Name System:

1.10.4 DNS: services, structure

1.10.5 History of DNS

Discussion Question:
What are several reasons an entity might want to steal a network name?
Would you guess that all such purposes bad?

02-Application/dns_process.png

1.10.6 Partial hierarchy of DNS servers

02-Application/app_layer18.png
DNS is just a pyramid scheme…

client wants IP for <https://www.amazon.com>; 1st approximation:

1.10.6.1 Root DNS servers

https://en.wikipedia.org/wiki/Root_name_server
https://en.wikipedia.org/wiki/DNS_root_zone
02-Application/app_layer19.png

1.10.6.2 Top-level domain (TLD) servers

https://en.wikipedia.org/wiki/Top-level_domain

If https://mst.eu is available…
What fun things could we do with that…?

Ask: How can one “be” an EU resident on the internet?

1.10.6.3 Authoritative DNS servers

1.10.6.4 Local DNS name server / Resolver

02-Application/dns_resolve.jpg

1.10.7 DNS query example

Show: some Wireshark observations of nslookup for various types of record (overview this time, more detail again lower).

  1. Visit https://mst.edu with web browser

  2. Make a manual query using command line tools

#!/bin/bash

nslookup mst.edu
nslookup www.mst.edu

dig mst.edu
dig www.mst.edu

whois mst.edu
whois icann.org

# What are the authoritative servers?
nslookup -type=NS mst.edu

# What do the authoritative servers say?
nslookup mst.edu ns-1.mst.edu
  1. Use python to query the same
#!/usr/bin/python3

import socket

print(socket.gethostbyname('mst.edu'))

https://en.wikipedia.org/wiki/WHOIS
WHOIS going to tell us a Domain Name joke?

1.10.8 DNS server interaction

02-Application/dnslevels.png

Standard iterated query
Some host at cis.poly.edu wants IP address for gaia.cs.umass.edu
02-Application/app_layer20.png
Iterated query:
Contacted server replies with name of server to contact.
“I don’t know this name,
but ask this other server who is responsible for knowing,
or is responsible for asking some server that is.”

Recursive queries
02-Application/app_layer21.png
Recursive query:
Puts burden of name resolution on contacted name server.
Heavy load at upper levels of hierarchy?

Q:
Can one’s own machine just do the query to root, TLD, and authoritative?
Why bother with the institutional resolver?

A:
Yes, if you set up your own DNS server (easy).
Just install bind, and configure it.
It’s just extra functionality not built into every client.

++++++++++++++ Cahoot-02-7

DNS caching, updating records
02-Application/DNS_Architecture.png

DNS is at the root of many internet problems…

1.10.9 DNS protocol, message format

https://en.wikipedia.org/wiki/Domain_Name_System#DNS_message_format

Query and reply messages, both with same overall message format

Message header

The header of DNS messages is composed of 12 bytes and its structure is shown in the figure below.

The QR flag is set to 0 in DNS queries and 1 in DNS answers.

The Opcode is used to specify the type of query.
For instance, a standard query is when a client sends a name, and the server returns the corresponding data.
An update request is when the client sends a name, and new data, and the server then updates its database.

The AA bit is set, when the server that sent the response has authority for the domain name found in the question section.
In the original DNS deployments, two types of servers were considered : authoritative servers and non-authoritative servers.
The authoritative servers are managed by the system administrators responsible for a given domain.
They always store the most recent information about a domain.
Non-authoritative servers are servers or resolvers that store DNS information about external domains without being managed by the owners of a domain.
They may thus provide answers that are out of date.
From a security point of view, the authoritative bit is not an absolute indication about the validity of an answer.

Ask: Is this secure?
It uses UDP; what does this imply?

Where TC is set, the partial RRSet that would not completely fit may be left in the response.
When a DNS client receives a reply with TC set, it should ignore that response, and query again, using a mechanism, such as a TCP connection, that will permit larger replies.

The RD (recursion desired) bit is set by a client when it sends a query to a resolver.
Such a query is said to be recursive because the resolver will recurse through the DNS hierarchy to retrieve the answer on behalf of the client.
In the past, all resolvers were configured to perform recursive queries on behalf of any Internet host.
However, this exposes the resolvers to several security risks.
The simplest one is that the resolver could become overloaded by having too many recursive queries to process.
As of this writing, most resolvers only allow recursive queries from clients belonging to their company or network and discard all other recursive queries.

The RA bit indicates whether the server supports recursion.

The RCODE is used to distinguish between different types of errors. See RFC 1035 for additional details.

The last four fields indicate the size of the Question, Answer, Authority and Additional sections of the DNS message.

1.10.10 DNS records (RR)

Name indicates the name of the node to which this resource record pertains.

The two bytes Type field indicates the type of resource record.

The Class field was used to support the utilization of the DNS in other environments than the Internet.

The TTL field indicates the lifetime of the Resource Record in seconds.
This field is set by the server that returns an answer and indicates for how long a client or a resolver can store the Resource Record inside its cache.
A long TTL indicates a stable RR.
Some companies use short TTL values for mobile hosts and also for popular servers.
For example, a web hosting company that wants to spread the load over a pool of hundred servers can configure its nameservers to return different answers to different clients.
If each answer has a small TTL, the clients will be forced to send DNS queries regularly.
The nameserver will reply to these queries by supplying the address of the less loaded server.

The RDLength field is the length of the RData field that contains the information of the type specified in the Type field.

Several types of DNS RR are used in practice.

type=A

type=NS

type=CNAME

type=MX

There are more record types (summary of commonly used):
https://en.wikipedia.org/wiki/List_of_DNS_record_types

02-Application/dns_iterative_resolve.png

+++++++++++++++++ Cahoot-02-8

1.10.11 Wireshark protocol details

Show: some Wireshark observations of nslookup for various types of record, this time in detail about the fields.

  1. Visit https://mst.edu with web browser

  2. Make a manual query using command line tools

#!/bin/bash

nslookup mst.edu
nslookup www.mst.edu

dig mst.edu
dig www.mst.edu

whois mst.edu
whois icann.org

# What are the authoritative servers?
nslookup -type=NS mst.edu

# What do the authoritative servers say?
nslookup mst.edu ns-1.mst.edu
  1. Use python to query the same
#!/usr/bin/python3

import socket

print(socket.gethostbyname('mst.edu'))

1.10.12 Inserting records into DNS

02-Application/dns_lookup_v_registration.png

1.10.13 Reverse DNS

https://en.wikipedia.org/wiki/Reverse_DNS_lookup

1.10.14 Attacks on DNS

https://en.wikipedia.org/wiki/Domain_Name_System#Security_issues
https://en.wikipedia.org/wiki/Domain_Name_System#Privacy_and_tracking_issues
02-Application/dns_abuse.jpg

1.10.14.1 Specific attacks

DDoS bandwidth-flooding attack
An attacker could attempt to send to each DNS root server a deluge of packets,
so many that the majority of legitimate DNS queries never get answered.
Bombard root servers with traffic.
This has not really been successful to date.

Defenses include:
Traffic filtering.
Local DNS servers cache IPs of TLD servers,
allowing root server bypass.

Bombarding TLD servers is potentially more dangerous.
02-Application/dns_flood.jpg

Man-in-the-middle attack
The attacker intercepts queries from hosts and returns bogus replies.
https://en.wikipedia.org/wiki/DNS_hijacking
(show in class)
02-Application/dns_spoof.jpg

DNS poisoning attack
The attacker sends bogus replies to a DNS server,
who is making outgoing requests itself,
tricking the server into accepting bogus records into its cache.
Send bogus replies to DNS server, which caches
https://en.wikipedia.org/wiki/DNS_spoofing
(show in class)
02-Application/dns_poison.jpg

DNS redirection
Another important DNS attack is not an attack on the DNS service, per se,
but instead exploits the DNS infrastructure,
to launch a DDoS attack against a targeted host.
Attacker sends DNS queries to many authoritative DNS servers,
with each query having the spoofed source address of the targeted host.
The DNS servers then send their replies directly to the targeted host.

Exploit DNS for DDoS:
send queries with spoofed source address and target IP.
This often requires amplification

DNS as exfiltration / infiltration / tunneling
One can sneak data through DNS requests or replies.
02-Application/dns_tunneling.jpg

+++++++++++++++++ Cahoot-02-9

1.10.15 DNS protections

Ways to avoid those attacks:

Just encrypt the connections to the server:
https://en.wikipedia.org/wiki/DNS_over_HTTPS
https://en.wikipedia.org/wiki/DNS_over_TLS

Tor/VPN/Proxy (privacy, but also some security).

Cryptographic signatures on DNS messages
https://en.wikipedia.org/wiki/Domain_Name_System_Security_Extensions
https://en.wikipedia.org/wiki/DNS-based_Authentication_of_Named_Entities
https://en.wikipedia.org/wiki/DNS_Certification_Authority_Authorization

1.10.16 Who controls DNS?

The Lord of the DNS

One DNS to rule them all,
One DNS to find them,
One DNS to bring them all,
and in the darkness bind them…
02-Application/sauron.webp
(i.e., a big boring Sauron committee…)
https://en.wikipedia.org/wiki/ICANN
https://en.wikipedia.org/wiki/ICANN#Criticism

https://en.wikipedia.org/wiki/Internet_Assigned_Numbers_Authority

https://en.wikipedia.org/wiki/OpenNIC
(permitted to be an open alternative)
02-Application/dns_overview.png

+++++++++++++++++ Cahoot-02-10

1.11 Alternative name systems

Fellowship of the DNS…

Fair, robust, distributed, decentralized, non-exploitable name resolution,
is a bit of a:
https://en.wikipedia.org/wiki/Catch-22_(logic)
and a real difficult problem to solve…

Discussion question:
What might a reliable distributed solution look like?
Might they come with their own exploits and problems?
Might a p2p system end up even more dictatorially problematic than DNS?
(e.g., Mr. Robot’s Evil Corp cryptocurrency)?

GNU name system
https://gnunet.org/gns
https://lsd.gnunet.org/lsd0001/
https://news.ycombinator.com/item?id=30154830
(discuss proposal to replace DNS!)
ICANN
https://icann.zoom.us/rec/play/znYwyZWPwrNraKqiZCLwOkHp_NITBj0QdhMpIrZPTrJumDRxIaecB8DHAygsgO-8PxQKkYx5ESGj6pBl.vZAWJHZoGeNyX9R4?startTime=1572978711000&_x_zm_rtaid=M4Wj53e3QXyaUK9nI6hiQg.1644387258044.8569edd15b9c2bafee5b5a283ad9fa90&_x_zm_rhtaid=108
of using GNUnet instead of DNS

I2P web-of-trust name system
https://geti2p.net/en/docs/naming
(web of trust based)

Crypto-currency-based
https://ens.domains/
https://docs.ens.domains/en/latest/introduction.html

https://www.namecoin.org/
https://en.wikipedia.org/wiki/Namecoin

1.12 Hosting a site

Do you need to buy a name to host a site on clearnet?
Do you need to buy an static IP to host a site on clearnet,
or does a dynamic IP suffice?
What about dynamic DNS?
https://en.wikipedia.org/wiki/Dynamic_DNS
Do you need to buy act actual machine? A virtual one?
Do you need to buy an HTTPS certificate?
Do you need to buy anything else?

What about overlay layers or darknets for simple free hosting?
https://en.wikipedia.org/wiki/I2P
https://en.wikipedia.org/wiki/Tor_(anonymity_network)
Can one circumvent DNS editing as a censorship technique?
Can one block sites at all with common darknets?

What is the easiest way to set up an independent site on your own hardware, or a VPS you rent?
Static websites:
https://onionshare.org/
http://lldan5gahapx5k7iafb3s4ikijc4ni7gx5iywdflkba5y2ezyg6sjgyd.onion/

sudo dnf install tor
pip3 install --upgrade onionshare-cli --user
echo "cool publically accessible website" >index.html
onionshare-cli --website --public index.html

You could even host a website like this on your phone,
in under 10 minutes:
https://onionshare.org/mobile/#download
Anywhere that had an internet connection,
you could leave your phone plugged in an host a website there…

https://medium.com/axon-technologies/hosting-anonymous-website-on-tor-network-3a82394d7a01
Interactive backend easily possible with tor process and Apache.

https://geti2p.net/en/faq#myI2P%20Site

1.13 P2P

Today:
Theoretical difficulties with P2P and their solutions (general).
An overview of protocols and services provided by P2P overlay applications (general).
High level protocol specification for an example P2P application (BitTorrent).

1.13.1 A variety of protocols

There are many P2P protocols.
BitTorrent is just one we will review today.

1.13.2 P2P vs. Client server

02-Application/p2p00.png

Examples:

02-Application/p2p01.png

File distribution problem: Client server vs. P2P

Upload/download capacity is limited resource!

Question:
How much time to distribute file (size F),
from one server, to N peers?
02-Application/app_layer23.png

1.13.2.1 Client-server time to distribute file

Let’s first determine the distribution time for the client-server architecture,
which we denote by Dcs. In the client-server architecture, none of the peers aids in
distributing the file. We make the following observations:

The server must transmit one copy of the file to each of the N peers.
Thus, the server must transmit N * F bits.
Since the server’s upload rate is us,
the time to distribute the file must be at least (N * F) / us

Server transmission
Must sequentially send (upload) a number (N) of file (F) copies:

Server upload rate:
us

Time to send one copy:
F / us

\[ time = \frac{bits}{\frac{bits}{seconds}} \]

\[ time = bits * \frac{seconds}{bits} \]

Time to send N copies:
(N * F) / us

Let d min denote the download rate of the peer with the lowest download rate,
that is, dmin = min{d1, d2, …, d~N}.
The peer with the lowest download rate,
cannot obtain all F bits of the file in less than F / dmin seconds.
Thus the minimum distribution time is at least F / dmin
That however, will almost never be the real time,
as the server must distribute to many peers.

Client: each client must download file copy.

Time to distribute F to N clients using client-server approach:
Dcs > max{ (N * F) / us, F / dmin }
Max numerator increases linearly with N.

Question: how much time to distribute file (size F) from one server to N peers?
02-Application/app_layer23.png

1.13.2.2 P2P time to distribute file

At the beginning of the distribution, only the server has the file.
To get this file into the community of peers, the server must send each bit of the file at least once into its access link.
Thus, the minimum distribution time is at least F / us
Unlike the client-server scheme, a bit sent once by the server may not have to be sent by the server again, as the peers may redistribute the bit among themselves.

Server transmission
Must upload at least one copy.
Time to send one copy:
F / us

As with the client-server architecture, the peer with the lowest download rate,
cannot obtain all F bits of the file in less than F / dmin seconds.
Thus the minimum distribution time is at least F / dmin
Unlike with the client-server model, with p2p,
this could actually (and often is) the server’s bandwidth contribution.

Client: each client must download file copy.
Min client download time: F / dmin

The total upload capacity of the system as a whole,
is equal to the upload rate of the server,
plus the upload rates of each of the individual peers, that is:
utotal = us + u1 + … + uN
The system must deliver (upload) F bits to each of the N peers,
thus delivering a total of N * F bits.
This cannot be done at a rate faster than utotal.
Thus, the minimum distribution time is also at least:
(N * F) / (us + u1 + … + uN).

Clients: as aggregate, each individual (i) must download N * F bits
Max upload rate (limiting max download rate) is us + sum(ui)

Time to distribute F to N clients using P2P approach:
DP2P > max{ F / us, F / dmin, (N * F) / (us + sum(ui)) }
Max numerator increases with N
But, so does the denominator,
since each peer provides service capacity

1.13.2.3 Distribution time for P2P vs. Client-server

02-Application/app_layer24.png
Net client upload rate = u
F / u = 1 hour
us = 10u
dmin >= us

P2P vs Client server
For the P2P architecture the minimal distribution time is always lesser,
compared to the distribution time of the client-server architecture.
It is also less than a fixed duration, above some number of peers N!
Applications with the P2P architecture can be self-scaling.
This scalability is a direct consequence of peers being re-distributors,
as well as consumers of bits.

1.13.3 BitTorrent overview

Standard protocol,
many clients (Vuze, BigglyBt, I2P-Snark, Bittorrent-official, etc., ),
and versions of tracker software (some server-based trackers).

File divided into 256Kb chunks (or other equal size).
Peers in torrent send/receive file chunks.
02-Application/app_layer25.png
Tracker:
Tracks peers participating in torrent (or DHT).
Runs their own choice of tracker software.
Used to be only a server-side operation, now also can be P2P!

Torrent:
Meta-data and group of peers exchanging chunks of a file.

Client:
Uploads and downloads files.
Runs their own client torrent software.

Process:
Alice arrives, chooses a torrent, and using the torrent meta-data,
obtains a list of peers from tracker server (or distributed tracker),
and finally begins exchanging file chunks with peers in torrent.

++++++++++++++++ Cahoot-02-11

1.13.3.1 Overview of process

Peer joining torrent:

After joining:

Churn:

Requesting chunks:

Ask: Why rarest first?

Discussion question:
Why not just be a leech (download but not contribute)?
How might you design a protocol with incentives?
What might an incentive look like?
Should you build incentives into protocols?
Do people follow incentives?

1.13.3.2 BitTorrent participation incentive

How do we put a kink in the wires of those who don’t contribute enough,
slowing down their transfers,
to encourage every peer to reciprocate?

Sending chunks: tit-for-tat incentives
https://en.wikipedia.org/wiki/Tit_for_tat

Overview:
(1) Alice “optimistically un-chokes” a new participant, Bob, in hopes that reciprocates
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates
(3) Bob becomes one of Alice’s top-four providers

All this results in higher upload rate, finding better trading partners, and getting file faster !

Sharing is caring…

1.13.3.2.1 Problem 0: How to get people to share

Due to risk or costs in internet speed or throughput,
individuals could potentially download, but not upload.

1.13.3.2.2 Solution 0: Participation incentives

An interesting read, game theory in software design and CompSci:
http://bittorrent.org/bittorrentecon.pdf

1.13.4 Distributed Hash Table (DHT)

General (not BitTorrent specific)

++++++++++++++++ Cahoot-02-12

Review: dictionaries, maps, and hash tables

Simple database with (key, value) pairs:
key: human name;
value: social security number
02-Application/pasted_image.png

key = hash(original key)
02-Application/pasted_image001.png
02-Application/hash_table.png

Note: There are potentially two distributed databases (or merged into one) in some p2p networks:
1. Routing table for overlay network peers, who are defined by their addresses
2. Database of torrents: addresses/peers

1.13.4.1 Distributed database

It’s easy to keep a database on a server,
but how do we increase the censorship resistance and robustness?

1.13.4.1.1 Problem 1: how to keep a database everywhere?

DHT
02-Application/DHT.png

1.13.4.1.2 Solution 1: distributed hash table (DHT)

Problem

Solution

  1. Storing indices of more neighbors increases messaging efficiency, and increases storage overhead

1.13.4.2 Peer churn

Peers come an go, and the network must adapt.
02-Application/app_layer26.png

1.13.4.2.1 Problem 2: peers turn over

Example, peer 5 abruptly leaves, or is disconnected

1.13.4.2.2 Solution 2: synchronization procedure

Handling peer churn:

Example: peer 5 abruptly leaves:

1.13.5 BitTorrent protocols

https://en.wikipedia.org/wiki/Bittorrent
https://wiki.wireshark.org/BitTorrent
http://bittorrent.org/beps/bep_0000.html

1.13.5.0.1 Main protocol and torrent files

https://www.bittorrent.org/beps/bep_0003.html (show in class)
https://en.wikipedia.org/wiki/Torrent_file (show in class)
https://en.wikipedia.org/wiki/Magnet_URI_scheme
Show a real torrent file, map to specifications.
For example, https://ftp.qubes-os.org/iso/Qubes-R4.1.0-x86_64.torrent

1.13.5.0.2 DHT server-less tracker protocol extension
1.13.5.0.3 Transport layer protocols used by BitTorrent

BitTorrent protocol: two main transport level choices

  1. Option 1: BitTorrent started with using TCP as its transport protocol.
  1. Option 2: UDP-based Micro Transport Protocol, called uTP.
0       4       8               16              24              32
+-------+-------+---------------+---------------+---------------+
| type  | ver   | extension     | connection_id                 |
+-------+-------+---------------+---------------+---------------+
| timestamp_microseconds                                        |
+---------------+---------------+---------------+---------------+
| timestamp_difference_microseconds                             |
+---------------+---------------+---------------+---------------+
| wnd_size                                                      |
+---------------+---------------+---------------+---------------+
| seq_nr                        | ack_nr                        |
+---------------+---------------+---------------+---------------+
1.13.5.0.4 BT video streaming (P2P)
02-Application/vuze-hp.png

1.13.5.1 BitTorrent Protocol

Show/demo: Wireshark downloading Linux ISO with transmission

1.13.6 P2P security

1.13.6.1 Problems:

1.13.6.2 Solutions:

Next: 03-Transport.html