Lecture#14-21 Transport Layer (Computer Networks Part-3)
Lecture#14-21 Transport Layer (Computer Networks Part-3)
Lecture#14-21 Transport Layer (Computer Networks Part-3)
3-1
Outline
Transport-layer Connection-oriented
services transport: TCP
Multiplexing and segment structure
demultiplexing reliable data transfer
Connectionless
flow control
connection management
transport: UDP
Principles of congestion
Principles of reliable
data transfer control
TCP congestion control
3-2
Transport services and protocols
provide logical communication application
transport
between app processes network
data link
running on different hosts physical
network
data link
network physical
lo
transport protocols run in data link
gi
ca
physical
end systems
le
network
nd
data link
send side: breaks app
-e
physical network
nd
data link
messages into segments, physical
tr
an
passes to network layer network
s
data link
po
physical
rt
rcv side: reassembles
segments into messages, application
transport
passes to app layer network
data link
more than one transport physical
3-3
Transport vs. network layer
network layer: logical Household analogy:
communication 12 kids sending letters
between hosts to 12 kids
transport layer: logical processes = kids
communication app messages = letters
between processes in envelopes
relies on, enhances, hosts = houses
network layer services
transport protocol =
Ann and Bill
network-layer protocol
= postal service
3-4
Internet transport-layer protocols
reliable, in-order application
transport
lo
data link
gi
flow control
ca
physical
le
network
connection setup
nd
data link
-e
physical network
nd
data link
unreliable, unordered physical
tr
an
delivery: UDP
network
s
data link
po
physical
rt
no-frills extension of
“best-effort” IP application
transport
network
services not available: data link
physical
delay guarantees
bandwidth guarantees
3-5
Multiplexing/demultiplexing
Demultiplexing at rcv host: Multiplexing at send host:
gathering data from multiple
delivering received segments
sockets, enveloping data with
to correct socket
header (later used for
demultiplexing)
= socket = process
application P3 P1
P1 application P2 P4 application
host 2 host 3
host 1
3-6
How demultiplexing works
host receives IP datagrams
each datagram has source IP address,
destination IP address 32 bits
each datagram carries 1 transport-
layer segment source port # dest port #
each segment has source, destination
port number
(recall: well-known port numbers for other header fields
specific applications)
host uses IP addresses & port numbers
to direct segment to appropriate socket
application
data
(message)
3-7
Connectionless demultiplexing
When host receives UDP
Create sockets with port
segment:
numbers:
DatagramSocket mySocket1 = new
checks destination port
DatagramSocket(99111); number in segment
DatagramSocket mySocket2 = new directs UDP segment to
DatagramSocket(99222); socket with that port
UDP socket identified by number
IP datagrams with
two-tuple:
(dest IP address, dest port number) different source IP
addresses and/or source
port numbers directed
to same socket
3-8
Connectionless demux (cont)
DatagramSocket serverSocket = new DatagramSocket(6428);
P2 P1
P1
P3
3-9
Connection-oriented demux
TCP socket identified Server host may support
by 4-tuple: many simultaneous TCP
source IP address sockets:
source port number each socket identified by
dest IP address its own 4-tuple
dest port number Web servers have
recv host uses all four different sockets for
values to direct each connecting client
segment to appropriate non-persistent HTTP will
socket have different socket for
each request
3-10
Connection-oriented demux
(cont)
P1 P4 P5 P6 P2 P1P3
SP: 5775
DP: 80
S-IP: B
D-IP:C
3-11
Connection-oriented demux:
Threaded Web Server
P1 P4 P2 P1P3
SP: 5775
DP: 80
S-IP: B
D-IP:C
3-12
Figure 11-1
3-13
Figure 11-2
UDP versus IP
3-14
Figure 11-3
Port numbers
3-15
Figure 11-4
IP addresses versus port numbers
3-16
Figure 11-5
IANA ranges
3-17
Figure 11-6
Socket addresses
3-18
UDP: User Datagram Protocol [RFC 768]
3-19
UDP: more
often used for streaming
multimedia apps 32 bits
loss tolerant
Length, in source port # dest port #
rate sensitive bytes of UDP length checksum
segment,
other UDP uses
including
DNS header
SNMP
reliable transfer over UDP: Application
add reliability at data
application layer (message)
application-specific
error recovery!
UDP segment format
3-20
UDP checksum
Goal: detect “errors” (e.g., flipped bits) in transmitted
segment
Sender: Receiver:
treat segment contents compute checksum of received
as sequence of 16-bit segment
integers check if computed checksum
checksum: addition (1’s equals checksum field value:
complement sum) of NO - error detected
segment contents YES - no error detected.
sender puts checksum But maybe errors
value into UDP checksum nonetheless? More later ….
field
3-21
Internet Checksum Example
Note
When adding numbers, a carryout from the
most significant bit needs to be added to the
result
Example: add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
3-22
Principles of Reliable data transfer
important in app., transport, link layers
top-10 list of important networking topics!
characteristics of unreliable channel will determine complexity of reliable data transfer protocol
(rdt)
3-23
Reliable data transfer: getting started
rdt_send(): called from above, deliver_data(): called by
(e.g., by app.). Passed data to rdt to deliver data to upper
deliver to receiver upper layer
send receive
side side
3-24
Reliable data transfer: getting started
We’ll:
incrementally develop sender, receiver sides of
reliable data transfer protocol (rdt)
consider only unidirectional data transfer
but control info will flow on both directions!
use finite state machines (FSM) to specify
sender, receiver
event causing state transition
actions taken on state transition
state: when in this
“state” next state state state
1 event
uniquely determined 2
by next event actions
3-25
Rdt1.0: reliable transfer over a reliable channel
sender receiver
3-26
Rdt2.0: channel with bit errors
underlying channel may flip bits in packet
checksum to detect bit errors
3-27
rdt2.0: FSM specification
rdt_send(data)
snkpkt = make_pkt(data, checksum) receiver
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
3-28
rdt2.0 has a fatal flaw!
What happens if Handling duplicates:
ACK/NAK corrupted? sender adds sequence
sender doesn’t know what number to each pkt
happened at receiver! sender retransmits current
can’t just retransmit: pkt if ACK/NAK garbled
possible duplicate receiver discards (doesn’t
deliver up) duplicate pkt
3-29
rdt2.1: discussion
Sender: Receiver:
seq # added to pkt must check if received
two seq. #’s (0,1) will packet is duplicate
suffice. Why? state indicates whether
0 or 1 is expected pkt
must check if received
seq #
ACK/NAK corrupted note: receiver can not
twice as many states
know if its last
state must “remember” ACK/NAK received OK
whether “current” pkt
at sender
has 0 or 1 seq. #
3-30
rdt2.2: a NAK-free protocol
same functionality as rdt2.1, using ACKs only
instead of NAK, receiver sends ACK for last pkt
received OK
receiver must explicitly include seq # of pkt being ACKed
duplicate ACK at sender results in same action as
NAK: retransmit current pkt
3-31
rdt3.0: channels with errors and loss
3-32
rdt3.0 in action
3-33
rdt3.0 in action
3-34
Performance of rdt3.0
rdt3.0 works, but performance stinks
example: 1 Gbps link, 15 ms e-e prop. delay, 1KB packet:
U L/R .008
= = = 0.00027
sender 30.008
RTT + L / R microsec
onds
U sender : utilization – fraction of time sender busy sending
1KB pkt every 30 msec -> 33kB/sec thruput over 1 Gbps link
network protocol limits use of physical resources!
3-35
rdt3.0: stop-and-wait operation
sender receiver
first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R
U L/R .008
= = = 0.00027
sender 30.008
RTT + L / R microsec
onds
3-36
Pipelined protocols
Pipelining: sender allows multiple, “in-flight”, yet-to-
be-acknowledged pkts
range of sequence numbers must be increased
buffering at sender and/or receiver
Increase utilization
by a factor of 3!
U 3*L/R .024
= = = 0.0008
sender 30.008
RTT + L / R microsecon
ds
3-38
Go-Back-N
Sender:
k-bit seq # in pkt header
“window” of up to N, consecutive unack’ed pkts allowed
45
Selective repeat
sender receiver
data from above : pkt n in [rcvbase, rcvbase+N-1]
if next available seq # in send ACK(n)
window, send pkt out-of-order: buffer
receiver sees no
difference in two
scenarios!
incorrectly passes
duplicate data as new in
(a)
Q: what relationship
between seq # size and
window size?
Transport Layer 3-48
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581
3-49
Figure 12-19
TCP segment format
3-50
Figure 12-20
Control field
3-51
TCP segment structure
32 bits
URG: urgent data counting
(generally not used) source port # dest port #
by bytes
sequence number of data
ACK: ACK #
valid acknowledgement number (not segments!)
head not
PSH: push data now len used
UA P R S F Receive window
(generally not used) # bytes
checksum Urg data pnter
rcvr willing
RST, SYN, FIN: to accept
Options (variable length)
connection estab
(setup, teardown
commands)
application
Internet data
checksum (variable length)
(as in UDP)
3-52
TCP seq. #’s and ACKs
Seq. #’s:
Host A Host B
byte stream
“number” of first User Seq=4
2, A C
byte in segment’s types K=79,
da t a =
‘C’ ‘C’
data host ACKs
ACKs: receipt of
ta = ‘C’ ‘C’, echoes
seq # of next byte 3, da
9 , A CK=4 back ‘C’
expected from other S eq =
7
side
cumulative ACK host ACKs
receipt Seq=4
Q: how receiver handles of echoed 3, ACK
=80
out-of-order segments ‘C’
A: TCP spec doesn’t
say, - up to
time
implementor
simple telnet scenario
3-53
TCP Round Trip Time and Timeout
Q: how to set TCP Q: how to estimate RTT?
timeout value? SampleRTT: measured time from
longer than RTT segment transmission until ACK
but RTT varies
receipt
ignore retransmissions
too short: premature
timeout SampleRTT will vary, want
unnecessary estimated RTT “smoother”
average several recent
retransmissions
too long: slow reaction measurements, not just
to segment loss current SampleRTT
3-54
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
3-55
Example RTT estimation:
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
350
300
250
RTT (milliseconds)
200
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
3-56
TCP Round Trip Time and Timeout
Setting the timeout
EstimtedRTT plus “safety margin”
large variation in EstimatedRTT -> larger safety margin
first estimate of how much SampleRTT deviates from EstimatedRTT:
DevRTT = (1-)*DevRTT +
*|SampleRTT-EstimatedRTT|
(typically, = 0.25)
3-57
RTO calculation for Time-out (Karn’s Algorithm)
3-59
TCP sender events:
data rcvd from app: timeout:
Create segment with retransmit segment
seq # that caused timeout
seq # is byte-stream restart timer
number of first data Ack rcvd:
byte in segment If acknowledges
start timer if not
previously unacked
already running (think segments
of timer as for oldest update what is known to
unacked segment) be acked
expiration interval: start timer if there are
TimeOutInterval outstanding segments
3-60
TCP Flow Control
flow control
sender won’t overflow
receive side of TCP
receiver’s buffer by
connection has a transmitting too
receive buffer: much,
too fast
speed-matching
service: matching the
send rate to the
receiving app’s drain
rate
app process may be
slow at reading from
buffer
3-61
TCP Flow control: how it works
Rcvr advertises spare
room by including value
of RcvWindow in
segments
Sender limits unACKed
(Suppose TCP receiver data to RcvWindow
discards out-of-order guarantees receive
segments) buffer doesn’t overflow
spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
3-62
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
3.3 Connectionless
flow control
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
3-63
TCP Connection Management
Recall: TCP sender, receiver Three way handshake:
establish “connection”
before exchanging data Step 1: client host sends TCP
segments SYN segment to server
initialize TCP variables: specifies initial seq #
seq. #s no data
buffers, flow control
Step 2: server host receives
info (e.g. RcvWindow) SYN, replies with SYNACK
client: connection initiator segment
Socket clientSocket = new
Socket("hostname","port
server allocates buffers
specifies server initial seq.
number");
server: contacted by client #
Socket connectionSocket = Step 3: client receives SYNACK,
welcomeSocket.accept(); replies with ACK segment,
which may contain data
3-64
Figure 12-28
Three-way handshaking
3-65
TCP Connection Management (cont.)
timed wait
ACK
replies with ACK. Closes
connection, sends FIN.
closed
3-66
TCP Connection Management (cont.)
closed
3-67
Figure 12-29
Four-way handshaking
3-68
TCP Connection Management (cont)
TCP server
lifecycle
TCP client
lifecycle
3-69
Principles of Congestion Control
Congestion:
informally: “too many sources sending too much
data too fast for network to handle”
different from flow control!
manifestations:
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem!
receivers
unlimited shared
one router,
Host B
output link buffers
infinite buffers
no retransmission
large delays
when congested
maximum
achievable
throughput
Transport Layer 3-71
Causes/costs of congestion: scenario 2
one router, finite buffers
sender retransmission of lost packet
R/3
out
out
out
R/4
a. b. c.
“costs” of congestion:
more work (retrans) for given “goodput”
unneeded retransmissions: link carries multiple copies of pkt
Transport Layer 3-73
Causes/costs of congestion: scenario 3
four senders
Q: what happens as
multihop paths in
and increase ?
timeout/retransmit in
Host A out
in : original data
'in : original data, plus
retransmitted data
finite shared output
link buffers
Host B
H
o
s
t
B
16 Kbytes
8 Kbytes
time
RTT
first loss event:
two segm
double CongWin every en ts
RTT
done by incrementing
CongWin for every ACK four segm
ents
received
Summary: initial rate
is slow but ramps up
exponentially fast time
Implementation:
Variable Threshold
At loss event, Threshold is
set to 1/2 of CongWin just
before loss event
1.22 MSS
RTT L
➜ L = 2·10-10 Wow
New versions of TCP for high-speed needed!
TCP connection 1
bottleneck
TCP
router
connection 2
capacity R