ComputerNetworks mod4HTTP2smtp Q1 Etext2

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

e-PG Pathshala

Subject : Computer Science

Paper: Computer Networks


Module: HTTP Part 2 & SMTP
Module No: CS/CN/4

Quadrant 1 – e-text

In our top-down approach to understanding network protocols, we have started with HTTP,
one of the most widely used application layer protocol. We will continue with some of the
interesting features that have made HTTP scale and survive over two and a half decades.
We will then look at another protocol, SMTP, used in another very common application,
namely, e-mail.
The objectives for this module are as follows.
Learning Objectives
• To understand support for cookies & caching in HTTP
• To understand the basic protocol used for e-mail
– SMTP

4.1 HTTP overview


Let us quickly some of the facts about HTTP. It is a request-response protocol used in a
client-server architecture, meant for accessing objects from the web. It uses the reliable
service provided by TCP. It is a stateless protocol, where each request is viewed and
serviced independent of previous requests. Every object has to be requested by a
separate HTTP request. This can make the process of retrieving web pages with many
objects very slow, if a TCP connection has to be established for each object. To cut down
on this overhead, HTTP uses a persistent mode of operation where a single connection
can be used to fetch multiple objects from the server.
4.1.1 HTTP Growth
Web servers now handle multiple times the amount of traffic than when the HTTP protocol
was originally designed. But the same basic protocol still works. The secret to this ability to
handling scale is that HTTP is a stateless protocol. It makes it simple and lighter, which
help it to scale. Another aspect of HTTP is that since it is designed to have requests and
response messages with many meaningful header lines, it has become possible to add
functionality required by the application around it, by adding the necessary headers.
We will now look at two such functionalities that have been added to enhance the
performance of web-servers using HTTP.
4.1 Cookies
All of us are familiar with this term ‘cookies’. We often encounter a message in the browser
saying that cookies will be installed in your machine. But what exactly is it and how does it
work.
Cookie actually is some kind of an identifier maintained in a file at the browser, and is used
by servers to track the user. As you can guess, this is a mechanism through which the web
server can identify the user so that it can maintain some state about his/her behavior.
Since HTTP is stateless, it just passes every request to the server program. But the server
can actually maintain some information about these requests in a back-end server, and
use this info to track the user. However, for this, it requires to identify the requests as
coming from the same user, without the user having to log-in. It is for this identification that
cookies are used. Many websites use cookies.
To understand this further, let us look at how this works. When a user visits a web-site for
the first time, i.e., it sends a HTTP request to that server, the server will allocate an identity
to this user, and save his usage or his behavior pattern or any information regarding his
transactions in a back-end database. When it sends back the HTTP response, it will send
this id as part of its header using a header line called ‘Set-cookie”. The browser on
receiving this, will save this and the corresponding web-site details in a cookie file. Next
time when the same web-site is to be accessed, it will first check the cookie file, and see if
any cookie information has been stored. If it has been stored, then it sends this cookie id
along with the HTTP request message as part of its header using a header line called
“cookie”. The server on receiving this, will check its database for this id, and use the
corresponding information to better serve this user.
Let us look at a typical transaction using cookies (Fig. 4.1). The cookie file is just a text file
which stores the web-site URL, and the corresponding cookie-id. Assume that we already
have an entry for the 'ebay' website (cookie value 8734) on the clients browser. Now when
the client accesses 'Amazon' website, for the first time, the Amazon server creates an ID
for this user (say 1678), and stores it in a backend database. When it sends the HTTP
response message to the client, it has a "set-cookie: 1678" header line added in the
response. The client browser, on seeing this header, will add an entry in the cookie file for
the amazon site. When the client accesses the same website the next time, the browser
will send this information as part of its header "cookie:1678". The server then consults its
back-end database, and uses this information to identify the user, and performs user-
specific action. The subsequent requests sent to the same website also will have this
information sent, and the user will continue to get user-specific responses, even after a
week.
Figure 4.1 An example of cookies in action

Cookies are used for many purposes. A website could use it for authentication. If a user is
logged into a secure area of the website, the login information (rather information related
to login) can be kept in a cookie. The browser can send this information automatically to
the site whenever it is accessed, so that the user need not enter this information again and
again when accessing the site.
It can be used to keep track of sessions - "session cookies" as they are called. These can
be used to keep track of the various page activities performed by a user so that the user
can take off from where he left previously.
It can be used by e-commerce websites to keep track of users purchases with shopping
carts. The user may not choose all that he/she wants to buy in one visit to the web-site.
So, a cookie may be used to track the items being added to the cart over multiple visits to
the site.
Cookies are also used by ad-sites to track user preferences and behavior. Similarly, many
websites allow users to customize the presentation or layout of the page. Such
preferences can be remembered with the help of cookies.
While there are many such uses, there is also a downside to use of cookies that the user
should be aware of. Cookies can be a source of security and privacy concerns. Since
cookies keep track of behavior about various websites, they can be used as some form of
spy-ware. They are not malicious as viruses (they do not execute code), but they can leak
a lot of information about the users web-behavior and interaction.

4.2 Web caches


The second important feature used with the HTTP protocol is the idea of web caches and
proxy servers. This feature also is aimed at handling the statelessness of HTTP. If the
same webpage is requested by a client multiple times, it will be sent back every time as
per the HTTP protocol. Obviously this is a waste of time and bandwidth, especially if the
content has not changed (static content). One way of handling this is by adding some
intelligence at the browser, wherein the browser could just maintain a local cache for such
web-pages, and display the content from the cache itself, instead of fetching it from the
server. Of course, the browser should know that the content has not changed. We need a
mechanism to get that information.
Also, look at another scenario. Let us consider an institution or enterprise with multiple
users. All of them may be requesting for the same page, yet separate requests will go for
the same content to the server. Here again there is an opportunity for optimization, if we
could cache the content at some common location that is closer to the multiple users at the
institution. We could have a machine at the edge of the institution, which caches the
content received in the response. It could then intercept all HTTP requests going out of the
institution, and if it already has the content, could serve the request from the cache itself,
without having to go to the origin server. In other words, it can proxy for the server. We call
such devices as proxy servers (Fig. 4.2).
Proxy servers are deployed in many networks to reduce the bandwidth usage and to get
quicker response. It is obvious that getting a response from the proxy will be much faster
than getting it from the origin server which could be many hops away in the internet. Again
we need a mechanism for the proxy server to know that the content has not changed.
HTTP provides such a mechanism by means of another corresponding pair of headers in
the request and response messages. HTTP allows for what is called as "Conditional
GET". A GET message is sent with a header line "If-modified-since: <date>". The
purpose of this header line is to tell the server that it needs to send the object only if that
object has not been modified since the <date> specified. This date information is stored
along with the content in the cache at the proxy server. Note that when any object is sent
by the server, the date on which the response is given as well as a the date at which this
file was last updated will be sent as part of the header. So the proxy can store this
information, and ask the server to send the file only if it has not been modified.
If it has not been modified, the server responds with a status message 'HTTP/1.0 304 Not
Modified'. Now, the proxy knows that its content is up-to-date and can service the request
from its cache itself. If it is modified, the server will anyway send back the complete file
with an OK status, and the proxy can now cache this new content.
Figure 4.2 Proxy server using web caching
Now what is the performance benefit that we get by using a proxy server? For an
institution, it helps to improve the response time, even without having to use high
bandwidth links. That is, it provides a low-cost solution for faster response times. Let us
consider an example to understand this benefit.
Example 4.1 : (a) Consider an institutional LAN of 1Gbps connected to the Internet through
an access link of 1.54Mbps as shown in Fig. 4.3(a). Assume that the average size of
objects being retrieved by the nodes is 100Kbits, and an average request rate of 15
requests/sec from browsers to origin servers. Let the RTT from institutional router to any
origin server be 2 secs. Calculate the LAN utilization and access link utilization (a) without
and (b) with web cache.

Figure 4.3 Web caching example (a) Without web cache (b) with web cache
(a) Without web cache :
 Data rate required for object retrieval per node = 15 * 100kbps = 1.5 Mbps
 LAN utilization = 1.5 Mbps / 1Gbps = 15%
 Access link utilization = 1.5 Mbps / 1.54 Mbps = 99%
 Total delay = Internet delay + access delay + LAN delay = 2 sec + minutes +
usecs
Even a single request can clog the access link. Only one request can be processed by the
access link per second. So the access delay could run to minutes depending on the
number of clients. Even with 10 clients (with 15 requests/sec/client), it can take up to 150
secs (2 and a half minutes) of waiting time. So, delays would be high for about 100 clients.
One solution to this would be to use a higher bandwidth access link, but that would be an
expensive solution.
(b) Instead, if we use a proxy server (web cache) at the edge of the institutional
network, let us look at how the total delay can be decreased. Assume that we have
a hit rate of 40% in the proxy’s cache. That is, 40% of the requests are server by
the cache, and 60% have to go to the origin server.
Access Link utilization
 60% of requests use access link
 Data rate to browsers over access link = 0.6*1.50 Mbps = .9 Mbps
 Access Link utilization = 0.9/1.54 = .58
Total delay
 Total delay = 0.6 * (delay from origin servers) + 0.4 * (delay when satisfied at
cache)
 = 0.6 (2.01) + 0.4 (~msecs)
 = ~ 1.2 secs
We can see from this example, that use of cache gives a significant improvement in
performance even for a 40% hit rate. Another advantage of using cache is that the load
at the server is also reduced.
Hence, caching and proxy servers are regularly deployed by most institutions, and ISPs at
various levels. Taking this cache idea further, there are many web caches that are
employed at many places in the network, which work in a cooperative fashion. With
cooperating caches, the clients first check with these caches, caches talk to each other
and try to service the request, thereby reducing the number of requests going to the origin
server.
It is interesting to note that the web caches act as both a server and a client. They act as
servers if the object is available in the cache. If it is not in the cache, it acts as a client to
contact another cache or the origin server. Another advantage of web caches is that the
Internet dense with caches enables even “poor” content providers to effectively deliver
content.
4.3 Electronic Mail - SMTP
The next application we consider is email and the Simple Mail Transfer Protocol (SMTP)
used by this application. As the name says, it is a simple protocol where messages -
commands and responses - are exchanged by mail servers. User agents facilitate access
to the mail servers. Thus, an email application has three components (Fig. 4.4).
• User agents
• Mail servers, and the
• Simple mail transfer protocol: SMTP.
In addition to these, there are a set of protocols used by the user agents to interact with
the servers. User agents send mail messages to the servers using SMTP, and receive or
retrieve the messages from the servers using protocols such as POP3 or IMAP. We will
look at the details of these protocols later.

Figure 4.4 Components of e-mail


4.3.1 User Agents and Mail Servers
The user agent also known as the mail reader, is an application that allows you to
compose, edit, read and in general manage mail messages. Microsoft Outlook, Mozilla
Thunderbird are examples of such agents. One can organize and manage mail messages
on the local machine using such agents.
Mail Servers are the ones that are responsible for storing the incoming and outgoing
messages. Incoming messages are stored in Mailbox, and the outgoing (to be sent) mail
messages are stored in message queues at the servers. The two mail servers talk to each
other using the SMTP protocol.
4.3.2 SMTP details
When SMTP protocol is used between mail servers to send email messages, the sending
mail server acts as the "client", and the receiving mail server is the "server”. The details of
SMTP can be found in RFC 2821. We will present an overview of this protocol here. It
uses TCP to reliably transfer email message from client to server, using the server port
number 25. There are three phases involved in the message transfer, namely,
I. handshaking (greeting)
II. transfer of messages, and
III. closure.
Handshaking is for the client and server to acknowledge each other and agree for the
message transfer. A Command/response interaction takes place between the client and
server for this purpose. All commands are in the form of ASCII text, and the responses in
the form of status codes and status phrases. Thus if we look at a transcript of an
exchange, it can be easily and clearly understood.
Let us consider the scenario shown in Fig. 4.5 where a user Alice sends a message to
Bob. As mentioned earlier, SMTP will be used by Alice's user agent to send the message
to her mail server. This mail server uses SMTP to transfer the message to Bob's mail
server. Bob's user agent then retrieves the message using a mail access protocol such as
POP or IMAP.

Figure 4.5 Scenario: Alice sends message to Bob


Following are the steps involved.
1) Alice uses UA to compose message and sends it “to” [email protected].
2) Alice’s UA sends message to her mail server; and the message is placed in the
message queue.
3) Client side of SMTP opens TCP connection with Bob’s mail server.
4) SMTP client sends Alice’s message over the TCP connection.
5) Bob’s mail server places the message in Bob’s mailbox.
6) Bob invokes his user agent to read the message
An example transcript of the message exchange that takes place between the servers as
part of the SMTP protocol is shown in Fig. 4.6. The server and the client are denoted by S
and C respectively. As soon as the TCP connection is established, the server responds
with a 220 status code indicating its willingness to receive the message. The first few lines
before "DATA" show the commands used for the initial handshake. Agreement on who is
sending data, and who is to receive it are part of this exchange. The actual message is
sent following the DATA command. SMTP server uses (a dot on a new line) the
characters, CRLF.CRLF to indicate the end of message. A QUIT command followed by a
corresponding status response completes the transaction. SMTP uses persistent TCP
connection, hence the TCP connection is closed at the end of the transaction.

S: 220 hamburger.edu
C: HELO crepes.fr
S: 250 Hello crepes.fr, pleased to meet you
C: MAIL FROM: <[email protected]>
S: 250 [email protected]... Sender ok
C: RCPT TO: <[email protected]>
S: 250 [email protected] ... Recipient ok
C: DATA
S: 354 Enter mail, end with "." on a line by itself
C: Do you like ketchup?
C: How about pickles?
C: .
S: 250 Message accepted for delivery
C: QUIT
S: 221 hamburger.edu closing connection

Figure 4.6 Transcript of mail exchange using SMTP


Since SMTP uses a "." to indicate end of data, a quick question that could arise here is :
How can you send a message containing "." on a line by itself ? The answer to this is
simple. The client sends a line containing ".." (2 periods - one dot acts as an escape
charater). The receiving agent will remove one from the message.
A few commands used above such as HELO, MAIL FROM, RCPT TO, DATA, and QUIT
are self-explanatory. A few more commands and their purpose are given below in Table
4.1.
Table 4.1 Some SMTP Commands

Command Explanation

VRFY Confirm that a name is a valid recipient

EXPN Expand an alias (group email address)

TURN Switch roles (sender <==> receiver)

SOML Send Or Mail: if recipient is logged in, display message on terminal,


otherwise email

SAML Send and Mail


NOOP Send back a positive reply code

RSET Abort current transaction

The SMTP server may not implement (or allow) some commands. If so, it returns a "502"
response: 502 5.5.1 Command not implemented: "SOML".

4.3.2.1 Playing SMTP client


One of the better ways of understanding SMTP, is to try out the SMTP interaction with an
SMTP server over a telnet connection. Since all SMTP messages are in 7-bit ASCII, it
becomes very easy to send them directly over a telnet session. That is, we are not using a
mail reader. The following is what you need to do.
I. telnet servername 25 (connect to the SMTP server running at port number 25 at the
'servername' )
II. see 220 reply from server
III. enter HELO, MAIL FROM, RCPT TO, DATA, QUIT commands in order as you
receive the corresponding status messages.
4.3.3 Mail message format
You will be very familiar with mail messages. The SMTP commands are not to be
confused with the header that appears in a mail message. They look similar.
The mail message will have a header with lines such as
To:
From:
Subject:
and the body of the message in plain ASCII.
A sample message is shown in Table 4.2.
Table 4.2 A sample mail message

From [email protected] Tue Sep 7 15:13:27 2014


Return-Path: <[email protected]>
Date: Tue, 07 Sep 2014 13:46:35 +0700
From: Harry Hacker <[email protected]>
Subject: You just won one million dollars
To: Joe Victim <[email protected]>
Message-id: <[email protected]>
Content-Length: 423
Status: R
Congratulations. You just won one millions dollars!
To claim your prize please go to: http://www.gullible.com

4.3.4 Comparison with HTTP


At this stage, a comparison of the SMTP protocol with the HTTP protocol would be in
order. Both have ASCII command/response interaction, and status codes. The main
difference, however, is that HTTP is a pull mechanism, whereas, SMTP is a push
mechanism. The HTTP client predominantly pulls information from the server, whereas,
the SMTP client pushes mail content to the server. In HTTP, each object is encapsulated
in its own response message, whereas in SMTP, multiple objects are sent in a single, but
multipart message. We will look at this later when we discuss the MIME (Multimedia
Internet Message Exchange) formats used for message transfer.
4.4 Summary
To summarize, in this session, we have discussed parts of two protocols - HTTP and
SMTP. With respect to HTTP we have seen the benefits and uses of Cookies, and
caching, and how they work. Similarly, we have seen the details (in terms of message
formats and exchanges) of the SMTP used for e-mail. We will be going into mail access
protocols such as POP3 and IMAP in the next session.
Thank you !

References
1. Computer Networking: A Top Down Approach Featuring the Internet, 6th edition.
Jim Kurose, Keith Ross
Addison-Wesley, 2012.
2. Computer Networks: A systems Approach, 5th edition, David Peterson, Davie,
Morgan Kauffman, 2012.

You might also like