Jump to content

Web server: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
History: Apache held the lead among web sites by 1997
Line 22: Line 22:
In March 1989 [[Sir Tim Berners-Lee]] proposed a new project to his employer [[CERN]], with the goal of easing the exchange of information between scientists by using a [[hypertext]] system.<ref>{{Cite news|url=https://www.telegraph.co.uk/technology/2018/11/24/father-web-sir-tim-berners-lee-plan-fight-fake-news/|title='Father of the web' Sir Tim Berners-Lee on his plan to fight fake news|last=Zolfagharifard|first=Ellie|date=2018-11-24|work=The Telegraph|location=London|access-date=2019-02-01|language=en-GB|issn=0307-1235}}</ref><ref>{{Cite web|url=http://history-computer.com/Internet/Maturing/Lee.html|title=History of Computers and Computing, Internet, Birth, The World Wide Web of Tim Berners-Lee|website=history-computer.com|access-date=2019-02-01}}</ref> The following year Berners-Lee introduced the first web server, [[CERN httpd]], and the first browser, [[WorldWideWeb]], both of which ran on [[NeXT]] workstations.
In March 1989 [[Sir Tim Berners-Lee]] proposed a new project to his employer [[CERN]], with the goal of easing the exchange of information between scientists by using a [[hypertext]] system.<ref>{{Cite news|url=https://www.telegraph.co.uk/technology/2018/11/24/father-web-sir-tim-berners-lee-plan-fight-fake-news/|title='Father of the web' Sir Tim Berners-Lee on his plan to fight fake news|last=Zolfagharifard|first=Ellie|date=2018-11-24|work=The Telegraph|location=London|access-date=2019-02-01|language=en-GB|issn=0307-1235}}</ref><ref>{{Cite web|url=http://history-computer.com/Internet/Maturing/Lee.html|title=History of Computers and Computing, Internet, Birth, The World Wide Web of Tim Berners-Lee|website=history-computer.com|access-date=2019-02-01}}</ref> The following year Berners-Lee introduced the first web server, [[CERN httpd]], and the first browser, [[WorldWideWeb]], both of which ran on [[NeXT]] workstations.


In the mid-1990s, the [[National Center for Supercomputing Applications]] developed [[NCSA httpd]], a server that ran on a variety of [[Unix]]-based operating systems and could serve dynamically generated content. These capabilities, along with the multimedia features of NCSA's [[NCSA Mosaic|Mosaic]] browser, highlighted the potential of web technology for publishing and [[distributed computing]] applications. The introduction of new servers continued through the 1990s and beyond, with [[Netscape Application Server|Netscape Enterprise Server]] and [[Microsoft]]'s [[Internet Information Services]] (IIS) among the leading commercial options. The freely available and [[open-source]] [[Apache HTTP server]] held the lead as the preferred server from 1997 to 2012, after which it declined amid competition from IIS and the likewise open-source [[Nginx]] server, which addressed some of its performance limitations.
In the mid-1990s, the [[National Center for Supercomputing Applications]] developed [[NCSA httpd]], a server that ran on a variety of [[Unix]]-based operating systems and could serve dynamically generated content. These capabilities, along with the multimedia features of NCSA's [[Mosaic (web browser)|Mosaic]] browser, highlighted the potential of web technology for publishing and [[distributed computing]] applications. The introduction of new servers continued through the 1990s and beyond, with [[Netscape Application Server|Netscape Enterprise Server]] and [[Microsoft]]'s [[Internet Information Services]] (IIS) among the leading commercial options. The freely available and [[open-source]] [[Apache HTTP server]] held the lead as the preferred server from 1997 to 2012, after which it declined amid competition from IIS and the likewise open-source [[Nginx]] server, which addressed some of its performance limitations.


Although each web server offers its own particular set of features, the HTTP and HTTPS protocols used to communicate with them have been maintained collaboratively as [[open standard]]s through the [[Internet Engineering Task Force]].
Although each web server offers its own particular set of features, the HTTP and HTTPS protocols used to communicate with them have been maintained collaboratively as [[open standard]]s through the [[Internet Engineering Task Force]].

Revision as of 18:35, 11 December 2021

PC clients communicating via network with a web server serving static content only.
The inside and front of a Dell PowerEdge server, a computer designed to be mounted in a rack mount environment. It is often used as a server, in this case as a web server.
Multiple web servers may be used for a high traffic website.
Web server farm with thousands of web servers used for super-high traffic websites.

A web server is computer software and underlying hardware that accepts requests via HTTP, the network protocol created to distribute web content, or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiates communication by making a request for a web page or other resource using HTTP, and the server responds with the content of that resource or an error message. A web server can also accept and store resources sent from the user agent if configured to do so.[1] [2]

The hardware used to run a web server can vary according to the volume of requests that it needs to handle. At the low end of the range are embedded systems, such as a printer that runs a small web server as its configuration interface. A high-traffic Internet website might handle requests with hundreds of servers that run on racks of high-speed computers.

A resource sent from a web server can be a preexisting file (static content) available to the web server, or it can be generated at the time of the request (dynamic content) by another program that communicates with the server software. The former usually can be served faster and can be more easily cached for repeated requests, while the latter supports a broader range of applications.

Technologies such as REST and SOAP, which use HTTP as a basis for general computer-to-computer communication, as well as support for WebDAV extensions, have extended the application of web servers well beyond their original purpose of serving human-readable pages.

History

The world's first web server, a NeXT Computer workstation with Ethernet, 1990. The case label reads: "This machine is a server. DO NOT POWER IT DOWN!!"
Sun's Cobalt Qube 3 – a computer server appliance (2002, discontinued)

In March 1989 Sir Tim Berners-Lee proposed a new project to his employer CERN, with the goal of easing the exchange of information between scientists by using a hypertext system.[3][4] The following year Berners-Lee introduced the first web server, CERN httpd, and the first browser, WorldWideWeb, both of which ran on NeXT workstations.

In the mid-1990s, the National Center for Supercomputing Applications developed NCSA httpd, a server that ran on a variety of Unix-based operating systems and could serve dynamically generated content. These capabilities, along with the multimedia features of NCSA's Mosaic browser, highlighted the potential of web technology for publishing and distributed computing applications. The introduction of new servers continued through the 1990s and beyond, with Netscape Enterprise Server and Microsoft's Internet Information Services (IIS) among the leading commercial options. The freely available and open-source Apache HTTP server held the lead as the preferred server from 1997 to 2012, after which it declined amid competition from IIS and the likewise open-source Nginx server, which addressed some of its performance limitations.

Although each web server offers its own particular set of features, the HTTP and HTTPS protocols used to communicate with them have been maintained collaboratively as open standards through the Internet Engineering Task Force.

Technical overview

The following technical overview should be considered only as an attempt to give a few very limited examples about some features that may be implemented by a web server and some of the tasks that it may perform.

A web server program plays the role of a server in a client-server model by implementing one or more versions of HTTP protocol, often including the HTTPS secure variant and other features and extensions that are considered useful for its planned usage.

The complexity and the efficiency of a web server program may vary a lot depending on (e.g.):[1]

  • common features implemented;
  • common tasks performed;
  • performances and scalability level aimed as a goal;
  • software model and techniques adopted to reach wished performances and scalability level;
  • target HW and category of usage, e.g. embedded system, low-medium traffic web server, high traffic Internet web server, etc.;
  • other aspects.

Common features

Although web server programs differ in how they are implemented, most of them offer the following common features.

These are basic features that most web servers usually have.

  • Static content serving: to be able to serve static content (web files) to clients via HTTP protocol.
  • HTTP: support for one or more versions of HTTP protocol in order to send versions of HTTP responses compatible with versions of client HTTP requests, e.g. HTTP/1.0, HTTP/1.1 (eventually also with encrypted connections HTTPS), plus, if available, HTTP/2, HTTP/3.
  • Logging: usually web servers have also the capability of logging some information, about client requests and server responses, to log files for security and statistical purposes.

A few other more advanced and popular features (only a very short selection) are the following ones.

Common tasks

A web server program, when it is running, usually performs several general tasks, (e.g.):[1]

  • starts, optionally reads and applies settings found in its configuration file(s) or elsewhere, optionally opens log file, starts listening to client connections / requests, etc.;
  • optionally tries to adapt its general behavior according to its settings, its current load conditions (number of client connections, etc.), depending on operating conditions, etc.;
  • manages client connection(s) (accepting new ones or closing the existing ones as required);
  • receives client requests (by reading HTTP messages):
  • executes or refuses requested HTTP method;
  • replies to client requests sending proper HTTP responses (e.g. requested resources or error messages) eventually verifying or adding HTTP headers to those sent by dynamic programs / modules;
  • optionally logs client requests and/or its responses (e.g. all, only errors, etc.);
  • optionally tries to adapt its ;
  • optionally generates statistics about web traffic managed and/or its performances;
  • etc.

Read request message

Web server programs are able:[5] [6] [7]

  • to read an HTTP request message;
  • to interpret it;
  • to verify its syntax;
  • to identify known HTTP headers and to extract their values from them.

Once that a request message has been decoded and verified, its values can be used to determine whether that request can be satisfied or not and so many other steps are performed to do so, including security checks, etc.

URL normalization

Web server programs usually perform some type of URL normalization (URL found in most HTTP request messages) in order:

  • to make resource path always a clean uniform path from root directory of website;
  • to lower security risks (e.g. by intercepting more easily attempts to access static resources outside the root directory of the website or to access to portions of path below wesite root directory that are forbidden or which require authorization, etc.);
  • to make path of web resources more recognizable by human beings, log analyzers, etc.

The term URL normalization refers to the process of modifying and standardizing a URL in a consistent manner. There are several types of normalization that may be performed including conversion of URLs domain name to lowercase, removal of "." and ".." path segments, adding trailing slashes to the non-empty path component, etc.

URL mapping

"URL mapping is the process by which a URL is analyzed to figure out what resource it is referring to, so that that resource can be returned to the requesting client. This process is performed with every request that is made to a web server, with some of the requests being served with a file, such as an HTML document, or a gif image, others with the results of running a CGI program, and others by some other process, such as a built-in module handler, a PHP document, or a Java servlet."[8]

In practice, web server programs that implement advanced features, beyond the simple static content serving, e.g.:

  • URL rewrite engine;
  • dynamic content serving;
  • etc.;

usually have to figure out how that URL has to be handled, e.g.:

  • as a URL redirection, a redirection to another URL;
  • as a static request of file content;
  • as a dynamic request of:
    • directory listing of files or other sub-directories contained in that directory;
    • other types of dynamic request in order to identify the program / module processor able to handle that kind of URL path and to pass to it other URL parts, i.e. usually path-info and query string variables;
  • etc.

One or more configuration files of web server may specify the mapping of parts of URL path (including filename extension, etc.) to a specific URL handler (file, directory, program, module, etc.).[9]

When a web server implements one or more of the above-mentioned advanced features then the path part of a valid URL may not always match an existing file system path under website directory tree (a file or a directory in file system) because it can refer to a virtual name of an internal or external module processor for dynamic requests.

URL path translation to file system

Web server programs are able to translate an URL path (all or part of it), that refers to a physical file system path, to an absolute path under the target website's root directory.[9]

Website's root directory may be specified by a configuration file or by some internal rule of the web server by using the name of the website which is the host part of the URL found in HTTP client request.[9]

Path translation to file system is done for the following types of web resources:

  • a local, usually non-executable, file (static request for file content);
  • a local directory (dynamic request: directory listing generated on the fly);
  • a program name (dynamic requests that is executed using CGI or SCGI interface and whose output is read by web server and resent to client who made the HTTP request).

The web server appends the path found in requested URL (HTTP request message) and appends it to the path of the (Host) website root directory. On an Apache server, this is commonly /home/www/website (on Unix machines, usually it is: /var/www/website). See the following examples of how it may result.

URL path translation for a static file request

Example of a static request of an existing file specified by the following URL:

http://www.example.com/path/file.html

The client's user agent connects to www.example.com and then sends the following HTTP/1.1 request:

GET /path/file.html HTTP/1.1
Host: www.example.com
Connection: keep-alive

The result is the local file system resource:

/home/www/www.example.com/path/file.html

The web server then reads the file, if it exists, and sends a response to the client's web browser. The response will describe the content of the file and contain the file itself or an error message will return saying that the file does not exist or its access is forbidden.

URL path translation for a directory request (without a static index file)

Example of an implicit dynamic request of an existing directory specified by the following URL:

http://www.example.com/directory1/directory2/

The client's user agent connects to www.example.com and then sends the following HTTP/1.1 request:

GET /directory1/directory2 HTTP/1.1
Host: www.example.com
Connection: keep-alive

The result is the local directory path:

/home/www/www.example.com/directory1/directory2/

The web server then verifies the existence of the directory and if it exists and it can be accessed then tries to find out an index file (which in this case does not exist) and so it passes the request to an internal module or a program dedicated to directory listings and finally reads data output and sends a response to the client's web browser. The response will describe the content of the directory (list of contained subdirectories and files) or an error message will return saying that the directory does not exist or its access is forbidden.

URL path translation for a dynamic program request

For a dynamic request the URL path specified by the client should refer to an existing external program (usually an executable file with a CGI) used by web server to generate dynamic content.[10]

Example of a dynamic request using a program file to generate output:

http://www.example.com/cgi-bin/forum.php?action=view&orderby=thread&date=2021-10-15

The client's user agent connects to www.example.com and then sends the following HTTP/1.1 request:

GET /cgi-bin/forum.php?action=view&ordeby=thread&date=2021-10-15 HTTP/1.1
Host: www.example.com
Connection: keep-alive

The result is the local file path of the program (in this example a PHP program):

/home/www/www.example.com/cgi-bin/forum.php

Web server executes that program passing to it the path-info and the query string action=view&orderby=thread&date=2021-10-15 so that the program knows what to do (in this case to return, as an HTML document, a view of forum entries ordered by thread since October, 15th 2021). Besides this web server reads data sent by that external program and resends that data to the client which made the request.

Manage request message

Once a request has been read, interpreted and verified, it has to be managed depending on its method, its URL and its parameters which may include values of HTTP headers.

In practice web server has to handle the request by using one of these response paths:[9]

  • if something in request was not acceptable (in status line, etc.), web server already sent an error response;
  • if request has a method (e.g. OPTIONS) that can be satisfied by general code of web server then a successful response is sent;
  • if URL requires authorization then an authorization error message is sent;
  • if URL maps to a redirection then a redirect message is sent;
  • if URL maps to a dynamic resource (a virtual path or a directory listing) then its handler (an internal module or an external program) is called and request parameters (query string, path info, etc.) are passed to it in order to allow it to reply to that request;
  • if URL maps to a static resource (usually a file on file system) then the internal static handler is called to send that file;
  • if request method is not known or if there is some other unacceptable condition (e.g. resource not found, internal server error, etc.) then an error response is sent.

Serve static content

PC clients communicating via network with a web server serving static content only.

If a web server program is capable of serving static content and it has been configured to do so, then it is able to send file content whenever a request message has a valid URL path matching (after URL mapping, URL translation and URL redirection) that of an existing file under the root directory of a website and file has attributes which match those required by internal rules of web server program.[9]

That kind of content is called static because usually it is not changed by web server when it is sent to clients and because it remains the same until it is modified (file modification) by some program.

NOTE: when serving static content only, a web server program usually does not change file contents of served websites (as they are only read and never written) and so it suffices to support only these HTTP methods:

  • OPTIONS
  • HEAD
  • GET

Response of static file content can be sped up by a file cache.

Directory index files

If a web server program receives a client request message with an URL whose path matches one of an existing directory and that directory is accessible and serving directory index file(s) is enabled then a web server program may try to serve the first of known (or configured) static index file names (a regular file) found in that directory; if no index file is found or other conditions are not met then an error message is returned.

Typical names for static index files are: index.html, index.htm, Default.htm, etc.

Regular files

If a web server program receives a client request message with an URL whose path matches the file name of an existing file and that file is accessible by web server program and its attributes match internal rules of web server program, then web server program can send that file to client.

Usually, for security reasons, most web server programs are pre-configured to serve only regular files or to avoid to use special file types like: device files, etc., along with symbolic links or hard links to them, in order to avoid potentially undesirable side effects when serving static web resources.[citation needed]

Serve dynamic content

PC clients communicating via network with a web server serving static and dynamic content.

If a web server program is capable of serving dynamic content and it has been configured to do so, then it is able to communicate with the proper internal module or external program (associated with the requested URL path) in order to pass to it parameters of client request; after that, web server program reads from it its data response (that it has generated, often on the fly) and then it resends it to the client program who made the request.[citation needed]

NOTE: when serving static and dynamic content, a web server program usually has to support also the following HTTP method in order to be able to safely receive data from client(s) and so to be able to host also websites with interactive form(s) that may send large data sets (e.g. lots of data entry, file uploads, etc.) to web server / external programs / modules, etc.:

  • POST

In order to be able to communicate with its internal modules and/or external programs, a web server program must have implemented one or more of the many available gateway interface(s) (see also Web Server Gateway Interfaces used for dynamic content).

The three standard and historical gateway interfaces are the following ones.

CGI
An external CGI program is run by web server program for each dynamic request, then web server program reads from it the generated data response and then resends it to client.
SCGI
An external SCGI program (it usually is a process) is started once by web server program or by some other program / process and then it waits for network connections; everytime there is a new request for it, web server program makes a new network connection to it in order to send request parameters and to read its data response, then network connection is closed.
FastCGI
An external FastCGI program (it usually is a process) is started once by web server program or by some other program / process and then it waits for a network connection which is established permanently by web server; through that connection are sent the request parameters and read data responses.
Directory listings
Directory listing dyamically generated by a web server.

A web server program may be capable to manage the dynamic generation (on the fly) of a directory index list of files and sub-directories.[11]

If a web server program is configured to do so and a requested URL path matches an existing directory and its access is allowed and no static index file is found under that directory then a web page (usually in HTML format), containing the list of files and/or subdirectories of above mentioned directory, is dynamically generated (on the fly). If it cannot be generated an error is returned.

Some web server programs allow the customization of directory listings by allowing the usage of a web page template (which is interpreted by web server) and / or by supporting the usage of dynamic index programs as CGIs, e.g. index.cgi, etc.

Usage of dynamically generated directory listings is usually avoided or limited to a few selected directories of a website because that generation takes much more OS resources than sending a static index page.

The main usage of directory listings is to allow the download of files (usually when their names, sizes, modification date-times, etc. may change randomly / frequently) as they are, without requiring to provide further information to requesting user.[citation needed]

Program or module processing

An external program or an internal module (processing unit) can execute some sort of application function that may be used to get data from or to store data to one or more data repositories, e.g.:[citation needed]

A processing unit can return any kind of web content, also by using data retrieved from a data repository, e.g.:[citation needed]

In practice whenever there is content that may vary, depending on one or more parameters contained in client request or in configuration settings, etc., then, usually, it is generated dynamically.

Send response message

Web server programs are able to send response messages as replies to client request messages.[5]

An error response message may be sent because a request message could not be successfully read or decoded or analyzed or executed.[6]

NOTE: the following sections are reported only as examples to help to understand what a web server, more or less, does; these sections are by any means neither exhaustive nor complete.

Error message

A web server program may reply to a client request message with many kinds of error messages, anyway these errors are divided mainly in two categories:

When an error response / message is received by a client browser, then if it is related to the main user request (e.g. an URL of a web resource such as a web page, etc.) then usually that error message is shown in some browser window / message.

URL authorization

A web server program may be able to verify whether the requested URL path:[14]

  • can be freely accessed by everybody;
  • requires a user authentication (request of user credentials, e.g. such as user name and password);
  • access is forbidden to some or all kind of users.

If the authorization / access rights feature has been implemented and enabled and access to web resource is not granted, then, depending on the required access rights, a web server program:

  • can deny access by sending a specific error message (e.g. access forbidden);
  • may deny access by sending a specific error message (e.g. access unauthorized) that usually forces the client browser to ask human user to provide required user credentials; if authentication credentials are provided then web server program verifies and accepts or rejects them.

URL redirection

A web server program may have the capability of doing URL redirections to new URLs (new locations) which consists in replying to a client request message with a response message containing a new URL suited to reach a valid or an existing web resource (client should redo the request with the new URL).[15]

URL redirection of location is used:[15]

  • to fix a directory name by adding a final slash '/';[11]
  • to give a new URL for a no more existing URL path to a new path where that kind of web resource can be found;
  • etc.

Example 1: a URL path points to a directory name but it does not have a final slash '/' so web server sends a redirect to client in order to instruct it to redo the request with the fixed path name.[11]

From:
  /directory1/directory2
To:
  /directory1/directory2/

Example 2: a whole set of documents has been moved inside website in order to reorganize their file system paths.

From:
  /directory1/directory2/2021-10-08/
To:
  /directory1/directory2/2021/10/08/

Example 3: a whole set of documents has been moved to a new website and now it is mandatory to use secure HTTPS connections to access them.

From:
  http://www.example.com/directory1/directory2/2021-10-08/
To:
  https://docs.example.com/directory1/2021-10-08/

Above examples are only a few of the possible kind of redirections.

Successful message

A web server program is able to reply to a valid client request message with a successful message, optionally containing requested web resource data.[16]

If web resource data is sent back to client, then it can be static content or dynamic content depending on how it has been retrieved (from a file or from the output of some program / module).

Content cache

In order to speed up web server responses by lowering average HTTP response times and HW resources used, many popular web servers implement one or more content caches, each one specialized in a content category.[17] [18]

Content is usually cached by its origin, e.g.:

File cache

Historically, static contents found in files have been stored on disks that were considered very slow when compared with RAM speed and so, since early OSs, OS file cache sub-systems were developed to speed up I/O operations for frequently accessed files.

As serving static files is - or at least was - considered a time critical task regarding performances, often using only the OS file cache may not always suffice to reach the desired maximum number of requests/responses per second under certain conditions, so since the early years of development of web servers, the problem was studied / researched.[19] [20]

In practice, nowadays, many popular / high performance web servers also have their own second-level file cache, tailored for a web server usage and using their specific implementation and parameters.[21] [22] [23]

Dynamic cache

Dynamic content, output by a module, an external program, etc., may not always change very frequently (given unique keys / parameters) and so, maybe for a while (e.g. from 1 second to several hours or more), the resulting output can be cached in RAM or even on a fast disk.[24]

The typical usage of a dynamic cache is when a website has dynamic web pages about news, weather, images, maps, etc. that do not change frequently (e.g. every n minutes) and that are accessed by a huge number of clients per minute / hour; in those cases it is useful to return cached content too (without calling the internal module or the external program) because clients often do not have an updated copy of the requested content in their browser caches.[25]

Anyway, in most cases those kind of caches are implemented by external servers (e.g. reverse proxy) or by storing dynamic data output in separate computers, managed by specific applications (e.g. memcached), etc., in order to not compete for HW resources (CPU, RAM, disks, etc.) with web server(s).[26] [27]

Kernel-mode and user-mode web servers

A web server software can be either incorporated into the OS and executed in kernel space, or it can be executed in user space (like other regular applications).

Web servers that run in kernel mode (usually called kernel space web servers) can have direct access to kernel resources and so they can be, in theory, faster than those running in user mode; anyway there are disadvantages in running a web server in kernel mode, e.g.: difficulties in developing (debugging) software whereas run-time critical errors may lead to serious problems in OS kernel.

Web servers that run in user-mode have to ask the system for permission to use more memory or more CPU resources. Not only do these requests to the kernel take time, but they are not always satisfied because the system reserves resources for its own usage and has the responsibility to share hardware resources with all the other running applications. Executing in user mode can also mean useless buffer copies which are another limitation for user-mode web servers.

Nowadays almost all web server software is executed in user mode (because many of above small disadvantages have been overcome by faster hardware, new OS versions, much faster OS system calls and new web server software). See also comparison of web server software to discover which of them run in kernel mode or in user mode (also referred as kernel space or user space).

Performances

To improve the user experience (on client / browser side), a web server should reply quickly (as soon as possible) to client requests; unless content response is throttled (by configuration) for some type of files (e.g. big files, etc.), also returned data content should be sent as fast as possible (high transfer speed).

In other words, a web server should always be very responsive, even under high load of web traffic, in order to keep total user's wait (sum of browser time + network time + web server response time) for a response as low as possible.

Performance metrics

For web server software, main key performance metrics (measured under vary operating conditions) usually are at least the following ones (i.e.):[28] [29]

  • number of requests per second (RPS, similar to QPS, depending on HTTP version and configuration, type of HTTP requests, etc.);
  • number of connections per second (CPS), is the number of connections per second accepted by web server (useful when using HTTP/1.0 or HTTP/1.1 with a very low limit of requests / responses per connection, i.e. 1 .. 20);
  • network latency + response time for each new client request; usually benchmark tool shows how many requests have been satisfied within a scale of time laps (i.e. within 1ms, 3ms, 5ms, 10ms, etc.) and / or the shortest, the average and the longest response time;
  • throughput of responses, in bytes per second.

Among the operating conditions, the number (1 .. n) of concurrent client connections used during a test is an important parameter because it allows to correlate the concurrency level supported by web server with results of the tested performance metrics.

Software efficiency

The specific web server software design and model adopted (i.e.):

  • single process or multi-process;
  • single thread (no thread) or multi-thread for each process;
  • etc.;

and other programming techniques, such as (i.e.):

used to implement a web server program, can bias a lot the performances and in particular the scalability level that can be reached under heavy load or when using high end hardware (many CPUs, disks, etc.).

In practice some web server software models may require more OS resources (i.e. more CPUs, more RAM, etc.) than others to be able to work well and so to reach target performances.

Operating conditions

There are many operating conditions that can affect the performances of a web server; performance values may vary depending on (i.e.):

  • the settings of web server (including the fact that log file is or is not enabled, etc.);
  • the HTTP version used by client requests;
  • the average HTTP request type (method, length of HTTP headers and optional body, etc.);
  • whether the requested content is static or dynamic;
  • whether the content is cached or not cached (by server and/or by client);
  • whether the content is compressed on the fly (when transferred), pre-compressed (i.e. when a file resource is stored on disk already compressed so that web server can send that file directly to the network with the only indication that its content is compressed) or not compressed at all;
  • whether the connections are or are not encrypted;
  • the average network speed between web server and its clients;
  • the number of active TCP connections;
  • the number of active processes managed by web server (i.e. including external CGI programs, etc.);
  • the hardware and software limitations or settings of the OS of the computer(s) on which the web server runs;
  • etc.

Benchmarking

Performances of a web server are typically benchmarked by using one or more of the available automated load testing tools.

Load limits

A web server (program installation) usually has pre-defined load limits for each combination of operating conditions, also because it is limited by OS resources and because it can handle only a limited number of concurrent client connections (usually between 2 and several tens of thousands for each active web server process, see also the C10k problem and the C10M problem).

When a web server is near to or over its load limits, it gets overloaded and so it may become unresponsive.

Causes of overload

At any time web servers can be overloaded due to one or more of the following causes (i.e.).

  • Excess legitimate web traffic. Thousands or even millions of clients connecting to the website in a short amount of time, e.g., Slashdot effect.
  • Distributed Denial of Service attacks. A denial-of-service attack (DoS attack) or distributed denial-of-service attack (DDoS attack) is an attempt to make a computer or network resource unavailable to its intended users.
  • Computer worms that sometimes cause abnormal traffic because of millions of infected computers (not coordinated among them).
  • XSS worms can cause high traffic because of millions of infected browsers or web servers.
  • Internet bots Traffic not filtered/limited on large websites with very few resources (bandwidth, etc.).
  • Internet (network) slowdowns (due to packet losses, etc.) so that client requests are served more slowly and the number of connections increases so much that server limits are reached.
  • Web servers (computers) partial unavailability. This can happen because of required or urgent maintenance or upgrade, hardware or software failures such as back-end (e.g. database) failures, etc.; in these cases the remaining web servers may get too much traffic and become overloaded.

Symptoms of overload

The symptoms of an overloaded web server are usually the following ones (i.e.).

  • Requests are served with (possibly long) delays (from 1 second to a few hundred seconds).
  • The web server returns an HTTP error code, such as 500, 502,[30][31] 503,[32] 504,[33] 408, or even an intermittent 404.
  • The web server refuses or resets (interrupts) TCP connections before it returns any content.
  • In very rare cases, the web server returns only a part of the requested content. This behavior can be considered a bug, even if it usually arises as a symptom of overload.

Anti-overload techniques

To partially overcome above average load limits and to prevent overload, most popular websites use common techniques like the following ones (i.e.).

  • Tuning OS parameters for hardware capabilities and usage.
  • Tuning web server(s) parameters to improve security, performances, etc.
  • Deploying web cache techniques (not only for static contents but, whenever possible, for dynamic contents too).
  • Managing network traffic, by using:
    • Firewalls to block unwanted traffic coming from bad IP sources or having bad patterns;
    • HTTP traffic managers to drop, redirect or rewrite requests having bad HTTP patterns;
    • Bandwidth management and traffic shaping, in order to smooth down peaks in network usage.
  • Using different domain names, IP addresses and computers to serve different kinds (static and dynamic) of content; the aim is to separate big or huge files (download.*) (that domain might be replaced also by a CDN) from small and medium-sized files (static.*) and from main dynamic site (maybe where some contents are stored in a backend database) (www.*); the idea is to be able to efficiently serve big or huge (over 10 – 1000 MB) files (maybe throttling downloads) and to fully cache small and medium-sized files, without affecting performances of dynamic site under heavy load, by using different settings for each (group) of web server computers, e.g.:
    • http://download.example.com
    • http://static.example.com
    • http://www.example.com
  • Using many web servers (computers) that are grouped together behind a load balancer so that they act or are seen as one big web server.
  • Adding more hardware resources (i.e. RAM, fast disks) to each computer.
  • Using more efficient computer programs for web servers (see also: software efficiency, etc.).
  • Using the most efficient Web Server Gateway Interface to process dynamic requests (spawning one or more external programs everytime a dynamic page is retrieved, kills performances).
  • Using other programming techniques and workarounds, especially if dynamic content is involved, to speed up the HTTP responses (i.e. by avoiding dynamic calls to retrieve objects, such as style sheets, images, etc., that never change or change very rarely, by copying that content to static files once and then keeping them synchronized with dynamic content, etc.).
  • Using latest efficient versions of HTTP (e.g. beyond using common HTTP/1.1 also by enabling HTTP/2 and maybe HTTP/3 too, whenever available web server software has reliable support for the latter two protocols) in order to reduce a lot the number of TCP/IP connections started by each client and the size of data exchanged (because of more compact HTTP headers representation, data compression, etc.); anyway, even if newer HTTP (2 and 3) protocols usually generate less network traffic for each request / response data, they may require more OS resources (i.e. RAM and CPU) used by web server software (because of encrypted data, lots of stream buffers and other implementation details); besides this, HTTP/2 and maybe HTTP/3 too, depending also on settings of web server and client program, may not be the best options for data upload of big or huge files at very high speed because their data streams are optimized for concurrency of requests and so, in many cases, using HTTP/1.1 TCP/IP connections may lead to better results / higher upload speeds (your mileage may vary).[34][35]

Market share

The LAMP (software bundle) (here additionally with Squid), composed entirely of free and open-source software, is a high performance and high-availability heavy duty solution for a hostile environment
Chart:
Market share of all sites of major web servers 2005–2021


October 2021

Below are the latest statistics of the market share of all sites of the top web servers on the Internet by Netcraft October 2021 Web Server Survey.

Product Vendor Percent
nginx NGINX, Inc. 34.95%
Apache Apache 24.63%
OpenResty OpenResty Software Foundation 6.45%
Cloudflare Server Cloudflare, Inc. 4.87%
IIS Microsoft 4.00% (*)
GWS Google 4.00% (*)

All other web servers are used by less than 22% of all websites.

NOTE: (*) percentage rounded to integer number, because its decimal values are not publicly reported by source page (only its rounded value is reported in graph).

February 2021

Below are the latest statistics of the market share of all sites of the top web servers on the Internet by Netcraft February 2021 Web Server Survey.

Product Vendor Percent
nginx NGINX, Inc. 34.54%
Apache Apache 26.32%
IIS Microsoft 6.5%
OpenResty OpenResty Software Foundation 6.36%
Cloudflare Server Cloudflare, Inc. 5.0%
GWS Google 3.90%

All other web servers are used by less than 18% of all websites.

February 2020

Below are the latest statistics of the market share of all sites of the top web servers on the Internet by Netcraft February 2020 Web Server Survey.

Product Vendor Percent
nginx NGINX, Inc. 36.48%
Apache Apache 24.5%
IIS Microsoft 14.21%
OpenResty OpenResty Software Foundation 4.00%
GWS Google 3.18%
Cloudflare Server Cloudflare, Inc. 3.0%

All other web servers are used by less than 15% of all websites.

February 2019

Below are the latest statistics of the market share of all sites of the top web servers on the Internet by Netcraft February 2019 Web Server Survey.

Product Vendor Percent
IIS Microsoft 28.42%
Apache Apache 26.16%
nginx NGINX, Inc. 25.34%
GWS Google 1.66%

All other web servers are used by less than 19% of all websites.

February 2018

Below are the latest statistics of the market share of all sites of the top web servers on the Internet by Netcraft February 2018 Web Server Survey.

Product Vendor Percent
IIS Microsoft 34.50%
Apache Apache 27.45%
nginx NGINX, Inc. 24.32%
GWS Google 1.20%

All other web servers are used by less than 13% of all websites.

February 2017

Below are the latest statistics of the market share of all sites of the top web servers on the Internet by Netcraft February 2017 Web Server Survey.

Product Vendor January 2017 Percent February 2017 Percent Change Chart color
IIS Microsoft 821,905,283 45.66% 773,552,454 43.16% −2.50 red
Apache Apache 387,211,503 21.51% 374,297,080 20.89% −0.63 black
nginx NGINX, Inc. 317,398,317 17.63% 348,025,788 19.42% 1.79 green
GWS Google 17,933,762 1.00% 18,438,702 1.03% 0.03 blue

All other web servers are used by less than 15% of all websites.

February 2016

Below are the latest statistics of the market share of all sites of the top web servers on the Internet by Netcraft February 2016 Web Server Survey.

Product Vendor January 2016 Percent February 2016 Percent Change Chart color
Apache Apache 304,271,061 33.56% 306,292,557 32.80% 0.76 black
IIS Microsoft 262,471,886 28.95% 278,593,041 29.83% 0.88 red
nginx NGINX, Inc. 141,443,630 15.60% 137,459,391 16.61% −0.88 green
GWS Google 20,799,087 2.29% 20,640,058 2.21% −0.08 blue

All other web servers are used by less than 19% of all websites.

Apache, IIS and Nginx are the most used web servers on the World Wide Web.[36][37]

See also

References

  1. ^ a b c Nancy J. Yeager; Robert E. McGrath (1996). Web Server Technology. ISBN 1-55860-376-X. Retrieved 22 January 2021.
  2. ^ William Nelson; Arvind Srinivasan; Murthy Chintalapati (2009). Sun Web Server: The Essential Guide. ISBN 978-0-13-712892-1. Retrieved 14 October 2021.
  3. ^ Zolfagharifard, Ellie (24 November 2018). "'Father of the web' Sir Tim Berners-Lee on his plan to fight fake news". The Telegraph. London. ISSN 0307-1235. Retrieved 1 February 2019.
  4. ^ "History of Computers and Computing, Internet, Birth, The World Wide Web of Tim Berners-Lee". history-computer.com. Retrieved 1 February 2019.
  5. ^ a b "Client/Server Messaging". RFC 7230, HTTP/1.1: Message Syntax and Routing. pp. 7–8. sec. 2.1. doi:10.17487/RFC7230. RFC 7230.
  6. ^ a b "Handling Incomplete Messages". RFC 7230, HTTP/1.1: Message Syntax and Routing. p. 34. sec. 3.4. doi:10.17487/RFC7230. RFC 7230.
  7. ^ "Message Parsing Robustness". RFC 7230, HTTP/1.1: Message Syntax and Routing. pp. 34–35. sec. 3.5. doi:10.17487/RFC7230. RFC 7230.
  8. ^ R. Bowen (29 September 2002). "URL Mapping" (PDF). Apache software foundation. Retrieved 15 November 2021.
  9. ^ a b c d e "Mapping URLs to Filesystem Locations". Apache: HTTPd server project. 2021. Retrieved 19 October 2021.
  10. ^ "Dynamic Content with CGI". Apache: HTTPd server project. 2021. Retrieved 19 October 2021.
  11. ^ a b c ASF Infrabot (22 May 2019). "Directory listings". Apache foundation: HTTPd server project. Retrieved 16 November 2021.
  12. ^ "Client Error 4xx". RFC 7231, HTTP/1.1: Semantics and Content. p. 58. sec. 6.5. doi:10.17487/RFC7231. RFC 7231.
  13. ^ "Server Error 5xx". RFC 7231, HTTP/1.1: Semantics and Content. pp. 62-63. sec. 6.6. doi:10.17487/RFC7231. RFC 7231.
  14. ^ "Introduction". RFC 7235, HTTP/1.1: Authentication. p. 3. sec. 1. doi:10.17487/RFC7235. RFC 7235.
  15. ^ a b "Response Status Codes: Redirection 3xx". RFC 7231, HTTP/1.1: Semantics and Content. pp. 53–54. sec. 6.4. doi:10.17487/RFC7231. RFC 7231.
  16. ^ "Successful 2xx". RFC 7231, HTTP/1.1: Semantics and Content. pp. 51-54. sec. 6.3. doi:10.17487/RFC7231. RFC 7231.
  17. ^ "Caching Guide". Apache: HTTPd server project. 2021. Retrieved 9 December 2021.
  18. ^ "NGINX Content Caching". F5 NGINX. 2021. Retrieved 9 December 2021.
  19. ^ Evangelos P. Markatos (1996). "Main Memory Caching of Web Documents". Computer networks and ISDN Systems. Retrieved 9 December 2021.
  20. ^ B.V. Pawar; J.B. Patil (23 September 2010). "A New Intelligent Predictive Caching Algorithm for Internet Web Servers". Oriental Journal of Computer Science and Technology. Retrieved 9 December 2021.
  21. ^ "IPlanet Web Server 7.0.9: file-cache". Oracle. 2010. Retrieved 9 December 2021.
  22. ^ "Apache Module mod_file_cache". Apache: HTTPd server project. 2021. Retrieved 9 December 2021.
  23. ^ "HTTP server: configuration: file cache". GNU. 2021. Retrieved 9 December 2021.
  24. ^ "Apache Module mod_cache_disk". Apache: HTTPd server project. 2021. Retrieved 9 December 2021.
  25. ^ "What is dynamic cache?". Educative. 2021. Retrieved 9 December 2021.
  26. ^ "Dynamic Cache Option Tutorial". Siteground. 2021. Retrieved 9 December 2021.
  27. ^ Arun Iyengar; Jim Challenger (2000). "Improving Web Server Performance by Caching Dynamic Data" (PDF). Usenix. Retrieved 9 December 2021.
  28. ^ Omid H. Jader; Subhi R. M. Zeebaree; Rizgar R. Zebari (12 December 2019). "A State of Art Survey For Web Server Performance Measurement And Load Balancing Mechanisms" (PDF). IJSTR: INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH. Retrieved 4 November 2021.
  29. ^ Jussara M. Almeida; Virgilio Almeida; David J. Yates (7 July 1997). "WebMonitor: a tool for measuring World Wide Web server performance". First Monday. Retrieved 4 November 2021.
  30. ^ Fisher, Tim; Lifewire. "Getting a 502 Bad Gateway Error? Here's What to Do". Lifewire. Retrieved 1 February 2019.
  31. ^ "What is a 502 bad gateway and how do you fix it?". IT PRO. Retrieved 1 February 2019.
  32. ^ Fisher, Tim; Lifewire. "Getting a 503 Service Unavailable Error? Here's What to Do". Lifewire. Retrieved 1 February 2019.
  33. ^ Fisher, Tim; Lifewire. "Getting a 504 Gateway Timeout Error? Here's What to Do". Lifewire. Retrieved 1 February 2019.
  34. ^ many (24 January 2021). "Slow uploads with HTTP/2". github. Retrieved 15 November 2021.
  35. ^ Junho Choi (24 August 2020). "Delivering HTTP/2 upload speed improvements". Cloudflare. Retrieved 15 November 2021.
  36. ^ Vaughan-Nichols, Steven J. "Apache and IIS' Web server rival NGINX is growing fast". ZDNet. Retrieved 1 February 2019.
  37. ^ Hadi, Nahari (2011). Web commerce security: design and development. Krutz, Ronald L. Indianapolis: Wiley Pub. ISBN 9781118098899. OCLC 757394142.