Effect of Database Server Arrangement To The Performance of Load Balancing Systems

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Effect of Database Server Arrangement to the

Performance of Load Balancing Systems

Jian-Bo Chen1, Tsang-Long Pao2, and Kun-Dah Lee2


1
Department of Information and Telecommunications Engineering
Ming Chuan University, Taiwan
[email protected]
2
Department of Computer Science and Engineering
Tatung University, Taiwan
[email protected], [email protected]

Abstract. Load balancing architectures can be used to solve overloading prob-


lems on web sites. Recently, most of the contents of web pages are retrieved from
a database. Thus, the arrangement of the database server in a load balancing ar-
chitecture is one of the most important factors that impact the overall perform-
ance. In this paper, we analyze the performance of load balancing architectures
with a centralized database arrangement and a decentralized database arrange-
ment. With a series of experiments, we can find the most appropriate arrange-
ment for the database server. The experimental results show that when the
number of client requests is small, using the decentralized arrangement results in
a lower average response time because no network communications are needed.
But when the number of client requests is large, using the centralized database
architecture can achieve higher performance because the database server can
share the load of web servers.

Keywords: load balance, centralized arrangement, decentralized arrangement.

1 Introduction
With the popularity of the Internet, the number of Internet users is increasing rapidly.
How to create attractive web pages and avoid server overloads are important issues for
administrators. A popular web site needs powerful hardware to support its services. But
even if the performance of a single server is improved, the requests for the web site may
increase more dramatically. Thus, load balancing architectures are probably the most
appropriate solutions for these kinds of web sites [1,2]. In load balancing architectures,
the administrator can easily add or remove the backend servers when needed. The
flexibility and availability obtained from the load balancing architecture can improve
the overall performance of the web site.
Most popular web sites create dynamic web pages for their clients. The contents of
dynamic web pages are always retrieved from the database. All of the data are stored in
the database server. When a client requests to access a web page whose data stored in
the database, the web server need to generates the contents by retrieving the required
data from the database, and responds with a web page to the client. Owing to how the

A. Hua and S.-L. Chang (Eds.): ICA3PP 2009, LNCS 5574, pp. 146–155, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Effect of Database Server Arrangement to the Performance of Load Balancing Systems 147

data are stored in the database server, the web pages can be diverse, and the web sites
can provide real time information to clients.
Generally speaking, the static web pages need more I/O processing, and the dynamic
web pages need more CPU resources. In the load balanced web server architecture with
multiple backend web servers, to alleviate the overload of web sites for dynamic web
pages, the arrangement of the database server is an important issue. In this paper, we
will discuss two kinds of arrangement for database servers: centralized and decentral-
ized arrangement [3,4]. In the centralized database arrangement architecture, the da-
tabase server is a stand-alone server which only serves the database function for all the
backend web servers. In this architecture, the web server retrieves the contents from the
centralized database server and generates the dynamic contents, which it then transfers
to the clients. The other architecture is the decentralized database arrangement archi-
tecture. In this architecture, the backend servers act both as web server and the database
server. The data are stored in each backend server. When clients request dynamic web
pages, the web server retrieves the data from itself because it is also the database server.
The web server generates the dynamic web pages and responds to the clients.
In this paper, we will discuss the performance difference between these two kinds of
arrangement under various load condition. Any load balancing algorithm, such as
Round Robin, least connection, or hash can be used in the load balancing system. Our
focus is on the comparison of these two kinds of arrangement, making the load bal-
ancing algorithm an unimportant issue. Thus, in our experimental environment, we use
the simplest algorithm, Round Robin algorithm, as our load balancing algorithm. The
backend servers can be homogeneous or heterogeneous. In a homogeneous environ-
ment, all the backend servers have the same hardware specifications; that is, they have
the same processing power. In a heterogeneous environment, the hardware specifica-
tions are different. Their processing powers are different. In order to avoid the influence
of this factor, our environment uses homogeneous backend servers. We use some cli-
ents to issue a large amount of requests to the load balancing system to evaluate the
performance of both the centralized and decentralized arrangements. [5]
This paper is organized as follows: Section 2 introduces some related works about
load balancing systems. Section 3 addresses the system architecture and experimental
environment. Section 4 describes the experimental results. Section 5 discusses the
conclusion.

2 Related Works
There are different kinds of load balancing architectures, including hardware-based,
software-based, and hybrid architectures [6]. Different load balancing architectures can
be used in different environments. We will discuss some load balancing architectures in
this section.

2.1 DNS-Based Load Balance

In this architecture, there are a number of backend servers works together as a cluster of
servers. An authoritative DNS server is setup as the cluster DNS server. This distrib-
uted web server architecture uses request routing mechanisms on the cluster side, and
148 J.-B. Chen, T.-L. Pao, and K.-D. Lee

no additional action needed on the client side. Architecture transparency is typically


obtained through a single virtual interface to the outside world, at least at the URL
level. The cluster DNS server perform one to many mapping to translate the symbolic
site domain name (URL) to the IP addresses of the backend web servers. This process
allows the cluster DNS server to implement many strategies to select the appropriate
server and distribute client requests. The DNS system, however, has a limited control
on the requests reaching the web cluster. Between the client and the cluster DNS server,
many intermediate name servers may cache the logical-name-to-IP-address mapping to
reduce network traffic. Moreover, the client will also cache the result of the address
resolution.
In addition to provide the IP address of a node, the DNS also specifies a valid period
(Time-To-Live, or TTL) for caching the result of the logical name resolution [7]. When
the TTL expires, the address-mapping request is forwarded to the cluster DNS to obtain
the IP address map again; otherwise, an intermediate name server will handle the
request.
If an intermediate name server holds a valid mapping for the cluster URL, it resolves
the address-mapping request without forwarding it to an upper level name server.
Otherwise, the address request reaches the cluster DNS, which selects the IP address of
one of the backend web servers [8]. The URL-to-IP-address mapping and the prede-
fined TTL value are forwarded to all intermediate name servers along the path and to
the client.

2.2 HTTP Redirection-Based Load Balance

In this architecture, a centralized dispatcher receives all incoming requests and dis-
tributes them among the web server nodes through the HTTP redirection mechanism
[9]. The dispatcher redirects a request by specifying the redirection status code [10] in
the response, indicating in its header the server address where the client can get the
desired document. Such redirection is largely transparent; at most, users might notice
an increased response time. Unlike most dispatcher-based solutions, HTTP redirection
does not require IP address modification of packets reaching or leaving the web server
system. HTTP redirection can be implemented through one of the two techniques de-
scribed below.
Server-state-based dispatching is used in the Distributed Server Groups architecture
[11]. This adds new methods to HTTP protocol to administer the web system and ex-
change messages between the dispatcher and the servers. Since the dispatcher must be
aware of the server loading, each server periodically reports the number of processes in
its run queue and the number of received requests per second. The dispatcher then
selects the least-loaded server.
Location-based dispatching is used in the Cisco Systems’ Distributed Director [12]
appliance. This provides two dispatching modes. The first applies the DNS-based
approach with client and server state information. The other uses the HTTP redirection.
The Distributed Director estimates the proximity of the client’s server and the node
availability with algorithms that apply to the DNS-based solution. Client requests are
redirected to the server that is evaluated as most suitable for each request at a certain
time.
Effect of Database Server Arrangement to the Performance of Load Balancing Systems 149

2.3 Dispatcher-Based Load Balance

To centralize request scheduling and completely control client-request routing, a net-


work component in the web server system acts as a dispatcher. Request routing among
servers is transparent. Unlike DNS-based architectures, which deal with addresses at
the URL level, the dispatcher has a single, virtual IP address (IP-SVA).
The dispatcher uniquely identifies each backend server in the system through a
private address that can be at different protocol levels, depending on the architecture.
The dispatcher-based architectures is differentiated by routing mechanism, the packet
single-rewriting and packet double-rewriting [6].
Packet Single-Rewriting: In some architectures, the dispatcher reroutes cli-
ent-to-server packets by rewriting their IP address, such as in the basic TCP router
mechanism. The web server cluster consists of a group of backend servers and a load
balancer that acts as an IP address dispatcher. All HTTP client requests reach the dis-
patcher because the IP-SVA is the only public address. The dispatcher selects a backend
server for each HTTP request through a Round Robin algorithm and forwards the packet
by rewriting the destination IP address of each incoming packet. The dispatcher replaces
its IP-SVA with the IP address of the selected server. Because a request consists of
several IP packets, the dispatcher tracks the source IP address for every established TCP
connection in an address table. The dispatcher can thereby route packets of the same
connection to the same web server. Furthermore, the web server must replace its IP
address with the dispatcher’s IP-SVA before sending the response packets to the client.
Therefore, the client is not aware of that its requests are handled by a hidden web server.
Packet Double-Rewriting: This mechanism also relies on a centralized dispatcher to
schedule and control client requests but differs from packet single-rewriting in the
source address modification of all packets between server and client. Packet dou-
ble-rewriting is based on Network Address Translation mechanism published by the
Internet Engineering Task Force. The dispatcher receives a client request, selects
the web server, modifies the IP header of each incoming packet, and also modifies the
outgoing packet that composes the requested document.

3 System Architecture and Experimental Environment

3.1 System Architecture

In the centralized arrangement architecture, the web servers and the database server are
connected to an Ethernet switch. The web servers are dedicated to server the web
function and the database server provides the database function. In order to avoid that
the database server becomes the bottleneck of the entire system, we install the database
in a powerful hardware [13,14]. There is a log server which is not part of the load
balancing system and is used to collect the experimental data for further analysis. The
dispatcher is used to dispatch the client requests to the backend web servers according
to the Round Robin algorithm. When the client issues a request to the dispatcher, the
dispatcher selects a web server and redirects the request to the appropriate web server.
Our focus is on the web server retrieving contents from the database to generate the
dynamic web pages and responds to the client. The architecture for the centralized
database arrangement is shown in Fig. 1.
150 J.-B. Chen, T.-L. Pao, and K.-D. Lee

Fig. 1. Centralized database arrangement architecture

In the decentralized arrangement architecture, the backend server acts as both web
server and database server. Every backend server has a copy of the contents stored in its
own database. When the client issues a request to the dispatcher, the dispatcher selects a
backend web server and redirects the request to that server. When the web server wants
to generate the dynamic web pages, it retrieves the contents from its own database. The
retrieval of data does not need network transmission time. Theoretically the perform-
ance in the decentralized arrangement should be better than the centralized arrange-
ment. This is true only when the requests is under a certain amount. If the amount of
client requests is larger than a threshold, however, the performance for the centralized
arrangement is better than the decentralized arrangement. The experimental results will
illustrate this situation in Section 4. The architecture for the decentralized database
arrangement is shown in Fig. 2.
The comparisons between these two kinds of arrangement are shown in Table 1.
Each architecture has its own advantages and disadvantages. in the centralized ar-
rangement, only one database server license is required while in the decentralized case
with N backend servers, we need N database server licenses. In terms of maintenance
consideration, because the data from the database for centralized arrangement is stored
in only one database server, the administrator only needs to maintain one copy of the
contents. But when maintaining a decentralized arrangement, the administrator should
make N copies of the contents for each backend server. On the other hand, if N backend
servers are required, we need only N servers in the decentralized case and N+1 servers
in the centralized case. Furthermore, the centralized arrangement will incur a single
point of failure while the decentralized arrangement will not. The comparisons and
performance evaluations will be shown in Section 4.
Effect of Database Server Arrangement to the Performance of Load Balancing Systems 151

3. request Web server & Database

4. response

1. request
Web server & Database

.
2. redirect
client

.
Dispatcher

Web server & Database

Fig. 2. Decentralized database arrangement architecture

Table 1. Comparisons between two architectures

Database Ease of Single Point


Architecture No. of Servers
Licenses Maintenance of Failure
Centralized 1 Easy N+1 Yes
Decentralized N Difficult N No

3.2 Experimental Environment

As shown in Fig. 3, there are four modules in our experiment setup: dispatcher, web
server, database server and clients. A log server is setup to collect the experimental
data. In order to issue the requests at the same time for each client, the clients must be
synchronized first using the Network Time Protocol. After all the clients are synchro-
nized, the clients will issue a requests for dynamic contents at a pre-determined time.
The web server then retrieves contents from the database server in both centralized or
decentralized arrangement. The log server records the start and end time for each re-
quest. When finishing the experiment, all the records are stored in the log server. We
can analyze the experimental results from these records to discover which arrangement
achieves better performance.
Table 2 is the list of our experimental hardware. Our focus is to find out the best
database arrangement in the load balancing system. Thus, we use four web servers to
share the load. In the centralized arrangement, only one database server is needed. To
152 J.-B. Chen, T.-L. Pao, and K.-D. Lee

Fig. 3. Experimental modules

Table 2. Hardware specification lists

Device CPU RAM OS No.

Web Server Intel Pentium 3.0GHz 2G FreeBSD7.0 4

Database Server Xeon1.6 GHz*4 16G FreeBSD7.0 1

Log Server Xeon 1.6 GHz 2G FreeBSD7.0 1

Client Intel Pentium 3.40GHz 2G FreeBSD7.0 20

avoid the database server becoming the bottleneck of the load balancing system, we use
powerful hardware to serve as the database server. We use twenty clients in our ex-
periments to generate requests we need to the load balancing system.

4 Experimental Results
In this section, we will describe and analyze the experimental results. The performance
comparison focuses on the average response time. The clients issue requests and record
the response time. The more requests the clients issue, the longer the average response
time. Furthermore, we want to know the performance of the centralized and
Effect of Database Server Arrangement to the Performance of Load Balancing Systems 153

Fig. 4. The comparison for three backend servers

Fig. 5. The comparison for four backend servers

decentralized arrangements. Figure 4 shows the experimental results when we adopt


three web servers, both with centralized and decentralized arrangements. In this figure,
we can see that when the requests increase, the average response time is also increase.
But, before 880 requests per second, we observe that the decentralized arrangement
achieves a lower response time than the centralized arrangement. In Fig. 5, we can see
similar results; when the requests are less than 920, the performance for the decen-
tralized arrangement is better than the centralized arrangement. However, when the
client requests increase, the experimental results change. When the clients’ requests
154 J.-B. Chen, T.-L. Pao, and K.-D. Lee

increase to a certain number, such as 880 in the three web server experiment and 920 in
the four web server experiment, the performance of the centralized arrangement is
better than the decentralized arrangement.
The reason for this phenomenon is that when the requests are less than a threshold,
the loading of both the web servers and database server are light. In the centralized
arrangement there are network communications between the web servers and the da-
tabase server, so the average response time is higher than that of the decentralized
arrangement. But when the requests from clients are larger than the threshold, the
loading of both web servers and the database server become heavy. In the decentralized
arrangement, the backend server acts as both the web server and the database server, so
the loading increases more rapidly. The performance of the centralized arrangement is
better than the decentralized arrangement when there are huge amount of requests
because the database server can share the loads of web servers.

5 Conclusion
Web sites with dynamic contents are more attractive than those with static contents. In
order to generate the dynamic web pages, the web server must retrieve the contents
from the database server. The arrangement of the database server acts as an important
factor that influences the performance of the load balancing system. The centralized
arrangement and decentralized arrangement architectures are compared in this paper.
From the experimental results, we can see that when the requests from clients are under
a threshold, the performance of the decentralized arrangement is better that of the
centralized arrangement because of the overhead of network communications. But
when the number of requests are larger than the threshold, the performance of the
centralized arrangement becomes better because the database server shares the loads of
web servers. Based on these experimental results, the administrator can chose a suitable
arrangement for his load balancing system.

References
1. Pao, T.-L., Chen, J.-B.: Capacity Based Load Balancing Scheme for Fair Request Dis-
patching. Asian Journal of Information Technology 5(11), 1284–1290 (2006)
2. Pao, T.-L., Chen, J.-B.: The Scalability of Heterogeneous Dispatcher-Based Web Server
Load Balancing Architecture. Parallel and Distributed Computing, Applications and
Technologies (PDCAT 2006), 229–233 (December 2006)
3. Choi, E., Lim, Y., Min, D.: Performance Comparison of Various Web Cluster Architectures.
In: Baik, D.-K. (ed.) AsiaSim 2004. LNCS, vol. 3398, pp. 617–624. Springer, Heidelberg
(2005)
4. Guo, J., Bhuyan, L.N.: Load Balancing in a Cluster-Based Web Server for Multimedia
Applications. IEEE Trans. Parallel and Distributed Systems 17(11), 1321–1334 (2006)
5. Li, C., Peng, G., Gopalan, K., Chiueh, T.-c.: Performance Garantees for Cluster-Based
Internet Services. In: Proc. 23th International Conference on Distributed Computing Sys-
tems, pp. 378–385 (May 2003)
6. Yu, P.S., Cardellini, V., Colajanni, M.: Dynamic Load Balancing on Web-server Systems.
IEEE Trans. Internet Computing, 28–39 (May/June 1999)
Effect of Database Server Arrangement to the Performance of Load Balancing Systems 155

7. Colajanmi, M., Yu, P.S.: Adaptive TTL schemes for Load Balancing of Distributed Web
Servers. ACM Trans. Sigmetrics Performance Evaluation Review 25(2), 36–42 (1997)
8. Yu Philip, S., Colajannin, M., Cardellini, V.: Dynamic Load Balancing in Geographically
Distributed Heterogeneous Web Servers. In: IEEE International Conference on Distributed
Computing Systems (ICDCS 1998), Amsterdam, Netherlands, pp. 26–29 (May 1998)
9. Kopparapu, C.: Load Balancing Servers, Firewalls, and Caches. Wiley Computer, Chich-
ester (2002)
10. W3C World Wide Web Consortium, http://www.w3c.org
11. Garland, M., Grassia, S., Monroe, R., Puri, S.: Implementing Distributed Server Groups for
the World Wide Web. Technical Report CMUCS-95-114 (January 1995)
12. Cisco System, http://www.cisco.com
13. Haney, D., Madsen, K.S.: Load-balancing for MySQL. Kobenhavns Universitet (2003)
14. Hellerstein, J.M., Stonebraker, M., Hamilton, J.: Architecture of a Database System.
Foundations and Trends, pp.141–259 (October 2007)

You might also like