Providing Database As A Service
Providing Database As A Service
Providing Database As A Service
Hakan Hacg m s
u u
Department of Information and
Computer Science
University of California
Irvine, CA 92697, USA
[email protected]
Bala Iyer
IBM Silicon Valley Lab.
San Jose, CA 95141, USA
[email protected]
Abstract
In this paper, we explore a new paradigm for data
management in which a third party service provider hosts
database as a service providing its customers seamless
mechanisms to create, store, and access their databases
at the host site. Such a model alleviates the need for
organizations to purchase expensive hardware and software, deal with software upgrades, and hire professionals
for administrative and maintenance tasks which are taken
over by the service provider. We have developed and deployed a database service on the Internet, called NetDB2,
which is in constant use. In a sense, data management
model supported by NetDB2 provides an effective mechanism for organizations to purchase data management as a
service, thereby freeing them to concentrate on their core
businesses. Among the primary challenges introduced by
database as a service are additional overhead of remote
access to data, an infrastructure to guarantee data privacy,
and user interface design for such a service. These issues
are investigated in the study. We identify data privacy as
a particularly vital problem and propose alternative solutions based on data encryption. This paper is meant as a
challenges paper for the database community to explore a
rich set of research issues that arise in developing such a
service.
1. Introduction
Advances in the networking technologies have triggered
one of the key industry responses, the software as a service initiative, also referred to as the application service provider (ASP) model. In this paper, we explore the
database as service paradigm and the challenges introduced by that.
Today, efcient data processing is a fundamental and vital issue for almost every scientic, academic, or business
Sharad Mehrotra
Department of Information and
Computer Science
University of California
Irvine, CA 92697, USA
[email protected]
the Internet.
In the system, data and all of the necessary database
products are located on the server site. A user makes a
connection to the system through the Internet and performs
the database queries and other relevant tasks over the data
through a web browser or an application programming interface such as JDBC [6]. The design principle of the system is to absorb complexity and workload on the server site
as much as possible. The goal is to keep the client side
lightweight, possibly requiring only a web browser to access the system. This makes the system portable and readily available from any location without any installation and
conguration at the client side. By using a web browser
based connection and web interface, the user has a chance to
access and use the whole set of database products, which are
professionally managed, without worrying about the system
administration, maintenance, upgrading the system etc.
The rest of the paper is organized as follows. Section 2
presents NetDB2s system architecture. Section 3 describes
user interface design of NetDB2. Section 4 discusses additional overheads due to the World Wide Web access to
NetDB2 system and presents experimental results based on
TPC-H benchmark queries. In Section 5 we describe our
solution to data privacy problem and provide experimental
results for different alternatives proposed in the study. We
conclude the paper in Section 6.
2. System Architecture
The basic NetDB2 system is implemented as a three-tier
architecture, namely; the presentation layer, the application
layer, the data management layer (Figure 1). There are two
benets of separating NetDB2 into layers. The rst is the insulation of software components of one layer from another,
ager and a backup/recovery server. The servlet engine communicates with the database using the JDBC protocol [6].
The database server and the backup/recovery server communicate, on a set schedule, through a private and secure
high-speed network, without human involvement. On a set
schedule backed up data is automatically restored to a warm
standby NetDB2 system, with take over capability.
3. User Interface
3.1. NetDB2s Visual Interface
NetDB2 provides a web interface, which makes the system accessible from any computer running a web browser
via Internet. Through NetDB2s user interface, one can
create/remove tables, views, triggers, indexes, abstract data
types, SQL queries, generate and call user dened functions
End-User
Login
HTTP Response
HTTP Request
Intermediate UI
NetDB2
HTTP
Request/Response
Redirection
Performance Ratio
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0.1
10
Scale Factor
DB2
NetDB2
5. Data Privacy
Privacy on the Internet is an issue that is of signicant interest. There are two fundamental issues: 1) Privacy of data
during transmission and 2) Privacy of stored data. The rst
issue, privacy during network transmission, has been studied widely in the Internet area and addressed by the Secure
Socket Layer protocol (SSL) [8] and Transport Layer Security (TSL) protocol [4]. The second issue, privacy of stored
data in relational databases is less studied and of greater
relevance to database as a service model. If database as a
service is to be successful, and customer data is to reside
on the site of the database service provider, then the service
provider needs to nd a way to preserve the privacy of the
user data. There needs to be security measure in place so
that even if the data is stolen, the thief cannot make sense of
it.
Encryption is the perfect technique to solve this problem. Prior work [7] [2] does not address the critical issue of
performance. But in this work, for the rst time, we have
addressed and evaluated the most critical issue for the success of encryption in databases, performance. To achieve
that, we have analyzed different solution alternatives.
There are two dimensions to encryption support in
databases. One is the granularity of data to be encrypted
or decrypted. The eld, the row and the page, typically
4KB, are the alternatives. The eld may appear to be the
best choice, because it would minimize the number of bytes
encrypted. However, as we have discovered, practical methods of embedding encryption within relational databases
entail a signicant start up cost for an encryption operation.
Row or the page level encryption amortizes this cost over
larger data. The second dimension is software versus hardware level implementation of encryption algorithms. Our
results show that the choice makes signicant impact on the
performance.
Specialized encryption hardware, the IBM S/390 Cryptographic Coprocessor, is available under IBM OS/390 environment with Integrated Cryptographic Service Facility
(ICSF) libraries. IBM DB2 for OS/390 provides a facility
called editproc (or edit routine), which can be associated
with a database table. An edit routine is invoked for a whole
row of the database table, whenever the row is accessed by
the DBMS.
We registered an encryption/decryption edit routine for
the tables. When a read/write request arrives for a row
in one of these tables, the edit routine invokes encryption/decryption algorithm, which is implemented in hardware, for whole row. We used the DES [3] algorithm option
for encryption hardware.
Performance Ratio
3.5
3
2.5
2
1.5
1
0.5
0
0.1
10
Scale Factor
NetDB2
select
l returnag, l linestatus, sum(l quantity) as sum qty, sum(l extendedprice) as sum base price,
sum(l extendedprice * (1 - decrypt(l discount,key))) as sum disc price,
sum(l extendedprice * (1 - decrypt(l discount,key)) * (1 + l tax)) as sum charge, avg(l quantity) as avg qty,
avg(l extendedprice) as avg price, avg(decrypt(l discount,key)) as avg disc, count(*) as count order
from lineitem
where
date (1998-12-01) - 90 day
l shipdate
group by l returnag, l linestatus
order by l returnag, l linestatus
'SW'
'HW'
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
0.1
10
100000
200000
300000
Number of Rows
400000
500000
600000
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
No encryption
.
of attributes of relation , where
denotes the set of tables with their columns in FROM clause
.
where
denotes
. denotes the set of base
columns in SELECT clause. We dene three groups of
columns mentioned in SELECT clause: 1) non-aggregation
columns are denoted by . 2) aggregation
columns are written as
, where
is one of
the aggregation functions of SQL. The set of aggregation
columns is denoted by
. 3) UDF columns are in
, where
is the name of any valid
the form of
user dened function. The set of UDF columns is denoted
by .
Similarly denotes the set of the columns in
, and
is dened as;
; i.e., each element of non-aggregation
columns of the query will directly be mapped.
1. Mapping
2. Mapping
is dened as;
and
, where
is the
operand of aggregate function
; i.e., aggregation functions are removed in mapping.
,
is dened as;
and , where is
the operand of user dened function
; i.e., user
3. Mapping
6. Conclusion
In this paper, we introduced NetDB2, an internet-based
database service built on top of DB2 that provides users
with tools for application development, creating and loading tables, and performing queries and transactions.
Database as a service model introduces many signicant challenges primary of which are the additional overhead of remote access to data (service delivery penalty),
an infrastructure to guarantee data privacy, and user interface design for such a service. We have addressed these issues. Our experiments using the TPC-H benchmark showed
that the network overhead is tolerable. Data privacy can
be achieved by using a suitable encryption algorithm. We
proposed, implemented, and evaluated different encryption
schemes. First, software level encryption techniques investigated. Field level encryption is implemented and evaluated. In this scheme selected number of elds of the given
10 Replace each
by
11
Acknowledgements
We thank Dante Aubert, Girma Bizuneh, Thomas Burke,
Glen Deen, Joseph Demuth, Mohan Desouza, Linda Distel,
Nick Donofrio, Anne Gardner, Satish Gupta, Don Haderle,
Katharine Harris, Anant Jhingran, Kirk Jordan, Michael
Kelley, Gopal Krishnan, Charles Lickel, Bruce McAlister,
Diane Moebus, Inderpal Narang, Robert Pederslie, Tony
Rall, and Guraraj Rao for their generous support.
References
[1] A. Aho, S. Johnson, and J. Ullman. Code generation for expressions with subexpressions. Journal of ACM, Jan., 1977.
[2] G. Davida, D. Wells, and J. Kam. A database encryption system with subkeys. ACM Transactions on Database Systems,
6(2), 1981.
[3] DES. Data encryption standard. FIPS PUB 46, Federal
Information Processing Standards Publication, 1977.
[4] T. Dierks and C. Allen. The TSL protocol. Internet Draft,
Nov., 1997.
[5] D. Gupta, P. Jalote, and G. Barua. A formal framework for
on-line software version change. IEEE Transactions on Software Engineering, 22(2):120131, 1996.
[6] G. Hamilton and R. Cattell. JDBC: A Java SQL API.
http://splash.javasoft.com/jdbc/.
[7] J. He and M. Wang. Encryption in relational database management systems. In Proc. Fourteenth Annual IFIP WG
11.3 Working Conference on Database Security (DBSec00),
Schoorl, The Netherlands, 2000.
[8] P. Karlton, A. Freier, and P. Kocher. The SSL protocol v3.0.
Internet Draft, Nov., 1996.
[9] S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers, 1997.
[10] R. L. Rivest, A. Shamir, and L. M. Adleman. A method for
obtaining digital signatures and public key cryptosystems.
Communications of the ACM, 21(2):120126, 1978.
[11] B. Schneier. Description of a new variable-length key, block
cipher (blowsh), fast software encryption. In Cambridge
Security Workshop Proceedings, pages 191204, 1994.
[12] B. Schneier. Applied Cryptography. John Wiley & Sons,
Inc., 1996.
[13] I. Sommerville. Software Engineering. Addison-Wesley, 6th
Edition, 2001.
[14] D. Srivastava, S. Dar, H. Jagadish, and A. Levy. Answering
queries with aggregation using views. In Proc. 22nd VLDB
Conference, India, 1996.
[15] TPC-H. Benchmark Specication. http://www.tpc.org.
[16] A. F. Westin. Freebies and privacy: What net users
think. Technical report, Opinion Research Corporation,
http://www.privacyexchange.org/iss/surveys/sr990714.html,
1999.