History of Database Applications

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

History of Database Applications

Hehua Chi Yihe Yang


University of Rochester University of Rochester
Wegmans Hall 1210, Rochester, NY 14620 Wegmans Hall 1210, Rochester, NY 14620
[email protected] [email protected]

ABSTRACT that we today associate with business intelligence, required complex


In this paper, we explored the history of database applications which coding [1].
are all about three revolutions in database technologies. The first
revolution was driven by the emergence of the electronic computer. Arguably, no single person has had more influence over database
In the 20 years following the widespread adoption of electronic technology than Edgar Codd. He harbored significant reservations
computers, a range of increasingly sophisticated database systems about their design. In particular, he considered the following
emerged. The second revolution was the emergence of the relational restrictions [1, 2]:
database. Shortly after the definition of the relational model in 1970, 1. Existing databases were too hard to use. Databases of the day
almost every significant database system shared a common could only be accessed by people with specialized programming
architecture. The three pillars of this architecture were the relational skills.
model, ACID transactions, and the SQL language. However, 2. Existing databases lacked a theoretical foundation. Codd’s
starting around 2008, the third revolution has resulted in an mathematical background encouraged him to think about data in
explosion of non-relational database alternatives driven by the terms of formal structures and logical operations.
demands of modern applications that require global scope and 3. Existing databases mixed logical and physical
continuous availability. An explosion of new database systems implementations.
occurred: key-value database, document database, graph database,
column database or even SSD and In-memory database. The next Codd published an internal IBM paper outlining his ideas for a more
generation databases will be NoSQL, NewSQL and big data formalized model for database systems, which then led to his 1970
platforms [1]. paper “A Relational Model of Data for Large Shared Data Banks.”
This classic paper contained the core ideas that defined the relational
database model that became the most significant—almost
1. INTRODUCTION universal—model for database systems for a generation [1, 2].
Wikipedia defines a database as an “organized collection of data.”
Although the term database entered our vocabulary only in the late The relational model does not itself define the way in which the
1960s, collecting and organizing data has been an integral factor in database handles concurrent data change requests. These changes—
the development of human civilization and technology. Books, generally referred to as database transactions. Jim Gray defined the
libraries and other indexed archives of information represent most widely accepted transaction model in the late 1970s. This soon
preindustrial equivalents of modern database systems [1]. became popularized as ACID transactions: Atomic, Consistent,
Independent, and Durable [2].
The emergence of electronic computers following the Second World
War represented the first revolution in databases. The development However, the restriction on scalability beyond a single data center
of indexing methods such as ISAM (Index Sequential Access implied by the ACID transaction model has been a key motivator for
Method) and similar indexing structures powered the first electronic the development of new database architectures. The difference in
databases. However, there was no Database Management Systems application architectures between the client-server era and the era of
(DBMS) which can minimize programmer overhead and ensure the massive web-scale applications created pressures on the relational
performance and integrity of data access routines [1]. database that could not be relieved through incremental innovation
[1].
By the early 1970s, two major models of DBMS were competing for
dominance. The network model was formalized by the CODASYL A sort of database explosions occurred in the years 2008 - 2009:
standard and implemented in databases such as IDMS, while the literally dozens of new database systems emerged in this short
hierarchical model provided a somewhat simpler approach as was period. Especially in late 2009, the term NoSQL quickly caught on
most notably found in IBM’s IMS (Information Management as shorthand for any database system that broke with the traditional
System) [1]. SQL database [3]. By 2011, the term NewSQL became popularized
as a means of describing this new breed of databases that, while not
However, these systems had several notable drawbacks. First, the representing a complete break with the relational model, enhanced
navigational databases were extremely inflexible in terms of data or significantly modified the fundamental principles. Finally, the
structure and query capabilities. And it was extremely difficult to term Big Data burst onto mainstream consciousness in early 2012.
add new data elements to an existing system. Second, the database Although the term refers mostly to the new ways in which data is
systems were centered on record at a time transaction processing. being leveraged to create value, we generally understand "Big Data
Query operations, especially the sort of complex analytic queries solutions" as convenient shorthand for technologies that support
large and unstructured datasets such as Hadoop. NoSQL, NewSQL,
and Big Data are in many respects vaguely defined, overhyped, and Table 1. Pre-relational database system development
overloaded terms. However, they represent the most widely Year Pre-Relational Database system
understood phrases for referring to next-generation database 1951 Magnetic tape
technologies [1]. 1955 Magnetic Disk
1961 ISAM
The remaining of this paper is organized as follows: the history of 1965 Hierarchical model
database applications are reviewed in Section 2. Two promising sub- 1968 IMS
areas of future database systems are introduced in Section 3. The
1969 Network Model
considerations and requirements for choosing appropriate database
1971 IDMS
systems in different applications are summarized in Section 4.
Finally, the conclusion is in Section 5.
2.1.2 Relational database system
The intricacies of relational database theory, at its essence, describes
how a given set of data should be presented to the user, rather than
2. HISTORY OF DATABASE how it should be stored on disk or in memory. A row in a table
APPLICATIONS should be identifiable and efficiently accessed by a unique key
2.1. Timeline of database development value, and every column in that row must be dependent on that key
value and no other identifier. Arrays and other structures that contain
nested information are, therefore, not directly supported [1]. Table
2 illustrates the development of relational database systems. While
each of these systems attempts to differentiate by claiming superior
performance, availability, functionality, or economy, they are
virtually identical in their reliance on three key principles: Codd’s
relational model, the SQL language, and the ACID transaction
model [1].

Table 2: Relational database system development


Year Relational database system
1970 Codd’s Paper
1974 System R
1978 Oracle
1980 Commercial Ingres
1981 Informix
1984 DB2
1987 Sybase
1989 Postgres
1989 SQL Server
Figure 1. Illustrating three major eras in database technology
1995 MySQL
[1].

2.1.3 The next generation database system


The first revolution was driven by the emergence of the electronic By the middle of the 2000s, the relational database seemed
computer and then many pre-relational databases have sprung up completely entrenched. In fact, the era of complete relational
like mushrooms. The second revolution was driven by a classic database supremacy was just about to end. The difference in
paper contained the core ideas that defined the relational database application architectures between the client-server era and the era of
model that became the most significant - almost universal - model massive web-scale applications created pressures on the relational
for database systems for a generation [5]. The third revolution for database that could not be relieved through incremental innovation
next generation databases has resulted in an explosion of non- [1]. Table 3 illustrates the development of today and future
relational database alternatives to meet the era of massive web-scale databases. In section 3, two promising sub-areas of future databases
and big data applications [4]. Figure 1 illustrates three major eras in will be introduced in detail.
database technology. In this section, we’ll provide an overview of
these three waves of database technologies and discuss the tendency Table 3: Today and future database development
forces leading to today and future’s next generation databases. Year Today and future databases
2003 MarkLogic
2.1.1 The pre-relational database system 2004 MapReduce
Early database systems enforced both a schema (a definition of the 2005 Hadoop
structure of the data within the database) and an access path (a fixed 2005 Vertica
means of navigating from one record to another). Table 1 illustrates 2007 Dynamo
the pre-relational database system development. By the early 1970s, 2008 Cassandra
two major models of DBMS were competing for dominance: the 2008 Hbase
network model and the hierarchical model [1]. 2008 NuoDB
2009 MongoDB transactions are lost which becomes impossible to perform joins or
2010 VoltDB maintain transactional integrity across shards [1].
2010 Hana
2011 Riak Finally, the operational costs of sharding, together with the loss of
2012 Areospike relational features, made many seek alternatives to the Relational
2014 Splice Machine Database Management System (RDBMS) [1].

3.1 NoSQL database


2.2 Summary: three platforms corresponding to A NoSQL (Not Only SQL) database provides a mechanism for
storage and retrieval of data that is modeled in means other than the
three waves of databases tabular relations used in relational databases. NoSQL databases
The three waves of databases roughly corresponds to a three waves
operate without a schema, allowing you to freely add fields to
of computer applications. The three platforms shown in figure 2 are
database records without having to define any changes in structure
often referred to illustrate the database system development. The
first. This is particularly useful when dealing with non-uniform data
first platform was the mainframe, which was supported by pre-
and custom fields. In summary, the common characteristics of
relational database systems. The second platform, client-server and
NoSQL databases are:
early web applications, was supported by relational databases. The
(1) The do not use the relational model;
third platform is characterized by applications that involve cloud
(2) The run well on clusters;
computing, mobile presence, social networking, and the Internet of
(3) Usually are open-source;
Things. The third platform demands a third wave of database
(4) They’re built for the 21st century web estates;
technologies that include but are not limited to relational systems
(5) They’re for the most part, schemaless;
[1]. Figure 2 summarizes how the three platforms correspond to the
(6) The most important result of the rise of NoSQL is Polyglot
three waves of database revolutions.
Persistence.

There are commonly 4 main types of NoSQL data models: key-value


databases; document databases; column databases and graph
databases.

3.1.1 Key-Value databases


A key-value database, or key-value store, is a data storage paradigm
designed for storing, retrieving, and managing associative arrays
which is a data structure more commonly known today as a
dictionary or hash. In the following scenarios, it is beneficial to
apply key-value databases [4].
(1) Storing Session Information;
Generally, every web session is unique and is assigned a unique
SessionID value. Applications that store the SessionID on disk or in
a RDBMS will greatly benefit from moving to a key-value store,
since everything about the session can be stored by a single PUT
request or retrieved using GET. This single-request operation makes
Figure 2. Three platforms correspond to three waves of it very fast, as everything about the session is stored in a single
database technology [1]. object. Solutions such as Memcached are used by many web
applications, and Riak can be used when availability is important [1]
.
3. TWO PROMISING FUTURE DATABASES (2) User Profiles, Preferences;
Almost every user has a unique UserId, Username, or some other
The relational database was already well established. However,
attribute, as well as preferences such as language, color, time-zone
driven by the demands of modern applications that require global
and so on. This can all be put into an object, so getting preferences
scope and continuous availability, relational databases were
of a user takes a single GET operation. Product profiles can be
inadequate to deal with the volumes and velocity of the big data. In
stored, similarly.
particular, the difference in application architectures between the
client-server era and the era of massive web-scale applications
(3)Shopping Cart Data;
created pressures on the relational database that could not be
E-commerce websites have shopping carts tied to the user. As we
relieved through incremental innovation. Scalability challenges
want the shopping carts to be available all the time, across browsers,
exist in scaling their infrastructure from thousands to millions of
machines, and sessions, all the shopping information can be put into
users. Even the most expensive commercial Relational Database
the value where the key is the userid. A Riak cluster would be best
Management System (RDBMS) such as Oracle could not provide
suited for these kinds of applications.
sufficient scalability to meet the demands of these sites. Sharding at
sites like Facebook has allowed a MySQL-based system to scale up
to massive levels. However, there are downsides to doing this
because many relational operations and database-level ACID 3.1.2 Document databases
A document database is designed to store semi-structured data as and so on. Figure 3 shows how to make decisions involved in
documents, typically in JSON or XML format. It is beneficial to use choosing the correct database.
document databases in the following scenarios: event logging;
content management systems, blogging platforms; web analytics or
real-time analytics and e-commerce Applications.

3.1.3 Column databases


A column store database is a type of database that stores data using
a column oriented model. It is beneficial to use document databases
in the following scenarios: event logging; content management
systems, blogging platforms; counters and expiring usage.

3.1.4 Graph databases


A graph database is a database that uses graph structures for
semantic queries with nodes, edges and properties to represent and
store data. It is beneficial to use document databases in the following
scenarios: connected data; routing, dispatch, and location-Based
services; recommendation engines.
Figure 3. Decisions involved in choosing the correct database
[1].
3.2 NewSQL database
The term NewSQL is not quite as broad as NoSQL. NewSQL is a
term to describe a new group of databases that share much of the
functionality of traditional SQL relational databases, while offering 5. CONCLUSIONS
some of the benefits of NoSQL technologies. NewSQL systems It's an exciting time to be working in the database industry. For a
offer the best of both worlds: the relational data model and ACID generation of software professionals, innovation in database
transactional consistency of traditional operational databases; the technology occurred largely within the constraints of the ACID-
familiarity and interactivity of SQL; and the scalability and speed of compliant relational databases. Now that the hegemony of the
NoSQL. Some offer stronger consistency guarantees than are RDBMS has been broken, we are free to design database systems
available with NoSQL solutions, although others limit this to whose only constraint is our imagination. It's well known that failure
‘tunable’ consistency and thus aren’t fully ACID-compliant [1]. The drives innovation. Some of these new database system concepts
NewSQL advantages include: might not survive the test of time; however, there seems little chance
(1) Minimize application complexity, stronger consistency and that a single model will dominate the immediate future as
often full transactional support. completely as had the relational model. Database professionals will
(2) Familiar SQL and standard tooling. need to choose the most appropriate technology for their
(3) Richer analytics leveraging SQL and extensions. circumstances with care; in many cases, relational technology will
(4) Many systems offer NoSQL-style clustering with more continue be a better fit—but not always [1].
traditional data and query models.
NoSQL, NewSQL, and Big Data are in many respects vaguely
The NewSQL disadvantages include: defined, overhyped, and overloaded terms. However, they represent
(1) No NewSQL systems are as general-purpose as traditional the most widely understood phrases for referring to next-generation
SQL systems set out to be. database technologies [1].
(2) In-memory architectures may be inappropriate for volumes
exceeding a few terabytes. Loosely speaking, NoSQL databases reject the constraints of the
(3) Offers only partial access to the rich tooling of traditional relational model, including strict consistency and schemas.
SQL systems. NewSQL databases retain many features of the relational model but
amend the underlying technology in significant ways. Big Data
systems are generally oriented around technologies within the
Hadoop ecosystem, increasingly including Spark [1].
4. DATABASE CONSIDERATIONS AND
REQUIREMENTS 6. REFERENCES
The first and most obvious purpose of a database is to store, update, [1] Harrison, Guy. Next Generation Databases. Publisher: Apress.
and access data. All database systems allow these operations in one December 26, 2015
form or another. Other functional and nonfunctional system [1] Ramez Elmasri, Shamkant B. Navathe, Fundamentals of
considerations and requirements for choosing appropriate database Database Systems (7th Edition). Publisher: Pearson. June 18, 2015
systems in different applications include: (1) consistency, [3] Hugh E.Williams, David Lane. Web Database Applications with
availability, and partition tolerance (CAP); (2) robustness and PHP and MySQL. Publisher: O’Reilly Media. May 2004
reliability; (3) scalability; (4) performance and speed; (5) [4] Haseeb, Abdul, and Geeta Pattun. "A review on NoSQL:
partitioning ability; (7) in-database analytics and monitoring; (8) Applications and challenges." International Journal of Advanced
operational and querying capabilities; (9) storage management; (10) Research in Computer Science 8.1 (2017).
talent pool and availability of relevant skills; (11) database integrity [5] Codd, Edgar F. "A relational model of data for large shared data
and constraints; (12) data model flexibility; (13) database security banks." Communications of the ACM 13.6 (1970): 377-387.

You might also like