This document discusses the history of database applications and revolutions. It describes 3 revolutions: 1) emergence of electronic computers driving early database systems, 2) emergence of the relational database model in 1970 defined by Codd, and 3) a new explosion of non-relational databases beginning in 2008 driven by demands of modern applications. The relational model and SQL language became universal standards, but new systems have emerged for applications requiring global scope and continuous availability.
This document discusses the history of database applications and revolutions. It describes 3 revolutions: 1) emergence of electronic computers driving early database systems, 2) emergence of the relational database model in 1970 defined by Codd, and 3) a new explosion of non-relational databases beginning in 2008 driven by demands of modern applications. The relational model and SQL language became universal standards, but new systems have emerged for applications requiring global scope and continuous availability.
This document discusses the history of database applications and revolutions. It describes 3 revolutions: 1) emergence of electronic computers driving early database systems, 2) emergence of the relational database model in 1970 defined by Codd, and 3) a new explosion of non-relational databases beginning in 2008 driven by demands of modern applications. The relational model and SQL language became universal standards, but new systems have emerged for applications requiring global scope and continuous availability.
This document discusses the history of database applications and revolutions. It describes 3 revolutions: 1) emergence of electronic computers driving early database systems, 2) emergence of the relational database model in 1970 defined by Codd, and 3) a new explosion of non-relational databases beginning in 2008 driven by demands of modern applications. The relational model and SQL language became universal standards, but new systems have emerged for applications requiring global scope and continuous availability.
University of Rochester University of Rochester Wegmans Hall 1210, Rochester, NY 14620 Wegmans Hall 1210, Rochester, NY 14620 [email protected][email protected]
ABSTRACT that we today associate with business intelligence, required complex
In this paper, we explored the history of database applications which coding [1]. are all about three revolutions in database technologies. The first revolution was driven by the emergence of the electronic computer. Arguably, no single person has had more influence over database In the 20 years following the widespread adoption of electronic technology than Edgar Codd. He harbored significant reservations computers, a range of increasingly sophisticated database systems about their design. In particular, he considered the following emerged. The second revolution was the emergence of the relational restrictions [1, 2]: database. Shortly after the definition of the relational model in 1970, 1. Existing databases were too hard to use. Databases of the day almost every significant database system shared a common could only be accessed by people with specialized programming architecture. The three pillars of this architecture were the relational skills. model, ACID transactions, and the SQL language. However, 2. Existing databases lacked a theoretical foundation. Codd’s starting around 2008, the third revolution has resulted in an mathematical background encouraged him to think about data in explosion of non-relational database alternatives driven by the terms of formal structures and logical operations. demands of modern applications that require global scope and 3. Existing databases mixed logical and physical continuous availability. An explosion of new database systems implementations. occurred: key-value database, document database, graph database, column database or even SSD and In-memory database. The next Codd published an internal IBM paper outlining his ideas for a more generation databases will be NoSQL, NewSQL and big data formalized model for database systems, which then led to his 1970 platforms [1]. paper “A Relational Model of Data for Large Shared Data Banks.” This classic paper contained the core ideas that defined the relational database model that became the most significant—almost 1. INTRODUCTION universal—model for database systems for a generation [1, 2]. Wikipedia defines a database as an “organized collection of data.” Although the term database entered our vocabulary only in the late The relational model does not itself define the way in which the 1960s, collecting and organizing data has been an integral factor in database handles concurrent data change requests. These changes— the development of human civilization and technology. Books, generally referred to as database transactions. Jim Gray defined the libraries and other indexed archives of information represent most widely accepted transaction model in the late 1970s. This soon preindustrial equivalents of modern database systems [1]. became popularized as ACID transactions: Atomic, Consistent, Independent, and Durable [2]. The emergence of electronic computers following the Second World War represented the first revolution in databases. The development However, the restriction on scalability beyond a single data center of indexing methods such as ISAM (Index Sequential Access implied by the ACID transaction model has been a key motivator for Method) and similar indexing structures powered the first electronic the development of new database architectures. The difference in databases. However, there was no Database Management Systems application architectures between the client-server era and the era of (DBMS) which can minimize programmer overhead and ensure the massive web-scale applications created pressures on the relational performance and integrity of data access routines [1]. database that could not be relieved through incremental innovation [1]. By the early 1970s, two major models of DBMS were competing for dominance. The network model was formalized by the CODASYL A sort of database explosions occurred in the years 2008 - 2009: standard and implemented in databases such as IDMS, while the literally dozens of new database systems emerged in this short hierarchical model provided a somewhat simpler approach as was period. Especially in late 2009, the term NoSQL quickly caught on most notably found in IBM’s IMS (Information Management as shorthand for any database system that broke with the traditional System) [1]. SQL database [3]. By 2011, the term NewSQL became popularized as a means of describing this new breed of databases that, while not However, these systems had several notable drawbacks. First, the representing a complete break with the relational model, enhanced navigational databases were extremely inflexible in terms of data or significantly modified the fundamental principles. Finally, the structure and query capabilities. And it was extremely difficult to term Big Data burst onto mainstream consciousness in early 2012. add new data elements to an existing system. Second, the database Although the term refers mostly to the new ways in which data is systems were centered on record at a time transaction processing. being leveraged to create value, we generally understand "Big Data Query operations, especially the sort of complex analytic queries solutions" as convenient shorthand for technologies that support large and unstructured datasets such as Hadoop. NoSQL, NewSQL, and Big Data are in many respects vaguely defined, overhyped, and Table 1. Pre-relational database system development overloaded terms. However, they represent the most widely Year Pre-Relational Database system understood phrases for referring to next-generation database 1951 Magnetic tape technologies [1]. 1955 Magnetic Disk 1961 ISAM The remaining of this paper is organized as follows: the history of 1965 Hierarchical model database applications are reviewed in Section 2. Two promising sub- 1968 IMS areas of future database systems are introduced in Section 3. The 1969 Network Model considerations and requirements for choosing appropriate database 1971 IDMS systems in different applications are summarized in Section 4. Finally, the conclusion is in Section 5. 2.1.2 Relational database system The intricacies of relational database theory, at its essence, describes how a given set of data should be presented to the user, rather than 2. HISTORY OF DATABASE how it should be stored on disk or in memory. A row in a table APPLICATIONS should be identifiable and efficiently accessed by a unique key 2.1. Timeline of database development value, and every column in that row must be dependent on that key value and no other identifier. Arrays and other structures that contain nested information are, therefore, not directly supported [1]. Table 2 illustrates the development of relational database systems. While each of these systems attempts to differentiate by claiming superior performance, availability, functionality, or economy, they are virtually identical in their reliance on three key principles: Codd’s relational model, the SQL language, and the ACID transaction model [1].
Table 2: Relational database system development
Year Relational database system 1970 Codd’s Paper 1974 System R 1978 Oracle 1980 Commercial Ingres 1981 Informix 1984 DB2 1987 Sybase 1989 Postgres 1989 SQL Server Figure 1. Illustrating three major eras in database technology 1995 MySQL [1].
2.1.3 The next generation database system
The first revolution was driven by the emergence of the electronic By the middle of the 2000s, the relational database seemed computer and then many pre-relational databases have sprung up completely entrenched. In fact, the era of complete relational like mushrooms. The second revolution was driven by a classic database supremacy was just about to end. The difference in paper contained the core ideas that defined the relational database application architectures between the client-server era and the era of model that became the most significant - almost universal - model massive web-scale applications created pressures on the relational for database systems for a generation [5]. The third revolution for database that could not be relieved through incremental innovation next generation databases has resulted in an explosion of non- [1]. Table 3 illustrates the development of today and future relational database alternatives to meet the era of massive web-scale databases. In section 3, two promising sub-areas of future databases and big data applications [4]. Figure 1 illustrates three major eras in will be introduced in detail. database technology. In this section, we’ll provide an overview of these three waves of database technologies and discuss the tendency Table 3: Today and future database development forces leading to today and future’s next generation databases. Year Today and future databases 2003 MarkLogic 2.1.1 The pre-relational database system 2004 MapReduce Early database systems enforced both a schema (a definition of the 2005 Hadoop structure of the data within the database) and an access path (a fixed 2005 Vertica means of navigating from one record to another). Table 1 illustrates 2007 Dynamo the pre-relational database system development. By the early 1970s, 2008 Cassandra two major models of DBMS were competing for dominance: the 2008 Hbase network model and the hierarchical model [1]. 2008 NuoDB 2009 MongoDB transactions are lost which becomes impossible to perform joins or 2010 VoltDB maintain transactional integrity across shards [1]. 2010 Hana 2011 Riak Finally, the operational costs of sharding, together with the loss of 2012 Areospike relational features, made many seek alternatives to the Relational 2014 Splice Machine Database Management System (RDBMS) [1].
3.1 NoSQL database
2.2 Summary: three platforms corresponding to A NoSQL (Not Only SQL) database provides a mechanism for storage and retrieval of data that is modeled in means other than the three waves of databases tabular relations used in relational databases. NoSQL databases The three waves of databases roughly corresponds to a three waves operate without a schema, allowing you to freely add fields to of computer applications. The three platforms shown in figure 2 are database records without having to define any changes in structure often referred to illustrate the database system development. The first. This is particularly useful when dealing with non-uniform data first platform was the mainframe, which was supported by pre- and custom fields. In summary, the common characteristics of relational database systems. The second platform, client-server and NoSQL databases are: early web applications, was supported by relational databases. The (1) The do not use the relational model; third platform is characterized by applications that involve cloud (2) The run well on clusters; computing, mobile presence, social networking, and the Internet of (3) Usually are open-source; Things. The third platform demands a third wave of database (4) They’re built for the 21st century web estates; technologies that include but are not limited to relational systems (5) They’re for the most part, schemaless; [1]. Figure 2 summarizes how the three platforms correspond to the (6) The most important result of the rise of NoSQL is Polyglot three waves of database revolutions. Persistence.
There are commonly 4 main types of NoSQL data models: key-value
databases; document databases; column databases and graph databases.
3.1.1 Key-Value databases
A key-value database, or key-value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays which is a data structure more commonly known today as a dictionary or hash. In the following scenarios, it is beneficial to apply key-value databases [4]. (1) Storing Session Information; Generally, every web session is unique and is assigned a unique SessionID value. Applications that store the SessionID on disk or in a RDBMS will greatly benefit from moving to a key-value store, since everything about the session can be stored by a single PUT request or retrieved using GET. This single-request operation makes Figure 2. Three platforms correspond to three waves of it very fast, as everything about the session is stored in a single database technology [1]. object. Solutions such as Memcached are used by many web applications, and Riak can be used when availability is important [1] . 3. TWO PROMISING FUTURE DATABASES (2) User Profiles, Preferences; Almost every user has a unique UserId, Username, or some other The relational database was already well established. However, attribute, as well as preferences such as language, color, time-zone driven by the demands of modern applications that require global and so on. This can all be put into an object, so getting preferences scope and continuous availability, relational databases were of a user takes a single GET operation. Product profiles can be inadequate to deal with the volumes and velocity of the big data. In stored, similarly. particular, the difference in application architectures between the client-server era and the era of massive web-scale applications (3)Shopping Cart Data; created pressures on the relational database that could not be E-commerce websites have shopping carts tied to the user. As we relieved through incremental innovation. Scalability challenges want the shopping carts to be available all the time, across browsers, exist in scaling their infrastructure from thousands to millions of machines, and sessions, all the shopping information can be put into users. Even the most expensive commercial Relational Database the value where the key is the userid. A Riak cluster would be best Management System (RDBMS) such as Oracle could not provide suited for these kinds of applications. sufficient scalability to meet the demands of these sites. Sharding at sites like Facebook has allowed a MySQL-based system to scale up to massive levels. However, there are downsides to doing this because many relational operations and database-level ACID 3.1.2 Document databases A document database is designed to store semi-structured data as and so on. Figure 3 shows how to make decisions involved in documents, typically in JSON or XML format. It is beneficial to use choosing the correct database. document databases in the following scenarios: event logging; content management systems, blogging platforms; web analytics or real-time analytics and e-commerce Applications.
3.1.3 Column databases
A column store database is a type of database that stores data using a column oriented model. It is beneficial to use document databases in the following scenarios: event logging; content management systems, blogging platforms; counters and expiring usage.
3.1.4 Graph databases
A graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data. It is beneficial to use document databases in the following scenarios: connected data; routing, dispatch, and location-Based services; recommendation engines. Figure 3. Decisions involved in choosing the correct database [1]. 3.2 NewSQL database The term NewSQL is not quite as broad as NoSQL. NewSQL is a term to describe a new group of databases that share much of the functionality of traditional SQL relational databases, while offering 5. CONCLUSIONS some of the benefits of NoSQL technologies. NewSQL systems It's an exciting time to be working in the database industry. For a offer the best of both worlds: the relational data model and ACID generation of software professionals, innovation in database transactional consistency of traditional operational databases; the technology occurred largely within the constraints of the ACID- familiarity and interactivity of SQL; and the scalability and speed of compliant relational databases. Now that the hegemony of the NoSQL. Some offer stronger consistency guarantees than are RDBMS has been broken, we are free to design database systems available with NoSQL solutions, although others limit this to whose only constraint is our imagination. It's well known that failure ‘tunable’ consistency and thus aren’t fully ACID-compliant [1]. The drives innovation. Some of these new database system concepts NewSQL advantages include: might not survive the test of time; however, there seems little chance (1) Minimize application complexity, stronger consistency and that a single model will dominate the immediate future as often full transactional support. completely as had the relational model. Database professionals will (2) Familiar SQL and standard tooling. need to choose the most appropriate technology for their (3) Richer analytics leveraging SQL and extensions. circumstances with care; in many cases, relational technology will (4) Many systems offer NoSQL-style clustering with more continue be a better fit—but not always [1]. traditional data and query models. NoSQL, NewSQL, and Big Data are in many respects vaguely The NewSQL disadvantages include: defined, overhyped, and overloaded terms. However, they represent (1) No NewSQL systems are as general-purpose as traditional the most widely understood phrases for referring to next-generation SQL systems set out to be. database technologies [1]. (2) In-memory architectures may be inappropriate for volumes exceeding a few terabytes. Loosely speaking, NoSQL databases reject the constraints of the (3) Offers only partial access to the rich tooling of traditional relational model, including strict consistency and schemas. SQL systems. NewSQL databases retain many features of the relational model but amend the underlying technology in significant ways. Big Data systems are generally oriented around technologies within the Hadoop ecosystem, increasingly including Spark [1]. 4. DATABASE CONSIDERATIONS AND REQUIREMENTS 6. REFERENCES The first and most obvious purpose of a database is to store, update, [1] Harrison, Guy. Next Generation Databases. Publisher: Apress. and access data. All database systems allow these operations in one December 26, 2015 form or another. Other functional and nonfunctional system [1] Ramez Elmasri, Shamkant B. Navathe, Fundamentals of considerations and requirements for choosing appropriate database Database Systems (7th Edition). Publisher: Pearson. June 18, 2015 systems in different applications include: (1) consistency, [3] Hugh E.Williams, David Lane. Web Database Applications with availability, and partition tolerance (CAP); (2) robustness and PHP and MySQL. Publisher: O’Reilly Media. May 2004 reliability; (3) scalability; (4) performance and speed; (5) [4] Haseeb, Abdul, and Geeta Pattun. "A review on NoSQL: partitioning ability; (7) in-database analytics and monitoring; (8) Applications and challenges." International Journal of Advanced operational and querying capabilities; (9) storage management; (10) Research in Computer Science 8.1 (2017). talent pool and availability of relevant skills; (11) database integrity [5] Codd, Edgar F. "A relational model of data for large shared data and constraints; (12) data model flexibility; (13) database security banks." Communications of the ACM 13.6 (1970): 377-387.