Learning PostgreSQL
1/5
()
About this ebook
Related to Learning PostgreSQL
Related ebooks
Mastering MariaDB Rating: 0 out of 5 stars0 ratingsPostgreSQL Server Programming - Second Edition Rating: 0 out of 5 stars0 ratingsMastering Elasticsearch - Second Edition Rating: 0 out of 5 stars0 ratingsSQL Server 2016 Developer's Guide Rating: 0 out of 5 stars0 ratingsPostgreSQL 9.0 High Performance Rating: 4 out of 5 stars4/5Big Data Analytics Rating: 0 out of 5 stars0 ratingsMastering Django: Core Rating: 0 out of 5 stars0 ratingsLearning R Programming Rating: 5 out of 5 stars5/5Expert Cube Development with Microsoft SQL Server 2008 Analysis Services Rating: 5 out of 5 stars5/5Functional Python Programming Rating: 0 out of 5 stars0 ratingsMastering Objectoriented Python Rating: 5 out of 5 stars5/5Python Essentials Rating: 5 out of 5 stars5/5The SQL Workshop: Learn to create, manipulate and secure data and manage relational databases with SQL Rating: 0 out of 5 stars0 ratingsLearning Apache Spark 2 Rating: 0 out of 5 stars0 ratingsPostgreSQL for Data Architects Rating: 0 out of 5 stars0 ratingsPostgreSQL Administration Essentials Rating: 0 out of 5 stars0 ratingsInstant MongoDB Rating: 0 out of 5 stars0 ratingsMastering PostgreSQL 9.6 Rating: 0 out of 5 stars0 ratingsPostgreSQL Development Essentials Rating: 5 out of 5 stars5/5Troubleshooting PostgreSQL Rating: 5 out of 5 stars5/5PostgreSQL High Performance Cookbook Rating: 0 out of 5 stars0 ratingsPostgreSQL Server Programming Rating: 0 out of 5 stars0 ratingsGet Programming with Scala Rating: 0 out of 5 stars0 ratingsEffective MySQL Optimizing SQL Statements Rating: 3 out of 5 stars3/5Practical SQL Rating: 4 out of 5 stars4/5Data Analysis with Python and PySpark Rating: 0 out of 5 stars0 ratingsLearning Flask Framework Rating: 4 out of 5 stars4/5Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code Rating: 0 out of 5 stars0 ratings
Databases For You
Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Access 2019 For Dummies Rating: 0 out of 5 stars0 ratingsSQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Learn SQL Server Administration in a Month of Lunches Rating: 3 out of 5 stars3/5Access 2016 For Dummies Rating: 0 out of 5 stars0 ratingsManaging Data Using Excel Rating: 5 out of 5 stars5/5Blockchain Basics: A Non-Technical Introduction in 25 Steps Rating: 4 out of 5 stars4/5Practical Data Analysis Rating: 4 out of 5 stars4/5COMPUTER SCIENCE FOR ROOKIES Rating: 0 out of 5 stars0 ratingsCompTIA DataSys+ Study Guide: Exam DS0-001 Rating: 0 out of 5 stars0 ratingsLearning NumPy Array Rating: 0 out of 5 stars0 ratingsQuery Store for SQL Server 2019: Identify and Fix Poorly Performing Queries Rating: 0 out of 5 stars0 ratingsPython Projects for Everyone Rating: 0 out of 5 stars0 ratingsSql : The Ultimate Beginner to Advanced Guide To Master SQL Quickly with Step-by-Step Practical Examples Rating: 0 out of 5 stars0 ratingsSchaum’s Outline of Fundamentals of SQL Programming Rating: 3 out of 5 stars3/5A concise guide to PHP MySQL and Apache Rating: 4 out of 5 stars4/5Learn SQL in 24 Hours Rating: 5 out of 5 stars5/5Entity Framework Core in Action, Second Edition Rating: 0 out of 5 stars0 ratingsData Science Strategy For Dummies Rating: 0 out of 5 stars0 ratingsExcel 2021 Rating: 4 out of 5 stars4/5Professional Access 2013 Programming Rating: 0 out of 5 stars0 ratingsData Warehousing: Optimizing Data Storage And Retrieval For Business Success Rating: 0 out of 5 stars0 ratingsVisualizing Graph Data Rating: 0 out of 5 stars0 ratingsRaspberry Pi Server Essentials Rating: 0 out of 5 stars0 ratingsAccess 2021 / Microsoft 365 Programming by Example: Mastering VBA for Data Management and Automation Rating: 0 out of 5 stars0 ratingsNode.js Design Patterns - Second Edition Rating: 4 out of 5 stars4/5
Reviews for Learning PostgreSQL
1 rating0 reviews
Book preview
Learning PostgreSQL - Juba Salahaldin
Table of Contents
Learning PostgreSQL
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Relational Databases
Database management systems
A brief history
Database categories
The NoSQL databases
The CAP theorem
NoSQL motivation
Key value databases
Columnar databases
Document databases
Graph databases
Relational and object relational databases
ACID properties
The SQL Language
Basic concepts
Relation
Tuple
Attribute
Constraint
Domain integrity constraint
Entity integrity constraint
Referential integrity constraints
Semantic constraints
Relational algebra
The SELECT and PROJECT operations
The RENAME operation
The Set theory operations
The CROSS JOIN (Cartesian product) operation
Data modeling
Data model perspectives
The entity-relation model
Sample application
Entities, attributes, and keys
Mapping ER to Relations
UML class diagrams
Summary
2. PostgreSQL in Action
An overview of PostgreSQL
PostgreSQL history
The advantages of PostgreSQL
Business advantages of PostgreSQL
PostgreSQL user advantages
PostgreSQL applications
Success stories
Forks
PostgreSQL architecture
PostgreSQL abstract architecture
The PostgreSQL community
PostgreSQL capabilities
Replication
Security
Extension
NoSQL capabilities
Foreign data wrapper
Performance
Very rich SQL constructs
Installing PostgreSQL
Installing PostgreSQL on Ubuntu
Client installation
Server installation
Basic server configuration
Installing PostgreSQL on Windows
The PostgreSQL clients
The psql client
Psql advanced settings
PostgreSQL utility tools
Backup and replication
Utilities
PgAdmin III
Summary
3. PostgreSQL Basic Building Blocks
Database coding
Database naming conventions
PostgreSQL identifiers
Documentation
Version control system
PostgreSQL objects hierarchy
Template databases
User databases
Roles
Tablespace
Template procedural languages
Settings
Setting parameters
Setting a context
PostgreSQL high-level object interaction
PostgreSQL database components
Schema
Schema usages
Table
PostgreSQL native data types
Numeric types
Character types
Date and time types
The car web portal database
Summary
4. PostgreSQL Advanced Building Blocks
Views
View synopsis
Views categories
Materialized views
Updatable views
Indexes
Index types
Partial indexes
Indexes on expressions
Unique indexes
Multicolumn indexes
Best practices on indexes
Functions
PostgreSQL native programming languages
Creating a function in the C language
Creating functions in the SQL language
Creating a function in the PL/pgSQL language
PostgreSQL function usages
PostgreSQL function dependency
PostgreSQL function categories
PostgreSQL anonymous functions
PostgreSQL user-defined data types
The PostgreSQL CREATE DOMAIN command
The PostgreSQL CREATE TYPE command
Triggers and rule systems
The PostgreSQL rule system
The PostgreSQL trigger system
Triggers with arguments
Using triggers to make views updatable
Summary
5. SQL Language
SQL fundamentals
SQL lexical structure
Querying the data with the SELECT statement
The structure of the SELECT query
Select-list
SQL expressions
DISTINCT
FROM clause
Selecting from multiple tables
Self-joins
WHERE clause
Comparison operators
Pattern matching
Row and array comparison constructs
Grouping and aggregation
GROUP BY clause
HAVING clause
Ordering and limiting the results
Subqueries
Set operations – UNION, EXCEPT, and INTERSECT
Dealing with NULLs
Changing the data in the database
INSERT statement
UPDATE statement
UPDATE using sub-select
UPDATE using additional tables
DELETE statement
TRUNCATE statement
Summary
6. Advanced Query Writing
Common table expressions
Reusing SQL code with CTE
Recursive and hierarchical queries
Changing data in multiple tables at a time
Window functions
Window definition
The WINDOW clause
Using window functions
Window functions with grouping and aggregation
Advanced SQL
Selecting the first records
Set returning functions
Lateral subqueries
Advanced usage of aggregating functions
Transaction isolation and multiversion concurrency control
Summary
7. Server-Side Programming with PL/pgSQL
Introduction
SQL language and PL/pgSQL – a comparison
PostgreSQL function parameters
Function authorization-related parameters
Function planner-related parameters
Function configuration-related parameters
The PostgreSQL PL/pgSQL control statements
Declaration statements
Assignment statements
Conditional statements
Iteration
The loop statement
The while loop statement
The for loop statement
Returning from the function
Returning void
Returning a single row
Returning multiple rows
Function predefined variables
Exception handling
Dynamic SQL
Executing DDL statements in dynamic SQL
Executing DML statements in dynamic SQL
Dynamic SQL and the caching effect
Recommended practices when using dynamic SQL
Summary
8. PostgreSQL Security
Authentication in PostgreSQL
PostgreSQL pg_hba.conf
Listen addresses
Authentication best practices
PostgreSQL default access privileges
Role system and proxy authentication
PostgreSQL security levels
Database security level
Schema security level
Table-level security
Column-level security
Row-level security
Encrypting data
PostgreSQL role password encryption
pgcrypto
One-way encryption
Two-way encryption
Summary
9. The PostgreSQL System Catalog and System Administration Functions
The system catalog
Getting the database cluster and client tools version
Getting ready
How to do it…
There's more…
Terminating and canceling user sessions
Getting ready
How to do it…
How it works…
There's more…
Setting and getting database cluster settings
Getting ready
How to do it…
There's more…
Getting the database and database object size
Getting ready
How to do it…
There's more…
Cleaning up the database
Getting ready
How to do it…
There's more…
Cleaning up data in the database
Getting ready
How to do it…
There's more…
Managing database locks
Adding missing indexes on foreign keys and altering the default statistic
Getting ready
How to do it…
Getting the views dependency tree
Getting ready
How to do it…
There's more…
Summary
10. Optimizing Database Performance
PostgreSQL configuration tuning
Maximum number of connections
Memory settings
Hard disk settings
Planner-related settings
Benchmarking is your friend
Tuning PostgreSQL queries
The EXPLAIN command and execution plan
Detecting problems in query plans
Common mistakes in writing queries
Unnecessary operations
Misplaced indexes
Unnecessary table or index scans
Using correlated nested queries
Using CTE when not mandatory
Using the PL/pgSQL procedural language consideration
Cross column correlation
Table partitioning
Constraint exclusion limitations
Summary
11. Beyond Conventional Data types
PostgreSQL arrays
Common functions of arrays and their operators
Modifying and accessing arrays
Indexing arrays in PostgreSQL
Hash store
Modifying and accessing an hstore
Indexing an hstore in PostgreSQL
The PostgreSQL JSON data type
JSON and XML
The JSON data type
Modifying and accessing JSON types
Indexing a JSON data type
The PostgreSQL RESTful API with JSON
A PostgreSQL full text search
The tsquery and tsvector data types
The tsvector data type
The tsquery data type
Pattern matching
Full text search indexing
Summary
12. Testing
Unit testing
Unit testing in databases
Unit test frameworks
Schema difference
The interfaces test
Data difference
PostgreSQL benchmarks
Summary
13. PostgreSQL JDBC
Introduction to JDBC
Connecting to a PostgreSQL database
Installing the driver
Initializing the driver
Obtaining a connection
Error handling
SQLWarnings
Issuing a query and processing the results
Static statements
PreparedStatements
Using a ResultSet
Navigating through a ResultSet
Reading row data
Handling null values
Scrollable and updateable ResultSets
Navigating through a ResultSet
Changing the data in a ResultSet
Using cursors
Getting information about the table structure
Function handling
Calling a stored function
Getting a ResultSet from a stored function
Getting a ResultSet from a function returning SETOF
Getting a ResultSet from a function returning a refcursor
Design considerations
Summary
14. PostgreSQL and Hibernate
Introduction to ORM and Hibernate
Hibernate overview and architecture
Installation and configuration
Installation of Hibernate
Configuring Hibernate
Getting a session from the SessionFactory
Mapping classes to tables
Creating an entity class
Creating a mapping file
Using annotation-based mapping
Working with entities
States of an entity
Making a new entity persistent
Loading an entity from the database
Loading a list of entries
Named queries
Creating dynamic queries
Modifying entities
Deleting entities
Using association mapping
One-to-many and many-to-one mappings
One-to-one mapping and component mapping
Many-to-many mapping
Fetching strategies
Configuring the fetch type
Configuring the fetch mode
Tuning the performance of Hibernate
Using caching
Using connection pools
Dealing with partitioned tables
Summary
Index
Learning PostgreSQL
Learning PostgreSQL
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2015
Production reference: 1241115
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78398-918-8
www.packtpub.com
Credits
Authors
Salahaldin Juba
Achim Vannahme
Andrey Volkov
Reviewers
Ângelo Marcos Rigo
Dr. Isabel Rosa
Commissioning Editor
Julian Ursell
Acquisition Editor
Tushar Gupta
Greg Wild
Content Development Editor
Parita Khedekar
Technical Editor
Vijin Boricha
Copy Editor
Shruti Iyer
Sonia Mathur
Project Coordinator
Judie Jose
Proofreader
Safis Editing
Indexer
Monica Ajmera Mehta
Graphics
Disha Haria
Production Coordinator
Conidon Miranda
Cover Work
Conidon Miranda
About the Authors
Salahaldin Juba has over 10 years of experience in industry and academia, with a focus on database development for large-scale and enterprise applications. He holds a master's degree of science in environmental management and a bachelor's degree of engineering in computer systems.
I would like to express my deepest gratitude to my colleagues Achim Vannahme and Andrey Volkov for making this work possible. Also, I would like to thank all those who provided support, especially the Packt publishing team, especially the persons whom I interacted with—Vijin Boricha, Greg Wild, and Parita Khedekar—for their great help with proofreading, design, comments, and remarks.
I would also like to thank my mother, Wedad; my wife, Rana; and the rest of my family, who supported me despite all of the time that I had to devote to this book over them.
Achim Vannahme works as a senior software developer at a mobile messaging operator, where he focuses on software quality and test automation. He holds a degree in computer science and has over 10 years of experience in using Java and PostgreSQL in distributed and high-performance applications.
Andrey Volkov pursued his education in information systems in the banking sector. He started his career as a financial analyst in a commercial bank. Here, Andrey worked with a database as a data source for his analysis and soon realized that querying the database directly is much more efficient for ad hoc analyses than using any visual report-generating software. He joined the data warehouse team, and after a while, he led the team by taking up the position of a data warehouse architect. Andrey worked mainly with Oracle databases to develop logical and physical models of finance and accounting data, created them in a database, implemented procedures to load and process data, and performed analytical tasks. He was also responsible for teaching users how to use data warehouse and BI tools, and SQL training was a part of his job as well.
After many years of being greatly interested in the aspects of his job that were related to IT rather than accounting or banking, Andrey changed fields. Currently, he works as a database developer in a telecommunication company. Here, Andrey works mainly with PostgreSQL databases and is responsible for data modeling, implementing data structures in databases, developing stored procedures, integrating databases with other software components, and developing a data warehouse.
Having worked with both Oracle and PostgreSQL—the former is a leading commercial and the latter is one of the most advanced open source RDBMSes—he is able to compare them and recognize and evaluate the key advantages of both. Andrey's extensive experience, therefore, made him able and willing to work on this book.
About the Reviewers
Ângelo Marcos Rigo has a strong background in web development, which he has worked with since 1998, with a focus on content management systems, hibryd mobile apps and custom web based systems. He holds a degree in systems information and also has extensive experience in managing, customizing, and developing extensions for the moodle LMS. Ângelo can be reached on his website, http://www.u4w.com.br, for consultation. He has also reviewed, Moodle Security, Packt Publishing.
I would like to thank my wife, Janaina de Souza, and my daughter, Lorena Rigo, for their support while I was away to review this book.
Dr. Isabel Rosa is a research associate at Imperial College London and one of the cofounders of Earthindicators. She has a PhD in computational ecology from Imperial College London and extensive experience in data mining and predictive modeling. For the last five years, Dr. Rosa worked as a researcher with Imperial College London. During her academic career, she acquired several skills such as statistical analysis, programming (R, C++, Python), working with geographic information systems (ArcGIS and QGIS), and creating databases (PostgreSQL/PostGIS, SQLServer). Dr. Rosa is also the lead author and coauthor of several scientific papers published in top-quality scientific journals, such as Global Change Biology. She has presented her work at several national and international scientific conferences and is the lead coordinator of Land Use Forum (London).
www.PacktPub.com
Support files, eBooks, discount offers, and more
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Free access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
Preface
Picking the right database management system is a difficult task due to the vast number of options on the market. Depending on the business model, one can pick a commercial database or an open source database with commercial support. In addition to this, there are several technical and nontechnical factors to assess. When it comes to a relational database management system, PostgreSQL stands at the top for several reasons. The PostgreSQL slogan, The world's most advanced open source database
, shows the sophistication of PostgreSQL features and community confidence.
PostgreSQL is an open source object relational database management system. It emphasizes extensibility and competes with major relational database vendors such as Oracle, SQL server, and MySQL. Due to its rich extensions and open source license, it is often used for research purposes, but PostgreSQL code is also the base for many commercial database management systems such as Greenplum and Vertica. Furthermore, start-up companies often favor PostgreSQL due to its licensing costs and because there are a lot of companies that provide commercial support.
PostgreSQL runs on most modern operating systems, including Windows, Mac, and Linux flavors. Also, there are several extensions to access, manage, and monitor PostgreSQL clusters, such as pgAdmin III. PostgreSQL installation and configuration is moderately easy as it is supported by most packaging tools, such as yum and apt.
Database developers can easily learn and use PostgreSQL because it complies with ANSI SQL standards and comes with many client tools such as psql and pgAdmin III. Other than this, there are a lot of resources to help developers learn PostgreSQL; it has a very good documentation manual and a very active and organized community.
PostgreSQL can be used for both OLTP and OLAP applications. As it is ACID compliant, it can be used out of the box for OLTP applications. For OLAP applications, PostgreSQL supports Window functions, FDW, and table inheritance; there are many external extensions for this purpose as well.
Even though PostgreSQL is ACID compliant, it has very good performance as it utilizes state of the art algorithms and techniques. For example, PostgreSQL utilizes MVCC architecture to allow concurrent access to data. Also, PostgreSQL provides a very good analyzer and advanced features, such as data partitioning using table inheritance and constraint exclusion, to speed up the handling of very large data. PostgreSQL supports several types of indexes such as B-Tree, GiN, and GiST, and BRIN indexes are also supported by PostgreSQL 9.5 at the time of writing this book.
PostgreSQL is scalable thanks to the many replication solutions in the market, such as Slony and pgpool-II. Additionally, PostgreSQL supports out-of-the-box synchronous and asynchronous streaming replication. This makes PostgreSQL very attractive because it can be used to set up highly available and performant systems.
What this book covers
Chapter 1, Relational Databases, introduces relational database system concepts, including relational database properties, relational algebra, and database modeling. Also, it describes different database management systems such as graph, document, key value, and columnar databases.
Chapter 2, PostgreSQL in Action, provides first-hand experience in installing the PostgreSQL server and client tools on different platforms. This chapter also introduces PostgreSQL capabilities, such as out-of-the-box replication support and its very rich data types.
Chapter 3, PostgreSQL Basic Building Blocks, provides some coding best practices, such as coding conventions, identifier names, and so on. This chapter describes the PostgreSQL basic building blocks and the interaction between these blocks, mainly template databases, user databases, tablespaces, roles, and settings. Also, it describes basic data types and tables.
Chapter 4, PostgreSQL Advanced Building Blocks, introduces several building blocks, including views, indexes, functions, user-defined data types, triggers, and rules. This chapter provides use cases of these building blocks and compares building blocks that can be used for the same case, such as rules and triggers.
Chapter 5, SQL Language, introduces Structured Query Language (SQL) which is used to interact with a database, create and maintain data structures, and enter data into databases, change it, retrieve it, and delete it. SQL has commands related to Data Definition Language (DDL), Data Manipulation Language (DML), and Data Control Language (DCL). Four SQL statements form the basis of DML—SELECT, INSERT, UPDATE, and DELETE—which are described in this chapter.
The SELECT statement is examined in detail to explain SQL concepts such as grouping and filtering to show what SQL expressions and conditions are and how to use subqueries. Some relational algebra topics are also covered in application to joining tables.
Chapter 6, Advanced Query Writing, describes advanced SQL concepts and features, such as common table expressions and window functions. This helps you implement a logic that would not be possible without them, such as recursive queries. Other techniques explained here, such as the DISTINCT ON clause, the FILTER clause, or lateral subqueries, are not that irreplaceable. However, they can help make a query smaller, easier, and faster.
Chapter 7, Server-Side Programming with PL/pgSQL, describes PL/pgSQL. It introduces function parameters, such as the number of returned rows, and function cost, which is mainly used by the query planner. Also, it presents control statements such as conditional and iteration ones. Finally, it explains the concept of dynamic SQL and some recommended practices when using dynamic SQL.
Chapter 8, PostgreSQL Security, discusses the concepts of authentication and authorization. It describes PostgreSQL authentication methods and explains the structure of a PostgreSQL host-based authentication configuration file. It also discusses the permissions that can be granted to database building objects such as schemas, tables, views, indexes, and columns. Finally, it shows how sensitive data, such as passwords, can be protected using different techniques, including one-way and two-way encryption.
Chapter 9, The PostgreSQL System Catalog and System Administration Functions, provides several recipes to maintain a database cluster, including cleaning up data, maintaining user processes, cleaning up indexes and unused databases objects, discovering and adding indexes to foreign keys, and so on.
Chapter 10, Optimizing Database Performance, discusses several approaches to optimize performance. It presents PostgreSQL cluster configuration settings, which are used in tuning the whole cluster's performance. Also, it presents common mistakes in writing queries and discusses several approaches to increase performance, such as using indexes or table partitioning and constraint exclusion.
Chapter 11, Beyond Conventional Data types, discusses several rich data types, including arrays, hash stores, and documents. It presents use cases as well as operations and functions for each data type. Additionally, it presents full-text search.
Chapter 12, Testing, covers some aspects of the software testing process and how it can be applied to databases. Unit tests for databases can be written as SQL scripts or stored functions in a database. There are several frameworks that help us write unit tests and process the results of testing.
Chapter 13, PostgreSQL JDBC, introduces the JDBC API. It covers basic operations, including executing SQL statements and accessing their results as well as more advanced features such as executing stored procedures and accessing the metainformation of databases and tables.
Chapter 14, PostgreSQL and Hibernate, covers the concept of Object-Relational Mapping, which is introduced using the Hibernate framework. This chapter explains how to execute CRUD operations in Hibernate and fetch strategies and associative mappings and also covers techniques such as caching and pooling for performance optimization.
What you need for this book
In general, PostgreSQL server and client tools do not need an exceptional hardware. PostgreSQL can be installed on almost all modern platforms, including Linux, Windows, and Mac. Also, in the book, when a certain library is needed, the installation instructions are given.
The example provided in this book requires PostgreSQL version 9.4; however, most of the examples can be executed on earlier versions as well. In order to execute the sample code, scripts, and examples provided in the book, you need to have at least a PostgreSQL client tool installed on your machine—preferably psql—and access to a remote server running the PostgreSQL server. In a Windows environment, the cmd.exe command prompt is not very convenient; thus, the user might consider using Cygwin http://www.cygwin.com/ or another alternative such as Powershell.
For some chapters, such as Chapter 13, PostgreSQL JDBC and Chapter 14, PostgreSQL and Hibernate, one needs to install a development kit (JDK). Also, it is convenient to use the NetBeans or Eclipse integrated development environment (IDE).
Who this book is for
If you are a student, database developer, or an administrator interested in developing and maintaining a PostgreSQL database, this book is for you. No knowledge of database programming or administration is necessary.
Conventions
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: The customer_service associates the customer and the service relations.
A block of code is set as follows:
carportal_app
>
account
>
account_id
>
Any command-line input or output is written as follows:
SELECT first_name, last_name, service_id FROM customer AS c CROSS JOIN customer_service AS cs WHERE c.customer_id=cs.customer_id AND c.customer_id = 3;
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: Another option is to use a Linux emulator such as Cygwin and MobaXterm.
Note
Warnings or important notes appear in a box like this.
Tip
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
Downloading the example code
You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Errata
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
Questions
If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.
Chapter 1. Relational Databases
This chapter will provide a high-level overview of topics related to database development. Understanding the basic relational database concepts enables the developers to not only come up with clean designs, but also to master relational databases. This chapter is not restricted to learning PostgreSQL, but covers all relational databases.
The topics covered in this chapter include the following:
Database management systems: Understanding the different database categories enables the developer to utilize the best in each world.
Relational algebra: Understanding relational algebra enables the developers to master the SQL language, especially, SQL code rewriting.
Data modeling: Using data modeling techniques leads to better communication.
Database management systems
Different database management systems support diverse application scenarios, use cases, and requirements. Database management systems have a long history. First we will quickly take a look at the recent history, and then explore the market-dominant database management system categories.
A brief history
Broadly, the term database can be used to present a collection of things. Moreover, this term brings to mind many other terms including data, information, data structure, and management. A database can be defined as a collection or a repository of data, which has a certain structure, managed by a database management system (DBMS). Data can be structured as tabular data, semi-structured as XML documents, or unstructured data that does not fit a predefined data model.
In early days, databases were mainly aimed at supporting business applications; this led us to the well-defined relational algebra and relational database systems. With the introduction of object-oriented languages, new paradigms of database management systems appeared such as object-relational databases and object-oriented databases. Also, many businesses as well as scientific applications use arrays, images, and spatial data; thus, new models such as raster, map, and array algebra are supported. Graph databases are used to support graph queries such as the shortest path from one node to another along with supporting traversal queries easily.
With the advent of web applications such as social portals, it is now necessary to support a huge number of requests in a distributed manner. This has led to another new paradigm of databases called NoSQL (Not Only SQL) which has different requirements such as performance over fault tolerance and horizontal scaling capabilities.
In general, the timeline of database evolution was greatly affected by many factors such as:
Functional requirements: The nature of the applications using a DBMS has led to the development of extensions on top of relational databases such as PostGIS (for spatial data) or even dedicated DBMS such as SCI-DB (for scientific data analytics).
Nonfunctional requirements: The success of object-oriented programming languages has created new trends such as object-oriented databases. Object relational database management systems have appeared to bridge the gap between relational databases and the object-oriented programming languages. Data explosion and the necessity to handle terabytes of data on commodity hardware have led to columnar databases, which can easily scale up horizontally.
Database categories
Many database models have appeared and vanished such as the network model and hierarchal model. The predominant categories now in the market are relational, object-relational databases, and NoSQL databases. One should not think of NoSQL and SQL databases as rivals; they are complementary to each other. By utilizing different database systems, one can overcome many limitations, and get the best of different technologies.
The NoSQL databases
The NoSQL databases are affected by the CAP theorem, also known as Brewer's theorem. In 2002, S. Gilbert and N. Lynch published a formal proof of the CAP theorem in their article: Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services
. In 2009, the NoSQL movement began. Currently, there are over 150 NoSQL databases (nosql-database.org).
The CAP theorem
The CAP theorem states that it is impossible for a distributed computing system to simultaneously provide all three of the following guarantees:
Consistency: All clients see (immediately) the latest data even in the case of updates.
Availability: All clients can find a replica of some data even in the case of a node failure. That means even if some part of the system goes down, the clients can still access the data.
Partition tolerance: The system continues to work regardless of arbitrary message loss or failure of part of the system.
The choice of which feature to discard determines the nature of the system. For example, one could sacrifice consistency to get a scalable, simple, and high-performance database management system.
Often, the main difference between a relational database and a NoSQL database is consistency. A relational database enforces ACID. ACID is the acronym for the following properties: Atomicity, Consistency, Isolation, and Durability. In contrast, many NoSQL databases adopt the basically available soft-state, eventual-consistency (BASE) model.
NoSQL motivation
A NoSQL database provides a means for data storage, manipulation, and retrieval for non-relational data. The NoSQL databases are distributed, open source and horizontally scalable. NoSQL often adopts the BASE model, which prizes availability over consistency, and informally guarantees that if no new updates are made on a data item, eventually all access to that data item will return the latest version of that data item. The advantages of this approach include the following:
Simplicity of design
Horizontal scaling and easy replication
Schema free
Huge amount of data support
We will now explore a few types of NoSQL databases.
Key value databases
The key value store is the simplest database store. In this database model, the storage, as its name suggests, is based on maps or hash tables. Some key-value databases allow complex values to be stored as lists and hash tables. Key-value pairs are extremely fast for certain scenarios, but lack the support for complex queries and aggregation. Some of the existing open source key-value databases are Riak, Redis, Memebase, and MemcacheDB.
Columnar databases
Columnar or column-oriented databases are based on columns. Data in a certain column in a two dimensional relation is stored together. Unlike relational databases, adding columns is inexpensive, and is done on a row-by-row basis. Rows can have a different set of columns. Tables can benefit from this structure by eliminating the storage cost of the null values. This model is best suited for distributed databases. HBase is one of the most famous columnar databases. It is based on the Google big table storage system. Column-oriented databases are designed for huge data scenarios, so they scale up easily. For small datasets, HBase is not a suitable architecture. First, the recommended hardware topology for HBase is a five-node or server deployment. Also, it needs a lot of administration, and is difficult to master and learn.
Document databases
A document-oriented database is suitable for documents and semi-structured data. The central concept of a document-oriented database is the notion of a document. Documents encapsulate and encode data (or information) in some standard formats or encodings such as XML, JSON, and BSON. Documents do not adhere to a standard schema or have the same structure; so, they provide a high degree of flexibility. Unlike relational databases, changing the structure of the document is simple, and does not lock the clients from accessing the data.
Document databases merge the power of relational databases and column-oriented databases. They provide support for ad-hoc queries, and can be scaled up easily. Depending on the design of the document database, MongoDB is designed to handle a huge amount of data efficiently. On the other hand, CouchDB provides high availability even in the case of hardware failure.
Graph databases
Graph databases are based on the graph theory, where a database consists of nodes and edges. The nodes as well as the edges can be assigned data. Graph databases allow traversing between the nodes using edges. Since a graph is a generic data structure, graph databases are capable of representing different data. A famous implementation of an open source commercially supported graph databases is Neo4j.
Relational and object relational databases
Relational database management systems are one of the most-used DBMSs in the world. It is highly unlikely that any organization, institution, or personal computer today does not have or use a piece of software that does not rely on RBDMS. Software applications can use relational databases via dedicated database servers or via lightweight RDBMS engines, embedded in the software applications as shared libraries.
The capabilities of a relational database management system vary from