Sqlfordevscom Next Level Database Techniques For Developers Pages 21 30
Sqlfordevscom Next Level Database Techniques For Developers Pages 21 30
Sqlfordevscom Next Level Database Techniques For Developers Pages 21 30
-- PostgreSQL
SELECT DISTINCT ON (customer_id) *
FROM orders
WHERE EXTRACT (YEAR FROM created_at) = 2022
ORDER BY customer_id ASC, price DESC;
Sometimes you have numerous rows, and you only want one for e.g. every customer. You
can stick to a for-each-loop-like lateral join as described before or use PostgreSQL's
DISTINCT ON invention. A standard DISTINCT query will filter rows with exact matches on
all columns of a row. But with the showcased feature, you can specify a subset of columns to
make distinct and only the first matching row after the sort will be kept.
21
Multiple Aggregates In One Query
-- MySQL
SELECT
SUM(released_at = 2001) AS released_2001,
SUM(released_at = 2002) AS released_2002,
SUM(director = 'Steven Spielberg') AS director_stevenspielberg,
SUM(director = 'James Cameron') AS director_jamescameron
FROM movies
WHERE streamingservice = 'Netflix';
-- PostgreSQL
SELECT
COUNT(*) FILTER (WHERE released_at = 2001) AS released_2001,
COUNT(*) FILTER (WHERE released_at = 2002) AS released_2002,
COUNT(*) FILTER (WHERE director = 'Steven Spielberg') AS
director_stevenspielberg,
COUNT(*) FILTER (WHERE director = 'James Cameron') AS
director_jamescameron
FROM movies
WHERE streamingservice = 'Netflix';
In some cases, you need to calculate multiple different statistics. Instead of executing
numerous queries, you can write one which will collect all the information in one single pass
through the data. Depending on your data and indexes this could speed up or slow down
your execution time. You should definitely test it on your application.
Notice: I have written a more extensive text about this topic on my database
focused website SqlForDevs.com: Multiple Aggregates in One Query
22
Limit Rows Also Including Ties
-- PostgreSQL
SELECT *
FROM teams
ORDER BY winning_games DESC
FETCH FIRST 3 ROWS WITH TIES;
Imagine you want to rank the teams of a sports league and show the top three ones. In rare
cases, at least 2 teams will have the same amount of winning games at the end of the
season. If they are both on 3rd place you may want to expand your limit to include both of
them. The WITH TIES option is doing precisely that. Whenever some rows would be
excluded despite having the same values as those included, they are included too although
the limit is exceeded.
23
Fast Row Count Estimates
-- MySQL
EXPLAIN FORMAT=TREE SELECT * FROM movies WHERE rating = 'NC-17' AND
price < 4.99;
-- PostgreSQL
EXPLAIN SELECT * FROM movies WHERE rating = 'NC-17' AND price < 4.99;
Showing the number of matching rows is a crucial feature for most applications, but it is
sometimes hard to implement for large databases. The larger a database is, the slower
counting the number of rows will be. The query will be very slow when no index exists to
help calculate the count. But even an existing index will not make counting hundreds of
thousands of index fast. However, an approximate count of rows may be good enough for
some use cases. The database's query planner always calculates an approximate row count
for a query that can be extracted by asking the database for the execution plan.
24
Date-Based Statistical Queries With Gap-Filling
-- MySQL
SET cte_max_recursion_depth = 4294967295;
WITH RECURSIVE dates_without_gaps(day) AS (
SELECT DATE_SUB(CURRENT_DATE, INTERVAL 14 DAY) as day
UNION ALL
SELECT DATE_ADD(day, INTERVAL 1 DAY) as day
FROM dates_without_gaps
WHERE day < CURRENT_DATE
)
SELECT dates_without_gaps.day, COALESCE(SUM(statistics.count), 0)
FROM dates_without_gaps
LEFT JOIN statistics ON(statistics.day = dates_without_gaps.day)
GROUP BY dates_without_gaps.day;
-- PostgreSQL
SELECT dates_without_gaps.day, COALESCE(SUM(statistics.count), 0)
FROM generate_series(
CURRENT_DATE - INTERVAL '14 days',
CURRENT_DATE,
'1 day'
) as dates_without_gaps(day)
LEFT JOIN statistics ON(statistics.day = dates_without_gaps.day)
GROUP BY dates_without_gaps.day;
The results for some statistical calculations will have gaps because no information was
saved for specific days. But instead of back-filling these holes with application code, the
database query can be restructured: A sequence of gapless values is created as source for
joining to the statistical data. For PostgreSQL the generate_series function might be used to
create the sequence, whereas for MySQL the same needs to be performed manually using a
recursive common table expression (CTE).
Notice: I have written a more extensive text about this topic on my database
focused website SqlForDevs.com: Fill Gaps in Statistical Time Series Results
25
Table Joins With A For-Each Loop
-- MySQL, PostgreSQL
SELECT customers.*, recent_sales.*
FROM customers
LEFT JOIN LATERAL (
SELECT *
FROM sales
WHERE sales.customer_id = customers.id
ORDER BY created_at DESC
LIMIT 3
) AS recent_sales ON true;
When joining tables, the rows of both tables are linked together based on some conditions.
However, the joining condition can only include all matching rows of the different table. It is
impossible to control the number of rows for every iteration of the join to e.g. limit the
bought products for every customer to just the last three ones.
The special lateral join type combines a join and a subquery. A subquery will be executed for
every row of the join's source table. Within that subquery, you can e.g. select only the last
three bought products of a customer. And as you already selected only matching sales for
every customer, a special true join condition indicates that all rows will be used. You can
now make for-each loops within your database. You've learned the holy grail of SQL!
Notice: I have written a more extensive text about this topic on my database
focused website SqlForDevs.com: For each loops with LATERAL Joins
26
Schema
The schema is probably the most crucial part of your database. The more complex your
schema is, the slower new developers will be able to work on your application. But it also
provides the possibility to go new ways and make them more straightforward by using
modern database features. Actually, many of those features can offload a lot of custom
application logic to the database and make development faster.
The schema chapter will show you how e.g. JSON documents can replace many tables, data
can be saved for faster querying or a simpler approach for storing trees.
27
Rows Without Overlapping Dates
Preventing e.g. multiple concurrent reservations for a meeting room is a complicated task
because of race conditions. Without pessimistic locking by the application or careful
planning, simultaneous requests can create room reservations for the exact timeframe or
overlapping ones. The work can be offloaded to the database with an exclusion constraint
that will prevent any overlapping ranges for the same room number. This safety feature is
available for integer, numeric, date and timestamp ranges.
28
Store Trees As Materialized Paths
-- MySQL
CREATE TABLE tree (path varchar(255));
INSERT INTO tree (path) VALUES ('Food');
INSERT INTO tree (path) VALUES ('Food.Fruit');
INSERT INTO tree (path) VALUES ('Food.Fruit.Cherry');
INSERT INTO tree (path) VALUES ('Food.Fruit.Banana');
INSERT INTO tree (path) VALUES ('Food.Meat');
INSERT INTO tree (path) VALUES ('Food.Meat.Beaf');
INSERT INTO tree (path) VALUES ('Food.Meat.Pork');
SELECT * FROM tree WHERE path like 'Food.Fruit.%';
SELECT * FROM tree WHERE path IN('Food', 'Food.Fruit');
-- PostgreSQL
CREATE EXTENSION ltree;
CREATE TABLE tree (path ltree);
INSERT INTO tree (path) VALUES ('Food');
INSERT INTO tree (path) VALUES ('Food.Fruit');
INSERT INTO tree (path) VALUES ('Food.Fruit.Cherry');
INSERT INTO tree (path) VALUES ('Food.Fruit.Banana');
INSERT INTO tree (path) VALUES ('Food.Meat');
INSERT INTO tree (path) VALUES ('Food.Meat.Beaf');
INSERT INTO tree (path) VALUES ('Food.Meat.Pork');
SELECT * FROM tree WHERE path ~ 'Food.Fruit.*{1,}';
SELECT * FROM tree WHERE path @> subpath('Food.Fruit.Banana', 0, -1);
You can use the lesser-known materialized path approach in addition to the widely known
nested set and adjacency list approaches for storing trees. Every row stores the materialized
path within the tree to itself, making queries for tree searching relatively easy. With
PostgreSQL you'll get a wide range of querying and manipulation functionality provided by
the label tree extension. While for MySQL you'll have to do use simple text searching
functionalities.
29
JSON Columns to Combine NoSQL and Relational
Databases
-- MySQL
CREATE TABLE books (
id bigint PRIMARY KEY,
author_id bigint NOT NULL,
category_id bigint NOT NULL,
name varchar(255) NOT NULL,
price numeric(15, 2) NOT NULL,
attributes json NOT NULL DEFAULT '{}'
);
-- PostgreSQL
CREATE TABLE books (
id bigint PRIMARY KEY,
author_id bigint NOT NULL,
category_id bigint NOT NULL,
name text NOT NULL,
price numeric(15, 2) NOT NULL,
attributes jsonb NOT NULL DEFAULT '{}'
);
Many database schemas can be simplified by copying ideas from NoSQL databases. The
querying and data modification logic will be a lot easier by e.g. avoiding a lot of joins or
complex architectures like the Entity–Attribute–Value approach (EAV). However, you should
still continue storing most your data in a standard relational schema. You can use JSON
columns to simply the schema following these rules:
Move seldom-used data (e.g. joined from other tables) into JSON arrays and objects
for easier querying.
Think thoroughly whether you store references to other tables in JSON documents as
you can't enforce foreign-key relationships. You need a good reason to do so.
Never used deeply nested collections. Any modifications and queries to those
documents will be a mess.
Notice: I have already written a more extensive text about this topic on
SqlForDevs.com: JSON columns
30