SQL Tutorial: SELECT Statement - Extended Query Capabilities
SQL Tutorial: SELECT Statement - Extended Query Capabilities
ORDER BY Clause
The ORDER BY clause is optional. If used, it must be the last clause in the SELECT
statement. The ORDER BY clause requests sorting for the results of a query.
When the ORDER BY clause is missing, the result rows from a query have no defined
order (they are unordered). The ORDER BY clause defines the ordering of rows based
on columns from the SELECT clause. The ORDER BY clause has the following general
format:
ORDER BY sorts rows using the ordering columns in left-to-right, major-to-minor order.
The rows are sorted first on the first column name in the list. If there are any duplicate
values for the first column, the duplicates are sorted on the second column (within the
first column sort) in the Order By list, and so on. There is no defined inner ordering for
rows that have duplicate values for all Order By columns.
Database nulls require special processing in ORDER BY. A null column sorts higher than
all regular values; this is reversed for DESC.
In sorting, nulls are considered duplicates of each other for ORDER BY. Sorting on
hidden information makes no sense in utilizing the results of a query. This is also why
SQL only allows select list columns in ORDER BY.
For convenience when using expressions in the select list, select items can be specified
by number (starting with 1). Names and numbers can be intermixed.
Example queries:
Expressions
In the previous subsection on basic Select statements, column values are used in the
select list and where predicate. SQL allows a scalar value expression to be used instead.
A SQL value expression can be a:
Literals
• String -- ASCII text framed by single quotes ('). Within a literal, a single quote is
represented by 2 single quotes ('').
• Numeric -- numeric digits (at least 1) with an optional decimal point and
exponent. The format is
[ddd][[.]ddd][E[+|-]ddd]
Numeric literals with no exponent or decimal point are typed as Integer. Those
with a decimal point but no exponent are typed as Decimal. Those with an
exponent are typed as Float.
• Datetime -- datetime literals begin with a keyword identifying the type, followed
by a string literal:
o Date -- DATE 'yyyy-mm-dd'
o Time -- TIME 'hh:mm:ss[.fff]'
o Timestamp -- TIMESTAMP 'yyyy-mm-dd hh:mm:ss[.fff]'
o Interval -- INTERVAL [+|-] string interval-qualifier
The format of the string in the Interval literal depends on the interval qualifier.
For year-month intervals, the format is: 'dd[-dd]'. For day-time intervals, the
format is '[dd ]dd[:dd[:dd]][.fff]'.
SQL Functions
Extracts a substring from a string - exp-1, beginning at the integer value - exp-2,
for the length of the integer value - exp-3. exp-2 is 1 relative. If FOR exp-3 is
omitted, the length of the remaining string is used. Returns the substring.
• UPPER(exp-1)
• LOWER(exp-1)
Trims leading, trailing or both characters from a string - exp-1. The trim character
is a space, or if exp-2 is specified, it supplies the trim character. If LEADING,
TRAILING, BOTH are missing, the default is BOTH. Returns the trimmed string.
• POSITION(exp-1 IN exp-2)
• CHAR_LENGTH(exp-1)
CHARACTER_LENGTH(exp-1)
• OCTET_LENGTH(exp-1)
Returns the integer number of octets (8-bit bytes) needed to represent the string -
exp-1.
Returns the numeric sub-field extracted from a datetime value - exp-1. sub-field is
YEAR, QUARTER, MONTH, DAY, HOUR, MINUTE, SECOND,
TIMEZONE_HOUR or TIMEZONE_MINUTE. TIMEZONE_HOUR and
TIMEZONE_MINUTE extract sub-fields from the Timezone portion of exp-1.
QUARTER is (MONTH-1)/4+1.
System Values
SQL System Values are reserved names used to access builtin values:
• CAST(exp-1 AS data-type)
Converts the value - exp-1, into the specified date-type. Returns the converted
value.
Returns exp-1 if it is not null, otherwise returns exp-2 if it is not null, otherwise
returns exp-3, and so on. Returns null if all values are null.
• CASE exp-1 { WHEN exp-2 THEN exp-3 } ... [ELSE exp-4] END
CASE { WHEN predicate-1 THEN exp-3 } ... [ELSE exp-4] END
The first form of the CASE construct compares exp-1 to exp-2 in each WHEN
clause. If a match is found, CASE returns exp-3 from the corresponding THEN
clause. If no matches are found, it returns exp-4 from the ELSE clause or null if
the ELSE clause is omitted.
The second form of the CASE construct evaluates predicate-1 in each WHEN
clause. If the predicate is true, CASE returns exp-3 from the corresponding THEN
clause. If no predicates evaluate to true, it returns exp-4 from the ELSE clause or
null if the ELSE clause is omitted.
Expression Operators
• String Operators
There is just one string operator - ||, for string concatenation. Both operands of ||
must be strings. The operator concatenates the second string to the end of the first.
For example,
'ab' || 'cd' ==> 'abcd'
• Numeric operators
o + -- addition
o - -- subtraction
o * -- multiplication
o / -- division
All numeric operators can be used on the standard numeric data types:
The numeric operators can be applied to datetime values, with some restrictions.
The basic rules for datetime expressions are:
A special form can be used to subtract a date, time, timestamp value from another
date, time, timestamp value to yield an interval value:
The interval-qualifier specifies the specific interval type for the result.
? interval-qualifier
In expressions, parentheses are used for grouping.
Joining Tables
The FROM clause allows more than 1 table in its list, however simply listing more than
one table will very rarely produce the expected results. The rows from one table must be
correlated with the rows of the others. This correlation is known as joining.
An example can best illustrate the rationale behind joins. The following query:
A more usable query would correlate the rows from sp with rows from p, for instance
matching on the common column -- pno:
SELECT *
FROM sp, p
WHERE sp.pno = p.pno
This produces:
sno pno qty pno descr color
S1 P1 NULL P1 Widget Blue
S2 P1 200 P1 Widget Blue
S3 P1 1000 P1 Widget Blue
S3 P2 200 P2 Widget Red
Rows for each part in p are combined with rows in sp for the same part by matching on
part number (pno). In this query, the WHERE Clause provides the join predicate,
matching pno from p with pno from sp.
The join in this example is known as an inner equi-join. equi meaning that the join
predicate uses = (equals) to match the join columns. Other types of joins use different
comparison operators. For example, a query might use a greater-than join.
The term inner means only rows that match are included. Rows in the first table that have
no matching rows in the second table are excluded and vice versa (in the above join, the
row in p with pno P3 is not included in the result.) An outer join includes unmatched
rows in the result. See Outer Join below.
More than 2 tables can participate in a join. This is basically just an extension of a 2 table
join. 3 tables -- a, b, c, might be joined in various ways:
Plus several other variations. With inner joins, this structure is not explicit. It is implicit
in the nature of the join predicates. With outer joins, it is explicit; see below.
Outer Joins
An inner join excludes rows from either table that don't have a matching row in the other
table. An outer join provides the ability to include unmatched rows in the query results.
The outer join combines the unmatched row in one of the tables with an artificial row for
the other table. This artificial row has all columns set to null.
The outer join is specified in the FROM clause and has the following general format:
• LEFT -- only unmatched rows from the left side table (table-1) are retained
• RIGHT -- only unmatched rows from the right side table (table-2) are retained
• FULL -- unmatched rows from both tables (table-1 and table-2) are retained
Self Joins
A query can join a table to itself. Self joins have a number of real world uses. For
example, a self join can determine which parts have more than one supplier:
SELECT DISTINCT a.pno
FROM sp a, sp b
WHERE a.pno = b.pno
AND a.sno <> b.sno
pno
P1
As illustrated in the above example, self joins use correlation names to distinguish
columns in the select list and where predicate. In this case, the references to the same
table are renamed - a and b.
Subqueries
Predicate Subqueries
Predicate subqueries are used in the WHERE (and HAVING) clause. Each is a special
logical construct. Except for EXISTS, predicate subqueries must retrieve one column (in
their select list.)
• IN Subquery
The IN Subquery tests whether a scalar value matches the single query column
value in any subquery result row. It has the following general format:
SELECT *
FROM p
WHERE pno IN (SELECT pno FROM sp)
pno descr color
P1 Widget Blue
P2 Widget Red
The Self Join example in the previous subsection can be expressed with an IN
Subquery:
Note that the subquery where clause references a column in the outer query
(a.sno). This is known as an outer reference. Subqueries with outer references are
sometimes known as correlated subqueries.
• Quantified Subqueries
A quantified subquery allows several types of tests and can use the full set of
comparison operators. It has the following general format:
The comparison operator specifies how to compare value-1 to the single query
column value from each subquery result row. The ANY, ALL, SOME specifiers
give the type of match expected. ANY and SOME must match at least one row in
the subquery. ALL must match all rows in the subquery, or the subquery must be
empty (produce no rows).
SELECT *
FROM p
WHERE pno =ANY (SELECT pno FROM sp)
pno descr color
P1 Widget Blue
P2 Widget Red
A self join is used to list the supplier with the highest quantity of each part
(ignoring null quantities):
SELECT *
FROM sp a
WHERE qty >ALL (SELECT qty FROM sp b
WHERE a.pno = b.pno
AND a.sno <> b.sno
AND qty IS NOT NULL)
sno pno qty
S3 P1 1000
S3 P2 200
• EXISTS Subqueries
The EXISTS Subquery tests whether a subquery retrieves at least one row, that is,
whether a qualifying row exists. It has the following general format
EXISTS(query-1)
Note: the select list in the EXISTS subquery is not actually used in evaluating the
EXISTS, so it can contain any valid select list (though * is normally used).
To list parts that have suppliers:
SELECT *
FROM p
WHERE EXISTS(SELECT * FROM sp WHERE p.pno = sp.pno)
pno descr color
P1 Widget Blue
P2 Widget Red
Scalar Subqueries
The Scalar Subquery can be used anywhere a value can be used. The subquery must
reference just one column in the select list. It must also retrieve no more than one row.
When the subquery returns a single row, the value of the single select list column
becomes the value of the Scalar Subquery. When the subquery returns no rows, a
database null is used as the result of the subquery. Should the subquery retreive more
than one row, it is a run-time error and aborts query execution.
A Scalar Subquery can appear as a scalar value in the select list and where predicate of an
another query. The following query on the sp table uses a Scalar Subquery in the select
list to retrieve the supplier city associated with the supplier number (sno column in sp):
Table Subqueries
Table Subqueries are queries used in the FROM clause, replacing a table name. Basically,
the result set of the Table Subquery acts like a base table in the from list. Table
Subqueries can have a correlation name in the from list. They can also be in outer joins.
Grouping Queries
A Grouping Query is a special type of query that groups and summarizes rows. It uses the
GROUP BY Clause.
A Grouping Query groups rows based on common values in a set of grouping columns.
Rows with the same values for the grouping columns are placed in distinct groups. Each
group is treated as a single row in the query result.
Even though a group is treated as a single row, the underlying rows can be subject to
summary operations known as Set Functions whose results can be included in the query.
The optional HAVING Clause supports filtering for group rows in the same manner as
the WHERE clause filters FROM rows.
For example, grouping the sp table on the pno column produces 2 groups:
GROUP BY Clause
GROUP BY is an optional clause in a query. It follows the WHERE clause or the FROM
clause if the WHERE clause is missing. A query containing a GROUP BY clause is a
Grouping Query. The GROUP BY clause has the following general format:
GROUP BY column-1 [, column-2] ...
column-1 and column-2 are the grouping columns. They must be names of columns from
tables in the FROM clause; they can't be expressions.
GROUP BY operates on the rows from the FROM clause as filtered by the WHERE
clause. It collects the rows into groups based on common values in the grouping columns.
Except nulls, rows with the same set of values for the grouping columns are placed in the
same group. If any grouping column for a row contains a null, the row is given its own
group.
For example,
SELECT pno
FROM sp
GROUP BY pno
pno
P1
P2
In Grouping Queries, the select list can only contain grouping columns, plus literals,
outer references and expression involving these elements. Non-grouping columns from
the underlying FROM tables cannot be referenced directly. However, non-grouping
columns can be used in the select list as arguments to Set Functions. Set Functions
summarize columns from the underlying rows of a group.
Set Functions
Set Functions are special summarizing functions used with Grouping Queries and
Aggregate Queries. They summarize columns from the underlying rows of a group or
aggregate.
Using the Group By example from above, grouping the sp table on the pno column:
Null columns are ignored in computing the summary. The Set Function -- SUM,
computes the arithmetic sum of a numeric column in a set of grouped/aggregate rows.
For example,
SELECT pno, SUM(qty)
FROM sp
GROUP BY pno
pno
P1 1200
P2 200
Set Functions have the following general format:
set-function ( [DISTINCT|ALL] column-1 )
set-function is:
The result of the COUNT function is always integer. The result of all other Set Functions
is the same data type as the argument.
The Set Functions skip columns with nulls, summarizing non-null values. COUNT
counts rows with non-null values, AVG averages non-null values, and so on. COUNT
returns 0 when no non-null column values are found; the other functions return null when
there are no values to summarize.
The DISTINCT and ALL specifiers are optional. ALL specifies that all non-null values
are summarized; it is the default. DISTINCT specifies that distinct column values are
summarized; duplicate values are skipped. Note: DISTINCT has no effect on MIN and
MAX results.
COUNT(*)
... which counts the underlying rows regardless of column contents.
HAVING Clause
The HAVING Clause is associated with Grouping Queries and Aggregate Queries. It is
optional in both cases. In Grouping Queries, it follows the GROUP BY clause. In
Aggregate Queries, HAVING follows the WHERE clause or the FROM clause if the
WHERE clause is missing.
HAVING predicate
Like the WHERE Clause, HAVING filters the query result rows. WHERE filters the
rows from the FROM clause. HAVING filters the grouped rows (from the GROUP BY
clause) or the aggregate row (for Aggregate Queries).
predicate is a logical expression referencing grouped columns and set functions. It has
the same restrictions as the select list for Grouping Queries and Aggregate Queries.
If the Having predicate evaluates to true for a grouped or aggregate row, the row is
included in the query result, otherwise, the row is skipped (not included in the query
result).
For example,
Aggregate Queries
An Aggregate Query can use Set Functions and a HAVING Clause. It is similar to a
Grouping Query except there are no grouping columns. The underlying rows from the
FROM and WHERE clauses are grouped into a single aggregate row. An Aggregate
Query always returns a single row, except when the Having clause is used.
An Aggregate Query is a query containing Set Functions in the select list but no GROUP
BY clause. The Set Functions operate on the columns of the underlying rows of the single
aggregate row. Except for outer references, any columns used in the select list must be
arguments to Set Functions. See Set Functions above.
An aggregate query may also have a Having clause. The Having clause filters the single
aggregate row. If the Having predicate evaluates to true, the query result contains the
aggregate row. Otherwise, the query result contains no rows. See HAVING Clause
above.
For example,
Union Queries
The SQL UNION operator combines the results of two queries into a composite result.
The component queries can be SELECT/FROM queries with optional WHERE/GROUP
BY/HAVING clauses. The UNION operator has the following general format:
query-1 UNION [ALL] query-2
query-1 and query-2 are full query specifications. The UNION operator creates a new
query result that includes rows from each component query.
By default, UNION eliminates duplicate rows in its composite results. The optional ALL
specifier requests that duplicates be retained in the UNION result.
The component queries of a Union Query can also be Union Queries themselves.
Parentheses are used for grouping queries.
The select lists from the component queries must be union-compatible. They must match
in degree (number of columns). For Entry Level SQL92, the column descriptor (data type
and precision, scale) for each corresponding column must match. The rules for
Intermediate Level SQL92 are less restrictive. See Union-Compatible Queries.
Union-Compatible Queries
For Entry Level SQL92, each corresponding column of both queries must have the same
column descriptor in order for two queries to be union-compatible. The rules are less
restrictive for Intermediate Level SQL92. It supports automatic conversion within type
categories. In general, the resulting data type will be the broader type. The corresponding
columns need only be in the same data type category:
UNION Examples
SELECT * FROM sp
UNION
SELECT CAST(' ' AS VARCHAR(5)), pno, CAST(0 AS INT)
FROM p
WHERE pno NOT IN (SELECT pno FROM sp)
sno pno qty
S1 P1 NULL
S2 P1 200
S3 P1 1000
S3 P2 200
P3 0
The remaining SQL-Data Statements (SQL DML) are the SQL Modification Statements,
described in the next sub-section: