We will be playing with a website titled "A Gentle Introduction to SQL" originally developed by Andrew Cummings of the School of Computing of Napier University, Edinburgh, UK. It is accessible at sqlzoo.net.
SELECT attribute-list FROM table-list WHERE qualifying-condition
To begin with, we will consider the case where the table-list is a single table (thus no join).
Tutorials:
Of note:
SELECT attribute-list
SELECT *will display all original attributes, in defined order
Can control the choice of and order of the resulting attributes based on the attribute-list, e.g.,
SELECT name,regionvs.
SELECT region,name
Can rename columns to choose alternate labels, e.g.,
SELECT name AS country, area AS "area (in sq. km)"
Can create new attributes on the fly, if computable from existing attributes, e.g., with numeric data
SELECT gdp/population AS percapitaor with concatenating string data,
SELECT first || ' ' || last AS name
FROM table-list
Though the FROM clause may simply be a list of one or more
existing tables, e.g.,
Though use of such an alias is unnecessary on straightforward queries, especially with only one table, it will be come helpful, and in some cases necessary, as we look at more complex queries involving nested queries and joins.
WHERE qualifying-condition
Arbitrary boolean expression can be formed using operators AND, OR, NOT and parenthesizing if not wanting default operator precedence.
Can use syntax
attr IN setto check for containment in a set, e.g.,
WHERE name IN ('France','Germany','Italy')
Can use LIKE keyword to do partial matches of strings, using '%' as wildcard for zero or more characters, e.g.,
WHERE name LIKE 'A%a'or '_' as a wildcard for exactly one character.
Can use BETWEEN keyword to check ranges, e.g.,
WHERE area BETWEEN 207600 AND 244820
From a pure relational algebra point of view, a table instance is a set of tuples, where no two tuples are identical and where the order of the tuples is arbitrary. Yet for SQL, we often may want to process the results of a query to ensure that only distinct tuples appear in the results and we may want to specify the order in which the tuples are displayed.
As for the first desire, of removing duplicated, this is accomplished by adding the optional keyword DISTINCT after the keyword SELECT yet before the list of attributes.
The order can be controlled by adding an ORDER BY clause to the end of a query. Together the syntax for a query appears as:
SELECT [DISTINCT] attribute-list FROM table-list WHERE qualifying-condition [ORDER BY attribute-list]Note that you can specify DISTINCT without ORDER BY and likewise ORDER BY without DISTINCT.
Tutorials:
Of note:
If more than one attribute is given in the ORDER BY list, the tuples will primarily be sorted by the first attribute, but in case of a tie sorted by the secondary attribute, and in case of a further tie the third, and so on.
By default, the sorting based upon an attribute is so that the values are in ascending order. You may change that to descending order by optionally placing the keyword DESC after an attribute on the ORDER BY list.
There are several built-in functions which can take a set of many tuples and compute aggregate results for the entire group. The most commonly used such functions are MAX, MIN, SUM, AVG, COUNT. The result of a query using an aggregate function is a single tuple with the aggregate value(s).
Of note for aggregates:
Once you include an aggregate function in the SELECT list, you may only use aggregate functions, not regular attributes. For example, if all countries have been aggregated to compute the average area, it no longer makes sense to refer to the name attribute, as the names are not all the same.
You may combine DISTINCT with a particular
aggregate expression by placing the keyword within the
parentheses, e.g.,
The expression COUNT is generally used in the context of COUNT(*) which counts the overall number of tuples. However COUNT can be used with other attributes, in which case it counts the number of tuples with non-null values for such an attribute, e.g. COUNT(area), or the number of distinct non-null values as in COUNT(DISTINCT area).
As discussed above, when using aggregates the default behavior is that all tuples are grouped together. However it is possible to break the overall list of tuples into several groups rather than one. This is done by specifying a list of attributes for grouping, where the underlying tuples are then grouped based on those with common values for those attributes.
Furthermore, once many such groups are formed, we have one additional chance to filter out which of those groups interest us. This is done through a HAVING clause which gives a boolean condition. The HAVING clause for groups is analogous to the WHERE clause for selecting the original tuples.
At this point, we have seen the general syntax
SELECT [DISTINCT] select-list FROM from-list WHERE qualification GROUP BY grouping-list HAVING group-qualification ORDER BY order-list
It is important to understand the (conceptual) order of evaluation for such a statement:
Tutorials:
Of note for grouping:
If using a GROUP BY clause, the only expressions which can then appear in the SELECT clause are either aggregates or original attributes which are among those used for the grouping. Notice that since all tuples in a particular group have a common value for one of those attributes, that attribute value is well-defined for the group.
Uncorrelated Nested Queries
The most typical use of nested queries is when the
inner selection is self-sufficient and can be
independently evaluated. An example is
SELECT C.name FROM cia C WHERE C.area > (SELECT area FROM cia WHERE name='United States')Which finds all countries with area at least as large as the United States.
Queries generally behave as if the WHERE clause is evaluated separately for each such C in the range of the FROM clause. Of course, since the nested select is independent of C, there is really no reason to reevaluate that statement for each C. A query optimizer will realize this and thus evaluate the nested query once and only once.
Correlated Nested Queries
In some cases, we wish to use a nested query which is not
independent of the contest. Consider the following goal:
For each region, report the country with the largest area in that
region. One way to structure this query is as follows:
SELECT C.region,C.name FROM cia C WHERE C.area >= ALL (SELECT D.area FROM cia D WHERE C.region = D.region)Note carefully the occurrence of range variable C in the inner select! This means that we need to consider reevaluating the inner select for each choice of C.
Nested Queries in the FROM clause
Though typically, the table(s) in the from clause are
actual tables, it is possible to put a nested select
statement in the from clause, with that resulting table
used as the source for answering the larger query. Here
is one simple (though unnecessary) example of such a
syntax, for finding all European countries which start
with the letter A.
SELECT C.name FROM (SELECT name FROM cia WHERE region='Europe') AS European WHERE European.name LIKE 'A%'Note that when creating such a from table on-the-fly, we will want to give it a name with an associated range variable (European, in this example). Note that this example could have been written more directly without need for such a technique, by expanding the outer WHERE clause to do the filtering for region='Europe'. If there is a more clear approach, one should avoid using nested queries as part of the from clause.
There are several operators which allow you to work with one or more sets. For all of the following, we let (...) represent a general set, typical that of a nested select statement.
For the final four operators above which take two sets as operands, you must ensure that the two queries return a set of results of equivalent form (i.e. have tuples with the same types of attributes in the same order).
When more than one table name is given as part of the FROM-list, the tables are to be "joined". The result is the cross product of the sets of tuples from the joined tables. For example, if two tables are joined, the result will be a table with one column for each of the columns of the original two tables. Then tuples are created by taking every possible row of the first table combined with every possible row of the second table. The concept extends to three or more tables as well.
Tutorials:
Special cases of joins:
Equi-joins
The most common type of a join is when the WHERE clause
consists entirely of a conjunction of equality tests
Natural Joins
In fact, if the names of the attributes in an Equi-join
are all the same, we call this a natural join.
Self Joins
Example: "comparable" countries
SELECT C.name, D.name FROM cia C, cia D WHERE C.region = D.region AND C.name != D.name AND C.area > 10000 AND C.area between D.area and D.area+1000
Outer Joins
When joining table A to table B, the standard join
includes a tuple from A paired with a tuple of B, only if
the specified WHERE condition is satisfied. If a tuple
from A does not 'match' any tuple from B, it does not
participate in the result. Similarly, a tuple from B
which does not 'match' any tuple from A is not a part of
the result.
However, there may be times where we still wish to have such unmatched tuples appear in the result. In a "left outer join", the result includes all tuples from the regular join as well as one tuple for each entry of the original left table which was never matched. That tuple will be padded with NULL values for any expected attributes that are not otherwise known.
In similar fashion, a right outer join will include resulting tuples from the regular join as well as one tuple for each tuple of the original right table which went unmatched. Finally, an "outer join" is equivalent to the union of the left and right outer joins.
A standard join of two tables in SQL uses a FROM syntax of
The corresponding syntax for outer joins in SQL is
Though we are used to writing boolean conditions, which are either true or false, it is important to note that NULL values, when they exist in the database, are treated in a special way. What we may think of as a boolean expression is actually viewed as an expression with three possible results true, false, and unknown. Based on this, we can expand the standard true tables for boolean operators like AND, OR, NOT in the sensible way.
Yet it is still important to know what happens to a tuple in a query if the WHERE clause evaluates to unknown. In this case, it is not included in the result. Only those tuples for which the WHERE clause was unquestionably true will appear.
Yet for constraints, or CHECK expressions, the opposite convention is used. If enforcing a check, the system will only stop operations which unquestionably lead to a violation of the check. If the boolean expression evaluated to unknown, the benefit of the doubt is given and the constraint is presumed to be satisfied.
There are also special operators to check if an attribute is a NULL value or not, namely IS NULL and IS NOT NULL.