A Gentle Introduction to SQL

We will be playing with a website titled "A Gentle Introduction to SQL" originally developed by Andrew Cummings of the School of Computing of Napier University, Edinburgh, UK.

It is accessible at sqlzoo.net.

We will use parts of this database as time goes.


On Thursday, 11 September 2003, we explored the following tutorials from sqlzoo:
  • warmup for basic SQL queries (CIA)
  • basic SQL queries (CIA)
  • warmup for nested selects (CIA)
  • nested selects (CIA)
  • warmup for aggregate functions (CIA)
  • warmup for GROUP BY and HAVING (CIA)
  • aggregate functions (CIA)
  • By the end of this day, we had seen SQL queries of the following general from (except thus far, we have only seen queries for which the from-list was a single entry):
    SELECT   [DISTINCT] select-list
    FROM     from-list
    WHERE    qualification
    GROUP BY grouping-list
    HAVING   group-qualification
    ORDER BY order-list
    
    As important was an understanding of the (conceptual) order of evaluation for such a statement, as follows:
    1. All tables in the from line are "joined" (more below).
    2. For each row of such a table, the where qualification is evaluated to see whether or not to include the row.
    3. All of the rows chosen by WHERE are then grouped, according to the grouping-list, so that each group is aggregated into a single row.
    4. For each remaining group, the HAVING qualification is used to decide whether to include that group's row in the final results.
    5. All remaining rows are sorted according to the order-list.
    6. If DISTINCT is included in the SELECT line, then duplicates are removed (turning a multi-set of tuples into a true set of tuples)
    7. Finally, only the columns specified in the select-list are given in the final result.

    On Tuesday, 16 September 2003, we continued by revisiting the form of the from-list.
  • Range Variables
    Though it is acceptable to have a query such as:
    SELECT   name,population
    FROM     CIA
    WHERE    population > 100000000
    
    It is also permissible to assign a range variable to identify each row considered. For example, the above query can be restated using a range variable C for each country in the CIA table.
    SELECT   C.name, C.population
    FROM     CIA C
    WHERE    C.population > 100000000
    
    This notation may make it more clear that the WHERE condition is being evaluated separately for each such C in the range.


  • Correlated Nested Queries
    Thus far, the nested queries we considered were such that the inner query was independent of the outer query. For example, in finding all countries with area at least as larege as the United States, we could write the following query.
    SELECT   C.name
    FROM     CIA C
    WHERE    C.area > (SELECT area FROM cia WHERE name='United States')
    
    Such a query should behave as if the WHERE clause is evaluated separately for each such C in the range of the FROM clause. Of course, since the nested select is independent of C, there is really no reason to reevaluate that statement for each C. A query optimizer will realize this and thus evaluate the nested query once and only once.

    However, we consider the following goal: For each region, report the country with the largest area in that region. One way to structure this query is as follows:

    SELECT   C.region,C.name
    FROM     CIA C
    WHERE    C.area >=  ALL (SELECT D.area FROM cia D WHERE C.region = D.region)
    
    Note carefully the occurance of range variable C in the inner select! This means that we need to consider reevaluating the inner select for each choice of C.


  • Joins
    When more than one table name is given as part of the FROM-list, the tables are to be "joined". The result is the cross product of the sets of tuples from the joined tables. For example, if two tables are joined, the result will be a table with one column for each of the columns of the original two tables. Then tuples are created by taking every possible row of the first table combined with every possible row of the second table. The concept extends to three or more tables as well.

    To explore joins, we will start by considering another database from sqlzoo.net, modeling the Internet Movie Database (imdb.com) We explored the following tutorials from sqlzoo:

  • warmup for basic joins (movies)
  • basic joins (movies)

  • Self Joins
    Example: "comparable" countries
    SELECT C.name, D.name
    FROM   CIA C, CIA D
    WHERE  C.region = D.region
       AND C.name != D.name
       AND C.area > 10000
       AND C.area between D.area and D.area+1000
    


  • Natural Joins

  • null attribute values


  • Outer Joins vs. Inner Joins

  • Michael Goldwasser