Assignments | Course Home | PostgreSQL | Schedule & Lecture Notes | Submit

Saint Louis University

Computer Science 371
Databases

Michael Goldwasser

Fall 2005

Dept. of Math & Computer Science

Homework Assignment 03

SQL Queries

Contents:


Overview

Topic: SQL Queries
Related Reading: Ch. 5, my personal overview of SQL queries.
Due: 8:00pm, Monday 3 October 2005


Collaboration Policy

For this assignment you must work individually. Please make sure you adhere to the policies on academic integrity in this regard.

You are free to discuss any of the "practice" problems which you may find on the website, but you should not discuss the specific problems to be submitted with anyone other than the instructor.


Problems to be Submitted (20 points)

The goal of the assignment is to create SQL queries that generate each of the desired views of the data. As a data set, we will use the cia table as well as the movie,actor,casting tables, as originally provided on sqlzoo.net. You may either test out your queries directly on that website (make sure to select their Postgres engine), or alternatively you can test out your entire script on turing. We have placed copies (or near copies?) of those tables into a csci371 database on turing. To connect to that database using psql, type psql -d csci371 at the shell prompt, or once within psql, type \c csci371.

You should submit a single SQL script which includes queries for all of the questiosn below. You should not submit the generated output, only the SQL query which can be used to answer the question. In fact, if you wish to see the output for the queries below (other than the extra credit), here they are!

Please follow the instructions precisely as to the desired result and the format of the user's external view. The most likely mistakes are those caused by misreading the goal. Also, your queries must be designed to work independently of any other knowledge about the data. For example, if we asked you to give a query which identifies the name of the country with the largest gdp, the following query happens to generate the proper output on this dataset:

SELECT C.name, C.gdp
FROM cia C
WHERE C.gdp >= 9000000000000
but it would be an incorrect answer for this homework. It is not general, as it relies on the significance of value 9000000000000.

  1. (2 points)

    Produce an alphabetized list of all regions that contain a country which includes the letter "z" in the country name (consider both upper and lower case z's). Make sure a chosen region is not included more than once.

  2. (2 points)

    Produce a table with columns (name,area) which includes all countries in either North or South America. Furthermore, make sure that all North American countries appear earlier in the table then the South American countries, and within each continent ensure that countries appear from largest to smallest areas.

  3. (2 points)

    Generate a table with columns (region,people) where the people column should be the cumulative population of all countries in that region. Order your table from most populated region to least populated region.

  4. (2 points)

    Generate a table identifying the name and gdp of the country with the largest gdp of all countries. (Gee, I wonder what country that will be!)

  5. (2 points)

    Generate a table with columns (name,density) that contains all countries which have a population density (i.e. people per square kilometer) which lies between 300 and 400.

    Hint: some countries are listed with an area of 0, wich makes the density calculation undefined. You'll have to make sure to remove those to get an executable query. Worse yet, SQL standards do not mandate that engines "short circuit" boolean expression evaluation. That is if you type something such as "expr1 AND expr2" both expressions may get evaluated, even if expr1 is false. So try to form a query which does not rely on short circuiting.

  6. (2 points)

    Generate a table listing the titles and years, chronologically, of all movies which included 'Dan Aykroyd' in the cast.

  7. (2 points)

    Generate a table listing the titles and years, chronologically, of all movies which included both 'Dan Aykroyd' and 'Chevy Chase' in the cast.

  8. (2 points)

    Generate a table which includes the names of all actors who were in both 'Ghostbusters' as well as 'Ghostbusters II'.

    Hint: INTERSECT

  9. (2 points)

    Generate a table with columns (title, releases) listing all movie titles which have been used for two or more independent releases, as well as the total number of such releases for each title. Please order the titles alphabetically.

  10. (2 points)

    If you got the previous query to work, you would have noticed that no movie title has been used more than twice in the database. For this problem, you may assume this fact, namely that no title has been released three or more times.

    Your goal here is to generate a table with columns (title, original, remake) that lists the same titles as with the previous problem, together with the respective years of the two releases.

    Hint: self-join


Extra Credit

  1. (1 point)

    Generate a table with columns (actor1,actor2), for every pair of actors who have worked together in seven or more different movies.

    Note: depending on how your write such a query, it could be very computationally expensive for the database engine (in fact, sqlzoo may even timeout your query). If you want to really impress, design a query which is evaluated more quickly by the engine.

  2. (1 point)

    Generate a query which identifies the specific pair (actor1,actor2) who have appeared in the greatest number of movies together (or in case of a tie, all possible such pairs).

    (I should note that I have not yet made the effort to try this myself, so I'm not sure of the difficulty. But in spirit, the data is there so there should be a way to phrase the query).


Michael Goldwasser
CSCI 371, Fall 2005
Last modified: Thursday, 29 September 2005
Assignments | Course Home | PostgreSQL | Schedule & Lecture Notes | Submit