[an error occurred while processing this directive] [an error occurred while processing this directive]


Assignment [an error occurred while processing this directive]

Overview

Topic: SQL Queries
Related Reading: Ch. 3.4, 5.1-5.5, as well as details on the sqlzoo.net database
Due: 12:45pm Thursday, 25 September 2003 [an error occurred while processing this directive]
For this assignment, you may either work alone or with one other person. If you do work as a pair, both of you should submit a copy of the solutions and you should make sure that both names are at the top of the file.

You are also encouraged to seek help from the instructor, whether you are working alone or as a pair.


The goal of the assignment is to create SQL queries that generate each of the desired views of the data. You do not need to submit the generated output, only the SQL query which can be used. We will use the tables from sqlzoo.net and assume you are using the choice of Oracle engine there.

Your answers should be typed in a single file and submitted electronically in the folder /home/cs341/Submit/_username_/sqlhw/ on the machine patel2.

Please follow the instructions precisely as to the desired result and the format of the user's external view. The most likely mistakes are those caused by misreading the goal.

Also, your queries must be designed to work independently of any other knowledge about the data. For example, if we asked you to give a query which identifies the name of the country with the largest gdp, the following query happens to generate the proper output on this dataset:

SELECT C.name, C.gdp
FROM cia C
WHERE C.gdp >= 9000000000000
but it would be an incorrect answer for this homework. It is not general, as it relies on the significance of value 9000000000000.


  1. Produce an alphabetized list of all regions that contain a country which includes the letter "z" in the country name (consider both upper and lower case z's). Make sure a chosen region is not included more than once.

  2. Produce a table with columns (name,area) which includes all countries in either North or South America. Furthermore, make sure that all North American countries appear earlier in the table then the South American countries, and within each continent ensure that countries appear from largest to smallest areas.

  3. Generate a table with columns (region,people) where the people column should be the cumulative population of all countries in that region. Order your table from most populated region to least populated region.

  4. Generate a table identifying the name and gdp of the country with the largest gdp of all countries. (Gee, I wonder what country that will be!)

  5. Generate a table with columns (name,density) that contains all countries which have a population density (i.e. people per square kilometer) which lies between 300 and 400.

    Hint: some countries are listed with an area of 0, wich makes the density calculation undefined. You'll have to make sure to remove those to get an executable query. Worse yet, SQL standards do not mandate that engines "short circuit" boolean expression evaluation. That is if you type something such as "expr1 AND expr2" both expressions may get evaluated, even if expr1 is false. So try to form a query which does not rely on short circuiting.

  6. Generate a table listing the titles and years, chronologically, of all movies which included 'Dan Aykroyd' in the cast.

  7. Generate a table listing the titles and years, chronologically, of all movies which included both 'Dan Aykroyd' and 'Chevy Chase' in the cast.

  8. Generate a table which includes the names of all actors who were in both 'Ghostbusters' as well as 'Ghostbusters II'.

    Hint: INTERSECT

  9. Generate a table with columns (title, releases) listing all movie titles which have been used for two or more independent releases, as well as the total number of such releases for each title. Please order the titles alphabetically.

  10. If you got the previous query to work, you would have noticed that no movie title has been used more than twice in the database. For this problem, you may assume this fact, namely that no title has been released three or more times.

    Your goal here is to generate a table with columns (title, original, remake) that lists the same titles as with the previous problem, together with the respective years of the two releases.

    Hint: self-join

Extra Credit

  1. Generate a table with columns (actor1,actor2), for every pair of actors who have worked together in ten or more different movies.

    Note: depending on how your write such a query, it could be very computationally expensive for the database engine (in fact, sqlzoo may even timeout your query). If you want to really impress, design a query which is evaluated more quickly by the engine.

  2. Generate a query which identifies the specific pair (actor1,actor2) who have appeared in the greatest number of movies together (or in case of a tie, all possible such pairs).

    (I should note that I have not yet made the effort to try this myself, so I'm not sure of the difficulty. But in spirit, the data is there so there should be a way to phrase the query).


[an error occurred while processing this directive]
[an error occurred while processing this directive]