Computer Science 371
Topic: SQL Queries
Related Reading: Ch. 5, my personal overview of SQL queries.
Due: 8:00pm, Monday 3 October 2005
For this assignment you must work individually. Please make sure you adhere to the policies on academic integrity in this regard.
You are free to discuss any of the "practice" problems which you may find on the website, but you should not discuss the specific problems to be submitted with anyone other than the instructor.
The goal of the assignment is to create SQL queries that
generate each of the desired views of the data.
As a data set, we will use the cia table as well as the
movie,actor,casting tables, as originally
provided on sqlzoo.net.
You may either test out your queries directly on that website (make
sure to select their Postgres engine), or alternatively you can test
out your entire script on turing. We have placed copies (or near
copies?) of those tables into a csci371 database on turing.
To connect to that database using psql, type
You should submit a single SQL script which includes queries for all of the questiosn below. You should not submit the generated output, only the SQL query which can be used to answer the question. In fact, if you wish to see the output for the queries below (other than the extra credit), here they are!
Please follow the instructions precisely as to the desired result and the format of the user's external view. The most likely mistakes are those caused by misreading the goal. Also, your queries must be designed to work independently of any other knowledge about the data. For example, if we asked you to give a query which identifies the name of the country with the largest gdp, the following query happens to generate the proper output on this dataset:
SELECT C.name, C.gdp FROM cia C WHERE C.gdp >= 9000000000000but it would be an incorrect answer for this homework. It is not general, as it relies on the significance of value 9000000000000.
Produce an alphabetized list of all regions that contain a country which includes the letter "z" in the country name (consider both upper and lower case z's). Make sure a chosen region is not included more than once.
Produce a table with columns (name,area) which includes all countries in either North or South America. Furthermore, make sure that all North American countries appear earlier in the table then the South American countries, and within each continent ensure that countries appear from largest to smallest areas.
Generate a table with columns (region,people) where the people column should be the cumulative population of all countries in that region. Order your table from most populated region to least populated region.
Generate a table identifying the name and gdp of the country with the largest gdp of all countries. (Gee, I wonder what country that will be!)
Generate a table with columns (name,density) that contains all countries which have a population density (i.e. people per square kilometer) which lies between 300 and 400.
Hint: some countries are listed with an area of 0, wich makes the density calculation undefined. You'll have to make sure to remove those to get an executable query. Worse yet, SQL standards do not mandate that engines "short circuit" boolean expression evaluation. That is if you type something such as "expr1 AND expr2" both expressions may get evaluated, even if expr1 is false. So try to form a query which does not rely on short circuiting.
Generate a table listing the titles and years, chronologically, of all movies which included 'Dan Aykroyd' in the cast.
Generate a table listing the titles and years, chronologically, of all movies which included both 'Dan Aykroyd' and 'Chevy Chase' in the cast.
Generate a table which includes the names of all actors who were in both 'Ghostbusters' as well as 'Ghostbusters II'.
Generate a table with columns (title, releases) listing all movie titles which have been used for two or more independent releases, as well as the total number of such releases for each title. Please order the titles alphabetically.
If you got the previous query to work, you would have noticed that no movie title has been used more than twice in the database. For this problem, you may assume this fact, namely that no title has been released three or more times.
Your goal here is to generate a table with columns (title, original, remake) that lists the same titles as with the previous problem, together with the respective years of the two releases.
Generate a table with columns (actor1,actor2), for every pair of actors who have worked together in seven or more different movies.
Note: depending on how your write such a query, it could be very computationally expensive for the database engine (in fact, sqlzoo may even timeout your query). If you want to really impress, design a query which is evaluated more quickly by the engine.
Generate a query which identifies the specific pair (actor1,actor2) who have appeared in the greatest number of movies together (or in case of a tie, all possible such pairs).
(I should note that I have not yet made the effort to try this myself, so I'm not sure of the difficulty. But in spirit, the data is there so there should be a way to phrase the query).