Overall Reading | |
---|---|
Brookshear: | Ch. 9.1, 9.3 |
Outline:
We saw some data structures which might help do this efficiently.
However, as we described it then, this was one-dimensional.
That is, we were only allowed to search for information based on one
particular field of information (the name).
More generally, we want databases to handle information about many different items, each of which may have many different fields of data.
Employee Record:
If efficiency important, perhaps I should keep two copies of all of the records, with one version sorted by name and another version sorted by birthdate.
For instance, payroll should have access to my salary and home address, but general employees should probably only see my name, office, and phone number. My manager might know my salary, but not my social security number.
The database portrays the data as if it were stored in rectangular tables, called relations.
Example (from Fig. 9.3):
EmplID | Name | Address | SSNum |
---|---|---|---|
25X15 | Joe E. Baker | 33 Nowhere St. | 111223333 |
34Y70 | Cheryl H. Clark | 563 Downtown Ave. | 999009999 |
23Y34 | G. Jerry Smith | 1555 Circle Dr. | 111005555 |
. | . | . | . |
. | . | . | . |
. | . | . | . |
We call one row of this table a tuple.
Name | Birthday | Zodiac | Birthstone |
---|---|---|---|
Art Alexakis | Apr. 12 | Aries | diamond |
Hank Azaria | Apr. 25 | Tauras | diamond |
Antonio Banderas | Aug. 10 | Leo | peridot |
Lucas Black | Nov. 29 | Sagittarius | citrine |
Matthew Broderick | Mar. 21 | Pisces | aquamarine |
Sandra Bullock | July 26 | Leo | ruby |
Steve Buscemi | Dec. 13 | Sagittarius | turquoise |
Nicolas Cage | Jan. 7 | Capricorn | garnet |
Jim Carrey | Jan. 17 | Capricorn | garnet |
George Clooney | May 6 | Tauras | emerald |
Courteney Cox | June 15 | Gemini | pearl |
John Cusack | June 28 | Cancer | pearl |
Joan Cusack | Oct. 11 | Libra | opal |
Matt Damon | Oct. 8 | Libra | opal |
Robert De Niro | Aug. 17 | Leo | peridot |
Leonardo DiCaprio | Nov. 11 | Scorpio | citrine |
Cameron Diaz | Aug. 30 | Virgo | peridot |
Stephen Dorff | July 29 | Leo | ruby |
Minnie Driver | Jan. 31 | Aquarius | garnet |
Calista Flockhart | Nov. 11 | Scorpio | citrine |
Harrison Ford | July 13 | Cancer | ruby |
Jodie Foster | Nov. 19 | Scorpio | citrine |
Mel Gibson | Jan. 3 | Capricorn | garnet |
Joseph Gordon-Levitt | Feb. 17 | Aquarius | amethyst |
Tom Hanks | July 9 | Cancer | ruby |
Dustin Hoffman | Aug. 8 | Leo | peridot |
Timothy Hutton | Aug. 16 | Leo | peridot |
Jeremy Irons | Sep. 19 | Virgo | sapphire |
Joshua Jackson | June 11 | Gemini | pearl |
Greg Kinnear | June 17 | Gemini | pearl |
Kevin Kline | Oct. 24 | Scorpio | opal |
Jeremy London | Nov. 7 | Scorpio | citrine |
Courtney Love | July 9 | Cancer | ruby |
Peter MacNicol | Apr. 10 | Aries | diamond |
William H. Macy | Mar. 13 | Pisces | aquamarine |
John Malkovich | Dec. 9 | Sagittarius | turquoise |
Julianna Margulies | June 8 | Gemini | pearl |
Matthew McConaughey | Nov. 4 | Scorpio | citrine |
Dylan McDermott | Oct. 26 | Scorpio | opal |
Frances McDormand | June 23 | Cancer | pearl |
Jack Nicholson | Apr. 22 | Aries | diamond |
Edward Norton | Aug. 18 | Leo | peridot |
Gwyneth Paltrow | Sep. 28 | Libra | sapphire |
Sean Penn | Aug. 17 | Leo | peridot |
Matthew Perry | Aug. 19 | Leo | peridot |
Michelle Pfeiffer | Apr. 29 | Tauras | diamond |
Brad Pitt | Dec. 18 | Sagittarius | turquoise |
Natalie Portman | June 9 | Gemini | pearl |
Aidan Quinn | Mar. 8 | Pisces | aquamarine |
Giovanni Ribisi | Mar. 31 | Aries | aquamarine |
Julia Roberts | Oct. 28 | Scorpio | opal |
Keri Russell | Mar. 23 | Pisces | aquamarine |
Adam Sandler | Sep. 9 | Virgo | sapphire |
Susan Sarandon | Oct. 4 | Libra | opal |
Rick Schroder | Apr. 13 | Aries | diamond |
Gary Sinise | Mar. 17 | Pisces | aquamarine |
Kevin Spacey | July 26 | Leo | ruby |
Rider Strong | Dec. 11 | Sagittarius | turquoise |
Billy Bob Thornton | Aug. 4 | Leo | peridot |
Uma Thurman | Apr. 29 | Tauras | diamond |
Skeet Ulrich | Jan. 20 | Capricorn | garnet |
Eddie Vedder | Dec. 23 | Capricorn | turquoise |
Robin Williams | July 21 | Cancer | ruby |
Kate Winslet | Oct. 5 | Libra | opal |
Robin Wright Penn | Apr. 8 | Aries | diamond |
Though this may contain the information that we want, there are several problems with such a structure.
A better way: We can instead represent our database using a combination of three relations, as follows:
Name | Birthday |
---|---|
Art Alexakis | Apr. 12 |
Hank Azaria | Apr. 25 |
Antonio Banderas | Aug. 10 |
Lucas Black | Nov. 29 |
Matthew Broderick | Mar. 21 |
Sandra Bullock | July 26 |
Steve Buscemi | Dec. 13 |
Nicolas Cage | Jan. 7 |
Jim Carrey | Jan. 17 |
George Clooney | May 6 |
Courteney Cox | June 15 |
John Cusack | June 28 |
Joan Cusack | Oct. 11 |
Matt Damon | Oct. 8 |
Robert De Niro | Aug. 17 |
Leonardo DiCaprio | Nov. 11 |
Cameron Diaz | Aug. 30 |
Stephen Dorff | July 29 |
Minnie Driver | Jan. 31 |
Calista Flockhart | Nov. 11 |
Harrison Ford | July 13 |
Jodie Foster | Nov. 19 |
Mel Gibson | Jan. 3 |
Joseph Gordon-Levitt | Feb. 17 |
Tom Hanks | July 9 |
Dustin Hoffman | Aug. 8 |
Timothy Hutton | Aug. 16 |
Jeremy Irons | Sep. 19 |
Joshua Jackson | June 11 |
Greg Kinnear | June 17 |
Kevin Kline | Oct. 24 |
Jeremy London | Nov. 7 |
Courtney Love | July 9 |
Peter MacNicol | Apr. 10 |
William H. Macy | Mar. 13 |
John Malkovich | Dec. 9 |
Julianna Margulies | June 8 |
Matthew McConaughey | Nov. 4 |
Dylan McDermott | Oct. 26 |
Frances McDormand | June 23 |
Jack Nicholson | Apr. 22 |
Edward Norton | Aug. 18 |
Gwyneth Paltrow | Sep. 28 |
Sean Penn | Aug. 17 |
Matthew Perry | Aug. 19 |
Michelle Pfeiffer | Apr. 29 |
Brad Pitt | Dec. 18 |
Natalie Portman | June 9 |
Aidan Quinn | Mar. 8 |
Giovanni Ribisi | Mar. 31 |
Julia Roberts | Oct. 28 |
Keri Russell | Mar. 23 |
Adam Sandler | Sep. 9 |
Susan Sarandon | Oct. 4 |
Rick Schroder | Apr. 13 |
Gary Sinise | Mar. 17 |
Kevin Spacey | July 26 |
Rider Strong | Dec. 11 |
Billy Bob Thornton | Aug. 4 |
Uma Thurman | Apr. 29 |
Skeet Ulrich | Jan. 20 |
Eddie Vedder | Dec. 23 |
Robin Williams | July 21 |
Kate Winslet | Oct. 5 |
Robin Wright Penn | Apr. 8 |
StartDate | EndDate | Birthstone |
---|---|---|
January 1 | January 31 | garnet |
February 1 | February 29 | amethyst |
March 1 | March 31 | aquamarine |
April 1 | April 30 | diamond |
May 1 | May 31 | emerald |
June 1 | June 30 | pearl |
July 1 | July 31 | ruby |
August 1 | August 31 | peridot |
September 1 | September 30 | sapphire |
October 1 | October 31 | opal |
November 1 | November 30 | citrine |
December 1 | December 31 | turquoise |
StartDate | EndDate | Zodiac |
---|---|---|
Feb. 19 | Mar. 20 | Pisces |
Mar. 21 | Apr. 19 | Aries |
Apr. 20 | May 20. | Tauras |
May 21 | June 20 | Gemini |
June 21 | July 22 | Cancer |
July 23 | Aug. 22 | Leo |
Aug. 23 | Sep. 22 | Virgo |
Sep. 23 | Oct. 22 | Libra |
Oct. 23 | Nov. 21 | Scorpio |
Nov. 22 | Dec. 21 | Sagittarius |
Dec. 22 | Jan. 19 | Capricorn |
Jan. 20 | Feb. 18 | Aquarius |
Query: Find all Capricorns
First approach (Fig. 9.4 of text):
EmplID | Name | Address | SSNum | JobID | JobTitle | SkillCode | Dept | StartDate | TermDate |
---|---|---|---|---|---|---|---|---|---|
25X15 | Joe E. Baker | 33 Nowhere St. | 111223333 | F5 | Floor manager | FM3 | Sales | 9-1-1998 | 9-30-1999 |
25X15 | Joe E. Baker | 33 Nowhere St. | 111223333 | D7 | Dept. head | D2 | Sales | 10-1-1999 | present |
34Y70 | Cheryl H. Clark | 563 Downtown Ave. | 999009999 | F5 | Floor manager | FM3 | Sales | 10-1-1998 | present |
23Y34 | G. Jerry Smith | 1555 Circle Dr. | 111005555 | S25X | Secretary | T5 | Personnel | 3-1-1996 | 4-30-1998 |
23Y34 | G. Jerry Smith | 1555 Circle Dr. | 111005555 | S25X | Secretary | T6 | Accounting | 5-1-1998 | present |
. | . | . | . | . | . | . | . | . | . |
. | . | . | . | . | . | . | . | . | . |
. | . | . | . | . | . | . | . | . | . |
Again, this suffers from the same pitfalls
An alternative solution: Use three separate relations. (Fig. 9.5)
EmplID | Name | Address | SS Num |
---|---|---|---|
25X15 | Joe E. Baker | 33 Nowhere St. | 111223333 |
34Y70 | Cheryl H. Clark | 563 Downtown Ave. | 999009999 |
23Y34 | G. Jerry Smith | 1555 Circle Dr. | 111005555 |
. | . | . | . |
. | . | . | . |
. | . | . | . |
JobID | JobTitle | SkillCode | Dept |
---|---|---|---|
S25X | Secretary | T5 | Personnel |
S26Z | Secretary | T6 | Accounting |
F5 | Floor manager | FM3 | Sales |
. | . | . | . |
. | . | . | . |
. | . | . | . |
EmplID | JobID | StartDate | TermDate |
---|---|---|---|
23Y34 | S25X | 3-1-1996 | 4-30-1998 |
34Y70 | F5 | 10-1-1998 | present |
23Y34 | S25Z | 5-1-1998 | present |
. | . | . | . |
. | . | . | . |
. | . | . | . |
Query: Find all people in Accounting
Imagine that you had the following relation:
EmplID | JobTitle | Dept |
---|---|---|
25X15 | Floor manager | Sales |
28Z25 | Secretary | Sales |
23Y34 | Secretary | Accounting |
and that you tried to replace it with a combination of the following two relations:
EmplID | JobTitle |
---|---|
25X15 | Floor manager |
28Z25 | Secretary |
23Y34 | Secretary |
JobTitle | Dept |
---|---|
Floor manager | Sales |
Secretary | Sales |
Secretary | Accounting |
Is this new formation equivalent? No!
Example: How do you determine the employees in the Sales department?
EmplID | Name | Address | SS Num |
---|---|---|---|
25X15 | Joe E. Baker | 33 Nowhere St. | 111223333 |
34Y70 | Cheryl H. Clark | 563 Downtown Ave. | 999009999 |
23Y34 | G. Jerry Smith | 1555 Circle Dr. | 111005555 |
. | . | . | . |
. | . | . | . |
. | . | . | . |
SELECT from EMPLOYEE where EmplID="34Y70"
EmplID | Name | Address | SS Num |
---|---|---|---|
34Y70 | Cheryl H. Clark | 563 Downtown Ave. | 999009999 |
EmplID | Name | Address | SS Num |
---|---|---|---|
25X15 | Joe E. Baker | 33 Nowhere St. | 111223333 |
34Y70 | Cheryl H. Clark | 563 Downtown Ave. | 999009999 |
23Y34 | G. Jerry Smith | 1555 Circle Dr. | 111005555 |
. | . | . | . |
. | . | . | . |
. | . | . | . |
PROJECT Name,Address from EMPLOYEE
Name | Address |
---|---|
Joe E. Baker | 33 Nowhere St. |
Cheryl H. Clark | 563 Downtown Ave. |
G. Jerry Smith | 1555 Circle Dr. |
. | . |
. | . |
. | . |
V | W |
---|---|
r | 2 |
t | 4 |
p | 6 |
Relation B
X | Y | Z |
---|---|---|
5 | g | p |
4 | d | e |
2 | m | q |
4 | t | f |
JOIN A and B
A.V | A.W | B.X | B.Y | B.Z |
---|---|---|---|---|
r | 2 | 5 | g | p |
r | 2 | 4 | d | e |
r | 2 | 2 | m | q |
r | 2 | 4 | t | f |
t | 4 | 5 | g | p |
t | 4 | 4 | d | e |
t | 4 | 2 | m | q |
t | 4 | 4 | t | f |
p | 6 | 5 | g | p |
p | 6 | 4 | d | e |
p | 6 | 2 | m | q |
p | 6 | 4 | t | f |
The attribute names from each of the original relations do not necessarily need to have the same names, and if they do, it may be a coincidence. However, we often do not want to join all possible pairs of tuples, but only those in which some particular attributes match.
V | W |
---|---|
r | 2 |
t | 4 |
p | 6 |
Relation B
X | Y | Z |
---|---|---|
5 | g | p |
4 | d | e |
2 | m | q |
4 | t | f |
JOIN A and B where A.W = B.X
A.V | A.W | B.X | B.Y | B.Z |
---|---|---|---|---|
r | 2 | 2 | m | q |
t | 4 | 4 | d | e |
t | 4 | 4 | t | f |
Example,
NEW1 <- JOIN ASSIGNMENT and JOB where ASSIGNMENT.JobID=JOB.JobID
NEW2 <- SELECT form NEW1 where ASSIGNMENT.TermDate="present"
LIST <- PROJECT ASSIGNMENT.EmplID,JOB.Dept from NEW2
Also, it should be noted that the user interface for many databases will present these same concepts, however using different syntax or graphical interaction.
Though its terminology differs somewhat from what we have introduced above, a single SQL statement may be used to express a combination of SELECT, PROJECT and JOIN operations.
Format:
select <list of attributes> from <list of tables> where <list of conditions>Let's revisit some sample queries:
PROJECT Name, Address from EMPLOYEE
In SQL:
select Name, Address from EMPLOYEE[Note well: the terminology 'select' in SQL is not the same as the operation SELECT we have discussed]
select EmplID, Name, Address, SSN from EMPLOYEE where EmplID = '34Y70'
select * from A, B where A.W = B.X
select EmplID, Dept from ASSIGNMENT, JOB where ASSIGNMENT.JobID = JOB.JobID and ASSIGNMENT.TermDate = "present"
SQL statements can use additional features not mentioned in the text. Just a few such examples are:
name region area population gdp
Afghanistan
Asia
652000
25838797
21000000000
Albania
Europe
28748
3490435
5600000000
Algeria
Africa
2381740
31193917
147600000000
.
.
.
.
.
.
.
.
.
.
id title yr score votes
1
Star Wars
1977
8.8
53567
2
Shawshank Redemption, The
1994
9
44974
3
Pulp Fiction
1994
8.6
43993
4
Titanic
1997
7.2
43371
.
.
.
.
.
.
.
.
.
.
id name
1
Woody Allen
2
Clint Eastwood
3
Robert De Niro
4
Sean Connery
.
.
.
.
movieid actorid ord
972
588
1
849
588
2
1575
588
3
47
590
4
.
.
.
.
.
.