DB Associates Report Spring2008
DB Associates Report Spring2008
DB Associates Report Spring2008
Project Report
Database Associates
Vit Bubak
Lian Duan
Ray Hylock
Todd Papke
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
Table of Contents
Note the mapping of each chapter to the specified page(s).
1
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
Introduction
While the IMDb (http://www.imdb.com/) movie database serves as a useful repository of movie
information, it’s use as a source for aggregate movie reviews is limited. While sites like Rotten Tomatoes
(http://www.rottentomatoes.com/) serve as a community portal for reviewers to come together as a
community and collectively rate movies, they fall short in their ability to allow the user to quickly track the
contributing artists that are part of the movie production (i.e. directors, actors, producers, etc.).
Additionally, box office receipts and weekly standings aren’t a component of either, but remain the focus
of sites such as Hollywood Reporter (http://www.hollywoodreporter.com/hr/index.jsp).
In order to create a more comprehensive site for EverythingMovies, the database schema must be
comprehensive enough to allow for multiple simultaneous queries (generated from HTML user forms
through a JSP tag library architecture) through a “round-robin” JDBC connection pool, while still allowing
for real time updates and contributions by the user community. Also, the schema must be designed to
allow for table abstraction across a hardware topology with an index that exists upon its own network
server (again for ease of scalability across a server topology as the connection pool grows to accommodate
the anticipated user community). Our intent is to make the database public domain with a Creative
Commons usage license. The license will allow for reuse as long as there is a click-through
“EverythingMovies” brand icon present on the web site that makes use of our database engine.
EverythingMovies will utilize a click-thru revenue sharing scheme as the primary revenue model.
While the Oracle DB architecture has historically proven to be scalable through a variety of
software and hardware optimization strategies, we also recognize that utilization of a schema that infers
Oracle exclusivity may not be in the best interest(s) of the adapting user community that we want to attract
with our data offering. Therefore, every attempt will be made to homogenize the SQL in order to allow
for data loading into other database engines, specifically PostgreSQL and MySQL. The initial POC effort
may use one of these “open source” database engines as necessary due to budgetary constraints.
Basic Requirements
The goal of our client is to create a new type of movie web site that is more comprehensive than the
popular ones that currently are available on the Internet.
The primary goal of our effort is to create a “proof of concept” system that could serve as a prototype for
illustrating the concepts to potential Venture Capital funding sources. Additionally, the database schema
will be used to iron out potential scalability issues that might arise if views are required that weren’t
considered during database design (this could arise, for instance, if the initial VC round comes with
functional considerations that were outside the initial scope of the project).
2
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
The three different types of users that will use our database are:
1. Clients of the database - people who want to use the database to find information about movies. This
group can be potentially devided in (a) casual clients who would search the database for any (common)
information about the movies, their actors, directors, awards won, et cetera and (b) specialized clients
with more complicated search requests. In either case, the implementation of the query interface is the
same for most of the users.
2. Contributors to the database - people who are going to add new information to the database. These
people differ from the system administrators in that they only add information based on prespecified
constraints.
3. System administrators - people who will manage and upgrade/alter/program the database. This group,
however, is – in every sense – the same as the database administrators.
Below, we include examples of the queries that the two basic types of users of the database (defined as
clients and contributors above) might find useful. (Note that the examples of the queries given are further
discussed in Chapter 4 (Queries) and the results of the queries are given in Chapter 6 (Interface and
Reports).
3
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
4
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
5
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
6
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
7
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
8
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
9
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
Note 1a)
Cardinalities of the binary relationships between two strong classes are included in the data dictionary.
Example: in the binary relationship class Hand_out {ORGANIZATIONS:AWARDS}, included in the
description of the Class Entity {ORGANIZATIONS}, the cardinality showed is [1:M], that is one that
states how many awards an organization can receive.
Note 1b)
Cardinalities of the binary relationships between one strong and one (derived) weak class follow the
same description as Note 1a) and are included in the data dictionary.
Note 2)
Cardinalities of the ternary relationships that include one derived weak class are as follow. Note that, in
the data dictionary, the cardinalities for the ternary relationships of this type are described as if the
relationships were effectively binary. In those cases, therefore, the cardinalities in the dictionary follow
the reasoning as in Note 1)
Each actor can act in zero to many shows CARD-R-CO(Act_in, ACTORS, SHOWS) IN [0:M]
Each show can have from one to many actors CARD-R-CO(Act_in, SHOWS, ACTORS) IN [1:M]
Note 4)
Cardinalities of the ternary relationships are not included directly in the data dictionary. Instead, they are
market by Cardinality [↓] and are discussed below.
Cardinalities in the ternary relationship Won_by {SHOWS:AWARD_INSTANCES:PEOPLE}
Each {SHOWS} and {PEOPLE} combination can win from [0:M] awards
CARD-R-CO(Won_by, SHOWS, AWARD_INSTANCES, PEOPLE) IN [0:M]
Each {SHOWS} and {AWARD_INSTANCES} combination can have from [0:M] people
CARD-R-CO(Won_by, SHOWS, AWARD_INSTANCES, PEOPLE) IN [0:M]
Each {PEOPLE} and {AWARD_INSTANCES } combination win from [0:1] shows
CARD-R-CO(Won_by, SHOWS, AWARD_INSTANCES, PEOPLE) IN [0:1]
Note 5)
Cardinalities of the aggregate class – derived class relationship are also included in the data dictionary.
Example: in the relationship between the aggregate class {COUNTRY_GROUPS} and
{COUNTRIES}, the cardinality [1:M] showed at {COUNTRY_GROUPS} in the data dictionary shows
the number of countries that can make up a country group. Similarly, cardinality [1:1] shows the number
of country groups that a country can belong to (i.e., at least one and at most one!)
Note 6)
Cardinalities of the multivalued attributed not included in the data dictionary are as follows.
Screen names can have 0 to many values: CARD-A(ACTORS, screenFN) IN [0:M]
CARD-A(ACTORS, screenMN) IN [0:M]
CARD-A(ACTORS, screenLN) IN [0:M]
Actors can play one to many roles: CARD-A(ACT_HISTORY, role) IN [1:M]
10
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
ACTORS(personID)
foreign key (personID) references PEOPLE(personID) ON DELETE CASCADE
In 4NF since there are no non-key attributes.
DIRECTORS(personID)
foreign key (personID) references PEOPLE(personID) ON DELETE CASCADE
In 4NF since there are no non-key attributes.
WRITERS(personID)
foreign key (personID) references PEOPLE(personID) ON DELETE CASCADE
In 4NF since there are no non-key attributes.
PRODUCERS(personID)
foreign key (personID) references PEOPLE(personID) ON DELETE CASCADE
In 4NF since there are no non-key attributes.
COMPOSERS(personID)
foreign key (personID) references PEOPLE(personID) ON DELETE CASCADE
In 4NF since there are no non-key attributes.
11
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
EDITORS(personID)
foreign key (personID) references PEOPLE(personID) ON DELETE CASCADE
In 4NF since there are no non-key attributes.
COLLECTIONS(colID)
In 4NF since there are no non-key attributes.
HAVE_FILMS(showID)
foreign key (showID) references FILMS(showID) ON DELETE CASCADE
In 4NF since there are no non-key attributes.
12
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
DIRECT(personID, showID)
foreign key (personID) references DIRECTORS(personID) ON DELETE SET NULL
foreign key (showsID) references SHOWS(showsID) ON DELETE SET NULL
In 4NF since there are no non-key attributes.
WRITE(personID, showID)
foreign key (personID) references WRITERS(personID) ON DELETE SET NULL
foreign key (showsID) references SHOWS(showsID) ON DELETE SET NULL
In 4NF since there are no non-key attributes.
PRODUCE(personID, showID)
foreign key (personID) references PRODUCERS(personID) ON DELETE SET NULL
foreign key (showsID) references SHOWS(showsID) ON DELETE SET NULL
In 4NF since there are no non-key attributes.
COMPOSE(personID, showID)
foreign key (personID) references COMPOSERS(personID) ON DELETE SET NULL
foreign key (showsID) references SHOWS(showsID) ON DELETE SET NULL
In 4NF since there are no non-key attributes.
EDIT(personID, showID)
foreign key (personID) references EDITORS(personID) ON DELETE SET NULL
foreign key (showsID) references SHOWS(showsID) ON DELETE SET NULL
In 4NF since there are no non-key attributes.
13
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
F = {personID, showID, salID - > amount}: In 4NF since the only non-key attribute cannot be a determinant
and transitivity does not exist because there is only one non-key attribute.
DISTRIBUTORS(distID, name)
F = {distID -> name}: In 4NF since the only non-key attribute cannot be a determinant and transitivity does
not exist because there is only one non-key attribute.
DISTRIBUTE(distID, showID)
foreign key (distID) references DISTRIBUTORS(distID) ON DELETE SET NULL
foreign key (showID) references SHOWS(showID) ON DELETE SET NULL
In 4NF since there are no non-key attributes.
RATINGS(rateID, rating)
check (rating >= 0) and (rating <= 5)
F = {rateID -> rating}: In 4NF since the only non-key attribute cannot be a determinant and transitivity does
not exist because there is only one non-key attribute.
RECEIVE(showID, rateID)
foreign key (showID) references SHOWS(showID) ON DELETE CASCADE
foreign key (rateID) references RATINGS(rateID) ON DELETE SET CASCADE
In 4NF since there are no non-key attributes.
COUNTRIES(countryID, name)
F = {countryID -> name}: In 4NF since the only non-key attribute cannot be a determinant and transitivity
does not exist because there is only one non-key attribute.
COUNTRY_GROUPS(cgID)
In 4NF since there are no non-key attributes.
14
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
COUNTRIES_MAKE_UP(countryID, cgID)
foreign key (countryID) references COUNTRIES(countryID) ON DELETE CASCADE
foreign key (cgID) references COUNTRY_GROUPS(cgID) ON DELETE CASCADE
F = {countryID -> cgid}: In 4NF since the only non-key attribute cannot be a determinant and transitivity does
not exist because there is only one non-key attribute.
CURRENCIES(curID, name)
F = {curID -> name}: In 4NF since the only non-key attribute cannot be a determinant and transitivity does not
exist because there is only one non-key attribute.
ORGANIZATIONS(orgID, name)
F = {orgID -> name}: In 4NF since the only non-key attribute cannot be a determinant and transitivity does not
exist because there is only one non-key attribute.
AWARDS(awardID, name)
F = {awardID -> name}: In 4NF since the only non-key attribute cannot be a determinant and transitivity does
not exist because there is only one non-key attribute.
15
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
HAND_OUT(awardID, orgID)
foreign key (awardID) references AWARDS(awardID) ON DELETE SET NULL
foreign key (orgID) references ORGANIZATIONS(orgID) ON DELETE SET NULL
F = {awardID -> orgID}: In 4NF since the only non-key attribute cannot be a determinant and transitivity does
not exist because there is only one non-key attribute.
WON_BY_PEOPLE(aiID, personID)
foreign key (aiID) references AWARD_INSTANCES(aiID) ON DELETE SET NULL
foreign key (personID) references PEOPLE(personID) ON DELETE SET NULL
In 4NF since there are no non-key attributes.
WON_BY_SHOWS(aiID, showID)
foreign key (aiID) references AWARD_INSTANCES(aiID) ON DELETE SET NULL
foreign key (showID) references SHOWS(showID) ON DELETE SET NULL
F = {aiID-> showid}: In 4NF since the only non-key attribute cannot be a determinant and transitivity does not
exist because there is only one non-key attribute.
16
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
For subclass (1): We used option A because there is relationship that involves all of subclass (PEOPLE to
AWARD_INSTANCES) and relationships that need each individual subclass and only that particular subclass.
Using option B would leave us with many relationships to AWARD_INSTANCES and we would have to add the
attributes to each subclass. If we were to use option C, that would require logic to make sure that we had
the correct subclass of people (i.e. only ACTORS for the act_in relationship) as well as adding six more
attributes to identify the type (we cannot simply use one since we have a cover type [1:M]). In this case, the
best alternative would depend on whether you would want to write more code or manage more tables. Our
choice would be to use option B since adding more tables would be easier for us to handle than more logic.
OPTION B
17
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
WON_BY_ACTORS(aiID, actorID)
foreign key (aiID) references AWARD_INSTANCES(aiID) ON DELETE SET NULL
foreign key (actorID) references ACTORS(actorID) ON DELETE SET NULL
In 4NF since there are no non-key attributes.
WON_BY_DIRECTORS(aiID, directorID)
foreign key (aiID) references AWARD_INSTANCES(aiID) ON DELETE SET NULL
foreign key (directorID) references DIRECTORS(directorID) ON DELETE SET NULL
In 4NF since there are no non-key attributes.
WON_BY_WRITERS(aiID, writerID)
foreign key (aiID) references AWARD_INSTANCES(aiID) ON DELETE SET NULL
foreign key (writerID) references WRITERS(writerID) ON DELETE SET NULL
In 4NF since there are no non-key attributes.
WON_BY_PRODUCERS(aiID, producerID)
foreign key (aiID) references AWARD_INSTANCES(aiID) ON DELETE SET NULL
foreign key (producerID) references PRODUCERS(producerID) ON DELETE SET NULL
In 4NF since there are no non-key attributes.
WON_BY_COMPOSERS(aiID, composerID)
foreign key (aiID) references AWARD_INSTANCES(aiID) ON DELETE SET NULL
foreign key (composerID) references COMPOSERS(composerID) ON DELETE SET NULL
In 4NF since there are no non-key attributes.
WON_BY_EDITORS(aiID, editorID)
foreign key (aiID) references AWARD_INSTANCES(aiID) ON DELETE SET NULL
foreign key (editorID) references EDITORS(editorID) ON DELETE SET NULL
In 4NF since there are no non-key attributes.
For subclass (2): We used option A because, again, we have relationships extending from both the
superclass and the subclasses. Using option B would increase the number of relationships by tables by 11
and we would have to add the attributes from SHOWS to the two subclasses. If we did option C, would have
to write some logic to help with the aggregate entity class COLLECTIONS (to make sure they were films) and
the typing class (for episodes). In this case, since we have a partition, we would only need to add one
attribute to SHOWS in order to differentiate between which type (film or episode) the tuple belongs to, but
we would have to add all of the attributes from both subclasses, so we will end up with at the minimum, 2
null attributes per record and at most 3. So, for our alternative, we decided to go with option C because a
little bit of logic and some empty fields are much easier to program and maintain than 11 additional tables.
18
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
OPTION C
HAVE_FILMS(showID, colID)
foreign key (showID) references SHOWS(showID) ON DELETE CASCADE
foreign key (colID) references COLLECTIONS(colID) ON DELETE CASCADE
In 4NF since there are no non-key attributes.
For subclass (3): Again, we used option A, but this time, it was simply because they were separate themes
and had different attributes. Since the superclass is the only one of the three that has a relationship, we
would not want to add complexity by using option B. So option C is the simplest alternative.
OPTION C
19
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
20
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
EDITORS Relation representing the entity subclass EDITORS; stores information on editors
PersonID Identifying number of the person char(10) FK (PEOPLE)
FD : personID personID
Check constraint: foreign key (personID) references PEOPLE(personID) ON DELETE CASCADE
SHOWS Relation representing the entity class SHOWS; stores information on Films/Shows
showID Identifying number for the show/film char(10) Primary Key
title Name of the show/film varchar2(30)
rating Rating of the show/film char(5)
language Language of the show/film varchar2(15)
genre Genre of the show/film varchar2(15)
Check constraint: rating in (‘G’, ‘PG’, ‘PG-13’, ‘R’, ‘NC-17’, ‘NR’, ‘TV-Y’,
‘TV-Y7’, ‘TV-G’, ‘TV-PG’, ‘TV-14’, ‘TV-MA’)
FD : showID title, genre, language, rating
FILMS Relation representing the sub-class FILMS; stores information on Films
showID Identifying number for the film char(10) FK (SHOWS)
year Year when the film was made Numeric(4)
runtime Year when the film was aired Date
Check constraint: (runtime > 0) or (runtime = ‘NA’)
Check constraint: foreign key (showID) references SHOWS(showID) ON DELETE SET NULL
FD : showID -> runtime, relDate
TV_SHOWS Relation representing the sub-class TV_SHOWS; stores information on TV shows
showID Identifying number for the TV show char(10) FK (SHOWS)
stardDate The date the TV show started airing Date
endDate The date the TV show ended airing Date
Check constraint: (runtime > 0) or (runtime = ‘NA’)
Check constraint: foreign key (showID) references SHOWS(showID) ON DELETE SET NULL
FD : showID -> startDate, endDate
COLLECTIONS Relation representing the Aggregate class COLLECTIONS
colID Identifying number for the collection of films char(10) Primary Key
colName Name for the collection char(10)
bonFeat Bonus features coming with the collection varchar(20)
HAVE_FILMS Relation representing the relationship HAVE_FILMS
colID Identifying number for the collection of films char(10) FK (COLECS)
showID Identifying number for the show (film) char(10) FK (SHOWS)
Check constraint: foreign key (showID) references FILMS(showID) ON DELETE CASCADE
Check constraint: foreign key (colID) references COLLECTIONS(colID) ON DELETE CASCADE
EPISODES Relation representing the instatiation class EPISODES; stores information on Episds
21
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
22
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
23
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
24
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
25
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
26
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
3.A Appendix
27
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
28
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
29
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
30
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
31
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
Data Population
Key to the success of the POC was the initial loading (or, population) of the database. IMDB
supplies a public domain version of their data, so we started with an initial load into our schema from
that data source. As we did not manage to locate a public domain review source, we first did a limited
“spider” sampling of a number of exiting sites in order to do a load of reviews for our database schema
tests and our initial application layer. Nevertheless, this too proved very difficult and so we proceeded
with building a parser for the IMDB database and a loader for our schema as explained in Chapter 7.
As difficult a task as building of the parser and the loader represented (in addition, we also had to
derive the relations entirely from simple keys), we succeeded in our task to an extent as to be able to
invoke/test the queries presented below.
Queries
Below, we include both the queries that we created initially for our database (see MS3) followed in each
instance by the query that we implemented at the end. The results of the queries can be found in
Chapter 6, Interface and Reports.
Query 1) Given time, find the top 10 box office movies in the week in North America.
SELECT *
FROM (SHOWS NATURAL JOIN
(SELECT showID, amount
FROM (SELECT *
FROM revenue_history
WHERE curID in (SELECT curID
FROM currencies
WHERE name='dollar')
AND rhDate > 'xx-xx-xxxx'
AND rhDate < 'xx-xx-xxxx'
AND cgID = 'abc'
ORDER BY amount DES)
WHERE rownum<=10)
);
The query was implemented in the database (see page 38) as shown below. In the implementation we let
the user search for 5 top grossing firms while leaving the currency (for which, we have no data loaded
yet) and period out. (The basic intent is to demonstrate the use of revenue histories.)
32
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
The query proposed originally (see above) was later implemented in the database as shown below (see
pages 44 to 45 for the results). Note that in the implementation, there are actually six queries that repeat
one after the other. The first is for the movies that actor has acted in and that is the first part of the
query. Basically, the query joins act_history to shows then to salaries, then to salaries_points (to
grab any percentages they may receive on top of or in lieu of). We select the person of interests records
(which comes from the query string). The second part is the same for the other five queries. We simply
join the table of interest (in the examples case direct which stores all directors and the movies directed)
to shows based on the person we are searching for.
SELECT showid, title, type, amount
FROM act_history a join shows s ON a.showid = s.showid join salaries s
ON a.showid = sa.showid and a.personid = sa.personid join salaries_points sp
ON a.showid = sp.showid and a.personid = sp.personid
WHERE personid = :PERSONID
SELECT s.showid, s.title
FROM direct a join shows s on a.showid = s.showid
WHERE personid = :PERSONID
The query proposed originally (see above) was later implemented in the database as shown below (see
page 40 for for the results). The only difference is that we only showed top five movies.
SELECT *
FROM (SELECT s.showid, title, avg(rateid) AS average, count(rateid) AS count
FROM shows s JOIN rating_history rh ON s.showid=rh.showid
GROUP BY s.showid, title
ORDER BY average DESC)
WHERE rownum <= 5;
33
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
Query 4) Given the title of the movie (e. g., “ABC”), find the all the cast members for the movie.
SELECT *
FROM people NATRUAL JOIN
actors NATURAL JOIN
(SELECT personID
FROM act_history
WHERE showID IN (SELECT showID
FROM shows
WHERE title='The bucket list')
);
SELECT name
FROM shows s JOIN act_history a
ON s.showid = a.showid JOIN people p
ON a.personid = p.personid
WHERE (s.showid = :SHOWID);
The query proposed originally (see above) was later implemented in the database as shown below (see
page 44 for the results). The query, simple at heart, was modified for the illustrative purposes as follows:
We start by joining together shows (which is where the query string begins) and act_history to get a
table of all actors that have acted in a show. Then, we join that with people to get their names. Next we
select only those where the show_id equals the query string (the one that has been selected). Then, we
take only name and print that out.
Query 5) Find all the awards won by a given movie.
SELECT *
FROM (SELECT *
FROM award_instances
WHERE aiID IN (SELECT aiID
FROM won_by_shows
WHERE showID IN (SELECT showID
FROM shows
WHERE title=”ABC”)))
NATURAL JOIN (SELECT * FROM awards NATURAL JOIN organizations);
The query proposed originally (see above) was later implemented in the database as shown below (see
page 46 for the results). Through this query, we adjusted our goal of finding all the awards won by the
movie to collect all awards won by both individuals, groups, and the shows. Hence, we have had to join
together the won_by_people and won_by_shows table in order to a full set of awards, people, and
shows. Then we joined that to shows and people to get the actual names. Then, we joined the result to
award_instances to get a history of awards. Finally, the whole result was joined back to awards to
get the award names. Then, we select the results that correspond to the showid in the query string.
34
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
In the two sections that follow (Chapter 5.1 and Chapter 5.2), we list (and comment on) some of the
triggers, procedures (and sequences) created for the site as described in the preceeding chapter.
The two triggers listed below are for entering new records into rating_history.
The calculation first determines how many months are between the current date (sysdate) and the new
date of birth. It then takes that number and divides it by 12 to get years which is then rounded to two
decimal places (this is accurate to the day). The final step is to set the new age value to the calculated age
and continue the insert or update.
This trigger can also be converted into a procedure that we could run every morning. We note that some
site (like IMDB) display today’s birthdays; this procedure can be used to ensure that no one is forgotten.
35
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
DECLARE
v_newAge people.age%type;
temp_dob people.dob%type;
BEGIN
temp_dob := :new.dob;
v_newAge := ROUND((MONTHS_BETWEEN(sysdate,temp_dob)/12),2);
:new.age := v_newAge;
END update_age;
/
The procedure from the previous page returns the following result (see the next page):
36
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
37
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
Figure 1
There are a few more additions to the site that we made in order to give it the look and feel of
multimedia sites today. The first is an image popup tooltip (Figure 2 on the next page). This image
shows up when you hover over any one of the preset movie posters on the page2. We also decided to
1 For more information and the actual code used (other than FileMaker), refer to
http://instruct.biz.uiowa.edu/courses/6K070AAA/rhylock/funStuff.htm and select the SE Code tab.
2 For the full code, refer to http://instruct.biz.uiowa.edu/courses/6K070AAA/rhylock/funStuff.htm and go to the DW Example tab and
38
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
use the light box, more specifically LightWindow v2.03, effect for movie trailer presentation, which is
increasing in popularity. The JavaScript and CSS files are very easy to install and reference. For this
project, use used references to QuickTime movies from Apple Movie Trailers4. To instantiate the light
window, we just added the following parameters to the anchor tag (just after <a href=”…” in the
HTML code): class="lightwindow page-options" params="lightwindow_width=320,
lightwindow_height=260". You can see this effect in Figure 3 below.
Figure 2
3 http://www.stickmanlabs.com/lightwindow/
4 http://www.apple.com/trailers/
39
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
Figure 3
Here is where we begin our tour. First we will cover the menu options (see Figure 4). As you
can see, there are six different options to choose from on the bar and three in the header (the logo is
also a link). Both Home and the logo point back to the home page. New Releases has two sub-options:
In Theaters and DVD. Neither work at this point, but there is a place holder. Best Movies also has two
options: Highest in US and User Rating. The first one links to Figure 5 where we can see the list of top
grossing US films. This is only to demonstrate the use of revenue histories. Currently, we only needed
to list the highest without aggregation because in order to combine all country group revenues into one
cohesive value, we would have to take into consideration, for example, exchange and inflation rates.
The query is as follows:
SELECT title, amount
FROM ( SELECT *
FROM shows s JOIN films f ON s.showid=f.showid JOIN revenue_history rh ON
s.showid=rh.showid JOIN recorded_in r ON rh.showid = r.showid AND
rh.cgid = r.cgid AND
rh.revid = r.revid JOIN currencies c ON r.curid=c.curid
WHERE c.name = 'Dollar'
ORDER BY amount desc)
WHERE rownum <= 5;
40
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
This query actually grabs the top 5, however, we have yet to find a reliable way to code in nested tables
in ASP.NET VB (which none of us are familiar with). So, for the site, we cut it back to the inner select
statement (replacing * with title, amount) for the demo. The statement itself is pretty straight forward.
For the inner select, we first join together all of the required tables, then we select the currency type
Dollar since we are only interested in the US. Finally we sort by amount descending. Then, we select
title and amount for the join results and grab the top 5.
Figure 4
Figure 5
The second option, User Rating, takes us to the top user ratings page for shows (Figure 6). Again, we
wanted to grab only the top 5, but with the nested select statement issues, we were forced to come up
with an alternative.
41
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
We then removed the outer statement and added a constraint on how low the average rating could be.
In this case we used 4.5. The modified query is below. First, we join the two tables needed,
rating_history (which stores the show ID and rate ID) and shows (to get the titles). Then, we select the
show id, title, average rate id (which returns the average rating), and the count of all uses that voted.
This is grouped together by show id and title. Then, the found set is constrained to those at or above
4.5, and finally, sorted into descending order by average votes.
The rest of the menu pages and Contact Us are not worth writing about simply because they are
place holders or have limited textual information which is irrelevant to this portion of the paper. We
will now move on the search element.
SELECT s.showid, title, avg(rateid) AS average, count(rateid) AS count
FROM shows s JOIN rating_history rh ON s.showid=rh.showid
GROUP BY s.showid, title
HAVING avg(rateid) >= 4.5
ORDER BY average DESC;
Figure 6
The search box can be found in one of two places. The first is on the left-hand side on the
home page (as seen in Figure 7) and the right-hand side elsewhere (as seen in Figure 8). Simply type in
any portion of a person’s name or show title, and the site does the rest. Say for example, you wanted to
search for anything having to do with “saving”. It could be a movie or a person (not in this case, but in
some). So, we enter in “saving” (case does not matter) and we can see our results in Figure 9.
42
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
Figure 7 Figure 8
Figure 9
The result set include Movies, Actors, Directors, Producers, Composers, Editors, and Writers. You
simply select the tab and the results are posted. The query for this is really simple. We return the entire
set of values (which are later scaled back on the site) from the shows (this is tab Movies) where the
search term NAME (which is parsed from the query string) is in the title. We convert everything to
lowercase in order to avoid any case-sensitive issues.
SELECT *
FROM shows
WHERE (lower(title) LIKE ‘%’ || lower(:NAME) || ‘%’);
After selecting a tab and object, you are taken to the details view for that object. The tour will continue
using “Saving Private Ryan”. The details for this movie are below in Figure 10. As you can see, the
basics about the movie are listed. In the full version, we of course would have all of the possible
information we could collect, but for this demo, we simply added a few items. Like the search results,
each show has a tab set as well. This is to keep the page from going on forever like we have all seen on
other site. By keeping the information tightly packed and organized, we hope to increase the ease of
which people browse for movie related information.
In Figure 10 (see the next page), there are two new categories: Awards and Revenues. Under Awards, its
lists all awards associated with the movie, both individual, group, and by show. Under Revenues, we
have a simple view of the amounts received so far by country group. Again, this is just for demo
purposes so we simply listed the values instead of actually putting them into context such as the value at
the date recorded and how comparable it is to today (inflation).
43
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
Baically, countries are clustered together by for revenue reporting. As you can see in Figure 10, country
group 1 consists of the United States, Canada, and Mexico. We just put these together to show how it
works. In reality, they would be clustered by currency values. The query groups by the group id,
recorded date, and amount and returns only all unique group results. We also have a procedure for
listing all of the countries and for the first in each group, we list the date and amount. That procedure
can be found in Chapter 5.2 (Procedures) on pages 35 and 36 (output).
Figure 10
Now, we will perform a new search for “morgan”. This will bring up actor Morgan Freeman.
Select him (the results are in Figure 11). As you can see, we have a comprehensive list of all the
categories rolled into one. This could be broken up into tabs like the others, but we will have to wait
and see if this necessary.
The query is as follows:
SELECT showid, title, type, amount
FROM act_history a join shows s
ON a.showid = s.showid join salaries sa
ON a.showid = sa.showid and a.personid = sa.personid join salaries_points sp
ON a.showid = sp.showid and a.personid = sp.personid
WHERE personid = :PERSONID
44
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
There are actually six queries that repeat one after the other. The first is for the movies that actor has
acted in and that is the first part of the query above. Basically, the query joins act_history to shows then
to salaries, then to salaries_points (to grab any percentages they may receive on top of or in lieu of). We
select the person of interests records (which comes from the query string). The second part is the same
for the other five queries. We simply join the table of interest (in the examples case direct which stores
all directors and the movies directed) to shows based on the person we are searching for.
From here, select “The Bucket List” (Figure 12). As you can see, there are two actors, Morgan Freeman
and Jack Nicolson listed.
The query for this is very simple. We start by joining together shows (which is where the query string
begins) and act_history to get a table of all actors that have acted in a show. Then, we join that with
people to get their names. Next we select only those where the show id equals the query string (the one
that has been selected). Then, we take only name and print that out.
Figure 11
45
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
Figure 12
Next, select the awards tab. As you can see (Figure 13), Jack Nicolson won an Oscar for his
performance. The query to return this value is as follows:
SELECT p.personid, p.name as name, a.name as award
FROM won_by_shows ws JOIN won_by_people wp ON
ws.aiid=wp.aiid JOIN shows s ON
ws.showid=s.showid JOIN people p ON
wp.personid=p.personid JOIN award_instances ai ON
ws.aiid=ai.aiid JOIN awards a ON
ai.awardid = a.awardid
WHERE (s.showid = :SHOWID);
Here, our goal is to collect all awards won by both individuals, groups, and the shows as mentioned
earlier. So, we have to join together the won_by_people and won_by_shows table in order to a full set
of awards, people, and shows. Then we join that to shows and people to get the actual names. Then, it
is joined to award_instances to get a history of awards. Finally, it is joined back to awards to get the
award names. They, we select the results that correspond to the show id in the query string.
Figure 13
Now we will move on the inserts. First, go back to the home page and scroll down to the
bottom and select the Admin button (Figure 14). This will bring up a list of items to edit (Figure 15).
There are only two here, the other is rating a movie which I will discuss later. Both look and feel the
same, so we are going to cover Shows. So, select Shows and we will get started.
Figure 14
46
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
Figure 15
Figure 16
In Figure 16 above, we see the view for the insert, update, and delete process. In the screen
shot, I have selected Air Force One to populate the details view to the right. Here, you can select either
to edit or delete this record, or add a new one.
Finally, this site has the ability to save user ratings of a particular movie. To get to this page,
simply select Rate A Show on any page from the links panel on either the right or left side. Once there,
click New to begin the process (this brings you to Figure 17). To rate a show, select the title from the
list box and then select a number of stars from the drop down list (5 being the best). Once you do, hit
Insert and you will have something that looks like Figure 18 which confirms your rating.
Figure 17
47
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
Figure 18
The code for this was written using Visual Basic and is listed below. Basically, it performs the insert into
the database for the specified data source, then retrieves the variables from the list box and drop down
list, and then passes those values to the review page (Figure 18) via a query string which is then parsed
by the page.
<script runat="server">
Protected Sub InsertButton_Click(ByVal sender As Object, ByVal e
As System.EventArgs)
FormView1.InsertItem(True)
Dim newshowid = CType(FormView1.FindControl("ListBox1"),
ListBox).SelectedItem.Value
Dim newrating = CType(FormView1.FindControl("DropDownList1"),
DropDownList).SelectedValue
Dim newurl =
"Http://instruct.biz.uiowa.edu/courses/6k186/6k186_databaseassociates/editR
atingsSubmit.aspx?showid
=" + newshowid + "&rating=" + newrating + "&btnSubmit"
Response.Redirect(newurl, True)
End Sub
</script>
There are many different types of functionality and security that we will add to this site in the
future. We are still far from complete when it comes to calculating currency values across time for the
purpose of comparison. Also, we need to add more insert/update/delete form for all topics. This is
where the security comes in. We would like to have the content generated much in the same was as a
Wiki page does with open editing, but we do not want it to be entirely open. We will have to come up
with some sort of validation process submissions or allow only those users with relevant backgrounds
and who have proven themselves to be trustworthy. This is still open to debate, but these are some of
the ideas. Also, the editing can be found by clicking on Admin on the home page. This of course will
be removed and replaced with a login control.
48
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
What we learned
Adapting data that already exists within a functional framework can be difficult, especially if the
application that you are creating doesn’t share all of the relations and functional aspects in an easily
mappable manner. This project was not difficult to design, but building a parser for the IMDB database
and a loader for our schema was more difficult than originally expected.
The database, while public domain, is in a text format, and the relations have to be derived entirely from
simple keys. However, due to the nature of our theme (movies), the IMDB database was the richest
target for an initial source of much of our data.
Implementation
The implementation of this system should be straight forward. We didn’t make use of any Oracle-
specific constraints, so a system utilizing PostgreSQL or MySQL are both acceptable DB platforms to
start with.
Please, see the following two pages for the Contract Estimate Summary – Option 1 (p. 49) and for the
Contract Estimate Summary – Option 1 (p. 50).
49
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
7.A Appendix
1 Year support software support and monthly change $3,000 NA $3,000 Yearly
contract (Note: The support includes one stance of software
maintenance/upgrades as well as two (2) hours of Web site
updates/changes per month)
26
[ 06K:272 ] Database Associates – Internet Movie Database – Project Report – May 9, 2008
1 Year support software support and monthly change $3,000 NA $3,000 Yearly
contract (Note: The support includes one stance of software
maintenance/upgrades as well as two (2) hours of Web site
updates/changes per month)
27