Introduction To Database Design: San Diego Supercomputer Center

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 50

Introduction to

Database Design

SAN DIEGO SUPERCOMPUTER CENTER


Database Design Agenda

•General Design Considerations


•Entity-Relationship Model
•Tutorial
•Normalization
•Star Schemas
•Additional Information
•Q&A

SAN DIEGO SUPERCOMPUTER CENTER


General Design Considerations

•Users
•Legacy Systems/Data
•Application Requirements

SAN DIEGO SUPERCOMPUTER CENTER


Users

•Who are they?


•Administrative
•Scientific
•Technical
•Impact
•Access Controls
•Interfaces
•Service levels

SAN DIEGO SUPERCOMPUTER CENTER


Legacy Systems/Data

•What systems are currently in place?


•Where does the data come from?
•How is it generated?
•What format is it in?
•What is the data used for?
•Which parts of the system must remain static?

SAN DIEGO SUPERCOMPUTER CENTER


Application Requirements

•What kind of database?


•OnLine Analytical Processing (OLAP)
•OnLine Transactional Processing (OLTP)
•Budget
•Platform / Vendor
•Workflow?
•order of operations
•error handling
•reporting

SAN DIEGO SUPERCOMPUTER CENTER


Entity - Relationship Model

A logical design method which emphasizes


simplicity and readability.

•Basic objects of the model are:


•Entities
•Relationships
•Attributes

SAN DIEGO SUPERCOMPUTER CENTER


Entities

Data objects detailed by the information in the


database.
•Denoted by rectangles in the model.

Employee Department

SAN DIEGO SUPERCOMPUTER CENTER


Attributes

Characteristics of entities or relationships.

•Denoted by ellipses in the model.

Employee Department

Name SSN Name Budget

SAN DIEGO SUPERCOMPUTER CENTER


Relationships

Represent associations between entities.

•Denoted by diamonds in the model.

Employee works in Department

Name SSN Start date Name Budget

SAN DIEGO SUPERCOMPUTER CENTER


Relationship Connectivity

Constraints on the mapping of the associated


entities in the relationship.
•Denoted by variables between the related entities.
•Generally, values for connectivity are expressed as “one” or
“many”
N 1
Employee work Department

Name SSN Start date Name Budget

SAN DIEGO SUPERCOMPUTER CENTER


Connectivity
one-to-one
1 1
Department has Manager

one-to-many
1 N
Department has Project

many-to-many
M N
Employee works on Project

SAN DIEGO SUPERCOMPUTER CENTER


ER example

Volleyball coach needs to collect information


about his team.

•The coach requires information on:


•Players
•Player statistics
•Games
•Sales

SAN DIEGO SUPERCOMPUTER CENTER


Team Entities & Attributes

•Players - statistics, name, start date, end date


•Games - date, opponent, result
•Sales - date, tickets, merchandise
Name Statistics

Games Players Sales

date opponent result tickets merchandise


Start date End date

SAN DIEGO SUPERCOMPUTER CENTER


Team Relationships
Identify the relationships.

•The player statistics are recorded at each game


so the player and game entities are related.
•For each game, we have multiple players so the relationship is
one-to-many

1 N
Games play Players

SAN DIEGO SUPERCOMPUTER CENTER


Team Relationships
Identify the relationships.

•The sales are generated at each game so the


sales and games are related.
•We have only 1 set of sales numbers for each game, one-to-one.

1 1
Games generates Sales

SAN DIEGO SUPERCOMPUTER CENTER


Team ER Diagram
date opponent result

Games
1 1

play generates

N 1

Players Sales
Name

Start date End date Statistics tickets merchandise

SAN DIEGO SUPERCOMPUTER CENTER


Logical Design to Physical Design

Creating relational SQL schemas from entity-


relationship models.
•Transform each entity into a table with the key and its
attributes.
•Transform each relationship as either a relationship
table (many-to-many) or a “foreign key” (one-to-many
and many-to-many).

SAN DIEGO SUPERCOMPUTER CENTER


Entity tables

Transform each entity into a table with a key and


its attributes.

create table employee


Employee (emp_no number,
name varchar2(256),
ssn number,
primary key
(emp_no));
Name SSN

SAN DIEGO SUPERCOMPUTER CENTER


Foreign Keys
Transform each one-to-one or one-to-many relationship
as a “foreign key”.
•Foreign key is a reference in the child (many) table to the
primary key of the parent (one) table.
create table department
Department (dept_no number,
name varchar2(50),
1 primary key (dept_no));
create table employee
has
(emp_no number,
dept_no number,
N name varchar2(256),
ssn number,
Employee primary key (emp_no),
foreign key (dept_no) references department);

SAN DIEGO SUPERCOMPUTER CENTER


Foreign Key
Department Accounting has 1 employee:
dept_no Name Brian Burnett
1 Accounting
2 Human Resources Human Resources has 2 employees:
3 IT Nora Edwards
Ben Smith

IT has 3 employees:
Employee Ajay Patel
John O’Leary
emp_no dept_no Name
Julia Lenin
1 2 Nora Edwards
2 3 Ajay Patel
3 2 Ben Smith
4 1 Brian Burnett
5 3 John O'Leary
6 3 Julia Lenin

SAN DIEGO SUPERCOMPUTER CENTER


Many-to-Many tables

Transform each many-to-many relationship as a table.


•The relationship table will contain the foreign keys to the related
entities as well as any relationship attributes.

create table proj_has_emp


Project (proj_no number,
N emp_no number,
start_date date,
Start date has primary key (proj_no, emp_no),
foreign key (proj_no) references project
foreign key (emp_no) references employee);
M

Employee

SAN DIEGO SUPERCOMPUTER CENTER


Many-to-Many tables
Project proj_has_emp
proj_no Name proj_no emp_no start_date
1 Employee Audit 1 4 4/7/03
2 Budget 3 6 8/12/02
3 5 3/4/01
3 Intranet 2 6 11/11/02
3 2 12/2/03
2 1 7/21/04

Employee Employee Audit has 1 employee:


emp_no dept_no Name Brian Burnett
1 2 Nora Edwards
2 3 Ajay Patel Budget has 2 employees:
3 2 Ben Smith Julia Lenin
4 1 Brian Burnett Nora Edwards
5 3 John O'Leary
6 3 Julia Lenin Intranet has 3 employees:
Julia Lenin
John O’Leary
Ajay Patel
SAN DIEGO SUPERCOMPUTER CENTER
Tutorial

Entering the physical design into the database.

•Log on to the system using SSH.


% ssh [email protected]
•Setup the database instance environment:
(csh or tcsh)
% source /dbms/db2/home/db2i010/sqllib/db2cshrc
(sh, ksh, or bash)
$ . /dbms/db2/home/db2i010/sqllib/db2cshrc
•Run the DB2 command line processor (CLP)
% db2

SAN DIEGO SUPERCOMPUTER CENTER


Tutorial

•db2 prompt will appear following version information.


db2=>
•connect to the workshop database:
db2=> connect to workshop
•create the department table
db2=> create table department \
db2 (cont.) => (dept_no smallint not null, \
db2 (cont.) => name varchar(50), \
db2 (cont.) => primary key (dept_no))

SAN DIEGO SUPERCOMPUTER CENTER


Tutorial

•create the employee table


db2 => create table employee \
db2 (cont.) => (emp_no smallint not null, \
db2 (cont.) => dept_no smallint not null, \
db2 (cont.) => name varchar(50), \
db2 (cont.) => ssn int not null, \
db2 (cont.) => primary key (emp_no), \
db2 (cont.) => foreign key (dept_no) references department)

•list the tables


db2 => list tables for schema <user>

SAN DIEGO SUPERCOMPUTER CENTER


Normalization

A logical design method which minimizes data


redundancy and reduces design flaws.

•Consists of applying various “normal” forms to


the database design.
•The normal forms break down large tables into
smaller subsets.

SAN DIEGO SUPERCOMPUTER CENTER


First Normal Form (1NF)

Each attribute must be atomic


• No repeating columns within a row.
• No multi-valued columns.

1NF simplifies attributes


• Queries become easier.

SAN DIEGO SUPERCOMPUTER CENTER


1NF
Employee (unnormalized)
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C, Perl, Java
2 Barbara Jones 224 IT Linux, Mac
3 Jake Rivera 201 R&D DB2, Oracle, Java

Employee (1NF)
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java

SAN DIEGO SUPERCOMPUTER CENTER


Second Normal Form (2NF)

Each attribute must be functionally dependent on


the primary key.
• Functional dependence - the property of one or more
attributes that uniquely determines the value of other
attributes.
• Any non-dependent attributes are moved into a
smaller (subset) table.
2NF improves data integrity.
• Prevents update, insert, and delete anomalies.

SAN DIEGO SUPERCOMPUTER CENTER


Functional Dependence
Employee (1NF)
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java

Name, dept_no, and dept_name are functionally dependent on


emp_no. (emp_no -> name, dept_no, dept_name)

Skills is not functionally dependent on emp_no since it is not unique


to each emp_no.

SAN DIEGO SUPERCOMPUTER CENTER


2NF
Employee (1NF)
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java

Employee (2NF) Skills (2NF)


emp_no skills
emp_no name dept_no dept_name
1 C
1 Kevin Jacobs 201 R&D
1 Perl
2 Barbara Jones 224 IT 1 Java
3 Jake Rivera 201 R&D 2 Linux
2 Mac
3 DB2
3 Oracle
3 Java

SAN DIEGO SUPERCOMPUTER CENTER


Data Integrity
Employee (1NF)
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java

• Insert Anomaly - adding null values. eg, inserting a new department does not
require the primary key of emp_no to be added.
• Update Anomaly - multiple updates for a single name change, causes
performance degradation. eg, changing IT dept_name to IS
• Delete Anomaly - deleting wanted information. eg, deleting the IT department
removes employee Barbara Jones from the database

SAN DIEGO SUPERCOMPUTER CENTER


Third Normal Form (3NF)

Remove transitive dependencies.


• Transitive dependence - two separate entities exist
within one table.
• Any transitive dependencies are moved into a smaller
(subset) table.
3NF further improves data integrity.
• Prevents update, insert, and delete anomalies.

SAN DIEGO SUPERCOMPUTER CENTER


Transitive Dependence
Employee (2NF)
emp_no name dept_no dept_name
1 Kevin Jacobs 201 R&D
2 Barbara Jones 224 IT
3 Jake Rivera 201 R&D

Dept_no and dept_name are functionally dependent on


emp_no however, department can be considered a
separate entity.

SAN DIEGO SUPERCOMPUTER CENTER


3NF
Employee (2NF)
emp_no name dept_no dept_name
1 Kevin Jacobs 201 R&D
2 Barbara Jones 224 IT
3 Jake Rivera 201 R&D

Employee (3NF) Department (3NF)


emp_no name dept_no dept_no dept_name
1 Kevin Jacobs 201
201 R&D
2 Barbara Jones 224
224 IT
3 Jake Rivera 201

SAN DIEGO SUPERCOMPUTER CENTER


Other Normal Forms

Boyce-Codd Normal Form (BCNF)


• Strengthens 3NF by requiring the keys in the
functional dependencies to be superkeys (a column or
columns that uniquely identify a row)
Fourth Normal Form (4NF)
• Eliminate trivial multivalued dependencies.
Fifth Normal Form (5NF)
• Eliminate dependencies not determined by keys.

SAN DIEGO SUPERCOMPUTER CENTER


Normalizing our team (1NF)
games sales
game_id date opponent result sales_id game_id merch tickets
34 6/3/05 Chicago W 120 34 5000 25000
35 6/8/05 Seattle W 122 35 4500 30000
40 6/15/05 Phoenix L 125 40 2500 15000
42 6/20/05 LA W 126 42 6500 40000

players
player_id game_id name start_date end_date aces blocks spikes digs
45 34 Mike Speedy 1/1/00 12 3 20 5
45 35 Mike Speedy 1/1/00 10 2 15 4
45 40 Mike Speedy 1/1/00 7 2 10 3
78 42 Frank Newmon 5/1/05
102 34 Joe Powers 1/1/02 7/1/05 8 6 18 10
102 35 Joe Powers 1/1/02 7/1/05 10 8 24 12
103 42 Tony Tough 1/1/05 15 10 20 14

SAN DIEGO SUPERCOMPUTER CENTER


Normalizing our team (2NF & 3NF)

games sales
game_id date opponent result sales_id game_id merch tickets
34 6/3/05 Chicago W 120 34 5000 25000
35 6/8/05 Seattle W 122 35 4500 30000
40 6/15/05 Phoenix L 125 40 2500 15000
42 6/20/05 LA W 126 42 6500 40000

players player_stats
player_id game_id aces blocks spikes digs
player_id name start_date end_date
45 34 12 3 20 5
45 Mike Speedy 1/1/00
45 35 10 2 15 4
78 Frank Newmon 5/1/05
45 40 7 2 10 3
102 Joe Powers 1/1/02 7/1/05
102 34 8 6 18 10
103 Tony Tough 1/1/05
102 35 10 8 24 12
103 42 15 10 20 14

SAN DIEGO SUPERCOMPUTER CENTER


Revisit team ER diagram
date opponent result

1 1
games generates
sales
1

Recorded tickets merchandise


by
N

player_stats N
tracked
1
players

aces blocks digs spikes Name Start date End date

SAN DIEGO SUPERCOMPUTER CENTER


Star Schemas

Designed for data retrieval


• Best for use in decision support tasks such as Data
Warehouses and Data Marts.
• Denormalized - allows for faster querying due to less
joins.
• Slow performance for insert, delete, and update
transactions.
• Comprised of two types tables: facts and dimensions.

SAN DIEGO SUPERCOMPUTER CENTER


Fact Table

The main table in a star schema is the Fact table.


• Contains groupings of measures of an event to be
analyzed.
•Measure - numeric data

Invoice Facts

units sold
unit amount
total sale price

SAN DIEGO SUPERCOMPUTER CENTER


Dimension Table
Dimension tables are groupings of descriptors
and measures of the fact.
•descriptor - non-numeric data

Customer Dimension Time Dimension

cust_dim_key time_dim_key
name invoice date
address due date
phone delivered date
Location Dimension Product Dimension

loc_dim_key prod_dim_key
store number product
store address price
store phone cost
SAN DIEGO SUPERCOMPUTER CENTER
Star Schema
The fact table forms a one to many relationship with each
dimension table.

Customer Dimension Time Dimension


1 1
cust_dim_key Invoice Facts
N N time_dim_key
name invoice date
address cust_dim_key due date
phone loc_dim_key delivered date
time_dim_key
prod_dim_key Product Dimension
Location Dimension N
units sold
unit amount
total sale price N
1 prod_dim_key
loc_dim_key product
store number 1 price
store address cost
store phone

SAN DIEGO SUPERCOMPUTER CENTER


Analyzing the team

The coach needs to analyze how the team


generates income.
• From this we will use the sales table to create our fact
table.

Team Facts

date
merchandise
tickets

SAN DIEGO SUPERCOMPUTER CENTER


Team Dimension

We have 2 dimensions for the schema:


player and games.

Game Dimension Player Dimension

game_dim_key player_dim_key
opponent name
result start_date
end_date
aces
blocks
spikes
digs

SAN DIEGO SUPERCOMPUTER CENTER


Team Star Schema

Team Facts
player_dim_key
game_dim_key
date
merchandise
tickets
N N
Player Dimension
1
Game Dimension 1 player_dim_key
name
start_date
game_dim_key end_date
opponent aces
result blocks
spikes
SAN DIEGO SUPERCOMPUTER CENTER digs
Books and Reference

•Database Design for Mere Mortals,


Michael J. Hernandez
•Information Modeling and Relational Databases,
Terry Halpin
•Database Modeling and Design,
Toby J. Teorey

SAN DIEGO SUPERCOMPUTER CENTER


Continuing Education

UCSD Extension

Data Management Courses

DBA Certificate Program

Database Application Developer Certificate Program

SAN DIEGO SUPERCOMPUTER CENTER


Data Central

The Data Services Group provides Data Allocations for


the scientific community.
• http://datacentral.sdsc.edu/
•Tools and expertise for making data collections
available to the broader scientific community.
•Provide disk, tape, and database storage resources.

SAN DIEGO SUPERCOMPUTER CENTER