Unit 5-Introduction To Biological Databases
Unit 5-Introduction To Biological Databases
Unit 5-Introduction To Biological Databases
TYPES OF DATABASES
There are two types of database management systems: flat-like indexing systems
and relational DBMSs.
Originally, databases all used a flat file format, which is a long text file that
contains many entries separated by a delimiter, a special character such as a vertical bar
(|). Within each entry are a number of fields separated by tabs or commas. Except for
the raw values in each field, the entire text file does not contain any hidden instructions
for computers to search for specific information or to create reports based on certain
fields from each record. The text file can be considered a single table. Thus, to search a
flat file for a particular piece of information, a computer has to read through the entire
file, an obviously inefficient process. This is manageable for a small database, but as
database size increases or data types become more complex, this database style can
become very difficult for information retrieval. Indeed, searches through such files often
cause crashes of the entire computer system because of the memory-intensive nature of
the operation.
HISTORY OF BIOLOGICAL DATABASES
1. Nucleotide sequences
2. Genomics (information on gene chromosomal location and nomenclature, provide
links to sequence databases)
3. Mutation/polymorphism (sequence variations linked or not to genetic diseases)
4. Protein sequences
5. Protein domain/family
6. Proteomics (2D gel, MS)
7. Microarray (high-dimensional data: profiles of thousands of genes depending on
hundreds/thousands of various conditions)
8. Organism-specific
9. 3D structure
10. Metabolism (e.g., metabolic pathways graph data)
11. Bibliography
12. Others
BIOLOGICAL DATABASES: SPECIFIC FEATURES
MODELS OF DATABASES
Hierarchical
The hierarchical data model organizes data in a tree structure. There is a hierarchy
of parent and child data segments. It has only one rooy record. Each root record may
participate in relationship with many child records. Each child may itself have many
child records. A parent record owns its child records.
If parent is deleted all child records are automatically deleted. Changing the
structure of the database is very difficult because changes in structure require changes to
the access mechanisms and consequently to the application programs.
Network
This model was developed in response to the shortcomings of the hierarchical
model. A network database is a collection of record types. Record types are associated
together by links and there is no contraint on the number and direction of the links that
can be established. There is no root record and each record can participate in any
number of owns relationships. In this network model the data duplication is removed,
4
it is possible to set up record instances without having them participate in a link and by
deleting one owner you do not necessarily delete all its members. However the
implementation of this model is very difficult.
Object-Oriented
Object DBMSs add database functionality to object programming languages.
They bring much more than persistent storage of programming language objects. Object
DBMSs extend the semantics of the C++, Smalltalk and Java object programming
languages to provide full-featured database programming capability, while retaining
native language compatibility. A major benefit of this approach is the unification of the
application and database development into a seamless data model and language
environment. As a result, applications require less code, use more natural data modeling,
and code bases are easier to maintain. Object developers can write complete database
applications with a modest amount of additional effort
Relational
Entity-Relationship Model
Entity like object, = thing.
Entity set like class = set of similar
entities/objects.
Attribute = property of entities in an entity set
Relationships connect two or more entity sets.
Purpose of the E/R Diagram
- The E/R model allows us to sketch database designs.
Kinds of data and how they connect.
Not how data changes.
- Designs are pictures called entity-relationship diagrams.
- Later: convert E/R designs to relational DB designs.
In an E/R Diagram
Entity set = rectangle.
Attribute = oval, with a line to the rectangle representing its entity set.
name
man
Beers
Entity set Beers has two attributes, name and manf (manufacturer).
Each Beers entity has values for these two attributes, e.g. (Bud, Anheuser-Busch)
A relationship connects two or more entity sets.
It is represented by a diamond, with lines to each of the entity sets involved.
6
many-one
one-one
many-many
Practice in SQL
(Note/Hint: A good discussion in understanding the details of SQL can be in the course
website, Introduction to SQL 1 and 2 and PL/SQL)
The CREATE TABLE statement is used to define a new table.
CREATE TABLE Students (sid
CHAR(20),
name CHAR(30),
login CHAR(20),
8
age
gpa
sid
50000
53666
53688
53650
53831
53832
INTEGER
REAL)
name
Dave
Jones
Smith
Smith
Madayan
Guldu
login
dave@cs
jones@cs
smith@chem
smith@ee
madayan@music
guldu@math
age
19
18
18
19
11
12
gpa
3.3
3.4
3.2
3.8
1.8
2.0
Students S
S.name = Smith
We can modify the column values in an existing row using the UPDATE command. For example we
can incement the age and decrement the gpa of the student with sid 53688.
UPDATE
SET
WHERE
Students S
S.age = S,age + 1, S.gpa 1
S.id = 53688
The WHERE clause is applied first and determines which rows are to be modified. The SET clause
then determines how these rows are to be modified. If the column that is being modified is also used to
determine the new value, the value used on this expression on the right side of equals ( =) is the old
value, that is before the modification. To illustarte these points further, consider,
UPDATE
SET
WHERE
Students S
S.gpa = S.gpa 0.1
S.gpa >= 3.3
name
Dave
Jones
Smith
Smith
Madayan
Guldu
login
dave@cs
jones@cs
smith@chem
smith@ee
madayan@music
guldu@math
age
19
18
18
19
11
12
gpa
3.2
3.3
3.2
3.7
1.8
2.0
Key Constraints
Constraints are mostly a collection of indexes and triggers that restrict certain actions on
a table. Constraints are not actual entities themselves. There are four types of
constraints:
1. Primary Key Constraints (A primary key is a type of index that will most likely
be used as the primary index when a query is made on the table.)
2. Unique Constraints (Unique constraints may be placed on multiple columns.
They constrain the UPDATE/INSERTS on the table so that the values being
updated or inserted do not match any other row in the table for the corresponding
values.
3. Check Constraints (A check constraint prevents updates/inserts on the table by
placing a check condition on the selected column. The UPDATE/INSERT is
allowed only if the check condition qualifies.
4. Foreign Key (FK) Constraints (A foreign key constraint allows certain fields in
one table to refer to fields in another table. This type of constraint is useful if you
have two tables, one of which has partial information, details on which can be
sought from another table with a matching entry. A foreign key constraint in this
case will prevent the deletion of an entry from the table with detailed information
if there is an entry in the table with partial information that matches it.)
10
Primary Key
cid
50000
53666
53688
53650
53831
53832
name
Dave
Jones
Smith
Smith
Madayan
Guldu
login
dave@cs
jones@cs
smith@chem
smith@ee
madayan@music
guldu@math
age
19
18
18
19
11
12
gpa
3.2
3.3
3.2
3.7
1.8
2.0
Foreign Key
cid
grade
Chem302
C
Math203
B
CS112
A
His105
B
sid
53831
53832
53650
53666
*
Students S
S.age < 18
SELECT
FROM
WHERE
S.name, S.login
Students S, Enrolled E
S.sid = E.sid AND E.grade = A
sid
22
22
22
22
31
31
31
64
64
74
bid
101
102
103
104
102
103
104
101
102
103
day
10/10/98
10/10/98
10/8/98
10/7/98
11/10/98
11/6/98
11/12/98
9/5/98
9/8/98
9/8/98
Instance R2 of Reserves
bid bname color
101 Interlake blue
102 Interlake red
103 Clipper green
104 Marine
red
Instance B1 of Boats
age
45.0
33.0
55.5
25.5
35.0
35.0
16.0
35.0
25.5
63.5
12
With DISTINCT
sname
Dustin
Brutus
Lubber
Andy
Rusty
Horatio
Zorba
Art
Bob
age
45.0
33.0
55.5
25.5
35.0
35.0
16.0
25.5
63.5
3) Find the names of sailors who have reserved boat number 103
SELECT
FROM
WHERE
S.sname
Sailors.S, Reserves.R
S.sid = R.sid AND R.bid = 103
R.sid
Boats B, Reserves R
B.sid = R.sid AND B.color = red
S.sname
Sailors.S, Reserves.R, Boats B
S.sid = R.sid AND R.bid = B.bid AND B.color = red
B.color
Sailors.S, Reserves.R, Boats B
S.sid = R.sid AND R.bid = B.bid AND S.sname = Lubber
13
7) Find the names of boats who have reserved at least one boat
SELECT
FROM
WHERE
S.sname
Sailors.S, Reserves.R
S.sid = R.sid
Nested Queries
Query 3 above, find the names of sailors who have reserved boat number 103 can be written also as:
SELECT
FROM
WHERE
S.sname
Sailors.S
S.sid IN ( SELECT R.sid
FROM Reserves R
WHERE R.bid = 103 )
Query 5 above, find the names of sailors who have reserved a red boat can be written also as:
SELECT
FROM
WHERE
S.sname
Sailors.S
S.sid IN ( SELECT R.sid
FROM Reserves R
WHERE R.bid IN ( SELECT B.bid
FROM
Boats B
WHERE B.color = red )
Find the names of sailors who have not reserved a red boat
SELECT
FROM
WHERE
S.sname
Sailors.S
S.sid NOT IN ( SELECT R.sid
FROM Reserves R
WHERE R.bid IN ( SELECT B.bid
FROM
Boats B
WHERE B.color = red )
14