Dbse
Dbse
THE DATA
HIERARCHY
A computer system organizes
data in a hierarchy that starts
with the bit, which represents
either a 0 or a 1. Bits can be
grouped to form a byte to
represent one character,
number, or symbol. Bytes can
be grouped to form a field,
and related fields can be
grouped to form a record.
Related records can be
collected to form a file, and
related files can be organized
into a database.
FIGURE 6-1
LOGICAL DATA CONCEPTS: REFERS TO
THE MANNER IN WHICH DATA IS VIEWED BY THE PROGRAMMER
OR END USER / WAY USER DESCRIBES REALITY
The use of a traditional approach to file processing encourages each functional area in a
FIGURE 6-2 corporation to develop specialized applications. Each application requires a unique data
file that is likely to be a subset of the master file. These subsets of the master file lead to
data redundancy and inconsistency, processing inflexibility, and wasted storage resources.
LIMITATIONS OF DATA FILE
ENVIRONMENT
Data redundancy: Data duplicated in several
files..loan example each data file contains
records about customers loans . Many of
these customers will be represented in other
data files.
Wastes physical storage media
Difficult to obtain comprehensive view of
customers
Increases costs of entering & maintaining
data
LIMITATIONS OF DATA FILE
ENVIRONMENT
Data inconsistency: Actual data values are
not synchronized across various copies of the
data.
Financial institution having customers with
several loans..each loan there is a file
containing customer fields(e.g
name,address,email,telephone number)
Change in customer address in only one file
creates inconsistencies with address field in
other files.
LIMITATIONS OF DATA FILE
ENVIRONMENT
Data isolation: File organization creates silos
of data making it extremely difficult to
access data from different applications
Eg: A manager who wants to know which
products customers are buying & which
customers owe more than $1000 not be able
from a data file system.
To get results he have to filter & integrate
data manually from multiple files.
LIMITATIONS OF DATA FILE
ENVIRONMENT
Data security:
Lack of data integrity
Data concurrency..one application updating
a record another accessing same record
simultaneously. To prevent concurrency
applications & data need to be independent
of one another.
File environment applications & data
dependent
To tackle with these problems..development
of databases & DBMS
Organizing Data in a Traditional File Environment
program
Lack of flexibility
Poor security
Lack of data sharing and availability
DATABASE MANAGEMENT
SYSTEMS
A program that provides access to databases
DBMS permits an organization to centralize
data ,manage them efficiently & provide
access to the stored data by application
programs.
Range in size & capabilities from simple
Microsoft Access to full featured Oracle &
DB2 solutions.
DBMS acts as interface between application
programs & physical data files.
Provides users with tools to add
,delete,maintain,display,print,search,select,
sort & update data.
These tools range from easy to use natural
language interfaces to complex programming
languages used for developing sophisticated
database applications.
ADVANTAGES & CAPABILITIES OF
DBMS
Permanence
Querying
Concurrency (transactions & locking)
Backup & replication
Rule enforcement
Security
Computation
Change & access logging
Automated optimization
Companies use DBMSs in a broad range of IS.
Some DBMS like access can be loaded onto
single users computer & accessed in adhoc
manner to support individual decision making.
Others like IBMs DB2 are located on multiple
interconnected mainframe computers to support
large scale TPSs such as order entry & inventory
control systems.
DBMSs like Oracle 11g are interconnected
throughout an organizations LAN(private
networks managed & owned by organization)
giving departments access to corporate data.
A DBMS enables different users to share data
& process resources
HOW?? Single unified database meet
different requirements of so many users??
How single database be structured so that
sales personnel can view customer ,
inventory & production maintenance data
while HR department maintains restricted
access to private personnel data?
The Database Approach to Data Management
A single human resources database provides many different views of data, depending on the
FIGURE 6-3 information requirements of the user. Illustrated here are two possible views, one of interest
to a benefits specialist and one of interest to a member of the company’s payroll
department.
DBMS provides 2 views of data: PHYSICAL
VIEW & A LOGICAL VIEW
PHYSICAL VIEW..deals with actual physical
arrangement & location of data in the
DASDs(Direct access storage devices).
Database specialist use physical view to
configure storage & processing resources.
Business user is interested in using the
information not in how it is stored.
LOGICAL or USERs view of data is meaningful to
the user.
DBMS provides endless logical views of the data.
This feature allows users to see data from
business-related perspective rather than from
technical viewpoint.
Way in which you see the data(logical view) can
vary but storage of data(physical view) is fixed.
The Database Approach to Data Management
Relational DBMS
Represent data as two-dimensional tables called relations or
files
Each table contains data on entity and attributes
Table: grid of columns and rows
Rows (tuples): Records for different entities
Fields (columns): Represents attribute for entity
Key field: Field used to uniquely identify each record
Primary key: Field in table used for key fields
Foreign key: Primary key used in second table as look-up field to
identify records from original table
The Database Approach to Data Management
A relational database organizes data in the form of two-dimensional tables. Illustrated here
are tables for the entities SUPPLIER and PART showing how they represent each entity and
FIGURE 6-4 its attributes. Supplier Number is a primary key for the SUPPLIER table and a foreign key for
the PART table.
The Database Approach to Data Management
FIGURE 6-4 A relational database organizes data in the form of two-dimensional tables. Illustrated here
(cont.) are tables for the entities SUPPLIER and PART showing how they represent each entity and
its attributes. Supplier Number is a primary key for the SUPPLIER table and a foreign key for
the PART table.
TYPES OF DATABASES
Data flow into companies from many
sources..clickstream data from web,e-
commerce applications,POS
terminals,filtered data from CRM,Supply
chain,ERP applications
Two basic types of databases:
Centralized databases
Distributed databases
CENTRALIZED DATABASES
Stores all related files in one physical
location.
Decades main database platform consisted
centralized database files on large,
mainframe because of enormous capital &
operating costs associated with alternative
systems.
Application processing is shared between
numerous clients & a server...DB/2 from IBM
& Oracle server from Oracle
CENTRALIZED DATABASE
CENTRALIZED DATABASES
Advantages: Multiple processors are applied
to overall task , back end & front end are
being done in parallel. Thus response time &
throughput are improved.
Different clients able to access same server
machine
Files made more consistent with one
another..physically kept at one location..file
changes made in supervised & orderly
fashion.
Files are not accessible except via
centralized host computer where they can be
protected more easily from unauthorized
access or modification.
Vulnerable to single point failure.
CENTRALISED DATABASE
Disadvantages: Since all data is stored at one
place any discrepancy may result in database
corruption
Server must be able to grow in power &
capacity to accommodate more clients
otherwise it will become bottleneck.
Centralised computing is more complex
because proper processing requires close
communication between clients & server
hence specialised & expensive tools are
necessary.
DISTRIBUTED DATABASE
A distributed database has complete copies
of a database or portions of a database.
Two types:
Replicated
partitioned
PARTITIONED DATABASE
Partitioned: Separate locations store different
parts of database(part that meets users local
needs)
Partitioned databases provide response speed of
localized files without need to replicate all
changes in multiple locations.
Data in files can be entered more quickly & kept
more accurate by users immediately responsible
for the data.
DISADVANTAGE OF
PARTITIONED DATABASE
Widespread access to potentially sensitive
company data can significantly increase
security problems.
REPLICATED DATABASE
Stores complete copies of entire database in
multiple locations.
Arrangement provides backup in case of a failure
or problem
Improves response time as it is closer to users
Expensive to set up & maintain..replica must be
updated as records are added to, modified in &
deleted from any of the databases.
Updates may be done at end of day or other
schedule as determined by business needs
If not done various databases will have conflicting
data.
DISTRIBUTED DATABASE
Distributing databases: Storing database in more than
one place
Advantages: continue to function at some
reduced level even when a component fails
Speeds up query processing
Reduce communication costs
Easier & more economical to add a local
computer
DISTRIBUTED DATABASE
DISTRIBUTED DATABASE
Disadvantages: Complex software required
Various sites must exchange messages &
perform additional calculations to ensure
proper coordination among the sites
NORMALISATION
Database normalization is the process of
organizing the fields and tables of a
relational database to minimize redundancy
and dependency.
Normalization usually involves dividing large
tables into smaller (and less redundant) tables
and defining relationships between them.
The objective is to isolate data so that
additions, deletions, and modifications of a
field can be made in just one table and then
propagated through the rest of the database
using the defined relationships.
Rules of normalisation are referred to as
normal forms:
1NF
2NF
3NF
FIRST NORMAL FORM(1NF)
A table is in 1NF if it contains no repeated
groups i.e no two fields stores same kind of
information in a single table.
SECOND NORMAL FORM
Depends on the concepts of primary key &
functional dependency
A database is in 2NF if it is in 1NF & every
attribute is fully functionally dependent on
the primary key.
Thus the relation is in 1NF with no repeating
groups & all non key attributes must depend
on the whole key not just some part of it.
SECOND NORMAL FORM
Functionally dependent means given a
primary key value of any attribute can be
retrieved.
If the student_ID is given , the value of
student Name can be obtained
Hence student name is functionally
dependent on student_ID
THIRD NORMAL FORM
A table is said to be in 3NF if all the non-key
fields are independent i.e. no two non-key
fields of the table are dependent on each
other.
Removes redundant data by removing fields
that are not wholly dependent on the
primary key.
A table is said to be in 3NF if it is in 2NF &
every non-key field is non-transitively
dependent on the primary key.