Introduction To Database: Edited: Wei-Pang Yang, IM - NDHU
Introduction To Database: Edited: Wei-Pang Yang, IM - NDHU
CHAPTER 1
INTRODUCTION
Database-System Applications
Purpose of Database Systems
View of Data
Database Languages
Relational Databases
Database Design
Data Storage and Querying
Transaction Management
Database Architecture
Database Users and Administrators
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-1
Database System: Introduction
Database Management System (DBMS)
Contains a large bodies of information
Goal of a DBMS:
provides a way to store and retrieve database information that is both
• convenient and
• efficient.
Functions of DBMS: Management of Data (MOD)
Defining structure for storage data
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-3
1.2 Purpose of Database Systems
In the early days, database applications were built on top of file
systems
Drawbacks of using file systems to store data:
Data redundancy and inconsistency
• Multiple file formats, duplication of information in different
files
Difficulty in accessing data
• Need to write a new program to carry out each new task
Data isolation — multiple files and formats
Integrity problems
• Integrity constraints (e.g. account balance > 0) become part
of program code
• Hard to add new constraints or change existing ones
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-4
Drawbacks of using file systems (cont.)
Drawbacks of using file systems to store data: (cont.)
Atomicity of updates 原子性 , 單一性
• Failures may leave database in an inconsistent state with
partial updates carried out
• E.g. transfer of funds from one account to another should
either complete or not happen at all
Concurrent access by multiple users
• Concurrent accessed needed for performance
• Uncontrolled concurrent accesses can lead to inconsistencies
E.g. two people reading a balance and updating it at the same
time
Security problems
Solution
end;
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-6
View of Data -1: Three Levels
An architecture for a database system
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-7
View of Data -2: Three Levels
User A1 User A2 User B1 User B2 User B3
Host Host Host Host Host C, C++
Language Language Language Language Language
+ DSL + DSL + DSL + DSL + DSL
DSL (Data Sub. Language)
e.g. SQL
1 2 3
External View External External External View
schema schema
@ # & A B B
External/conceptual External/conceptual
mapping A mapping B
Database
Conceptual Conceptual management
< schema View system Dictionary
(DBMS) e.g. system
catalog
DBA Conceptual/internal
(Build and mapping
maintain
schemas
and Storage
1 2 3 ...
mappings) structure 100
definition Stored database (Internal View) # & @
(Internal
schema)
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-8
1.3.2 Instances and Schemas
Schema – the logical structure of the database
e.g., the database consists of information about a set of customers
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-9
Instances and Schemas (cont.)
Instance – the actual content of the database at a particular point in time
Analogous to the value of a variable
Instance
Schema
create table account
(account-number char(10),
balance integer)
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-11
1.3.3 Data Models
A collection of conceptual tools for describing
data (entities, objects)
data relationships
data semantics
data consistency constraints
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-12
Category of Data Models
Category of Data Models:
Entity-Relationship model
Relational model
Object-oriented model
Semi-structured data models
• Extensible Markup Language (XML)
Older models:
• Network model and
• Hierarchical model
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-13
1.4 Database Languages
Data Definition Language (DDL):
Specification notation for defining the database schema
E.g.
create table account
(account-number char(10),
balance integer)
Data Manipulation Language (DML)
To express database queries or updates
E.g.
Select account-number
from account
where balance >1000
SQL (Structured Query Language): a single language for both
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-14
1.4.1 Data-Manipulation Language (DML)
Language for accessing and manipulating the data organized
by the appropriate data model
DML also known as query language
For retrieval, insertion, deletion, modification (update)
Two classes of languages
Procedural DMLs – user specifies what data is required and
how to get those data
• E.g. … in C
Declarative DMLs (Nonprocedural DMLs) – user specifies
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-15
1.4.2 Data-Definition Language (DDL)
Specification notation for defining the database schema
E.g.
Define:
• Attributes name
• Data type
• Consistency constraints (integrity constraints)
Domain constraints:
e.g. assets are integer type create table branch
(branch-name char(15),
Assertions: e.g. assets >= 0 branch-city char(30),
Authorization: for different users assets integer,
primary key (branch-name),
….
check (assets >= 0))
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-16
Data Dictionary and Storage Definition
Data Dictionary:
DDL compiler generates a set of tables stored in a data dictionary
contains metadata (i.e., data about data)
• Database schema
• System tables
• Users
•…
Database system consults the Data dictionary before reading or
modifying actual dada.
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-17
1.5 Relational Databases
Definition 1: A Relational Database is a database that is perceived
by the users as a collection of time-varying, normalized relations
(tables).
• Perceived by the users: the relational model apply at the view level and
logical levels.
• Time-varying: the set of tuples changes with time.
• Normalized: contains no repeating group (only contains atomic value).
The relational model represents a database system at a level of
abstraction that removed from the details of the underlying machine,
like high-level language.
C, PASCAL ,PL/1 DBMS environments
assembler Relational DBMS
Relational
machine Data Model
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-18
1.5.1 Tables
Definition 2: A Relational Database is a database that is perceived by
its users as a collection of tables (and nothing but tables).
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-19
1.5.2 Data-Manipulation Language
SQL (Structured Query Language) : widely used
E.g. find the name of the customer with customer-id 192-83-7465
select customer.customer-name
from customer
where customer.customer-id = ‘192-83-7465’
customer
Output:
customer-name
Johnson
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-20
SQL (Structured Query Language)
E.g. find the balances of all accounts held by the customer with
customer-id 192-83-7465
select account.balance
from depositor, account
where depositor.customer-id = ‘192-83-7465’ and
depositor.account-number = account.account-number
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-21
1.5.3 Data-Definition Language
SQL provides DDL to define database schema:
Tables
• E.g.
create table account
(account-number char(10),
balance integer)
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-22
Referential Integrity Constraint
create table account
(account-number char(10),
3. account 存款帳
branch-name char(15),
balance integer,
primary key (account-number),
references
4. depositor 存款戶
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-23
1.5.4 Data Access from Application Programs
Application programs generally access databases through one of
Language extensions to allow embedded SQL
Application program interface (e.g. ODBC/JDBC) which allow
SQL queries to be sent to a database
ODBC: Open Database Connectivity for C
JDBC: Java Database Connectivity for Java language
ODBC/JDBC
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-24
1.6 Database Design
Database Design - The process of designing the general structure of
the database:
Logical Design
Physical Design
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-25
1.6.1 Design Process
Phase I
Specification of user requirement (with domain experts)
Phase II
Conceptual design (ch. 6)
Design tables
Normalization (ch. 7)
Phase III
Specification of functional requirements
Phase IV
Implementation
Logical-design
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-26
1.6.2 Database Design for Banking
Banking Database: consists 6 relations:
1. branch (branch-name, branch-city, assets)
2. customer (customer-name, customer-street, customer-only)
3. account (account-number, branch-name, balance)
4. loan (loan-number, branch-name, amount)
5. depositor (customer-name, account-number)
6. borrower (customer-name, loan-number)
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-27
Example: Banking Database
1. branch 分公司 2. customer 客戶 ( 存款戶 , 貸款戶 ) 3. depositor 存款戶
account
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-28
1.6.3 Entity-Relationship Model (ch.6)
Example: Schema in the Entity-Relationship model
客戶 存款帳
存款帳
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-30
Entity Relationship Model (cont.)
E-R model of real world
Entities (objects)
• E.g. customers, accounts, bank branch
Relationships between entities
• E.g. Account A-101 is held by customer Johnson
• E.g. Relationship set depositor associates customers with
accounts
Widely used for database design
Database design in E-R model usually converted to design in the
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-31
1.6.4 Normalization
Definition: A Relational Database is a database that is perceived by
its users as a collection of tables (and nothing but tables).
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-32
Problem of Normalization
<e.g.>
S1, Smith, 20, London, P1, Nut, Red, 12, London, 300
S1, Smith, 20, London, P2, Bolt, Green, 17, Paris, 200
.
.
S4, Clark, 20, London, P5, Cam, Blue, 12, Paris, 400
Normalization
S P SP
S# SNAME STATUS CITY P# ... ... ... S# P# QTY
s1 . . London . . . . . . .
. . . . . . . . . . .
S' P SP'
P# ... ... ... S# CITY P# QTY
or S# SNAME STATUS
S1 London P1 300
S1 Smith .
S2 . . . . . . S1 London P2 200
. . . . . . . . . . .
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-33
1.7 Object-Based and Semistructured Databases
Extend the relational data model
by including object orientation and
constructs to deal with added data types. (video, image, …)
Allow attributes of tuples to have complex types, including
non-atomic values such as nested relations. (repeated data, …)
Preserve relational foundations,
in particular the declarative access to data, while extending
modeling power. 6. borrower
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-34
1.7.2 Semistructured Data Models
XML (Extensible Markup Language)
Defined by the WWW Consortium (W3C) 聯合
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-35
1.8 Data Storage and Querying
Components of Database System Query
DBMS
Query Processor
• Helps to simplify to access data Language Processor
• High-level view
Query Processor
• Users are not be burdened Optimizer
unnecessarily with the physical
details Operation Processor
Storage Manager
• Require a large amount of space Access Method
• Can not store in main memory Storage Manager
• Disk speed is slower
File Manager
• Minimize the need to move data
between disk and main memory
Goal of a DBMS: provides a way to store and Database
Overall
System
Structure
that provides the interface between the low-level data stored and the
Indices: provide fast access to data items that hold particular values
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-38
Storage Management (cont.)
Components of Storage manager:
Authorization and Integrity Manager
DML Compiler
Translates DML statements into an evaluation plan (or some
evaluation plans) which consists low-level instructions
Query Optimization: picks the lowest cost evaluation plan
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-40
Flow of Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-41
Query Optimizer
Alternative ways of evaluating a given query
Equivalent expressions
Different algorithms for each operation
Cost difference between a good and a bad way of evaluating a query
can be enormous
Need to estimate the cost of operations
Depends critically on statistical information about relations which
the database must maintain
Need to estimate statistics for intermediate results to compute cost
of complex expressions
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-42
Example: A Simple Query Processing
(補)
Query in SQL :
SELECT CUSTOMER. NAME DBMS
FROM CUSTOMER, INVOICE
WHERE REGION = 'N.Y.' AND
AMOUNT > 10000 AND
Language Processor
CUTOMER.C#=INVOICE.C
Internal Form :
Operator Processor
Calls to Access Method :
OPEN SCAN on C with region index
GET next tuple
.
.
.
Access Method
Calls to file system : e.g.B-tree; Index; Access Storage
GET10th to 25th bytes from Hashing Method
block #6 of file #5 Manager
File System
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-43
database
1.9 Transaction Management
Transaction:
A transaction is a collection of operations that performs a single
logical function in a database application
Atomicity: all or nothing
Failure recovery manager
ensures that the database remains in a consistent (correct) state,
Failure:
• system failures (e.g., power failures and operating system
crashes)
• transaction failures.
Concurrency-control manager
controls the interaction among the concurrent transactions, to
ensure the consistency of the database.
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-44
1.10 Data Mining and Analysis
Data Analysis and Mining
Decision Support Systems
Data Analysis and OLAP (Online analytical processing),
Data Warehousing
Data Mining
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-45
Decision Support Systems
Decision-support systems
are used to make business decisions,
often based on data collected by on-line transaction
systems.
Examples of business decisions:
What items to stock?
What insurance premium to change?
To whom to send advertisements?
Examples of data used for making decisions
Retail sales transaction details
Customer profiles (income, age, gender, etc.)
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-46
Data Mining (ch.18)
Data mining:
seeks to discover knowledge automatically in the form of statistical
rules and patterns from large databases. E.g. p.23: Young women buy cars.
is the process of semi-automatically analyzing large databases to find
useful patterns
Prediction based on past history
Predict if a credit card applicant poses a good credit risk, based on some
attributes (income, job type, age, ..) and past history
Predict if a pattern of phone calling card usage is likely to be fraudulent
Descriptive Patterns 欺騙的
Associations
• Find books that are often bought by “similar” customers. If a new
such customer buys one such book, suggest the others too. (library)
引起 ; 因果關係
Associations may be used as a first step in detecting causation
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-47
1.11 Database Architecture
System Structure of a Database System
Fig. 1.6 (p.25)
Application Structure
User uses database at the site
Users uses database through a network
• Client: remote database users work
• Sever: database system runs here
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-48
Application Architectures
ODBC/JDBC
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-49
1.12 Database Users and Administrators
User A1 User A2 User B1 User B2 User B3
Host Host Host Host Host C, C++
Language Language Language Language Language
+ DSL + DSL + DSL + DSL + DSL
DSL (Data Sub. Language)
e.g. SQL
1 2 3
External View External External External View
schema schema
@ # & A B B
External/conceptual External/conceptual
mapping A mapping B
Database
Conceptual Conceptual management
< schema View system Dictionary
(DBMS) e.g. system
catalog
DBA Conceptual/internal
(Build and mapping
maintain
schemas
and Storage
1 2 3 ...
mappings) structure 100
definition Stored database (Internal View) # & @
(Internal
schema)
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-50
1.12.1 Database Users and User Interfaces
Application programmers
interact with system through DML calls
Specialized users
write specialized database applications that do not fit into the
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-51
1.12.2 Database Administrator
Database Administrator:
Coordinates all the activities of the database system;
Routine maintenance
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-54
計算機科學的諾貝爾獎 – 杜林獎 ( 趙坤茂 )
象徵最崇高學術桂冠的諾貝爾獎,從 1901 年開始頒發,根據瑞典
發明家諾貝爾的遺囑,設有物理、化學、生理醫學、文學及和平等
五個獎項;自 1969 年起,增設了經濟學諾貝爾獎。
疑問 : 為什麼諾貝爾獎沒有數學獎項呢?坊間流傳的說法是,當初
諾貝爾的夫人,曾經和瑞典一位很有成就的數學家米塔雷符勒有過
一段婚外情,所以諾貝爾決定不設數學獎項。
英國數學家亞蘭杜林 (Alan Turing , 1912-1954) ,雖然無緣在有生
之年得到諾貝爾獎,但後人為了紀念他在數位計算理論貢獻而設立
的杜林獎 (Turing Award) ,已被公認是計算機科學領域最崇高的獎
項。
杜林獎從 1966 年開始頒發,受獎人都是對計算機科學有深遠影響
的大師級學者。例如,在計算複雜度理論上有卓越貢獻的庫克
(Cook) 、 C 程式語言的創始人理奇 (Ritchie) 、 Unix 作業系統製作
人湯普生 (Thompson) 及資料庫管理系統的先驅卡德 (Codd) 等。
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-55
計算機科學的諾貝爾獎 – 杜林獎 (cont.)
1936 年時,杜林提出了一個假想性的計算工具,稱為杜林機器
(Turing machine) ,這個機器有一個長條型、無窮多格的儲存磁帶
,每一格位置是空白或一個符號;附帶在磁帶上的是一個可讀寫的
磁頭,它可以在磁帶的格子往左或往右,並在每次移動時讀、寫或
擦拭該格子;還有一個有限狀態控制機,可運用狀態的改變,配合
目前磁頭所在的位置,來決定這些移動讀寫的動作。
這樣一個簡單的機器,它的運算功力竟然相當於今天的數位計算機
,換句話說,目前數位計算機可以運算的方法,我們都可以在杜林
機器上實現!
杜林也提出了如何決定電腦是否會“思考”的方法,也被視為人工
智慧研究領域的基石。
在二次世界大戰時,杜林曾發展一個可以破解德軍密碼的機器,不
過世人在戰爭結束二十五年後才知曉。
他也是馬拉松運動的好手,真是多才多藝的科學家。可惜他在
1954 年時就過世,只享年 42 歲。
Edited: Wei-Pang Yang, IM.NDHU Source: Database System Concepts, Silberschatz etc. 2006 1-56