Bca Unit 1 Notes
Bca Unit 1 Notes
Bca Unit 1 Notes
UNIT-1:
DATA:
Data can be defined as a collection of raw facts. Raw
facts refer to a collection of numbers, characters,
images or other outputs.
In other words, data are facts and figures that are not
currently being used in a decision process and take
the form of historical records that are recorded and
filed.
Data is often viewed as the lowest level of abstraction from which information and
knowledge are derived. Data is limitless and present everywhere in the universe. We can
consider data as groups of information that represent the attributes of a variable or set of
variables. For Ex: Names, Telephone numbers and Addresses of your friends.
INFORMATION:
Information the word is derived from Latin word informer which means give form to.
Hence, here we are giving some meaningful form to meaningless data.
It is the collection of processed data gathered through various means of communication. In
other words, information is the processed data on which decisions are taken and actions are
performed thereafter. Information is organized so that it can be meaningful and has some
value to the recipient. The characteristics of information are as follows:
1) Accurate: To be useful, information must be accurate at all levels. Accurate
information provides a reliable and valid representation of raw facts. The cost of
inaccurate or distorted information can be extremely high.
2) Timely: Information is appreciated only if it is available on time. If information is
available ahead of time its value may be diminished.
3) Complete: Without complete information, a decision maker may get a distorted view
of reality which may lead to huge losses.
4) Precise: Information should be precise, containing all the essential elements of
relevant subject areas. We should not bury important information in the stacks of large
data.
Thus, it provides power to find and evaluate the problems and make decisions effectively and
efficiently.
INFORMATION
It is raw in nature.
It is processed data.
It is not used in decision making.
Decisions are made on the basis of information.
It gives birth to information.
When absorbed gives birth to knowledge.
They are recorded and filed.
They are retrieved and processed.
They are not organized and are of They are organized and are of
no significance to business.
large significance to business.
DATABASE:
A database is an organized collection of logically related data so that it can be easily managed
and updated. Database has some source from which data is derived, some degree of
interaction with events in the real world and an audience that is actively interested in the
contents of the database. A database is a structured collection of data.
E.g.: Dictionary, Student record registers.
Features of database:
1. Shared:
Data in database can be shared among different users.
2. Persistence:
Data exists permanently.
3. Security:
Data is protected from unauthorized access.
4. Non redundancy:
No duplicity of data.
5. Independent:
Data is independent at each level so that the changes made to one level does not reflect the
other.
Defining a database:
Specifying the data types, structure and constraints for data to be stored in a database.
Constructing:
Storing data on some storage device that is controlled by the DBMS.
Manipulating:
Functions like querying, updating and generating reports from database.
Sharing:
Multiple users and programs access database concurrently.
Protection:
System protection from hardware failures and security protection.
Maintenance:
Allowing system to grow with changing requirements.
CHARACTERISTICS OF DATABASES:
1. Self describing nature of the database system:
The database system contains not only the database itself but also a complete description of
the database structure and constraints. This definition is stored in the DBMS catalog and is
called as metadata as it describes the actual structure of database.
2. Data isolation:
The structure of the DBMS files is stored in the DBMS catalog separately from the access
programs. This property is also known as program-data independence.
3. Support multiple views of data:
A database generally has many users each of which many require a different view of the
database. A view may be defined as a subset of the database from the database files but is not
3
stored separately. A multiuser DBMS generally provides this facility of providing multiple
views.
4. Data sharing and multi user transaction processing:
Since DBMS provides multiple views at the same time, so it must include concurrency
control software to ensure that several users trying to update the same data in do so in a
controlled manner.
COMPONENTS OF DBMS:
Hardware: The hardware is the actual computer system used for keeping and accessing
the database. Can range from a PC to a network of computers.
Software: This includes DBMS, operating system, network software (if necessary) and
also the application programs.
Data: It is used by the organization and a description of this data is called the schema.
Procedures: These are the instructions and rules that should be applied to the design and
use of the database and DBMS.
NEED OF DATABASES:
Database and Database Management Systems (DBMS) have become essential for
managing our business, government, banks, universities and every other kind of
human endeavor.
They are a critical element of todays software industry to solve the problems of
managing huge amounts of data that are increasingly being stored.
A Database System is a central repository in an organizations information system and
is essential for supporting the organizations functions, maintaining the data for these
functions and helping users interpret the data in decision making.
around
the
world
1.Banking:
For customer information, accounts loans and banking transactions.
2.Airlines:
For reservations and schedule information.
in
differnt
sectors:
3.Universities:
For student information, course registrations and grades.
4.Credit card transactions:
For purchases on credit cards and generation of monthly statements.
5.Telecommunications:
For keeping records of calls made, generating monthly bills, maintaining balances on prepaid
calling cards and storing information about the communication networks.
6.Finance:
For storing information about holdings, sales and purchase of financial instruments such as
stocks and bonds.
7.Sales:
For customer, product and purchase information.
8.Manufacturing:
For management of supply chain and for tracking production of items in factories,
inventories of items in warehouses/stores and orders for items.
9.Human Resources:
For information about employees, salaries, payroll taxes and benefits and for generation of
paychecks.
10.Web based services:
For taking web users feedback,responses,resource sharing etc.
ADVANTAGES OF DBMS:
Data independence:
It provides an abstract view of the data that hides the details of data representation and
storage. The data should be such that the changes made to it at one level should not
affect other levels.
Data Access:
It provides us with a fast and efficient data access. For eg: if a bank officer wants to
know the number of customers whose a/c balance is Rs 1000 or more, he can simply
make a query and he will be provided with it.
Data integrity:
The data values stored in the database must satisfy certain types of consistency
constraints. For eg: the balance of a bank a/c may never fall below Rs 500.
Data security:
Not every type of database user should be allowed to access all the data. For eg: in a
college management system, students should not be allowed to access faculty details.
LIMITATIONS OF DBMS:
Although there are many advantages of DBMS, the DBMS may also have some minor
disadvantages. These are:
1.Cost of Hardware & Software:
A processor with high speed of data processing and memory of large size is required to run
the DBMS software. It means that you have to upgrade the hardware used for file-based
system. Similarly, DBMS software is also very costly.
2. Cost of Data Conversion:
When a computer file-based system is replaced with a database system, the data stored into
data file must be converted to database file. It is very difficult and costly method to convert
data of data files into database. You have to hire database and system designers along with
application programmers. Alternatively, you have to take the services of some software
house. So a lot of money has to be paid for developing software.
3. Cost of Staff Training:
Most DBMS are often complex systems so the training for users to use the DBMS is
required. Training is required at all levels, including programming, application development,
and database administration. The organization has to be paid a lot of amount for the training
of staff to run the DBMS.
4. Appointing Technical Staff:
The trained technical persons such as database administrator, application programmers, data
entry operators etc. are required to handle the DBMS. You have to pay handsome salaries to
these persons. Therefore, the system cost increases.
5. Database Damage:
In most of the organizations, all data is integrated into a single database. If database is
damaged due to electric failure or database is corrupted on the storage media, then your
valuable data may be lost forever.
5. Lack of Security:
Anyone can easily access some confidential /important data
other department.
DBMS SYSTEM
data
is
10
11
2. Data availability and recovery from failure: The DBA must take steps to ensure that if
system fails, users can continue to access as much as of the uncorrupted data as
possible. The DBA must also work to restore the data to a consistent state. The DBA
is also responsible for implementing procedures to back up the data periodically.
3. Database tuning: Database tuning describes a group of activities used to optimize and
homogenize the performance of a database. The goal is to maximize use of system
resources to perform work as efficiently and rapidly as possible. The needs of the
users are likely to evolve with time the DBA is responsible for modifying the
database, in particular the conceptual and physical schemas to ensure the adequate
performance as user requirements change.
2) Database designers: Responsible for designing the database, identifying the data
to be stored, choosing the structures to represent and store this data.
3) End Users: The persons that use the database for querying, updating, generating reports,
etc. The various types of end users are:
Casual end users: These users occasionally access the database and need different
information each time. They learn only a few facilities that they may be used
repeatedly. They use a sophisticated database query language to specify their requests
and are typically middle- or high-level managers or other occasional browsers.
Parametric (or naive) end users: Users who constantly query and update the database,
using standard types of queries and updates called canned transactions that have been
carefully programmed and tested. They need to learn very little about the facilities
provided by the DBMS. For example: A user of ATM falls in this category. The user is
instructed through each step of a transaction. The operations performed by this class
of user are very limited. Other such nave users are end user of the database who
works through a menu-oriented application program.
Sophisticated end users: They include Engineers, scientists, business analysts, and
others who thoroughly familiarize themselves with the facilities of the DBMS so as to
implement their applications to meet their complex requirements. Use full DBMS
capabilities for implementing complex applications.
12
6) DBMS system designers and implementers are persons who design and implement the
DBMS modules and interfaces as a software package. A DBMS is a complex software
system that consists of many components or modules, including modules for
implementing the catalog, query language, interface processors, data access,
concurrency control, recovery, and security. The DBMS must interface with other
system software, such as the operating system and compilers for various programming
languages.
7) Tool developers include persons who design and implement toolsthe software
packages that facilitate database system design and use, and help improve
performance. Tools are optional packages that are often purchased separately. They
include packages for database design, performance monitoring, natural language or
graphical interfaces, prototyping, simulation, and test data generation. In many cases,
independent software vendors develop and market these tools.
8) Operators and maintenance personnel are the system administration personnel who
are responsible for the actual running and maintenance of the hardware and software
environment for the database system.
DBMS ARCHITECTURE:
A commonly used view of data approach is the three-level architecture suggested by
ANSI/SPARC (American National Standards Institute/Standards Planning and Requirements
Committee).
Objectives:
13
Under this approach, a database is considered as containing data about an enterprise. The
three levels of the architecture are three different views of the data:
1. External individual user view.
2. Conceptual logical user view.
3. Internal physical or storage view.
The three level database architecture allows a clear separation of the information meaning
(conceptual view) from the external data representation and from the physical data structure
layout. A database system that is able to separate the three different views of data is likely to
be flexible and adaptable. This flexibility and adaptability is data independence that we will
discuss further.
ADVANTAGES OF THREE-TIER SCHEME:
Each user is able to access the same data with a different view of the data as per their
requirements.
User is not concerned about the physical storage details.
Internal structure of the database is unaffected by changes to the physical storage
organization, such as changeover to a new storage device.
DBA is able to change the database storage structure without affecting the users view.
We now briefly discuss the three different views.
1. External Level or View Level or user view:
The external level is the view that the individual user of the database has. This view is
often a restricted view of the database and the same database may provide a number
of different views for different classes of users. In general, the end users and even the
applications programmers are only interested in a subset of the database. For example,
a department head may only be interested in the departmental finances and student
enrolments but not the library information. The librarian would not be expected to
have any interest in the information about academic staff. The payroll office will have
no interest in student enrolments.
2. Conceptual Level or Logical View: It describes the structure of the whole database of
users. This schema hides the details of physical storage structure and concentrate on
describing entities, data types, what data is stored in database and the relationships
among the data. This level contains the logical structure of the entire database as seen
by the DBA. The conceptual view represents:
All entities, their attributes, and their relationships.
The constraints on the data
Semantic information about the data
Security & integrity information
E.g. In case of student database entity is Student and attributes for this entity are
Roll No, Name, Course, Address etc.
14
Data Field
Data Type
Size
Constraint
Roll No
Number
10
unique
Name
Text
15
Not null
Course
Text
10
Not null
Conceptual/Internal Mapping:
The conceptual schema is related to the internal schema through
conceptual/internal mapping. It defines the correspondence between the
conceptual view and the stored database. It also enables DBMS to find the actual
records or combination of records in physical storage that constitutes a logical
record in the conceptual schema, together with any constraints to be enforced on
the operations for that logical record. In case of any change in the structure of the
stored database, the conceptual/internal mapping is also changed accordingly by
the DBA, so that the conceptual schema can remain invariant and effects of
changes to internal schema.
(ii)
External/Conceptual Mapping:
Each external schema is related to the conceptual schema by external/conceptual
mapping. It defines the corresponding between a particular external view and
conceptual view. A number of external views can exist at the same time, any
number of users can share a given external view and different external views can
overlap. There could be one mapping between conceptual and internal levels and
several mappings between external and conceptual levels.
Thus the Conceptual/Internal Mapping is the key to the physical data independence while the
External/Conceptual Mapping is the key to logical data independence.
15
DATA INDEPENDENCE:
Data independence can be defined as the capacity to change the schema at one level of a
database system without having to change the schema at the next higher level, thus insulating
application programs from changes in the way data is structured and stored. It is
accomplished by changing the mapping between the two levels. Data Independence is a
major objective of implementing DBMS in an organization. It is the type of data transparency
that matters for a centralized DBMS. The two types of data independence are:
(i)
(ii)
Data independence is accomplished because; when the schema is changed at one level the
schema at the next higher level remains unchanged; only the mapping between the two levels
change.
Course Schema:
Roll No
Phone no
Address
(number)(4)
(number)(10)
(text)(10)
Course
Course Id
Department
16
Student Schema:
INSTANCE: Database changes over time when information is inserted or deleted. The
collection of information stored in the database at a particular moment is called an instance of
the database. For eg:
Instance of Student Database:
Rohit
6789123
9897651230
STRUCTURE OF DBMS:
DBMS (Database Management System) acts
as an interface between the user and the
database. The user requests the DBMS to
perform various operations (insert, delete,
update and retrieval) on the database. The
components of DBMS perform these
requested operations on the database and
provide necessary data to the users. The
various components of DBMS are shown
below: 1. DDL Compiler - Data Description Language compiler processes schema definitions
specified in the DDL. It converts the data definition statements into a set of tables.
2. DML Compiler and Query optimiser - The DML commands such as insert, update, delete,
retrieve from the application program are sent to the DML compiler for compilation into
object code for database access. The object code is then optimised in the best way to execute
a query by the query optimiser and then send to the data manager.
3. Data Manager - The Data Manager is the central software component of the DBMS also
knows as Database Control System.
The Main Functions of Data Manager is:
Convert operations in user's Queries coming from the application programs or
combination of DML Compiler and Query optimizer which is known as Query
Processor from user's logical view to physical file system.
Data - names of the tables, names of attributes of each table, length of attributes, and
number of rows in each table.
Relationships between database transactions and data items referenced by them which
is useful in determining which transactions are affected when certain data definitions
are changed.
Data dictionary is used to actually control the data integrity, database operation and accuracy.
It may be used as a important part of the DBMS.
Importance of Data Dictionary
Data Dictionary is necessary in the databases due to following reasons:
It improves the control of DBA over the information system and user's understanding
of use of the system.
It helps in documenting the database design process by storing documentation of the
result of every design phase and design decisions.
It provides great assistance in producing a report of which data elements (i.e. data
values) are used in all the programs.
18
DATABASE LANGUAGES:
To support a variety of users, DBMS must provide appropriate languages and interfaces for
each category for users to express database queries and updates.
Following languages are used to specify database schemas:
i)
ii)
iii)
iv)
19
DATA MODELS:
Data modeling is used to represent entities and their relationships in a database. A data
model is a conceptual model for structuring data.
A number of models for data representation have been developed. The models differ
in their method of representing the association amongst entities and attributes.
It is a set of concepts that can be used to describe the structure of a database. Structure
of database includes:
data types
relationships
constraints
20
(I)
Hierarchical Model:
The hierarchical model is used to describe those record structures in which the
various physical records which make up the logical record are tied together in a
sequence which looks like an inverted tree. At the top of the structure is a single
record. Beneath that are one or more records each of which can occur one or more
times. Each of these can in turn have multiple records beneath them. In
diagrammatic form, the top to bottom set of records looks like an inverted tree or a
pyramid of records. The various records in the lower part of the structure are
accessed by first accessing the records above them and then following the chain of
pointers to the records at the next lower levels. The records at any given level are
referred to as the parent records and the records at the next lower level that are
connected to it, or dependent on it are referred to as its children or the child
records. There can be any number of records at any level, and each record can
have any number of children. Each occurrence of the structure normally represents
the collection of data about a single subject. This parent-child repetition can be
repeated through several levels.
21
1.
2.
3.
4.
5.
6.
Advantages:
Simplicity
Data Sharing
Data Security
Data Independence
Data Integrity
Efficiency
1.
2.
3.
4.
5.
6.
7.
8.
9.
Disadvantages:
Data Relationships are Difficult to Modify
Queries Restricted to Traversing the Hierarchy
Multiple Parents not Allowed
Implementation Complexity
Inflexibility
Database Management Problems
Lack of Structural Independence
Implementation Limitation
No Standards
(II)
Network Model:
The popularity of the network data model coincided with the popularity of the
hierarchical data model. Some data were more naturally modeled with more than
one parent per child. So, the network model permitted the modeling of many-tomany relationships in data. In 1971, the Conference on Data Systems Languages
(CODASYL) formally defined the network model. The basic data modeling
22
construct in the network model is the set construct. A set consists of an owner
record type, a set name, and a member record type. A member record type can
have that role in more than one set; hence the multi parent concept is supported.
An owner record type can also be a member or owner in another set. The data
model is a simple network, and link and intersection record types (called junction
records by IDMS) may exist, as well as sets between them. Thus, the complete
network of relationships is represented by several pair wise sets; in each set some
(one) record type is owner (at the tail of the network arrow) and one or more
record types are members (at the head of the relationship arrow). Usually, a set
defines a 1: M relationship, although 1:1 is permitted.
Advantages
i)
Simplicity
ii)
Facilitating more relationship types
iii)
Superior data access
iv)
Database Integrity
v)
Data Independence
vi)
Database Standards
Emp
loye
e1
Empl
oyee
2
Proj
ect
1
Pr
oje
ct
2
Disadvantages
i)
System Complexity
ii)
Absence of structural independence
iii)
Less User-Friendly
SSN EDATE
AD
D
SEX
SALARY DNO
DEPARTMENT
DN
DNAME MGRSSN
O
Now the network model is as SUPERVISOR
follows:
SUP_SSN
WORKS_ON
(III) Relational Model:
SSN PNO
HOURS
Relational
Model is introduced by Ted
Codd of IBM in 1970. Central concept is a relation, which is actually a set
23
ADVANTAGES:
Simplicity
Structural Independence
Ease of design, implementation, maintenance and uses
Flexible and powerful query capability
No anomalies
DISADVANTAGES:
Hardware Overheads
Easy-to-design capability leading to bad design
Properties of Relational Tables:
The RELATIONAL database model is based on the Relational Algebra. For example, an
"orders" table might contain (customer-ID, product-code) pairs and a "products" table might
24
contain (product-code, price) pairs so to calculate a given customer's bill you would sum the
prices of all products ordered by that customer by joining on the product-code fields of the
two tables.
SSN
DEPARTMENT
DNO
DNAME
MGRSSN
EDATE
ADD
SEX
SALARY
DNO
DEPT_LOCATION
DNO
DLOCATION
PROJECT
PNAME
PNO
WORKS_ON
SSN
PNO
Data
Models
PLOCATION
DNO
HOURS
25
Access
Language
Data
Independence
Structural
Independence
Hierarchica
l
Files, Records
Logical
Recordproximity in a based
linear tree.
Procedural
Yes
No
Files, Records
Intersecting
Networks
Recordbased
Procedural
Yes
No
Tables
Identifiers of
rows on one
table are
embedded as
attribute values
in another table
Valuebased
NonProcedural
Yes
No
Network
Relational
ENTITY-RELATIONSHIP MODEL:
ER Model is a popular high level conceptual data model. ER model describes the data as
entity, relationships and attributes. The basic objects that the ER model represents are entity
and attributes. The entity-relationship (ER) data model allows us to describe the data
involved in a real-world enterprise in terms of objects and their relationships and is widely
used to develop an initial database design.
26
Example:
Person: STUDENT, EMPLOYEE, CLIENT
Object: COUCH, AIRPLANE, MACHINE
Place: CITY, NATIONAL PARK, ROOM, WAREHOUSE
2. Entity Type and Entity Set:
An entity type defines a collection of entities that have same attributes. An entity
instance is a single item in this collection. An entity set is a set of entity instances i.e.
a collection of similar entities.
Example: STUDENT is an entity type; a student with ID number 555-55-5555 is an entity
instance; and a collection of all students is an entity set.
27
In the above figure there are two entities types named employee and company with the list of
attributes.
Entity set:
The collection of all entities of a particular entity type in the database at any point of time is
called an entity set or extension of the entity type.
3. Attributes:
Attribute names (or simply attributes) are properties of entity types. An attribute is a
property or characteristic of an entity type that is of interest to an organization. Some
attributes of common entity types include the following:
Example:
STUDENT = {Student ID, Name, Address, Phone, Email, DOB}
ORDER = {Order ID, Date of Order, Amount of Order}
ACCOUNT = {Account Number, Account Type, Date Opened, Balance}
CITY = {City Name, State, Population}
Types of Attributes:
(i)
(ii)
28
(iii)
(iv)
4. Relationships:
It represents an association among two or more entities. E.g. association between
teacher and student.
Relationship set and Relationship Type:
A relationship set is a grouping of all matching relationship
instances, and the term relationship type refers to the
relationship between entity types.
5. Degree of a Relationship:
29
The number of entity sets that participate in a relationship is called the degree of
relationship. The three most common degrees of a relationship in a database are unary
(degree 1), binary (degree 2), and ternary (degree 3).
a) Unary Relationship:
A unary relationship R is an
association between two instances of
the same entity type. This type of
relationship is called a recursive
relationship. For example, Employee
reports to Employee.
b) Binary Relationship:
A binary relationship R is an
association between two instances of
two different entity types. For example,
in college, a binary relationship exists
between a student (STUDENT entity)
and an instructor (FACULTY entity) of a single class; an instructor teaches a student.
c) Ternary Relationship:
A ternary relationship R is an association between three instances of three different
entity types. For example, consider a student using certain equipment for a project. In
this case, the STUDENT, PROJECT, and EQUIPMENT entity types relate to each
other with ternary relationships: a student checks out equipment for a project.
6. Role Name:
30
The role name signifies the role that a participating entity plays in each relationship
instance and helps to explain what the relationship means. For example, In the
WORKS_FOR relationship type, EMPLOYEE plays the role of employee or worker
and DEPARTMENT plays the role of department or employee.
Constraints on Relationship Types:
Relationship types usually have certain constraints that limit the possible combinations of
entities that may participate in the corresponding relationship set. For example, If the
company has a rule that each employee must work for exactly one department then we would
like to describe the constraints in the schema. Two types of relationship constraints:
1) Cardinality ratio
2) Participation
7. Cardinality of a Relationship:
The term cardinal number refers to the number used in counting. The cardinality of
relationship represents the minimum/maximum number of instances of entity A that
must/can be associated with any instance of entity B.
Types of Cardinality:
a. One-to-One Relationship:
In a one-to-one relationship, at most one instance of entity A can be associated with a
given instance of another entity B and vice versa. Ex:
1. One Person is married to one person only.
2. Manager manages one Department.
3. One teacher teaches one students.
b. One-to-Many Relationship:
In a one-to-many relationship, many instances of entity B can be associated with a given
instance of entity A. However, only one instance of entity A can be associated with a given
instance of entity B. Ex:
1. One Employee works on many projects.
31
8. Participation Constraints:
The participation constraint specifies whether the existence of an entity depends on it
being related to another entity via the relationship type.
some or part of the set of employee entity are related to a department entity via
MANAGES, but not necessarily all.
SYMBOLS USED IN E-R DIAGRAM:
Transaction number.
33
II.
III.
Transaction date.
Transaction amount.
Though each transaction is distinct but different transactions on different accounts could
share the same number. Thus, this entity does not have a primary key. Thus transaction is a
weak entity set.
A member of a strong entity set is a dominant entity. A member of a weak entity set is a
subordinate entity. A weak entity set does not have a primary key, but we need a means of
distinguishing among the entities. The discriminator of a weak entity set is a set of
attributes that allows this distinction to be made.
So --------The primary key of a weak entity set is formed by taking the primary key of
the strong entity set on which the existence of weak entity depends plus weak entity sets
discriminator.
For Ex:
Consider the following:
In this example, (Loan-no & Payment no) acts as primary key for payment entity set. The
relationship between weak and strong entity set is called an Identifying Relationship.
further
into
SECRETARY,
SALARIED_EMPLOYEE and so on.
ENGINEER,
MANAGER,
TECHNICIAN,
The set of entities in the latter groupings is a subset of ENTITIES that belong to the
EMPLOYEE entity set, meaning that every entity that is a member of one of these sub
groupings is also an employee. We call each of these sub groupings a subclass of the
EMPLOYEE entity type and the EMPLOYEE entity type is called the super class for each of
these subclasses.
We call the relationship between a subclass and a super class as a super class/ subclass
relationship.
GENERALIZATION, SPECIALIZATION AND AGGREGATION:
1. Specialization:
It is the process of defining a set of subclasses or subsets of a super-class.
The set of subclasses is based upon some distinguishing characteristics of the entity in
the super class.
It is a top down process of defining super classes and their related subclasses.
Example: {SECRETARY, ENGINEER, TECHNICIAN} is a specialization of
EMPLOYEE based upon job type.
2. Generalization:
The reverse of the specialization process is a generalization.
Generalization refers to the process of identifying some common characteristics of a
collection of entity sets and creating a new entity set that contains entities processing
these common features.
Several classes with common features are generalized into a super class; original
classes become its subclasses.
Example: CAR, TRUCK generalized into VEHICLE; both CAR and TRUCK become
subclasses of the super class VEHICLE.
We can view {CAR, TRUCK} as a specialization of VEHICLE
35
OR
3. Aggregation:
Aggregation is the abstraction concept for building a higher level object by compiling
information on an object.
One disadvantage of the ER model is that it cannot express relationships among
relationships. So this problem is overcome here.
Aggregation shows a has-a or is part-of relationship between entity types where one
represents the whole and other represents the part.
There are cases where this concept can be related:
36
1) The situations in which we aggregate attribute values of an object to form the whole
object.
2) When we represent an aggregation relationship as an ordinary relationship.
We call this relationship between the primitive object and their aggregate object
IS_A_PART_OF or inverse is called IS_A_COMPONENT_OF.
SIGNIFICANCE OF ER MODEL:
1. An ER model maps well to the relational model i.e. it can be easily transformed into
tables.
2. An ER model can be easily used by the database designer to communicate the design
to the end user.
3. An ER model can be used as a design plan to implement a data model in DBMS
software.
ENTITY v/s ATTRIBUTE:
Sometimes it may not be clear whether a property should be modeled as an attribute or as an
entity. For Ex: consider adding address to the employee entity. Now, one option is to use an
attribute address but it is only appropriate if we need to record only one address per
employee. Another alternative is to create an entity address and to record associations
between employees and addresses using any relationship. This complex alternative is
necessary in two situations:
1. We have to record more than one address for an employee.
37
2. We want to capture the structure of an address in our ER diagram. For Ex: we might
break down an address into city, state, country and zip code. By representing an
address as an entity with these attributes, we can support queries such as Find all
employees with an address in New Delhi.
EXTRA PART:
Q: WHEN NOT TO USE A DBMS:
38
Q: DATA ABSTRACTION:
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led
designers to use complex data structures to represent data in the database. Since many
database-systems users are not computer trained, developers hide the complexity from users
through several levels of abstraction, to simplify users interactions with the system:
Physical Level: The lowest level of abstraction describes how the data are actually
stored. The physical level describes complex low-level data structures in detail.
Logical Level: The next-higher level of abstraction describes what data are stored in
the database, and what relationships exist among those data. The logical level thus
describes the entire database in terms of a small number of relatively simple
structures. Although implementation of the simple structures at the logical level may
involve complex physical-level structures, the user of the logical level does not need
to be aware of this complexity. Database administrators, who must decide what
information to keep in the database, use the logical level of abstraction.
View Level: The highest level of abstraction describes only part of the entire
database. Even though the logical level uses simpler structures, complexity remains
because of the variety of information stored in a large database. Many users of the
database system do not need all this information; instead, they need to access only a
part of the database. The view level of abstraction exists to simplify their interaction
with the system. The system may provide many views for the same database.
39
40