Database Concepts

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

DATABASE CONCEPTS

1.1 Introduction
A database management system (DBMS) is a collection of interrelated data and a set of
programs to access those data. This is a collection of related data with an implicit meaning and hence
is a database. The collection of data, usually referred to as the database, contains information relevant
to an enterprise. The primary goal of a DBMS is to provide a way to store and retrieve database
information that is both convenient and efficient. By data, we mean known facts that can be recorded
and that have implicit meaning.
Database systems are designed to manage large bodies of information. Management of data
involves both defining structures for storage of information and providing mechanisms for the
manipulation of information. In addition, the database system must ensure the safety of the
information stored, despite system crashes or attempts at unauthorized access. If data are to be shared
among several users, the system must avoid possible anomalous results. A database has the following
implicit properties:
1. A database represents some aspect of the real world, sometimes called the miniworld or the
universe of discourse (UoD). Changes to the miniworld are reflected in the database.
2. A database is a logically coherent collection of data with some inherent meaning. A random
assortment of data cannot correctly be referred to as a database.
3. A database is designed, built, and populated with data for a specific purpose. It has an
intended group of users and some preconceived applications in which these users are
interested.
A database can be of any size and complexity. For example, the list of names and addresses
may consist of only a few hundred records, each with a simple structure. On the other hand, the
computerized catalog of a large library may contain half a million entries organized under different
categories - by primary author’s last name, by subject, by book title - with each category organized
alphabetically. A database may be generated and maintained manually or it may be computerized. For
example, a library card catalog is a database that may be created and maintained manually. A
computerized database may be created and maintained either by a group of application programs
written specifically for that task or by a database management system.
DBMS enables users to create and maintain a database. The DBMS is a general-purpose
software system that facilitates the processes of defining, constructing, manipulating, and sharing
databases among various users and applications. Defining a database involves specifying the data
types, structures, and constraints of the data to be stored in the database. The database definition or
descriptive information is also stored by the DBMS in the form of a database catalog or dictionary; it
is called metadata. Constructing the database is the process of storing the data on some storage
medium that is controlled by the DBMS. Manipulating a database includes functions such as querying
the database to retrieve specific data, updating the database to reflect changes in the mini-world, and
generating reports from the data. Sharing a database allows multiple users and programs to access the
database simultaneously.
An application program accesses the database by sending queries or requests for data to the
DBMS. A query typically causes some data to be retrieved; a transaction may cause some data to be
read and some data to be written into the database.
Other important functions provided by the DBMS include protecting the database and
maintaining it over a long period of time. Protection includes system protection against hardware or
software malfunction (or crashes) and security protection against unauthorized or malicious access. A
typical large database may have a life cycle of many years, so the DBMS must be able to maintain the
database system by allowing the system to evolve as requirements change over time. A simplified
database system environment is shown in figure 1

Figure 1: Simplified database system environment

Applications of Database System


Database can be used in a variety of applications. Some of the applications are discussed from
the introduction above such as the library catalog system. Some other representative areas are
highlighted below:
1. Enterprise Information such as sales, accounting, human resources, manufacturing, and
online retailers.
Sales: For customer, product, and purchase information.
Accounting: For payments, receipts, account balances, assets and other accounting
information.
Human resources: For information about employees, salaries, payroll taxes, and
benefits, and for generation of pay checks.
Manufacturing: For management of the supply chain and for tracking production of
items in factories, inventories of items in warehouses and stores, and orders for items.
2. Banking and Finance
Banking: For customer information, accounts, loans, and banking transactions.
Credit card transactions: For purchases on credit cards and generation of monthly
statements.
Finance: For storing information about holdings, sales, and purchases of financial
instruments such as stocks and bonds; also for storing real-time market data to enable
online trading by customers and automated trading by the firm.
3. Universities: For student information, course registrations, and grades (in addition to
standard enterprise information such as human resources and accounting).
4. Airlines: For reservations and schedule information. Airlines were among the first to use
databases in a geographically distributed manner.
5. Telecommunication: For keeping records of calls made, generating monthly bills,
maintaining balances on prepaid calling cards, and storing information about the
communication networks.

1.2 Characteristics of a Database


The database approach has some very characteristic features which are discussed in details
below:
1. Concurrent use: A database system allows several users to access the database concurrently.
Answering different questions from different users with the same (base) data is a central
aspect of an information system. Such concurrent use of data increases the economy of a
system. An example for concurrent use is the travel database of a bigger travel agency. The
employees of different branches can access the database concurrently and book journeys for
their clients. Each travel agent sees on his interface if there are still seats available for a
specific journey or if it is already fully booked.
2. Structured and Defined data: A fundamental feature of the database approach is that the
database systems do not only contain the data but also the complete definition and description
of these data. These descriptions are basically details about the extent, the structure, the type
and the format of all data and additionally, the relationship between the data. This kind of
stored data is called metadata i.e. data about data.
3. Separation of data and applications: As described in the feature structured data the
structure of a database is described through metadata which is also stored in the database. An
application software does not need any knowledge about the physical data storage like
encoding, format, storage place etc. It only communicates with the management system of a
database (DBMS) via a standardised interface with the help of a standardised language like
SQL. The access to the data and the metadata is entirely done by the DBMS. In this way all
the applications can be totally separated from the data.
4. Data Integrity: Data integrity is a byword for the quality and the reliability of the data
of a database system. In a broader sense data integrity includes also the protection of
the database from unauthorised access (confidentiality) and unauthorised changes.
Data reflect facts of the real world. database.
5. Transactions: A transaction is a bundle of actions which are done within a database to bring
it from one consistent state to a new consistent state. When a transaction is atomic, it cannot
be divided up any further. Within a transaction all or none of the actions need to be carried
out. Doing only a part of the actions would lead to an inconsistent database state. An example
of a transaction is the transfer of an amount of money from one bank account to another. The
debit of the money from one account and the credit of it to another account makes together a
consistent transaction. This transaction is also atomic. The debit or credit alone would both
lead to an inconsistent state. After finishing the transaction (debit and credit) the changes to
both accounts become persistent and the one who gave the money has now less money on his
account while the receiver has now a higher balance.
6. Data Persistence: Data persistence means that in a DBMS all data is maintained as
long as it is not deleted explicitly. The life span of data needs to be determined
directly or indirectly be the user and must not be dependent on system features.
Additionally, data once stored in a database must not be lost. Changes of a database
which are done by a transaction are persistent. When a transaction is finished even a
system crash cannot put the data in danger.

1.3 Advantages of a DBMS


The following are the advantages of using a DBMS to manage data
1. Data independence: Application programs should be as independent as possible from details
of data representation and storage. The DBMS can provide an abstract view of the data to
insulate application code from such details.
2. Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store and
retrieve data efficiently. This feature is especially important if the data is stored on external
storage devices.
3. Data integrity and security: If data is always accessed through the DBMS, the DBMS can
enforce integrity constraints on the data. For example, before inserting salary information for
an employee, the DBMS can check that the department budget is not exceeded. Also, the
DBMS can enforce access controls that govern what data is visible to different classes of
users.
4. Data administration: When several users share the data, centralizing the administration of
data can offer significant improvements. Experienced professionals who understand the
nature of the data being managed, and how different groups of users use it, can be responsible
for organizing the data representation to minimize redundancy and fine tuning the storage of
the data to make retrieval efficient.
5. Concurrent access and crash recovery: A DBMS schedules concurrent accesses to the data
in such a manner that users can think of the data as being accessed by only one user at a time.
Further, the DBMS protects users from the effects of system failures.
6. Reduced application development time: Clearly, the DBMS supports many important
functions that are common to many applications accessing data stored in the DBMS. This, in
conjunction with the high-level interface to the data, facilitates quick development of
applications. Such applications are also likely to be more robust than applications developed
from scratch because many important tasks are handled by the DBMS instead of being
implemented by the application.

1.4 Disadvantages of a DBMS


The disadvantages of a DBMS are stated as follows:
1. Danger of an overkill: For small and simple applications for single users a database
system is often not advisable.
2. Complexity: A database system creates additional complexity and requirements. The
supply and operation of a database management system with several users and
databases is quite costly and demanding.
3. Qualified Personnel: The professional operation of a database system requires
appropriately trained staff. Without a qualified database administrator nothing will
work for long.
4. Costs: Through the use of a database system new costs are generated for the system
itself but also for additional hardware and the more complex handling of the system.
5. Lower Efficiency: A database system is a multi-use software which is often less
efficient than specialised software which is produced and optimised exactly for one
problem.
1.5 Data View of DBMS
A database system is a collection of inter-related data and a set of programs that allow users
to access and modify these data. A major purpose of a database system is to provide users with an
abstract view of the data. Data abstraction generally refers to the suppression of details of data
organization and storage, and the highlighting of the essential features for an improved understanding
of data. There are three levels of abstraction namely the physical level, logical level and view level.
Physical level of abstraction is the lowest level of abstraction which describes how the
data are actually stored. The physical level describes complex low-level data structures in
detail.
Logical level of abstraction is the next-higher level of abstraction which describes what
data are stored in the database, and what relationships exist among those data. The logical level
therefore describes the entire database in terms of a small number of relatively simple structures.
Although implementation of the simple structures at the logical level may involve complex physical-
level structures, the user of the logical level does not need to be aware of this complexity. This is
referred to as physical data independence. Database administrators, who must decide what
information to keep in the database, use the logical level of abstraction.
The view level of abstraction is the highest level of abstraction. It describes only part of the
entire database. Even though the logical level uses simpler structures, complexity remains because of
the variety of information stored in a large database. Many users of the database system do not need
all this information; instead, they need to access only a part of the database. The view level of
abstraction exists to simplify their interaction with the system. The system may provide many views
for the same database.
Figure 2 shows the relationship among the three levels of abstraction.

Figure 2: The three levels of abstraction

1.6 Instances and Schemas


Databases change over time as information is inserted and deleted. The collection of
information stored in the database at a particular moment is called an instance of the database. The
overall design of the database is called the database schema. Schemas are changed infrequently, if at
all.
The concept of database schemas and instances can be understood by analogy to a program
written in a programming language. A database schema corresponds to the variable declarations
(along with associated type definitions) in a program. Each variable has a particular value at a given
instant. The values of the variables in a program at a point in time correspond to an instance of a
database schema.
Database systems have several schemas, partitioned according to the levels of abstraction.
The physical schema describes the database design at the physical level, while the logical schema
describes the database design at the logical level. A database may also have several schemas at the
view level, sometimes called sub-schemas, that describes the different views of the database.
The logical schema is the most important schema because of its effect on application
programs, since programmers construct applications by using the logical schema. The physical
schema is hidden beneath the logical schema, and can usually be changed easily without affecting
application programs. Application programs are said to exhibit physical data independence if they do
not depend on the physical schema, and thus need not be re-written if the physical schema changes.

1.7 Data Models


A data model is a collection of concepts that can be used to describe the structure of a
database. It provides a way to describe the design of a database at the physical, logical, and view
levels. There are different types of data models and this can be classified into four different categories
namely, relational model, entity-relationship model, object-based data model and semi-structured data
model.
Relational Model: The relational model uses a collection of tables to represent both data and
the relationships among those data. Each table has multiple columns, and each column has a unique
name. The data is arranged in a relation which is visually represented in a two dimensional table. The
data is inserted into the table in the form of tuples i.e. rows. A tuple is formed by one or more than
one attributes, which are used as basic building blocks in the formation of various expressions that are
used to derive a meaningful information. There can be any number of tuples in the table, but all the
tuple contains fixed and same attributes with varying values.
The relational model is implemented in database where a relation is represented by a table, a
tuple is represented by a row, an attribute is represented by a column of the table, attribute name is the
name of the column such as ‘identifier’, ‘name’, ‘city’ etc., attribute value contains the value for
column in the row. Constraints are applied to the table and form the logical schema. In order to
facilitate the selection of a particular row/tuple from the table, the attributes i.e. column names are
used, and to expedite the selection of the rows some fields are defined uniquely to use them as
indexes, this helps in searching the required data as fast as possible.
All the relational algebra operations, such as Select, Intersection, Product, Union, Difference,
Project, Join, Division, Merge etc. can also be performed on the relational database model. Operations
on the relational database model are facilitated with the help of different conditional expressions,
various key attributes, predefined constraints etc.
Entity-Relationship Model: The entity-relationship (E-R) data model is based on a perception
of a real world that consists of a collection of basic objects, called entities, and of relationships among
these objects. An entity is a “thing” or “object” in the real world that is distinguishable from other
objects. For example, each person is an entity, and bank accounts can be considered as entities.
Entities are described in a database by a set of attributes. For example, the attributes account-
number and balance may describe one particular account in a bank, and they form attributes of the
account entity set. Similarly, attributes such as customer-name, customer-street address and customer-
city may describe a customer entity.
An extra attribute customer-id is used to uniquely identify customers (since it may be possible
to have two customers with the same name, street address, and city). A unique customer identifier
must be assigned to each customer.
A relationship is an association among several entities. For example, a depositor relationship
associates a customer with each account that she has. The set of all entities of the same type and the
set of all relationships of the same type are termed an entity set and relationship set, respectively. The
overall logical structure (schema) of a database can be expressed graphically by an E-R diagram.
Object - oriented data model: The object-oriented data model is another data model that has
seen increasing attention. The object-oriented model can be seen as extending the E-R model with
notions object-oriented data model. The object-relational data model combines the features of the
object-oriented data model and relational data model.
Semi-structured data models: This data model permits the specification of data where
individual data items of the same type may have different sets of attributes. This is in contrast with the
data models mentioned earlier, where every data item of a particular type must have the same set of
attributes. The extensible markup language (XML) is widely used to represent semi-structured data.
Historically, two other data models, the network data model and the hierarchical data model,
preceded the relational data model. These models were tied closely to the underlying implementation,
and complicated the task of modeling data. As a result, they are not usually used now, except in old
database code that is still in service in some places.

1.8 Database Languages


A database system provides a data definition language to specify the database schema and a
data manipulation language to express database queries and updates. In practice, the data definition
and data manipulation languages are not two separate languages; instead they simply form parts of a
single database language, such as the widely used SQL language.
1.8.1 Data-Definition Language (DDL)
We specify a database schema by a set of definitions expressed by a special language called a
data-definition language (DDL). For instance, the following statement in the SQL language defines
the account table:
create table account (account-number char(10), balance integer)
Execution of the above DDL statement creates the account table. In addition, it updates a
special set of tables called the data dictionary or data directory. A data dictionary contains metadata.
The schema of a table is an example of metadata. A database system consults the data dictionary
before reading or modifying actual data.
We specify the storage structure and access methods used by the database system by a set of
statements in a special type of DDL called a data storage and definition language. These statements
define the implementation details of the database schemas, which are usually hidden from the users.
The data values stored in the database must satisfy certain consistency constraints. For
example, suppose the balance on an account should not fall below N1000. The DDL provides
facilities to specify such constraints. The database systems check these constraints every time the
database is updated.

1.8.2 Data Manipulation Language (DML)


A data-manipulation language (DML) is a language that enables users to access or manipulate
data as organized by the appropriate data model. Data manipulation is:
i. The retrieval of information stored in the database
ii. The insertion of new information into the database
iii. The deletion of information from the database
iv. The modification of information stored in the database
There are basically two types of data manipulation language, procedural and declarative
DMLs. Procedural DMLs require a user to specify what data are needed and how to get those data.
Declarative DMLs (also referred to as non-procedural DMLs) require a user to specify what data are
needed without specifying how to get those data.
Declarative DMLs are usually easier to learn and use than procedural DMLs. However, since
a user does not have to specify how to get the data, the database system has to figure out an efficient
means of accessing data. The DML component of the SQL language is non-procedural.

Assignment: Write short note on data dictionary.

1.9 Database Administrators and Database Users


A primary goal of a database system is to retrieve information from and store new
information in the database. People who work with a database can be categorized as database users or
database administrators.

1.9.1 Database Users and User Interfaces


There are four different types of database-system users, differentiated by the way they expect
to interact with the system. Different types of user interfaces have been designed for the different
types of users.
Naive users are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously. For example, a bank teller who needs to
transfer N1000 from account A to account B invokes a program called transfer. This program asks the
teller for the amount of money to be transferred, the account from which the money is to be
transferred, and the account to which the money is to be transferred.
Another example is a user who wishes to find her account balance over the World Wide Web.
Such a user may access a form, where she enters her account number. An application program at the
Web server then retrieves the account balance, using the given account number, and passes this
information back to the user. The typical user interface for naive users is a forms interface, where the
user can fill in appropriate fields of the form. Naive users may also simply read reports generated
from the database.
Application programmers are computer professionals who write application programs.
Application programmers can choose from many tools to develop user interfaces. Rapid application
development (RAD) tools are tools that enable an application programmer to construct forms and
reports without writing a program. There are also special types of programming languages that
combine imperative control structures for example, for loops, while loops and if-then-else statements
with statements of the data manipulation language. These languages, sometimes called fourth-
generation languages, often include special features to facilitate the generation of forms and the
display of data on the screen. Most major commercial database systems include a fourth generation
language.
Sophisticated users interact with the system without writing programs. Instead, they form
their requests in a database query language. They submit each such query to a query processor, whose
function is to break down DML statements into instructions that the storage manager understands.
Analysts who submit queries to explore data in the database fall in this category. Online analytical
processing (OLAP) tools simplify analysts’ tasks by letting them view summaries of data in different
ways. For instance, an analyst can see total sales by region (for example, North, South, East, and
West), or by product, or by a combination of region and product (that is, total sales of each product in
each region). The tools also permit the analyst to select specific regions, look at data in more detail
(for example, sales by city within a region) or look at the data in less detail (for example, aggregate
products together by category). Another class of tools for analysts is data mining tools, which help
them find certain kinds of patterns in data.
Specialized users are sophisticated users who write specialized database applications that do
not fit into the traditional data-processing framework. Among these applications are computer-aided
design systems, knowledge base and expert systems, systems that store data with complex data types
(for example, graphics data and audio data), and environment-modeling systems.

1.9.2 Database Administrator


One of the main reasons for using DBMS is to have central control of both the data and the
programs that access those data. A person who has such central control over the system is called a
database administrator (DBA). The functions of a DBA include the following:
i. Schema definition: The DBA creates the original database schema by executing a set
of data definition statements in the DDL.
ii. Storage structure and access-method definition.
iii. Schema and physical organization modification: The DBA carries out changes to the
schema and physical organization to reflect the changing needs of the organization, or
to alter the physical organization to improve performance.
iv. Granting of authorization for data access: By granting different types of
authorization, the database administrator can regulate which parts of the database
various users can access. The authorization information is kept in a special system
structure that the database system consults whenever someone attempts to access the
data in the system.
v. Routine maintenance: Examples of the database administrator’s routine maintenance
activities are: periodically backing up the database, either onto tapes or onto remote
servers, to prevent loss of data in case of disasters such as flooding, ensuring that
enough free disk space is available for normal operations, and upgrading disk space
as required, monitoring jobs running on the database and ensuring that performance is
not degraded by very expensive tasks submitted by some users.

You might also like