Unit 5 Database and SQL
Unit 5 Database and SQL
1. Redundancy in defining and storing data , results in wastage of disk storage space.
2. Data definition is actually a part of application programs themselves and hence file
processing software can access only specified fields.
3. The structure of data files is embedded in the application programs itself and so any change
to the structure of a file may require changing all programs that manipulate this file.
5. Data for multiple applications may not be integrated and maintained in a single file.
DBMS : A database management system is the software system that allows users to define,
create and maintain a database and provides controlled access to the data. So , Database
Management System is a software package that allows data to be effectively stored , retrieved
and manipulated .The data contained in a DBMS package can be accessed by multiple
application programs and users. The DBMS serves as the intermediary between the user and the
database. Application packages such as SQL Server , Oracle , MS-access are some commercially
available DBMS packages.
Various type of DBMS are :
1. Hierarchical DBMS
2. Network DBMS
3. Relational DBMS
4. Object Oriented DBMS
5. Distributed DBMS
Database system is a generic term that is typically used to encapsulate the constructs of a
database , DBMS and data Model.
Characteristic of Database system :
Self-describing nature of a database system : The fundamental characteristic of a database is
that the database system contains not only the database itself but also a complete definition to
description of the database structure and constraints in the catalog called metadata.
Insulation between programs , Data and Data abstraction : The structure of data files is
embedded in the application programs itself and so any change to the structure of a file may
require changing all programs that manipulate this file. However , in case of DBMS does not
require such changes in most of cases. The structure of data files is stored in the DBMS catalog
separately from the accessing programmes.This property is called a program data independence.
This characteristic that allows program data independence is called data abstraction.
Support multiple views : A database provides multiple views required by different users. A
view may be subset of the database or it may contain virtual data that is derived from the
database files , but not explicitly stored.
Sharing of Data and Multi User Transaction Processing : Databases allow sharing of data and
also multiple users to access the database at same time.
Advantages of Database systems :
1. Database reduces the data redundancy to a large extent : Redudancy is duplication of data.
The database systems do not maintain separate copies of the same data. Rather , all data are kept
at one place and all the applications that require data refer to the centrally maintained database .
2, Databases can control inconsistencies to a large extent : Duplication of data may lead to
inconsistency because there may be situation when data will be updated at one place and at other
place same record is not updated. So by controlling redundancy inconsistency is controlled.
3. Databases facilitate sharing of data : Data can be shared between various users and each
user may use same piece of data for different purposes.
4. Databases enforce standards : The database management system can ensure that all the data
follow the applicable standards which will aid data interchange or migration between
systems.This will also result in uniformity of entire database as well its usage.
5. Databases can ensure data security : DBMS can ensure data security and privacy by
restricting the unauthorized access to data.
6. Integrity can be maintained through database : Integrated database leads to unification of
several otherwise distinct data files with any redundancy among those files partially or wholly
removed, into a one file. There should be no harm to data even if there occurs some hardware
failures.
7. Database enforces integrity checks : The database management systems designs certain
integrity checks to ensure that data values confirm to certain specified rules .
8. Databases provide backup and recovery : Hardware failures and various types of accidents
will occur occasionally. The storage of data and its updation procedures , defined by the
database, are such that the system can easily recover from these circumstances without harm to
data.
1. Increased costs :Database systems require sophisticated hardware and software and highly
skilled personnel. The cost of maintaining the hardware, software, and personnel required to
operate and manage a database system can be substantial. Training, licensing, and regulation
compliance costs are often overlooked when database systems are implemented.
2. Complexity : The provision of the functionality that is expected of a good DBMS makes the
DBMS an extremely complex piece of software. Database designers, developers, database
administrators and end-users must understand this functionality to take full advantage of it.
Failure to understand the system can lead to bad design decisions, which can have serious
consequences for an organization.
3. Size : The complexity and breadth of functionality makes the DBMS an extremely large piece
of software, occupying many megabytes of disk space and requiring substantial amounts of
memory to run efficiently.
4. Cost of Data Conversion :When a computer file-based system is replaced with database
system, the data stored into data file must be converted to database file. It is very difficult and
costly method to convert data of data file into database. You have to hire database system
designers along with application programmers. Alternatively, you have to take the services of
some software house. So a lot of money has to be paid for developing software.
Relational data model : It is the most widely used data model. Relational data model was
propounded by E.F.Codd. Relational data model has established itself as the primary data
model for commercial data processing applications. All the data items and relationship among
them are represented in a two dimensional table called a relation. A Row in a table represents a
relationship among a set of values. Rows of relation are generally referred as tuples and
columns are usually referred to as attributes. The number of tuples in a relation is called the
cardinality of the relation. The number of attributes in a relation determine the degree of a
relation. RDBMS ( Relational Database Management System ) is a software package used to
store and retrieve data that is organized in the form of tables.
1. Simple Model: The simplest model of the relational database does not require any
complex structure or query to process the databases. It has a simple architectural process
as compared to a hierarchical database structure. Its simple architecture can be handled
with simple SQL queries to access and design the relational database.
2. Data Accuracy: Relational databases can have multiples tables related to each other
through primary and foreign keys. There are fewer chances for duplication of data fields.
Therefore the accuracy of data in relational database tables is greater than in any other
database system.
3. Easy to access Data: The data can be easily accessed from the relational database, and it
does not follow any pattern or way to access the data. One can access any data from a
database table using SQL queries. Each table in the associated database is joined through
any relational queries such as join and conditional descriptions to concatenate all tables to
get the required data.
4. Security: It sets a limit that allows specific users to use relational data in RDBMS.
5. Collaborate: It allows multiple users to access the same database at a time.
Domain :A domain is the original sets of atomic values used to model data. By atomic value, we
mean that each value in the domain is indivisible as far as the relational model is concerned. For
example:
The domain of Marital Status attribute has a set of possibilities: Married, Single,
Divorced.(Which are unique / atomic)
The domain of Shift has the set of all possible days: {Mon, Tue, Wed…}.
The domain of Class attribute is values from 1st to 10th .
The domain of month-of-year can accept January , Feb to December as possible values.
These values , being unique to that attribute are referred to as atomic values, So ,
a domain is a set of acceptable values that a column is allowed to contain.
Base Table : A base table is a table that usually has a permanent memory representation and
description.
View : A view is a kind of table whose contents are taken from other tables ( base tables)
depending on condition. Views do not contain data of their own. They are also referred as
virtual table. There is no stored file created for storing contents of view , only definition of view
is stored. Thus views are also referred as virtual table. So we can say , View is nothing more than
a SQL statement that is stored in the database with an associated name.
A view can contain all rows of a table or select rows from a table. A view can be created
from one or many tables which depends on the written SQL query to create a view.
Create VIEW nameof view AS
SELECT * FROM nameoftable
Where condition ;
If a created View is not needed any more, we can delete or drop a View using the
DROP statement. Similarly we can update a view , inset a row , delete a row from a
view.
Keys : A key is that data item that exclusively identifies a record. e.g : Account_No , Emp_No,
Customer_No are used as key fields because they specifically identify a record stored in a
database. Keys enable tables in database to be related with each other.
Primary Key : Within a given relation , a set of one or more attributes having values that are
unique within a relation and thus are able to uniquely identify that tuple, is said to be primary
key of the relation. The primary key is non-redundant i.e it does not have duplicate values in the
same relation.
The primary key must be chosen such that its attributes are never or very rarely changed. e.g
Emp_code should be chosen as a primary key, as does not change till employee is working in
organization.
In some tables, when a tuple cannot be uniquely identified by a single field , in such cases
combination of more than one attribute provides a unique value for each row . In such tables , the
group of these attributes is declared as primary key. In such cases, the primary key consists of
more than one attribute , it is called composite–primary–key. The Primary key cannot allow
NULL values.
Candidate Key : Occasionally we may encounter a relation in which there is more than one
attribute possessing the unique identification property. In such cases , the database analyst
decides one of them as the primary key for the relation. In case of two or more candidate keys ,
only one of them serves as the primary key. The rest of them are Alternate key.
There is only one primary key in a table , but there can be multiple candidate keys.
e.g supp# and supp-name is a candidate key for suppliers relation , as both of these contain
unique values for each tuple. Supp-name is also alternate key.
Foreign Key : A foreign key is used to represent the relationship between two tables. In relation,
a non- key attribute of one table , which is the primary–key of some other table is known as
Foreign Key.
Supp_code Supp_Name
The table in which this non-key attribute i.e the foreign key attribute exists , is called a
Foreign table or detail table, and the table that defines the primary key , which is foreign
key of detail table refers to , is called primary table or master table.
In a relation database , the foreign key of a relation may be the primary key of another
relation.
SQL
SQL : SQL pronounced either Sequel or Seekei is an acronym for Structured Query Language.
SQL is non-procedural language that enables you to create and operate on relational databases ,
which are sets of related information stored in tables.
1. Interactive SQL : it is used to operate directly on a database to produce output for desired
purpose .
2. Embedded SQL : Embedded SQL consists of SQL commands put inside the programs which
are written in other high level language ( such as COBOL , C or C++) .
Characteristics of SQL : SQL offers a variety of processing capabilities , simpler ones of which
can be used by casual users and more complex by skilled programmers.
1. SQL queries are designed to retrieve large amount of records from a database quickly and
efficiently.
2. SQL provides various types of constraints to manage the data efficiently.
3. An SQL database conforming to set standards can be easily accessed by third party software
and application tools.
4. With the help of SQL system security and authorization can be controlled and maintained.
5. SQL is a portable language and it is independent of any platform (Operating System, etc).
Also, it can be embedded with other applications as per need/requirement/use.
6. SQL is very easy to learn and understand, answers to complex queries can be received in
seconds.
Types of SQL commands :
SQL commands can be classified into the following three types :
1. DDL 2. DML 3. DCL
1. DDL ( Data Definition Language ) : A database scheme is specified by a set of definition
which are expressed by special language called data definition language. The result of
compilation of DDL statements is a set of tables which are stored in a special file called data
dictionary.
Whenever data is read or modified in the database system , the data directory is consulted.
Some of the DDL commands in SQL are :
1. Create – to make new database , table , view
2. Drop - to destroy existing database , table , view
3. Alter - to modify existing database , table , view
2. DML ( Data Manipulation Language ) : After the database schema has been specified and
the database has been created , the data can be manipulated using a set of procedures which are
expressed by a special language called a data manipulation language (DML). By data
manipulation we mean :
i. the retrieval of information stored in the database.
ii. the insertion of new information into the database.
iii. the deletion of information from the database.
iv. the modification of data stored in the database.
The DML are basically of two types :
(i) Procedural DMLs require a user to specify what data is needed and how to get it.
(ii) Non-Procedural DMLs require a user to specify what data is needed without specifying how
to get it.
3. DCL ( Data control language ). DCL consists of features that determine whether a user is
permitted to perform a particular action. It is used to control access to the data base and contains
commands like Grant , Revoke etc .
GRANT: It enables system administrators to assign privileges and roles to the specific user
accounts to perform specific tasks on the database.
REVOKE: It enables system administrators to revoke privileges and roles from the user
accounts so that they cannot use the previously assigned permission on the database.
4. Transaction Control Language (TCL): It is used to deal with the transaction operations in the
database. The commands in this category are COMMIT, ROLLBACK, SET TRANSACTION,
SAVEPOINT, etc.
Various Data Types : Each column in a database table is required to have a name and a data type to
determine what type of data that will be stored inside each column when table is created.
Fig 1.1
Constraints : SQL constraints are used to specify rules for data in a table. So by using
constraints , we can specify the limit on the type of data that can be stored in a particular column
in a table. Constraints can be specified when the table is created with the CREATE TABLE
statement, or after the table is created with the ALTER TABLE statement. The two basic types of
constraints are COLUMN CONSTRAINT and TABLE CONSTRAINT. Column constraint apply to
only to individual columns whereas table constraints apply on multiple columns.
Some of the constraints used with CREATE TABLE Command :
1. NOT NULL : Whenever a table's column is declared as NOT NULL, then the value for that
column cannot be empty for any of the table's records.
NOTE: NULL does not mean zero. NULL means empty column, not even zero.
2. UNIQUE : Ensures that all values in a column are different. So, duplicate values are not
Allowed in the columns to which the UNIQUE constraint is applied.
3. PRIMARY KEY : A primary key is a field which can uniquely identify each row in a table.It
is a combination of NOT NULL and Unique constraints.
4. DEFAULT : Sets a default value for a column if no value is specified. When a user does not
enter a value for the such type of column , automatically the defined default value is inserted
in the field.
5. CHECK : This constraint ensures that the values in a column satisfies a specific condition. So
this constraint helps to validate the values of a column to meet a particular condition.
Syntax for applying constraints :
Below is the syntax to create constraints using CREATE TABLE statement at the time of
creating the table.
CREATE TABLE Employee
(
column1 data_type(size) constraint_name,
column2 data_type(size) constraint_name,
column3 data_type(size) constraint_name,
....
);
ALTER : When we want to change a definition of table even after creating it , we can use
ALTER command. Alter command is used to modify a table by adding a new column or
dropping an existing column from a database table , change their sizes , change their datatypes
or add or delete constraints.
The Syntax is :
ALTER TABLE < Table name > [ alter_options] ;
Add option : ADD keyword is used to add columns or constraints to the table as :
ALTER TABLE Employee ADD Salary Varchar(10) ;
This add new column to already created Employee table ( fig 1.1).
Ecode Name Dept Post Salary
Fig 1.2
Modify : Modify option is used to change the datatypes , column size or add or delete constraints
as :
ALTER TABLE student MODIFY Name varchar(30);
This command will change size of column Name in above table from 20 to 30
Drop Column : To delete a column in a table, use the following syntax :
ALTER TABLE table_name DROP COLUMN column_name;
DROP : Drop can be used to delete the table from database , to remove constraints imposed on
table. But there is a condition for dropping a table that it must be an empty table. So we must
first use delete command to remove all the rows.
e.g DELETE from Employee; and then
DROP TABLE Employee ;
DROP TABLE Employee ; It is different than delete option.
ALTER TABLE Employee DROP constraint Ecode NOT NULL ;
DML Commands :
Insert
Update
Delete
Select
INSERT : Insert statements is used to insert a row of data in a table .All inserted values are
enclosed using single quote strings.
Syntax :
INSERT INTO tablename ( column1 , column2,…)
Values ( ‘values 1’, value 2…..)
e.g
INSERT INTO Employee
( Ecode , Name , Dept, Post , Salary )
VALUES ( 001, ‘Rahil’, ‘Bank’,’Manager’, 29000 );
Above statement inserts a row of data into table Employee.
Ecode Name Dept Post Salary
001 Rahil Bank Manager 29000
UPDATE : The UPDATE command is used to update or change records that match to specified
criteria. This is accomplished by carefully constructing a where clause.
Syntax :
UPDATE tablename
SET Column name = new
Where = [ condition ]
UPDATE Employee
SET Salary = 30000
WHERE Roll No = 001
The Salary will be changed accordingly in first row and we will have :
Ecode Name Dept Post Salary
001 Rahil Bank Manager 30000
002 Faizan Education Clerk 10000
003 Shafat IT Executive 15000
004 Kareeem Bank Clerk 8000
005 Asif IT Class IV 8000
006 Altaf Education Clerk 9000
007 Adil IT Manager 35000
Fig. 1.3
Similarly we can use UPDATE command to change all the values in an existing row. We can
also change one column in all the rows . Suppose we want to increase the marks of each student
by 1000 marks then the command will be
UPDATE student
SET Salary = Salary +1000;
So all the values under Column marks will be increased by 1000 .
DELETE : Delete command is used to delete row or records in a table depending upon the
condition in where clause.( The WHERE clause is optional)
Syntax :
DELETE FROM tablename
WHERE = condition;
DELETE FROM Employee
WHERE Marks < 7000 ’ ;
This will delete all those rows where Salary column contain values less than 700. The output will
be :
Ecode Name Dept Post Salary
001 Rahil Bank Manager 30000
002 Faizan Education Clerk 10000
003 Shafat IT Executive 15000
006 Altaf Education Clerk 9000
007 Adil IT Manager 35000
Note : While giving a query , the result can be obtained in any order. The order of selection
determines the order of display .
e.g : Select Dept, Salary , Name from Employee ; ( from Fig 1.3)
The Output will be :
Dept Salary Name
Bank 30000 Rahil
Education 10000 Faizan
IT 15000 Shafat
Bank 8000 Kareeem
IT 8000 Asif
Education 9000 Altaf
IT 35000 Adil
DISTINCT : DISTINCT keyword eliminates duplicate rows from the results of a SELECT
command. The suntax for command is :
SELECT DISTINCT < column List >
FROM < tablename >;
e.g : To list the departments from employee table without repetition the command will be :
SELECT DISTINCT Dept FROM Employee ; ( From Fig 1.3)
The output will be :
Dept
Bank
Education
IT
DISTINCT, in effect , applies to the entire output row , not a specific field. If the clause selects
multiple fields , DISTINCT eliminates rows where all of the selected fields are identical. Rows
in which some values are the same and some are different will be retained.
ALL: With ALL keyword , the result retains the duplicate rows in output.It is same as when you
specify neither ALL nor DISTINCT. b
SELECT ALL Dept FROM Employee ; ( From Fig 1.3)
The output will be : Dept
Bank
Education
IT
Bank
IT
Education
IT
WHERE Clause : In real life , tables can contain unlimited rows. The WHERE clause specifies
the criteria for selection of rows to be returned. The SELECT statement with WHERE clause
takes the following general form :
SELECT < Column List >
FROM < tablename >
WHERE Condition ;
The Whole of table ( one row at a time ) is searched for the condition to be true and then those
rows will be displayed for which condition is true.
e.g a) SELECT * FROM Employee WHERE Salary >12000 ; ( From Fig 1.3)
The above query will produce the following output :
Ecode Name Dept Post Salary
001 Rahil Bank Manager 30000
003 Shafat IT Executive 15000
007 Adil IT Manager 35000
b) SELECT * FROM Employee WHERE NOT Post =’Manager’; ( From Fig 1.3)
The output will produce all rows except rows with Manager because of NOT operator :
Ecode Name Dept Post Salary
002 Faizan Education Clerk 10000
003 Shafat IT Executive 15000
004 Kareeem Bank Clerk 8000
005 Asif IT Class IV 8000
006 Altaf Education Clerk 9000
d) SELECT * FROM Employee WHERE Post IN (‘Clerk’ , ‘Class IV’) ; ( From Fig 1.3)
IN Operator is used to specify list of values. The output will display employees whose post is
clerk or Class IV :
Ecode Name Dept Post Salary
002 Faizan Education Clerk 10000
004 Kareeem Bank Clerk 8000
005 Asif IT Class IV 8000
006 Altaf Education Clerk 9000
ORDER BY Clause : Select command provides the output in predecided order. We can sort the
results or a query in a specified order by using ORDER By clause. The ORDER BY clause
allows sorting ( ascending or descending ) of query results by one or more columns. The default
order is ascending . The Syntax is :
SELECT < Column List >
FROM < tablename >
WHERE Condition ;
ORDER BY < Column Name > ;
f) SELECT Name , Post , Salary FROM Employee ORDER BY Salary DESC ; ( From Fig 1.3)
This will give output as descending order of Salary :
Name Post Salary
Adil Manager 35000
Rahil Manager 30000
Shafat Executive 15000
Faizan Clerk 10000
Altaf Clerk 9000
Asif Class IV 8000
Kareeem Clerk 8000
Aggregate Function : Aggregate function operate against collection of values, but return a
single value per group .As they operate on aggregates of tuples , so they are called a aggregate
functions.
Function Description
SUM (Column ) Returns total sum of Column
AVG ( Column) Returns the average of Column
COUNT(column) Returns the number of rows ( without NULL values ) of the column.
COUNT(*) Returns the number of selected rows in a selected table
MAX (Column) Returns the highest value of a column
MIN (Column) Returns the lowest value of a column
SUM ( ):
g) SELECT SUM (Salary) FROM Employee ; ( From Fig 1.3)
This will return the sum of salary in the table Employee as : SUM
115000
h) SELECT SUM (Salary) FROM Employee WHERE Salary < 10000 ; ( From Fig 1.3)
The output will be as : SUM
25000
Avg ( ):
i) SELECT AVG (Salary) FROM Employee WHERE Dept = ’ IT ’ ; ( From Fig 1.3)
The output will be as : AVG
19333.33
MAX ( ):
J) SELECT MAX (Salary) FROM Employee ; ( From Fig 1.3)
The output will be as : MAX
35000
MIN ( ):
k) SELECT MIN (Salary) FROM Employee WHERE Dept = ’ Bank’; ( From Fig 1.3)
The output will be as : MIN
8000
COUNT ( ):
l) SELECT COUNT( *) FROM Employee WHERE Dept = ’ Bank’; ( From Fig 1.3)
The output will be as : COUNT
2
GROUP BY : The GROUP BY clause is used in SELECT statement to divide the table into
groups . Grouping can be done by a column name , or with aggregate functions in which
aggregate functions produces a value for each group.
Syntax :
SELECT <Column List>
FROM < table name >
GROUP by < Expression>
n) SELECT Dept , Sum (Salary) FROM Employee GROUP BY Dept ; ( From Fig 1.3)
The output will be as : Dept Sum
BANK 38000
Education 19000
IT 58000
o) SELECT Dept., Sum(Salary) FROM Employee GROUP byDept Having Sum (Salary)>20,000.
( From Fig 1.3)
The output will give us sum of salary of groups greater than 20000 Dept Sum
as : BANK 38000
IT 58000