1 - Content - Introduction To Database, Normalization, DDL, DML
1 - Content - Introduction To Database, Normalization, DDL, DML
1.1 Objective
Introduction to Database
Understand types of Database and its architecture
Understand Database server and its instances
Understand Normalization
Understand procedure of 'Connecting to Server from Client'
Understand SQL Data Types and concepts on DDL, DML, Constraints, DRL including
aggregate functions, joins and sub-queries
1.2 Content
Information : It is processed data which helps in decision making and planning. When we add meaning
to data, it becomes information.
For example, 300 is the total strength of associates present today at Ahmadabad ILP center .
Information = Data + context , in which data is 300 and context is ' total strength of associates present
today at Ahmadabad ILP center'.
- Redundancy
Data existing at multiple places
- Inconsistency
Stale data, corrupted data
- Unavailability
Data doesn't exist
Quality information needs data management. Data management means managing creation, reading,
update & deletion of data.
Database System :
A database is an organized collection of data. Using database, we can input, store, retrieve and manage
the large data. The data are typically organized to find relevant information. For example, modeling
the availability of rooms in hotels in a way that supports finding a hotel with vacancies.
-It has a Database Management System (DBMS) - collection of interrelated data and programs to
access the data.
-It has a data model – the manner in which data is stored.
-Uses a programming language to design and manage data - (Data definition language-DDL, Data
Manipulation language-DML)
-Need a DBMS
-Start up effort required
-Training effort required
Database is collection of tables and each table contains rows and columns .
For example, below mentioned 'Customers' table contains 3 rows and 3 columns (CustomerID,
CustomerName, City)
Customers Table
Types of databases include analytic, management, operational, flat-file and hierarchical databases.
Other databases include end-user, network, distributed and rational databases. The difference in the
databases is based on how it works and what it is used for.
There are many types of databases that vary by function and data model.
The second way to classify different types of databases is by their 'data model'.
A data model describes the structure of data and how it is accessed. There are several common data
models:
Flat-file. Flat file databases most closely resemble paper files and file cabinets. Flat file
databases, while simple to create and access, contain lots of redundant information.
These redundancies slow down the search process, making the flat-file type of database
inefficient.
Network databases contain simple data that exist with or without links to other data. The
database is separated into records, which may be sliced into columns or fields.
Relational. Relational databases link tables with common "key" fields allowing for
sophisticated relationships between tables. Table links can be "indexed" or stored for
future use, resulting in quicker searches in this type of database.
A database management system (DBMS) is an aggregate of data, hardware, software, and users that
helps an enterprise manage its operational data. DBMS is responsible for maintaining the integrity and
security of stored data, and for recovering information if the system fails. In case of DBMS, we can not
establish relation between tables . It can follow network, hierarchical or other data model. In DBMS,
we need to open each and every table whenever we want to use it.
Advantages of DBMS :
Disadvantages of DBMS :
RDBMS stands for Relational Database Management System. In case of RDBMS, we can establish
relation between tables and it is based on the relational model. In RDBMS, if we open database, the
whole tables can be accessible.
It is basis for SQL, and for all modern database systems such as MS SQL Server, IBM DB2, Oracle,
MySQL, and Microsoft Access. RDBMS allows the data to be queried based on any column in any
table and no need to create an index in order to query data so relational data is easier to query than
hierarchical.
Structure Of DataBase
When we use a database, we are not usually concerned with where each piece of data is stored, or what
size it is. We just want to be sure that when you refer to a name, for example, the correct value is
returned.
The physical database describes how the structures in the logical database and the search paths between
them are implemented. The term database means the logical database, unless indicated otherwise.
One or more data files, two are more redo log files, and one or more control files are components of
logical database structure of Oracle database.
It specifies the physical configuration of the database on the storage media. At a physical level, the data
is stored in data files on disk . The data in the data files is stored in operating system blocks. It is the
detailed design of a system that includes modules & the database's hardware & software specifications
of the system.
One characteristic of an RDBMS is the independence of logical data structures such as tables, views,
and indexes from physical storage structures. Because physical and logical structures are separate, you
can manage physical storage of data without affecting access to logical structures. For example,
renaming a database file does not rename the tables stored in it.
•Control files
A control file is a root file that tracks the physical components of the database.
The online redo log is a set of files containing records of changes made to data.
The logical database is the structure of the data and the relationships between different pieces of
information. There is no information about how these structures and relations are implemented.
Tablespaces and database's schema objects are components of logical database structure of Oracle
database.
Once the relationships and dependencies amongst the various pieces of information have been
determined, it is possible to arrange the data into a logical structure which can then be mapped into the
storage objects supported by the database management system. In the case of relational databases the
storage objects are tables which store data in rows and columns.
Oracle Database allocates logical space for all data in the database. The logical units of database space
allocation are data blocks, extents, segments, and table spaces.
At the finest level of granularity, Oracle Database stores data in data blocks. One logical data block
corresponds to a specific number of bytes of physical disk space, for example, 2 KB. Data blocks are
the smallest units of storage that Oracle Database can use or allocate.
An extent is a set of logically contiguous data blocks allocated for storing a specific type of
information.
A segment is a set of extents allocated for a specific database object, such as a table. For example, the
data for the employees table is stored in its own data segment, whereas each index for employees is
stored in its own index segment. Every database object that consumes storage consists of a single
segment.
An instance of the Database Engine is a copy of the sqlservr.exe executable that runs as an operating
system service. Each instance manages several system databases and one or more user databases. Each
computer can run multiple instances of the Database Engine. Applications connect to the instance in
order to perform work in a database managed by the instance.
A database instance is a set of memory structures that manage database files. A database is a set of
physical files on disk created by the CREATE DATABASE statement. The instance manages its
associated data and serves the users of the database.
Every running Oracle database is associated with at least one Oracle database instance. Because an
instance exists in memory and a database exists on disk, an instance can exist without a database and a
database can exist without an instance.
•Maintaining internal data structures that are accessed by many processes and threads concurrently
•Buffering redo data before writing it to the online redo log files
The SGA is shared by the Oracle processes, which include server processes and background processes,
running on a single computer. The way in which Oracle processes are associated with the SGA varies
according to operating system.
A database instance includes background processes. Server processes, and the process memory
allocated in these processes, also exist in the instance. The instance continues to function when server
processes terminate.
To execute any query, we have to start oracle instance using STARTUP command.
1.2.4 Normalization
Normalization is the process of organizing the fields and tables of a relational database to minimize
redundancy and dependency.
To free the database from Insertion,Update and Deletion anomalies.
Types of Anomolies
Updation Anomaly :
To update address of a student who occurs twice or more than twice in a table, we will have to
update S_Address column in all the rows, else data will become inconsistent.
Table showing Updation Anomaly:
Table showing Updation Anomaly:
Integer
Insertion Anomaly :
Suppose for a new admission, we have a Student id(S_id), name and address of a student but if
student has not opted for any subjects yet then we have to insert NULL there, leading to
Insertion anomaly.
Table showing Insertion Anomaly:
Deletion Anomaly :
If (S_id) 403 has only one subject and temporarily he drops it, when we delete that row entire
student record will be deleted along with it.
Table showing Deletion Anomaly:
Normal Forms:
Student name 'Nandish' is used twice in the table and subject maths is also repeated.
To reduce above table to First Normal form break the table into two different tables.
Student Table:
Subject Table:
In Student table concatenation of subject_id and student_id is the Primary key.
Now both the Student table and Subject table are normalized to first normal form.
Customer_Detail Table :
Order_Detail Table :
Sale_Detail Table :
Third Normal Form (3NF)
Every non-prime attribute of table must be dependent on primary key.The transitive functional
dependency should be removed from the table.
Street, city and state depends upon Zip. The dependency between zip and other fields is called
transitive dependency. To apply 3NF, move the street, city and state to new table,with Zip as
primary key.
When we start SqlDbx application, it automatically shows Server Login dialog in which we select
correct Server type, enter Server name, schema, User and Password and click button OK as shown
below .
If server found and client software installed correctly new SQL Editor window will open as shown
below.
We have to make sure that in our system we have putty installed . Go to cmd from start button and reach to
that path where putty is installed. Then type below command to open putty as shown in below diagrams :
putty IPAddress
and then enter key .
It will open putty login screen and will ask for id and password as shown in below diagram .
Once Id and password are accepted , we will get shell command as shown in below diagrams :
At shell command, we have to type below command for login into database :
sqlplus DbId/DbPassword
and then enter key . We will get below sql prompt where we can write our query and execute it as shown
below :
Float, Data and Time, Number, String, Boolean, BLOB, CLOB etc..
Each column in a database table is required to have a name and a data type. SQL developers have to
decide what types of data will be stored inside each and every table column when creating a SQL table.
However, different databases offer different choices for the data type definition.
The following table shows some of the common names of data types between the various database
platforms:
Data type SQLServer Oracle MySQL
boolean Bit Byte N/A
Int
integer Int Number
Integer
Float
float Number Float
Real
string (fixed) Char Char Char
Varchar
string (variable) Varchar Varchar
Varchar2
SQL Concepts
SQL is a standard language for accessing and manipulating databases. It stands for Structured Query
Language. Users interact with database systems through query languages.
DDL : It is Data Definition Language which consist of 'CREATE, DROP, TRUNCATE, ALTER'
command.
DML : It is Data Manipulation Language which consist of 'INSERT, DELETE, UPDATE' command.
The CREATE TABLE statement is used to create a table in a database. Tables are organized into rows
and columns; and each table must have a name.
Syntax
The column_name parameters specify the names of the columns of the table. The data_type parameter
specifies what type of data the column can hold (e.g. varchar, integer, date, etc.). The size parameter
specifies the maximum length of the column of the table.
Try it yourself :
Now we want to create a table called "Persons" that contains five columns: PersonID, LastName,
FirstName, Address, and City.
We use the following CREATE TABLE statement:
What if we only want to delete the data inside the table, and not the table itself?
Then, use the TRUNCATE TABLE statement:
The ALTER TABLE statement is used to add, delete, or modify columns in an existing table.
Syntax
To delete a column in a table, use the following syntax (notice that some database systems don't allow
deleting a column):
ALTER TABLE table_name
DROP COLUMN column_name
Try it yourself :
Look at the "Persons" table:
P_Id LastName FirstName Address City
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
SELECT Statement
Syntax
SELECT column_name,column_name
FROM table_name;
and
WHERE Clause
The WHERE clause is used to extract only those records that fulfill a specified criterion.
Syntax
SELECT column_name,column_name
FROM table_name
WHERE column_name operator value;
The AND operator displays a record if both the first condition AND the second condition are true.
The OR operator displays a record if either the first condition OR the second condition is true.
ORDER BY Keyword
The ORDER BY keyword is used to sort the result-set by one or more columns.
The ORDER BY keyword sorts the records in ascending order by default. To sort the records in a
descending order, you can use the DESC keyword.
Syntax
SELECT column_name,column_name
FROM table_name
ORDER BY column_name,column_name ASC|DESC;
INSERT Statement
The first form does not specify the column names where the data will be inserted, only their values:
The second form specifies both the column names and the values to be inserted:
UPDATE Customers
SET ContactName='Alfred Schmidt', City='Hamburg'
WHERE CustomerName='Alfreds Futterkiste';
DELETE Statement
It is possible to delete all rows in a table without deleting the table. This means that the table structure,
attributes, and indexes will be intact:
LIKE Operator
Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name LIKE pattern;
The following SQL statement selects all customers with a City starting with the letter "s":
The "%" sign is used to define wild cards (missing letters) both before and after the pattern.
The following SQL statement selects all customers with a City ending with the letter "s":
IN Operator
Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1,value2,...);
The following SQL statement selects all customers with a City of "Paris" or "London":
BETWEEN Operator
The BETWEEN operator is used to select values within a range.The values can be numbers, text, or
dates.
Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name BETWEEN value1 AND value2;
SQL Joins
An SQL JOIN clause is used to combine rows from two or more tables, based on a common field
between them.
The most common type of join is: SQL INNER JOIN (simple join). An SQL INNER JOIN return all
rows from multiple tables where the join condition is met.
Returns all rows when there is at least one match in BOTH tables.
The INNER JOIN keyword selects all rows from both tables as long as there is a match between the
columns in both tables.
Syntax
SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name=table2.column_name;
or:
SELECT column_name(s)
FROM table1
JOIN table2
ON table1.column_name=table2.column_name;
Note : INNER JOIN is the same as JOIN.
Notice that the "CustomerID" column in the "Orders" table refers to the customer in the "Customers"
table. The relationship between the two tables above is the "CustomerID". Then, if we run the
following SQL statement (that contains an INNER JOIN or NORMAL JOIN):
Try it yourself :
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers
ON Orders.CustomerID=Customers.CustomerID;
Result of Query :
LEFT JOIN:
Return all rows from the left table, and the matched rows from the right table. The LEFT JOIN
keyword returns all rows from the left table (table1), with the matching rows in the right table (table2).
The result is NULL in the right side when there is no match.
Syntax
SELECT column_name(s)
FROM table1
LEFT JOIN table2
ON table1.column_name=table2.column_name;
or:
SELECT column_name(s)
FROM table1
LEFT OUTER JOIN table2
ON table1.column_name=table2.column_name;
Try it yourself :
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
LEFT JOIN Customers
ON Orders.CustomerID=Customers.CustomerID;
Result of Query :
RIGHT JOIN:
Return all rows from the right table, and the matched rows from the left table. The RIGHT JOIN
keyword returns all rows from the right table (table2), with the matching rows in the left table (table1).
The result is NULL in the left side when there is no match.
Syntax
SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name=table2.column_name;
or:
SELECT column_name(s)
FROM table1
RIGHT OUTER JOIN table2
ON table1.column_name=table2.column_name;
Try it yourself :
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
RIGHT JOIN Customers
ON Orders.CustomerID=Customers.CustomerID;
Result of Query :
FULL JOIN:
Return all rows from both the tables if there is any match or not. The FULL OUTER JOIN keyword
returns all rows from the left table (table1) and from the right table (table2).
The FULL OUTER JOIN keyword combines the result of both LEFT and RIGHT joins.
Syntax
SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name=table2.column_name;
Try it yourself :
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
FULL OUTER JOIN Customers
ON Orders.CustomerID=Customers.CustomerID;
Result of Query :
SELF JOIN:
It is a join in which a table is joined with itself, specially when the table has a FOREIGN KEY which
references its own PRIMARY KEY. To join a table itself means that each row of the table is combined
with itself and with every other row of the table. The self join can be viewed as a join of two copies of
the same table.
Syntax
SELECT t1.column_name, t2. column_name
FROM table t1, table t2
where t1.column_name = t2.column_name
Try it yourself :
SQL constraints are used to specify rules for the data in a table. It helps to ensure the data type and
range that goes into the database entity.
If there is any violation between the constraint and the data action, the action is aborted by the
constraint.
Constraints can be specified when the table is created (inside the CREATE TABLE statement) or after
the table is created (inside the ALTER TABLE statement).
UNIQUE - Ensures that each row for a column must have a unique value
PRIMARY KEY - A combination of a NOT NULL and UNIQUE. Ensures that a column (or
combination of two or more columns) have an unique identity which helps to find a particular record in
a table more easily and quickly.
FOREIGN KEY -
It points to a primary key in another table ensure the referential integrity of the data in one table to
match values in another table
DEFAULT - Specifies a default value when specified none for this column
The NOT NULL constraint enforces a field to always contain a value. This means that you cannot
insert a new record, or update a record without adding a value to this field.
The following SQL enforces the "P_Id" column as PRIMARY KEY and the "LastName" column to
not accept NULL values:
UNIQUE Constraint
Note : We may have many UNIQUE constraints per table, but only one PRIMARY KEY constraint per
table.
The following SQL creates a UNIQUE constraint on the "P_Id" column when the "Persons" table is
created:
To create a UNIQUE constraint on the "P_Id" column when the table is already created, use the
following SQL:
or
The PRIMARY KEY constraint uniquely identifies each record in a database table.
Primary keys must contain unique values. A primary key column cannot contain NULL values.
Each table should have a primary key, and each table can have only ONE primary key.
or
Note that the "P_Id" column in the "Orders" table points to the "P_Id" column in the "Persons" table.
The "P_Id" column in the "Persons" table is the PRIMARY KEY in the "Persons" table.
The "P_Id" column in the "Orders" table is a FOREIGN KEY in the "Orders" table.
The FOREIGN KEY constraint is used to prevent actions that would destroy links between tables.
The FOREIGN KEY constraint also prevents invalid data from being inserted into the foreign key
column, because it has to be one of the values contained in the table it points to.
To create a FOREIGN KEY constraint on the "P_Id" column when the "Orders" table is already
created, use the following SQL:
or
The CHECK constraint is used to limit the value range that can be placed in a column.
If you define a CHECK constraint on a single column it allows only certain values for this column.
If you define a CHECK constraint on a table it can limit the values in certain columns based on values
in other columns in the row.
The following SQL creates a CHECK constraint on the "P_Id" column when the "Persons" table is
created. The CHECK constraint specifies that the column "P_Id" must only include integers greater
than 0.
or
To allow naming of a CHECK constraint, and for defining a CHECK constraint on multiple columns,
use the following SQL syntax:
To create a CHECK constraint on the "P_Id" column when the table is already created, use the
following SQL:
To allow naming of a CHECK constraint, and for defining a CHECK constraint on multiple columns,
use the following SQL syntax:
DEFAULT Constraint
The DEFAULT constraint is used to insert a default value into a column.
The default value will be added to all new records, if no other value is specified.
SQL Functions
SQL Aggregate Functions : SQL aggregate functions return a single value, calculated from values in a
column. Useful aggregate functions:
AVG() Function
Syntax
SELECT AVG(column_name) FROM table_name
COUNT() function
Syntax
The COUNT(column_name) function returns the number of values (NULL values will not be counted)
of the specified column:
COUNT(*) Syntax
COUNT(DISTINCT column_name) function returns the number of distinct values of the specified
column:
Syntax
SELECT COUNT(DISTINCT column_name) FROM table_name;
The MAX() function returns the largest value of the selected column.
Syntax
SELECT MAX(column_name) FROM table_name;
MIN() Function
The MIN() function returns the smallest value of the selected column.
Syntax
SELECT MIN(column_name) FROM table_name;
SUM() Function
The following SQL statement finds the sum of all the "Quantity" fields for the "OrderDetails" table:
GROUP BY Statement
The GROUP BY statement is used in conjunction with the aggregate functions to group the result-set
by one or more columns.
Syntax
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name;
Query Result
OrderID Quantity
10248 27
10249 49
HAVING Clause
The HAVING clause was added to SQL because the WHERE keyword could not be used with
aggregate functions.
Syntax
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING aggregate_function(column_name) operator value;
Try it yourself and verify the output:
SELECT OrderID,SUM(Quantity) FROM OrderDetails GROUP BY OrderID HAVING
SUM(Quantity) > 40;
Query Result
OrderID Quantity
10249 49
Subquery
A subquery is a query within a query. In Oracle, you can create subqueries within your SQL statements.
These subqueries can reside in the WHERE clause, the FROM clause, or the SELECT clause.
Subquery or Inner query or Nested query is a query in a query. A subquery is usually added in the
WHERE Clause of the sql statement. Most of the time, a subquery is used when you know how to
search for a value using a SELECT statement, but do not know the exact value in the database.
Subqueries can be used with the following sql statements along with the comparision operators like =,
<, >, >=, <= etc.
•SELECT
•INSERT
•UPDATE
•DELETE
Using sub query, I want to fetch those customer who has placed the orders.
1.2.7.1 Database
Data : It is unprocessed raw facts
Information : It is processed data which helps in decision making and planning.
A database is an organized collection of data. Using database, we can input, store, retrieve and
manage the large data.
1.2.7.3 Normalization
It is the process of organizing the fields and tables of a relational database to minimize
redundancy and dependency.
It has three forms : INF , 2 NF and 3 NF
1) Which keyword returns all rows from the right table with the matching rows in the left table and
result is NULL in the left side when there is no match?
a. Normal join
b. left join
c. right join
d. full join
2) If we want to delete all rows from the table permanently, which command will be efficient ?
a. Delete
b. Truncate
c. Drop
d. All of the above
3) Which is true?
a. False
b. True
a. select
b. Insert
c. Delete
d. Update
6) Which SQL statement selects all customers with a Country containing the pattern "land":
8) In following query 'SELECT * FROM Products WHERE Price BETWEEN 10 AND 20', which
price value will not be selected ?
a. 11
b. 10
c. 19
d. none of the above
a. min()
b. max()
c. count()
d. All of the above
Answers:
1. [c]
2. [b]
3. [c]
4. [b]
5. [a]
6. [c]
7. [a]
8. [d]
9. [d]
10. [d]