Rdbms Mba (It)
Rdbms Mba (It)
Rdbms Mba (It)
Data is nothing but facts and statistics stored or free flowing over a network, generally it's raw and unprocessed. For example: When you visit any
website, they might store you IP address, that is data, in return they might add a cookie in your browser, marking you that you visited the website, that is
data, your name, it's data, your age, it's data.
Data becomes information when it is processed, turning it into something meaningful. Like, based on the cookie data saved on user's browser, if a
website can analyse that generally men of age 20-25 visit us more, that is information, derived from the data collected.
What is a Database?
A Database is a collection of related data organised in a way that data can be easily accessed, managed and updated. Database can be software
based or hardware based, with one sole purpose, storing data.
During early computer days, data was collected and stored on tapes, which were mostly write-only, which means once data is stored on it, it can never
be read again. They were slow and bulky, and soon computer scientists realised that they needed a better solution to this problem.
Larry Ellison, the co-founder of Oracle was amongst the first few, who realised the need for a software based Database Management System.
What is DBMS?
A DBMS is a software that allows creation, definition and manipulation of database, allowing users to store, process and analyse data easily. DBMS
provides us with an interface or a tool, to perform various operations like creating database, storing data in it, updating data, creating tables in the
database and a lot more.
DBMS also provides protection and security to the databases. It also maintains data consistency in case of multiple users.
Here are some examples of popular DBMS used these days:
MySql
Oracle
SQL Server
IBM DB2
PostgreSQL
Amazon SimpleDB (cloud based) etc.
1. Data stored into Tables: Data is never directly stored into the database. Data is stored into tables, created inside the database. DBMS also
allows to have relationships between tables which makes the data more meaningful and connected. You can easily understand what type of
data is stored where by looking at all the tables created in a database.
2. Reduced Redundancy: In the modern world hard drives are very cheap, but earlier when hard drives were too expensive, unnecessary
repetition of data in database was a big problem. But DBMS follows Normalisation which divides the data in such a way that repetition is
minimum.
3. Data Consistency: On Live data, i.e. data that is being continuosly updated and added, maintaining the consistency of data can become a
challenge. But DBMS handles it all by itself.
4. Support Multiple user and Concurrent Access: DBMS allows multiple users to work on it(update, insert, delete data) at the same time and
still manages to maintain the data consistency.
5. Query Language: DBMS provides users with a simple Query language, using which data can be easily fetched, inserted, deleted and
updated in a database.
6. Security: The DBMS also takes care of the security of data, protecting the data from un-authorised access. In a typical DBMS, we can create
user accounts with different access permissions, using which we can easily secure our data by restricting user access.
7. DBMS supports transactions, which allows us to better handle and manage data integrity in real world applications where multi-threading is
extensively used.
Advantages of DBMS
Disadvantages of DBMS
It's Complexity
Except MySQL, which is open source, licensed DBMSs are generally costly.
They are large in size.
Components of DBMS
The database management system can be divided into five major components, they are:
1. Hardware
2. Software
3. Data
4. Procedures
5. Database Access Language
Let's have a simple diagram to see how they all fit together to form a database management system.
DBMS Components: Hardware
When we say Hardware, we mean computer, hard disks, I/O channels for data, and any other physical component involved before any data is
successfully stored into the memory.
When we run Oracle or MySQL on our personal computer, then our computer's Hard Disk, our Keyboard using which we type in all the commands, our
computer's RAM, ROM all become a part of the DBMS hardware.
Users
Database Administrators: Database Administrator or DBA is the one who manages the complete database management system. DBA takes
care of the security of the DBMS, it's availability, managing the license keys, managing user accounts and access etc.
Application Programmer or Software Developer: This user group is involved in developing and desiging the parts of DBMS.
End User: These days all the modern applications, web or mobile, store user data. How do you think they do it? Yes, applications are
programmed in such a way that they collect user data and store the data on DBMS systems running on their server. End users are the one
who store, retrieve, update and delete data.
It is an extension of the 2-tier architecture. In the 2-tier architecture, we have an application layer which can be accessed programatically to perform
various operations on the DBMS. The application generally understands the Database Access Language and processes end users requests to the
DBMS.
In 3-tier architecture, an additional Presentation or GUI Layer is added, which provides a graphical user interface for the End user to interact with the
DBMS.
For the end user, the GUI layer is the Database System, and the end user has no idea about the application layer and the DBMS system.
If you have used MySQL, then you must have seen PHPMyAdmin, it is the best example of a 3-tier DBMS architecture.
Hierarchical Model
Network Model
Entity-relationship Model
Relational Model
Hierarchical Model
This database model organises data into a tree-like-structure, with a single root, to which all the other data is linked. The heirarchy starts from
the Root data, and expands like a tree, adding child nodes to the parent nodes.
In this model, a child node will only have a single parent node.
This model efficiently describes many real-world relationships like index of a book, recipes etc.
In hierarchical model, data is organised into tree-like structure with one one-to-many relationship between two different types of data, for example, one
department can have many courses, many professors and of-course many students.
Network Model
This is an extension of the Hierarchical model. In this model data is organised more like a graph, and are allowed to have more than one parent node.
In this database model data is more related as more relationships are established in this database model. Also, as the data is more related, hence
accessing the data is also easier and fast. This database model was used to map many-to-many data relationships.
This was the most widely used database model, before Relational Model was introduced.
Entity-relationship Model
In this database model, relationships are created by dividing object of interest into entity and its characteristics into attributes.
Different entities are related using relationships.
E-R Models are defined to represent the relationships into pictorial form to make it easier for different stakeholders to understand.
This model is good to design a database, which can then be turned into tables in relational model(explained below).
Let's take an example, If we have to design a School Database, then Student will be an entity with attributes name, age, address etc. As Address is
generally complex, it can be another entity with attributes street name, pincode, city etc, and there will be a relationship between them.
Relationships can also be of different types. To learn about E-R Diagrams in details, click on the link.
Relational Model
In this model, data is organised in two-dimensional tables and the relationship is maintained by storing a common field.
This model was introduced by E.F Codd in 1970, and since then it has been the most widely used database model, infact, we can say the only database
model used around the world.
The basic structure of data in the relational model is tables. All the information related to a particular type is stored in rows of that table.
Hence, tables are also known as relations in relational model.
In the coming tutorials we will learn how to design tables, normalize them to reduce data redundancy and how to use Structured Query language to
access data from tables.
Basic Concepts of ER Model in DBMS
As we described in the tutorial Database models, Entity-relationship model is a model used for design and representation of relationships between data.
The main data objects are termed as Entities, with their details defined as attributes, some of these attributes are important and are used to identity the
entity, and different entities are related using relationships.
In short, to understand about the ER Model, we must understand about:
Let's take an example to explain everything. For a School Management Software, we will have to
store Student information, Teacher information, Classes, Subjects taught in each class etc.
ER Model: Attributes
If a Student is an Entity, then student's roll no., student's name, student's age, student's gender etc will be its attributes.
An attribute can be of many types, here are different types of attributes defined in ER database model:
1. Simple attribute: The attributes with values that are atomic and cannot be broken down further are simple attributes. For example,
student's age.
2. Composite attribute: A composite attribute is made up of more than one simple attribute. For example, student's address will contain, house
no., street name, pincode etc.
3. Derived attribute: These are the attributes which are not present in the whole database management system, but are derived using other
attributes. For example, average age of students in a class.
4. Single-valued attribute: As the name suggests, they have a single value.
5. Multi-valued attribute: And, they can have multiple values.
ER Model: Keys
If the attribute roll no. can uniquely identify a student entity, amongst all the students, then the attribute roll no. will be said to be a key.
Following are the types of Keys:
1. Super Key
2. Candidate Key
3. Primary Key
In the next tutorial, we will learn how to create ER diagrams and design databases using ER diagrams.
Components of ER Diagram
Entitiy, Attributes, Relationships etc form the components of ER Diagram and there are defined symbols and shapes to represent each one of them.
Let's see how we can represent these in our ER Diagram.
Entity
Simple rectangular box represents an Entity.
Weak Entity
A weak Entity is represented using double rectangular boxes. It is generally connected to another entity.
ER Diagram: Relationship
A Relationship describes relation between entities. Relationship is represented using diamonds or rhombus.
There are three types of relationship that exist between Entities.
1. Binary Relationship
2. Recursive Relationship
3. Ternary Relationship
The above example describes that one student can enroll only for one course and a course will also have only one Student. This is not what you will
usually see in real-world relationships.
One to Many Relationship
The below example showcases this relationship, which means that 1 student can opt for many courses, but a course can only have 1 student. Sounds
weird! This is how it is.
The above diagram represents that one student can enroll for more than one courses. And a course can have more than 1 student enrolled in it.
1. Generalization
2. Specialization
3. Aggregration
Let's understand what they are, and why were they added to the existing ER Model.
Generalization
Generalization is a bottom-up approach in which two lower level entities combine to form a higher level entity. In generalization, the higher level entity
can also combine with other lower level entities to make further higher level entity.
It's more like Superclass and Subclass system, but the only difference is the approach, which is bottom-up. Hence, entities are combined to form a more
generalised entity, in other words, sub-classes are combined to form a super-class.
For example, Saving and Current account types entities can be generalised and an entity with name Account can be created, which covers both.
Specialization
Specialization is opposite to Generalization. It is a top-down approach in which one higher level entity can be broken down into two lower level entity. In
specialization, a higher level entity may not have any lower-level entity sets, it's possible.
Aggregration
Aggregration is a process when relation between two entities is treated as a single entity.
In the diagram above, the relationship between Center and Course together, is acting as an Entity, which is in relationship with another entity Visitor.
Now in real world, if a Visitor or a Student visits a Coaching Center, he/she will never enquire about the center only or just about the course, rather
he/she will ask enquire about both.
Rule zero
This rule states that for a system to qualify as an RDBMS, it must be able to manage database entirely through the relational capabilities.
1 Adam 34 13000
2 Alex 28 15000
3 Stuart 20 18000
4 Ross 42 19020
1 Adam 34 13000
Attribute Domain
When an attribute is defined in a relation(table), it is defined to hold only a certain type of values, which is known as Attribute Domain.
Hence, the attribute Name will hold the name of employee for every tuple. If we save employee's address there, it will be violation of the Relational
database model.
Name
Adam
Alex
Ross
1. Key Constraints
2. Domain Constraints
3. Referential integrity Constraints
Key Constraints
We store data in tables, to later access it whenever required. In every table one or more than one attributes together are used to fetch data from tables.
The Key Constraint specifies that there should be such an attribute(column) in a relation(table), which can be used to fetch data for any tuple(row).
The Key attribute should never be NULL or same for two different row of data.
For example, in the Employee table we can use the attribute ID to fetch data for each of the employee. No value of ID is null and it is unique for every
row, hence it can be our Key attribute.
Domain Constraint
Domain constraints refers to the rules defined for the values that can be stored for a certain attribute.
Like we explained above, we cannot store Address of employee in the column for Name.
Similarly, a mobile number cannot exceed 10 digits.
Referential Integrity Constraint
We will study about this in detail later. For now remember this example, if I say Supriya is my girlfriend, then a girl with name Supriya should also exist
for that relationship to be present.
If a table reference to some data from another table, then that table and that data should be present for referential integrity constraint to hold true.
1 Akon 9876723452 17
2 Akon 9991165674 19
3 Bkon 7898756543 18
4 Ckon 8987867898 19
5 Dkon 9990080080 17
Let's take a simple Student table, with fields student_id, name, phone and age.
Super Key
Super Key is defined as a set of attributes within a table that can uniquely identify each record within a table. Super Key is a superset of Candidate key.
In the table defined above super key would include student_id, (student_id, name), phone etc.
Confused? The first one is pretty simple as student_id is unique for every row of data, hence it can be used to identity each row uniquely.
Next comes, (student_id, name), now name of two students can be same, but their student_id can't be same hence this combination can also be a
key.
Similarly, phone number for every student will be unique, hence again, phone can also be a key.
So they all are super keys.
Candidate Key
Candidate keys are defined as the minimal set of fields which can uniquely identify each record in a table. It is an attribute or a set of attributes that can
act as a Primary Key for a table to uniquely identify each record in that table. There can be more than one candidate key.
In our example, student_id and phone both are candidate keys for table Student.
A candiate key can never be NULL or empty. And its value should be unique.
There can be more than one candidate keys for a table.
A candidate key can be a combination of more than one columns(attributes).
Primary Key
Primary key is a candidate key that is most appropriate to become the main key for any table. It is a key that can uniquely identify each record in a table.
For the table Student we can make the student_id column as the primary key.
Composite Key
Key that consists of two or more attributes that uniquely identify any record in a table is called Composite key. But the attributes which together form
the Composite key are not a key independentely or individually.
In the above picture we have a Score table which stores the marks scored by a student in a particular subject.
In this table student_id and subject_id together will form the primary key, hence it is a composite key.
Non-key Attributes
Non-key attributes are the attributes or fields of a table, other than candidate key attributes/fields in a table.
Non-prime Attributes
Non-prime Attributes are attributes other than Primary Key attribute(s)..
Normalization of Database
Database Normalization is a technique of organizing the data in the database. Normalization is a systematic approach of decomposing tables to
eliminate data redundancy(repetition) and undesirable characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that puts
data into tabular form, removing duplicated data from the relation tables.
Normalization is used for mainly two purposes,
The video below will give you a good overview of Database Normalization. If you want you can skip the video, as the concept is covered in detail, below
the video.
In the table above, we have data of 4 Computer Sci. students. As we can see, data for the fields branch, hod(Head of Department) and office_tel is
repeated for the students who are in the same branch in the college, this is Data Redundancy.
Insertion Anomaly
Suppose for a new admission, until and unless a student opts for a branch, data of the student cannot be inserted, or else we will have to set the branch
information as NULL.
Also, if we have to insert data of 100 students of same branch, then the branch information will be repeated for all those 100 students.
These scenarios are nothing but Insertion anomalies.
Updation Anomaly
What if Mr. X leaves the college? or is no longer the HOD of computer science department? In that case all the student records will have to be updated,
and if by mistake we miss any record, it will lead to data inconsistency. This is Updation anomaly.
Deletion Anomaly
In our Student table, two different informations are kept together, Student information and Branch information. Hence, at the end of the academic year,
if student records are deleted, we will also lose the branch information. This is Deletion anomaly.
Normalization Rule
Normalization rules are divided into the following normal forms:
In the next tutorial, we will discuss about the First Normal Form in details.
To understand what is Partial Dependency and how to normalize a table to 2nd normal for, jump to the Second Normal Form tutorial.
Third Normal Form (3NF)
A table is said to be in the Third Normal Form when,
Here is the Third Normal Form tutorial. But we suggest you to first study about the second normal form and then head over to the third normal form.
To learn about BCNF in detail with a very easy to understand example, head to Boye-Codd Normal Form tutorial.
Here is the Fourth Normal Form tutorial. But we suggest you to understand other normal forms before you head over to the fourth normal form.
If tables in a database are not even in the 1st Normal Form, it is considered as bad database design.
Our table already satisfies 3 rules out of the 4 rules, as all our column names are unique, we have stored data in the order we wanted to and we have
not inter-mixed different type of data in columns.
But out of the 3 different students in our table, 2 have opted for more than 1 subject. And we have stored the subject names in a single column. But as
per the 1st Normal form each column must contain atomic value.
101 Akon OS
101 Akon CN
102 Bkon C
102 Bkon C++
By doing so, although a few values are getting repeated but values for the subject column are now atomic for each record/row.
Using the First Normal Form, data redundancy increases, as there will be many columns with same data in multiple rows but each row as a whole will be
unique.
If you want you can skip the video, as the concept is covered in detail below the video.
What is Partial Dependency? Do not worry about it. First let's understand what is Dependency in a table?
What is Dependency?
Let's take an example of a Student table with columns student_id, name, reg_no(registration
number), branch and address(student's home address).
In this table, student_id is the primary key and will be unique for every row, hence we can use student_id to fetch
any row of data from this table
Even for a case, where student names are same, if we know the student_id we can easily fetch the correct record.
Hence we can say a Primary Key for a table is the column or a group of columns(composite key) which can
uniquely identify each record in the table.
I can ask from branch name of student with student_id 10, and I can get it. Similarly, if I ask for name of student
with student_id 10 or 11, I will get it. So all I need is student_id and every other column depends on it, or can be
fetched using it.
This is Dependency and we also call it Functional Dependency.
What is Partial Dependency?
Now that we know what dependency is, we are in a better state to understand what partial dependency is.
For a simple table like Student, a single column like student_id can uniquely identfy all the records in a table.
But this is not true all the time. So now let's extend our example to see if more than 1 column together can act as
a primary key.
Let's create another table for Subject, which will have subject_id and subject_name fields and subject_id will be the
primary key.
subject_id subject_name
1 Java
2 C++
3 Php
Now we have a Student table with student information and another table Subject for storing subject
information.
Let's create another table Score, to store the marks obtained by students in the respective subjects. We will also
be saving name of the teacher who teaches that subject along with marks.
1 10 1 70 Java Teacher
2 10 2 75 C++ Teacher
3 11 1 80 Java Teacher
In the score table we are saving the student_id to know which student's marks are these and subject_id to
know for which subject the marks are for.
Together, student_id + subject_id forms a Candidate Key(learn about Database Keys) for this table, which can be
the Primary key.
Confused, How this combination can be a primary key?
See, if I ask you to get me marks of student with student_id 10, can you get it from this table? No, because you
don't know for which subject. And if I give you subject_id, you would not know for which student. Hence we
need student_id + subject_id to uniquely identify any row.
Now if you look at the Score table, we have a column names teacher which is only dependent on the subject, for
Java it's Java Teacher and for C++ it's C++ Teacher & so on.
Now as we just discussed that the primary key for this table is a composition of two columns which
is student_id & subject_id but the teacher's name only depends on subject, hence the subject_id, and has nothing to
do with student_id.
This is Partial Dependency, where an attribute in a table depends on only a part of the primary key and not on
the whole key.
And our Score table is now in the second normal form, with no partial dependency.
1 10 1 70
2 10 2 75
3 11 1 80
Quick Recap
1. For a table to be in the Second Normal form, it should be in the First Normal form and it should not have Partial
Dependency.
2. Partial Dependency exists, when for a composite primary key, any attribute in the table depends only on a part
of the primary key and not on the complete primary key.
3. To remove Partial dependency, we can divide the table, remove the attribute which is causing partial
dependency, and move it to some other table where it fits in well.
In our last tutorial, we learned about the second normal form and even normalized our Score table into the 2nd Normal Form.
So let's use the same example, where we have 3 tables, Student, Subject and Score.
Student Table
Subject Table
Score Table
1 10 1 70
2 10 2 75
3 11 1 80
In the Score table, we need to store some more information, which is the exam name and total marks, so let's add 2 more columns to the Score table.
1 Workshop 200
2 Mains 70
3 Practicals 30
The second point sounds a bit tricky, right? In simple words, it means, that for a dependency A → B, A cannot be a non-prime attribute, if B is a prime
attribute.
103 C# P.Chash
As you can see, we have also added some sample data to the table.
In the table above:
One student can enrol for multiple subjects. For example, student with student_id 101, has opted for subjects - Java & C++
For each subject, a professor is assigned to the student.
And, there can be multiple professors teaching one subject like we have for Java.
student_id p_id
101 1
101 2
and so on...
1 P.Java Java
2 P.Cpp C++
and so on...
And now, this relation satisfy Boyce-Codd Normal Form. In the next tutorial we will learn about the Fourth Normal Form.
1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the table may have multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B and C should be independent of each other.
If all these conditions are true for any relation(table), it is said to have multi-valued dependency.
Time for an Example
Below we have a college enrolment table with columns s_id, course and hobby.
1 Science Cricket
1 Maths Hockey
2 C# Cricket
2 Php Hockey
As you can see in the table above, student with s_id 1 has opted for two courses, Science and Maths, and has two hobbies, Cricket and Hockey.
You must be thinking what problem this can lead to, right?
Well the two records for student with s_id 1, will give rise to two more records, as shown below, because for one student, two hobbies exists, hence
along with both the courses, these hobbies should be specified.
1 Science Cricket
1 Maths Hockey
1 Science Hockey
1 Maths Cricket
And, in the table above, there is no relationship between the columns course and hobby. They are independent of each other.
So there is multi-value dependency, which leads to un-necessary repetition of data and other anomalies as well.
s_id course
1 Science
1 Maths
2 C#
2 Php
s_id hobby
1 Cricket
1 Hockey
2 Cricket
2 Hockey
Introduction to SQL
Structure Query Language(SQL) is a database query language used for storing and managing data in Relational DBMS. SQL was the first commercial
language introduced for E.F Codd's Relational model of database. Today almost all RDBMS(MySql, Oracle, Infomix, Sybase, MS Access) use SQL as
the standard database query language. SQL is used to perform all types of data operations in RDBMS.
SQL Command
SQL defines following ways to manipulate data stored in an RDBMS.
Command Description
Command Description
Command Description
Command Description
Command Description
Creating a Database
To create a database in RDBMS, create command is used. Following is the syntax,
Creating a Table
create command can also be used to create tables. Now when we create a table, we have to specify the details of the columns of the tables too. We
can specify the names and datatypes of various columns in the create command itself.
Following is the syntax,
Datatype Use
VARCHAR used for columns which will be used to store characters and integers, basically a
string.
CHAR used for columns which will store char values(single character).
TEXT used for columns which will store text which is generally long in length. For
example, if you create a table for storing profile information of a social networking
website, then for about me section you can have a column of type TEXT.
TRUNCATE command
TRUNCATE command removes all the records from a table. But this command will not destroy the table's structure. When we use TRUNCATE command on
a table its (auto-increment) primary key is also initialized. Following is its syntax,
RENAME query
RENAME command is used to set a new name for any existing table. Following is the syntax,
INSERT command
Insert command is used to insert data into a table. Following is its general syntax,
101 Adam 15
Insert value into only specific columns
We can use the INSERT command to insert values for only some specific columns of a row. We can specify the column names along with the values to
be inserted like this,
101 Adam 15
102 Alex
101 Adam 15
102 Alex
103 chris 14
Suppose the column age in our tabel has a default value of 14.
Also, if you run the below query, it will insert default value into the age column, whatever the default value may be.
101 Adam 15
102 Alex
103 chris 14
101 Adam 15
102 Alex 18
103 chris 14
In the above statement, if we do not use the WHERE clause, then our update query will update age for all the columns of the table to 18.
101 Adam 15
102 Alex 18
103 Abhi 17
UPDATE Command: Incrementing Integer Value
When we have to update any integer value in a table, then we can fetch and update the value in the table in a single statement.
For example, if we have to update the age column of student table every year for every student, then we can simply run the following UPDATE statement
to perform the following operation:
DELETE command
DELETE command is used to delete data from a table.
Following is its general syntax,
101 Adam 15
102 Alex 18
103 Abhi 17
101 Adam 15
102 Alex 18
COMMIT command
command is used to permanently save any transaction into the database.
COMMIT
When we use any DML command like INSERT, UPDATE or DELETE, the changes made by these commands are not
permanent, until the current session is closed, the changes made by these commands can be rolled back.
To avoid that, we use the COMMIT command to mark the changes as permanent.
Following is commit command's syntax,
COMMIT;
ROLLBACK command
This command restores the database to last commited state. It is also used with SAVEPOINT command to jump to a
savepoint in an ongoing transaction.
If we have used the UPDATE command to make some changes into the database, and realise that those changes
were not required, then we can use the ROLLBACK command to rollback those changes, if they were not commited
using the COMMIT command.
Following is rollback command's syntax,
ROLLBACK TO savepoint_name;
SAVEPOINT command
command is used to temporarily save a transaction so that you can rollback to that point whenever
SAVEPOINT
required.
Following is savepoint command's syntax,
SAVEPOINT savepoint_name;
In short, using this command we can name the different states of our data in any table and then rollback to that
state using the ROLLBACK command whenever required.
id name
1 Abhi
2 Adam
4 Alex
Lets use some SQL queries on the above table and see the results.
INSERT INTO class VALUES(5, 'Rahul');
COMMIT;
SAVEPOINT A;
SAVEPOINT B;
SAVEPOINT C;
NOTE: SELECT statement is used to show the data stored in the table.
id name
1 Abhi
2 Adam
4 Alex
5 Abhijit
6 Chris
7 Bravo
Now let's use the ROLLBACK command to roll back the state of data to the savepoint B.
ROLLBACK TO B;
id name
1 Abhi
2 Adam
4 Alex
5 Abhijit
6 Chris
Now let's again use the ROLLBACK command to roll back the state of data to the savepoint A
ROLLBACK TO A;
id name
1 Abhi
2 Adam
4 Alex
5 Abhijit
So now you know how the commands COMMIT, ROLLBACK and SAVEPOINT works.
System: This includes permissions for creating session, table, etc and all types of other system privileges.
Object: This includes permissions for any command or query to perform any operation on the database tables.
GRANT: Used to provide any user access privileges or other priviliges for the database.
Now we will use the SELECT statement to display data of the table, based on a condition, which we will add to our SELECT query using WHERE clause.
Let's write a simple SQL query to display the record for student with s_id as 101.
SELECT s_id,
name,
age,
address
FROM student WHERE s_id = 101;
Following will be the result of the above query.
SELECT s_id,
name,
age,
address
FROM student WHERE name = 'Adam';
Following will be the result of the above query.
Operator Description
= Equal to
!= Not Equal to
Wildcard operators
There are two wildcard operators that are used in LIKE clause.
101 Adam 15
102 Alex 18
103 Abhi 17
SELECT * FROM Student WHERE s_name LIKE 'A%';
The above query will return all records where s_name starts with character 'A'.
101 Adam 15
102 Alex 18
103 Abhi 17
Using _ and %
SELECT * FROM Student WHERE s_name LIKE '_d%';
The above query will return all records from Student table where s_name contain 'd' as second character.
101 Adam 15
Using % only
SELECT * FROM Student WHERE s_name LIKE '%x';
The above query will return all records from Student table where s_name contain 'x' as last character.
102 Alex 18
Syntax of Order By
SELECT column-list|* FROM table-name ORDER BY ASC | DESC;
Using default Order by
Consider the following Emp table,
Here we want to find name and age of employees grouped by their salaries or in other words, we will be grouping employees based on their salaries,
hence, as a result, we will get a data set, with unique salaries listed, along side the first employee's name and age to have that salary. Hope you are
getting the point here!
group by is used to group different row of data together based on any one column.
SQL query for the above requirement will be,
name age
Rohan 34
Shane 29
Anu 22
name salary
Rohan 6000
Shane 8000
Scott 9000
You must remember that Group By clause will always come at the end of the SQL query, just like the Order by clause.
SQL HAVING Clause
Having clause is used with SQL Queries to give more precise condition for a statement. It is used to mention condition in Group by based SQL queries,
just like WHERE clause is used with SELECT query.
Syntax for HAVING clause is,
Suppose we want to find the customer whose previous_balance sum is more than 3000.
We will use the below SQL query,
SELECT *
FROM sale GROUP BY customer
HAVING sum(previous_balance) > 3000
Result will be,
The main objective of the above SQL query was to find out the name of the customer who has had a previous_balance more than 3000, based on all
the previous sales made to the customer, hence we get the first row in the table for customer Alex.
DISTINCT keyword
The distinct keyword is used with SELECT statement to retrieve unique values from the table. Distinct removes all the duplicate records while
retrieving records from any table in the database.
Syntax for DISTINCT Keyword
SELECT DISTINCT column-name FROM table-name;
salary
5000
8000
10000
AND operator
AND operator is used to set multiple conditions with the WHERE clause, alongside, SELECT, UPDATE or DELETE SQL queries.
Example of AND operator
Consider the following Emp table
SELECT * FROM Emp WHERE salary < 10000 AND age > 25
The above query will return records where salary is less than 10000 and age greater than 25. Hope you get the concept here. We have used
the AND operator to specify two conditions with WHERE clause.
OR operator
OR operator is also used to combine multiple conditions with WHERE clause. The only difference between AND and OR is their behaviour.
When we use AND to combine two or more than two conditions, records satisfying all the specified conditions will be there in the result.
But in case of OR operator, atleast one condition from the conditions specified must be satisfied by any record to be in the resultset.
Example of OR operator
Consider the following Emp table
In above specified problem statements, the description after the keyword 'all' defines a set which contains some
elements and the final result contains those units which satisfy these requirements.
Another way how you can identify the usage of division operator is by using the logical implication of if...then.
In context of the above two examples, we can see that the queries mean that,
1. If there is a bank in that particular city, that person must have an account in that bank.
2. If there is a course in the list of courses required to be graduated, that person must have taken that course.
Do not worry if you are not clear with all this new things right away, we will try to expain as we move on with
this tutorial.
We shall see the second example, mentioned above, in detail.
Table 1: Course_Taken → It consists of the names of Students against the courses that they have taken.
Student_Name Course
Robert Databases
Table 2: Course_Required → It consists of the courses that one is required to take in order to graduate.
Course
Databases
Programming Languages
Unfortunately, there is no direct way by which we can express the division operator. Let's walk through the
steps, to write the query for the division operator.
Create a set of all students that have taken courses. This can be done easily using the following command.
CREATE TABLE AllStudents AS SELECT DISTINCT Student_Name FROM Course_Taken
Student_name
Robert
David
Hannah
Tom
Next, we will create a set of students and the courses they need to graduate. We can express this in the form of
Cartesian Product of AllStudents and Course_Required using the following command.
CREATE table StudentsAndRequired AS
SELECT AllStudents.Student_Name, Course_Required.Course
FROM AllStudents, Course_Required
Student_Name Course
Robert Databases
David Databases
Hannah Databases
Tom Databases
Here, we are taking our first step for finding the students who cannot graduate. The idea is to simply find the
students who have not taken certain courses that are required for graduation and hence they wont be able to
graduate. This is simply all those tuples/rows which are present in StudentsAndRequired and not present
in Course_Taken.
CREATE table StudentsAndNotTaken AS
SELECT * FROM StudentsAndRequired WHERE NOT EXISTS
(Select * FROM Course_Taken WHERE StudentsAndRequired.Student_Name =
Course_Taken.Student_Name
AND StudentsAndRequired.Course = Course_Taken.Course)
Student_Name Course
Hannah Databases
Tom Databases
All the students who are present in the table StudentsAndNotTaken are the ones who cannot graduate.
Therefore, we can find the students who cannot graduate as,
CREATE table CannotGraduate AS SELECT DISTINCT Student_Name FROM StudentsAndNotTaken
Student_name
David
Hannah
Tom
5. Find all students who can graduate
The students who can graduate are simply those who are present in AllStudents but not in CannotGraduate.
This can be done by the following query:
CREATE Table CanGraduate AS SELECT * FROM AllStudents
WHERE NOT EXISTS
(SELECT * FROM CannotGraduate WHERE
CannotGraduate.Student_name = AllStudents.Student_name)
Student_name
Robert
Hence we just learned, how different steps can lead us to the final answer. Now let us see how to write all these
5 steps in one single query so that we do not have to create so many tables.
SELECT DISTINCT x.Student_Name FROM Course_Taken AS x WHERE NOT
EXISTS(SELECT * FROM Course_Required AS y WHERE NOT
EXISTS(SELECT * FROM Course_Taken AS z
WHERE z.Student_name = x.Student_name
AND z.Course = y.Course ))
Student_name
Robert
This gives us the same result just like the 5 steps above.
SQL Constraints
SQL Constraints are rules used to limit the type of data that can go into a table, to maintain the accuracy and
integrity of the data inside table.
Constraints can be divided into the following two types,
NOT NULL
UNIQUE
PRIMARY KEY
FOREIGN KEY
CHECK
DEFAULT
The above query will declare that the s_id field of Student table will not take NULL value.
UNIQUE Constraint
UNIQUE constraint ensures that a field or column will only have unique values. A UNIQUE constraint field
will not have duplicate data. This constraint can be applied at column level or table level.
Here we have a simple CREATE query to create a table, which will have a column s_id with unique values.
CREATE TABLE Student(s_id int NOT NULL UNIQUE, Name varchar(60), Age int);
The above query will declare that the s_id field of Student table will only have unique values and wont take
NULL value.
The above query specifies that s_id field of Student table will only have unique value.
Order_Detail Table
10 Order1 101
11 Order2 103
12 Order3 102
In Customer_Detail table, c_id is the primary key which is set as foreign key in Order_Detail table. The value
that is entered in c_id which is set as foreign key in Order_Detail table must be present
in Customer_Detail table where it is set as primary key. This prevents invalid data to be inserted
into c_id column of Order_Detail table.
If you try to insert any incorrect data, DBMS will return error and will not allow you to insert the data.
In this query, c_id in table Order_Detail is made as foriegn key, which is a reference of c_id column in
Customer_Detail table.
There are two ways to maintin the integrity of data in Child table, when a particular record is deleted in the
main table. When two tables are connected with Foriegn key, and certain data in the main table is deleted, for
which a record exits in the child table, then we must have some mechanism to save the integrity of data in the
child table.
1. On Delete Cascade : This will remove the record from child table, if that value of foriegn key is deleted from the
main table.
2. On Delete Null : This will set all the values in that record of child table as NULL, for which the value of foriegn
key is deleted from the main table.
3. If we don't use any of the above, then we cannot delete data from the main table for which data in child table
exists. We will get an error if we try to do so.
ERROR : Record in child table exist
CHECK Constraint
CHECK constraint is used to restrict the value of a column between a range. It performs check on the values,
before storing them into the database. Its like condition checking before saving data into a column.
The above query will restrict the s_id value to be greater than zero.
1. Aggregate Functions
2. Scalar Functions
Aggregate Functions
These functions return a single value after performing calculations on a group of values. Following are some
of the frequently used Aggregrate functions.
AVG() Function
Average returns average value after calculating it from values in a numeric column.
Its general syntax is,
SELECT AVG(column_name) FROM table_name
Using AVG() function
Consider the following Emp table
avg(salary)
8200
COUNT() Function
Count returns the number of rows present in the table either based on some condition or without condition.
Its general syntax is,
SELECT COUNT(column_name) FROM table-name
count(name)
Example of COUNT(distinct)
Consider the following Emp table
count(distinct salary)
4
FIRST() Function
first(salary)
9000
LAST() Function
LAST function returns the return last value of the selected column.
Syntax of LAST function is,
SELECT LAST(column_name) FROM table-name;
Using LAST() function
Consider the following Emp table
last(salary)
8000
MAX() Function
MAX function returns maximum value from selected column of the table.
Syntax of MAX function is,
SELECT MAX(column_name) from table-name;
MAX(salary)
10000
MIN() Function
MIN function returns minimum value from a selected column of the table.
Syntax for MIN function is,
SELECT MIN(column_name) from table-name;
MIN(salary)
6000
SUM() Function
SUM function returns total sum of a selected columns numeric values.
Syntax for SUM is,
SELECT SUM(column_name) from table-name;
SUM(salary)
41000
Scalar Functions
Scalar functions return a single value from an input value. Following are some frequently used Scalar Functions
in SQL.
UCASE() Function
Result is,
UCASE(name)
ANU
SHANE
ROHAN
SCOTT
TIGER
LCASE() Function
LCASE(name)
anu
shane
rohan
scott
tiger
MID() Function
MID function is used to extract substrings from column values of string type in a table.
Syntax for MID function is,
SELECT MID(column_name, start, length) from table-name;
MID(name,2,2)
nu
ha
oh
co
ig
ROUND() Function
ROUND function is used to round a numeric field to number of nearest integer. It is used on Decimal point
values.
Syntax of Round function is,
SELECT ROUND(column_name, decimals) from table-name;
ROUND(salary)
9001
8001
6000
10000
8000