Database

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Database

The database sometimes referred to as an electronic database, is an organized collection of logically


related data that is stored in an efficient manner so that it can be easily accessed managed and updated.
Let's divide the whole definition into parts and understand in an easier way

Organized Collection
Data should be arranged in such a way that the user can easily process the data when required.

Logically Related Data


Logically related data means that the data should be relevant in some context.

Example: If we are going to make a database for a customer then the database may include customer
name, contact number etc. All these information are in the context of the customer. But the information
like the number of siblings of the customer is out of context and logically not related to the customer
database

DBMS - DataBase Management System


The software which is used to manage the database is called Database Management System(DBMS). It
provides us with an interface or a tool, to perform various operations like:

 creating the database


 manipulating the database
 storing and retrieving the data from the database
 deleting data from the database, etc

The changes in the database have to be made according to certain rules and these rules are defined in
DBMS itself.

*A DBMS can limit what data the end-user sees.

*It provides multiple views of the same database depending upon the user accessibility.

* It can provide access to read and write on the database.

Some popular DBMS software is MySQL, Oracle, SQLite, PostgreSQL, MariaDB etc.

Characteristics of DBMS

 Real-World Entity: A DBMS uses real-world entities(object) to design its architecture. Example: A
customer database uses customers as an entity and the phone number of the customer as an
attribute.
Relation-based Tables: Using DBMS we can form tables based on relations between various entities.

Example: In a university database, we can have students and college as entities.


-We can have relation b/w student & college i.e student studies in a college.

-Using this we can form two tables, one table of entity student and another of entity college.

Query Language: DBMS comes equipped with query language which allows the users to store and
retrieve the data. We can apply as many filtering options as required and get specific results.

Multiple Views: It provides multiple views of the same data depending upon the user.

Example: In a university database, the accountant will have a different view of data than a student. The
accountant may have access to the salary of teachers but students won't have that access.

Multiple Users: DBMS allows multiple users can access the data at the same time and work upon it
parallelly.

ACID Properties: The transaction(a group of tasks) in DBMS follows the concept of ACID;

Atomicity: It means either the transaction will happen or it will not happen. It means if any operation
is performed on the data, either it should be performed or executed completely or should not be
executed at all. It further means that the operation should not break in between or execute partially.

Consistency: It means the state of the database will be consistent before and after the transaction.

Isolation: One transaction will not affect the working of others.

Durability: It means the database should be durable and should not be affected by some system
failures or any other errors.

Users in a DBMS
-DBMS provides an interface for many users to access and retrieve the data.

-Type of access depends upon the software capabilities of the user.

Types of users in DBMS on the basis of their software capabilities and expertise:

1. Application Programmers : They make software programs for managing the database.

2. Database Administrator: He/She is responsible for managing the entire database system and
are called database admin(DBA).
3. End-Users: They are the people who use DBMS software and perform various operations like
retrieving, deleting, inserting etc.
Advantages of DBMS

Data Abstraction: It shows only those data to the user which are useful for them and hides the
complexity of data from the end-users.

Control data Redundancy: It controls the database from forming multiple copies of the same data.

Minimized Data Inconsistency: The DBMS keeps a check that if the value of an object is present in
two different files then both these values should be the same.

Easy Data Manipulation: In DBMS the data is centralized so we can easily modify the data at one
place and the change would be reflected at all other places where the data is present.

Concurrent Access: Multiple users can access the data at the same time.

Backup and Recovery: We can make copies of our data and data can also be recovered during system
failure by applying some recovery techniques.

Disadvantages of DBMS

Increased Cost: The cost of maintaining software, hardware, and personnel to operate and maintain
the DBMS can be very high.

Increased Complexity: Since most of these DBMS use many different technologies at the same time
they require training for users to use this. Only specialized personnel can operate it.

Frequent Update: As new technologies are coming in the market every day we need to remain
updated. These upgrades and training the database users and DBA to learn new changes increases costs
to the company.

Higher Impact Of Failure: The database in DBMS is centralized which increases the vulnerability of the
system. So the failure of any component or corruption of any storage device can bring the system to a
halt.

Why database design is important?

Database Design: Database design is the process of creating a structured plan for a database that
outlines how data will be stored, organized, and accessed.

-It involves designing database schema

-Database design is an important aspect of database management, as it determines the efficiency and
effectiveness of the database in storing and retrieving data.
-A well-designed database can improve the performance and scalability of the database, while a poorly
designed database can lead to problems such as data redundancy, data inconsistency, and slow query
performance.

(Extra)

database schema
What is Schema?
-The Skeleton of the database is created by the attributes(titles/headings) and this skeleton is named
Schema.
-Schema mentions the logical constraints like table, primary key, etc.
-The schema does not represent the data type of the attributes.

Details of a Customer

Schema of Customer

Database Schema
-A database schema is a logical representation of data that shows how the data in a database should be
stored logically.

-It shows how the data is organized and the relationship between the tables.
Database schema contains table, field, views and relation between different keys like primary
key,foreign key

(Process of database design) There are several key considerations in database design, including:

The database design process involves creating a structured plan for a database that outlines how data

will be stored, organized, and accessed. It typically involves the following steps:

1. Data requirements: Identify the types of data that need to be stored and the relationships

between different data elements.

2. Data normalization: Divide the data into smaller, related tables to eliminate

redundancy(unneeded) and improve data integrity(quality).

3. Indexes: Determine which data elements need to be indexed to improve query performance.

4. Data types: Choose appropriate data types for each data element to ensure efficient storage

and retrieval of data.


5. Create indexes: Determine which data elements need to be indexed to improve query
performance. An index is a data structure that helps the database system locate data more
quickly.
6. Test and refine the design: Test the database design to ensure that it meets the needs of the
organization and makes efficient use of resources. This might involve running performance tests
and making adjustments to the design as needed.

What is a Good Database Design?

-A good database design is one that is well-structured, efficient, and flexible.

-It meets the needs of the organization and supports the efficient storage and retrieval of data.

-A good database design is normalized, which means that the data is divided into smaller, related tables

and there is minimal redundancy.

Summarized

This helps to eliminate redundancy and improve the integrity and efficiency of the data. A good

database design is also efficient, which means that it makes effective use of resources such as storage
and processing power, and ensures that queries are fast and efficient. It is flexible, which means that it

can adapt to changing requirements and needs, and is able to handle new types of data and new

relationships between data elements without requiring major redesigns.

Additionally, a good database design is scalable, which means that it can handle increasing amounts of

data and queries without degrading performance, and is secure, with appropriate measures in place to

protect the data from unauthorized access and manipulation.

What is Data Normalization?

Data normalization is the process of organizing a database in a way that reduces redundancy and

dependency, redundancy can lead to inconsistencies and errors in the data and can make it more

difficult to update and maintain the database. By normalizing the data, organizations can reduce

redundancy and improve the integrity of the data.

It involves dividing the data into smaller, related tables and establishing relationships between those

tables.

There are several levels of data normalization, each with its own set of rules. The most common levels of

normalization are:

First normal form (1NF): In 1NF, data is divided into tables with unique primary keys, and there are no

repeating groups of data within a table.


 Second normal form (2NF): In 2NF, data is further normalized by removing partial dependencies on

the primary key. This means that non-key attributes are dependent on the entire primary key, rather

than just a part of it.

 Third normal form (3NF): In 3NF, data is further normalized by removing transitive dependencies. This

means that non-key attributes are dependent only on the primary key, and not on other non-key

attributes.

Data normalization is a process that helps to eliminate redundancy and improve the integrity and

efficiency of a database. By normalizing the data, organizations can design more effective and efficient

databases that are easier to update and maintain.

Normal Description
Form

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent
on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.

4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued
dependency.

5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be
lossless.

Types of Database Schemas:


A database schema is a visual representation of a database that shows the tables, columns, and

relationships between different data elements. There are several types of database schemas, each with

its own characteristics and uses:

 Conceptual schema. A conceptual database schema gives a high-level view of what your

database will contain and how different pieces of information relate to each other, without

offering real-world details i.e how everything is stored

 Logical database schema. Logical schemas flesh out conceptual schemas with more concrete
details about the objects that will be contained within them, such as names, tables, views, and
integrity constraints.

 Physical database schema. A physical schema is an actual design for a relational database. It
includes all the technical and contextual information needed for the schema and is created with
a specific physical data system in mind.

 User schema: A user schema is a view of the data that is specific to a particular user or group

of users. It represents the data that is relevant to the user and the way that the user wants to

access and view the data.

Why Database Design Matters:

Database design shapes how efficiently a database stores and retrieves data. A good design boosts

performance and scalability, while a poor one can cause issues like redundant data, inconsistencies, and

slow queries.

Key Reasons:

Data Integrity: Ensures accurate, consistent, and error-free data. Poor designs may lead to redundancy

and inconsistencies, eroding trust in query results.


Query Performance: Enhances speed and ease of data retrieval, crucial for frequently accessed or

large databases. Poor designs can result in sluggish query performance, impacting organizational

efficiency.

Scalability: Allows databases to handle growing data and queries without performance drop. Vital for

databases expected to expand, ensuring continued support for organizational needs.

Cost Savings: Well-designed databases are more cost-effective, reducing the need for extra hardware

and software. Poor designs may require more resources, leading to higher maintenance costs.

In summary, good database design improves performance, maintains data integrity, and saves costs—

essential for effective data management.

Database Maintenance

Good database maintenance is essential for ensuring the accuracy, reliability, and performance of a

database. Proper maintenance can help to prevent data loss, corruption, and other issues that can occur

over time, and it can also help to optimize the performance of the database.

There are several key benefits to good database maintenance:

Data accuracy: Good database maintenance helps to ensure that the data in the database is accurate

and up to date. This is important because errors in the data can lead to incorrect results and decision-

making, which can have serious consequences for an organization.

 Data integrity: Good database maintenance helps to maintain the integrity of the data, which means

that the data is consistent and follows the rules and constraints that have been set for it. This is

important because data integrity is essential for the reliability of the database.
Performance: Good database maintenance can help to optimize the performance of the database. This

is important because a poorly performing database can result in slow query times and other issues,

which can have a negative impact on the efficiency and productivity of an organization.

 Data security: Good database maintenance includes measures to protect the data from unauthorized

access and manipulation. This is important because data breaches and other security incidents can have

serious consequences for an organization, including legal and regulatory penalties, damage to

reputation, and financial losses.

There are several key activities that are involved in good database maintenance. These include:

Backups: Regular backups of the database are essential to ensure that the data can be restored in case

of a disaster or other data loss event.

Indexing: Indexing helps to improve the performance of the database by creating structures that allow

the database to locate data more quickly.

3. Data cleansing: Data cleansing involves identifying and correcting errors or inconsistencies in

the data. This is important because data errors can lead to incorrect results and decision-

making.

4. Optimization: Optimization involves identifying and addressing performance issues in the

database. This can include activities such as index optimization, query optimization, and

hardware optimization.
5. Security: Good database maintenance includes measures to protect the data from

unauthorized access and manipulation. This can include activities such as password

management, access control, and security audits.

Good database maintenance is essential for ensuring the accuracy, reliability, and performance of a

database. Proper maintenance can help to prevent data loss, corruption, and other issues that can occur

over time, and it can also help to optimize the performance of the database.

Key activities involved in good database maintenance include backups, indexing, data cleansing,

optimization, and security. By investing in good database maintenance, organizations can improve the

quality and value of their data, and increase the efficiency and productivity of their operations.

In a Database Management System (DBMS), the concept of files and the file system is crucial for

organizing and managing data. Here's an overview:

File:

In the context of a DBMS, a file is a collection of related records.

A file represents a table in a relational database or an entity in other types of databases.

Each record in the file corresponds to a row in a table, and each field in the record corresponds to a

column in a table.

File Organization:

Files in a DBMS can be organized in different ways based on the requirements of the application and the

efficiency of data retrieval.

Common file organizations include sequential, random, and hashed.

Sequential File Organization:


Records are stored in sequential order based on a primary key or some other field.

Suitable for applications that require sequential processing of records.

It may not be efficient for direct access or searching.

Random (or Direct) File Organization:

Records can be accessed directly without having to read through the preceding records.

Requires an index structure to facilitate direct access based on a key field.

Suitable for applications that require quick retrieval of specific records.

Hashed File Organization:

Uses a hash function to determine the storage location of records.

Provides fast access to records, especially in scenarios where a unique key is used.

Suitable for applications where quick access to specific records is crucial.

File System:

The file system in a DBMS is a mechanism for organizing and managing files.

It includes components like data dictionary, data catalog, and index files.

The data dictionary contains metadata about the structure of the database, such as information about

tables, fields, and relationships.

The data catalog stores information about the data stored in the database, including the location of files

and indexes.

Index files are used to speed up data retrieval by providing a quick reference to the location of specific

records.

Data Integrity and Security:


The file system in a DBMS also manages data integrity and security.

It enforces constraints to maintain the consistency and accuracy of data.

Access control mechanisms are implemented to ensure that only authorized users can access and

modify data.

Transaction Management:

The file system plays a role in managing transactions, ensuring that multiple operations on the database

occur atomically (all or nothing) and maintain consistency.

In modern database systems, the file system is often abstracted away, and databases use sophisticated

data structures and algorithms to manage data efficiently. Relational database management systems

(RDBMS) like MySQL, PostgreSQL, and Oracle, for example, provide a high-level interface for users and

applications to interact with data without dealing directly with file organization details.

Problems with file System Data Management

While the use of file systems for data management in databases was prevalent in early database

systems, it had several limitations and problems. Here are some of the key issues associated with using a

file system for data management in a Database Management System (DBMS):

Data Redundancy:

In a file system-based approach, data redundancy is a common problem. The same data may be

duplicated in multiple files, leading to inconsistencies and wastage of storage space.


Data Inconsistency:

The decentralized nature of file systems makes it challenging to maintain data consistency. Updates and

modifications to data may result in inconsistencies, especially when multiple applications access the

same data.

Data Isolation:

Data isolation refers to the situation where each application has its own set of files, and changes made

by one application may not be immediately visible to other applications. This lack of data sharing can

lead to inefficient use of data and difficulties in maintaining a unified and coherent view of the data.

Difficulty in Access and Retrieval:

Retrieving specific data from a file system can be inefficient, especially when dealing with large datasets.

File systems may not provide efficient mechanisms for searching, sorting, and filtering data.

Limited Concurrent Access:

File systems may not handle concurrent access by multiple users or applications well. This can result in

issues such as data corruption or the inability to perform certain operations when data is being accessed

or modified by others.

Security Concerns:

File systems often lack robust security features. It may be challenging to implement access controls,

encryption, and other security measures to protect sensitive data adequately.

Lack of Data Integrity Constraints:

Maintaining data integrity (ensuring that data satisfies certain consistency constraints) can be difficult in

a file system. Without the enforcement of constraints, there is a higher risk of introducing errors and

inconsistencies in the data.


Scalability Issues:

As the volume of data increases, file systems may struggle to scale efficiently. Performance degradation

and increased complexity can be significant challenges in managing large datasets.

Limited Data Relationships:

File systems do not inherently support the establishment and enforcement of relationships between

different sets of data. In contrast, relational database management systems excel in managing

relationships between tables, ensuring data integrity.

Maintenance Challenges:

Maintenance tasks, such as data backup, recovery, and optimization, can be more complex in a file

system-based approach compared to modern database systems.

In response to these challenges, relational database management systems (RDBMS) emerged as a more

structured and efficient way to manage data, providing features like data integrity, normalization, and

transaction management. RDBMS systems have largely supplanted file systems for data management in

modern applications due to their ability to address these issues effectively.

You might also like