Chapter 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

Database Administration:

The Complete Guide to Practices and Procedures

Chapter 5
Application Design
Agenda
• Database Application Development & SQL
• Defining Transactions
• Locking
• Batch Processing
• Questions
Database Application Development
and SQL
To properly design an application that relies on databases
for persistent data storage, the system designer at a
minimum will need to understand the following issues:
• How data is stored in a relational database
• How to code SQL statements to access and modify data in
the database
• How SQL differs from traditional programming languages
• How to embed SQL statements into a host programming
language
• How to optimize database access by changing SQL and
indexes
• Programming methods to avoid potential database
processing problems
SQL
• SQL is the de facto standard for accessing relational
databases
• SQL is a high-level language that provides a greater
degree of abstraction than do traditional procedural
languages
• SQL focuses on what data you want specify data
is needed, not how to get it. It's a declarative language,
meaning you specify the result you want, and the
database engine decides how to execute the query.
– It does not—indeed it cannot—specify to retrieve it
SQL: English-like
• SQL can be used to retrieve data easily with
an English-like syntax.
• It is easier to understand this:

• Than it is to understand C, Java, or most


typical programming languages.
Set-a-Time Processing
• In set-a-time processing, operations such as
data are applied to multiple
rows or an entire set of rows at once.
• Multiple rows can be retrieved, modified, or removed in one
fell swoop by using a single SQL statement
• Every operation performed on a relational database operates
on a table (or set of tables) and results in another table
• This is called
Relational Closure

Result Set ( )
SQL
Statement

Database Tables
Embedding SQL in a Program
• Host language to use SQL to communicate with the database
– COBOL, FORTRAN, Assembler, etc.
– C/C++, Java, PHP, Visual Basic, etc.
• API: allow applications to interface with databases.
– ODBC, JDBC
• Integrated Development Environment (IDE): is a complete
software environment that supports all phases of development.
– Eclipse (for Java, C++, Python, etc.)Visual Studio (for C#, VB.NET, C++,
etc.) PyCharm (for Python)
• A code generator is a tool that automates the
generation of code based on certain inputs, such as
templates or models.
SQL Middleware and APIs
• Application programs require an for issuing SQL to
access or modify data. The interface is used to embed SQL
statements in a host programming language, such as COBOL,
Java, C, or Visual Basic.
• Standard interfaces enable application programs to access
databases using SQL. There are several popular standard
interfaces or APIs (Application Programming Interfaces) for
database programming, including ODBC, JDBC, SQLJ, and OLE DB.
• ODBC
– ODBC is a , or CLI
– Instead of directly embedding SQL in the program, ODBC uses
.
• to allocate and deallocate resources
• control connections to the database
• execute SQL statements
• obtain diagnostic information
• control transaction termination
• obtain information about the implementation
#include <windows.h>
#include <sql.h>
#include <sqlext.h>
#include <iostream>
int main() {
SQLHENV hEnv;
SQLHDBC hDbc;
SQLHSTMT hStmt;
SQLRETURN ret;
// Allocate environment handle
SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &hEnv);
// Set the ODBC version environment attribute
SQLSetEnvAttr(hEnv, SQL_ATTR_ODBC_VERSION, (void*)SQL_OV_ODBC3, 0);
// Allocate connection handle
SQLAllocHandle(SQL_HANDLE_DBC, hEnv, &hDbc);
// Connect to database (replace DSN, user, password accordingly)
SQLConnect(hDbc, (SQLCHAR*)"DSN=mydsn", SQL_NTS, (SQLCHAR*)"username", SQL_NTS, (SQLCHAR*)"password", SQL
// Allocate statement handle
SQLAllocHandle(SQL_HANDLE_STMT, hDbc, &hStmt);
// Execute an SQL query
SQLExecDirect(hStmt, (SQLCHAR*)"SELECT * FROM my_table", SQL_NTS);
// Fetch and display the data
SQLCHAR columnData[256];
while (SQLFetch(hStmt) == SQL_SUCCESS) {
SQLGetData(hStmt, 1, SQL_C_CHAR, columnData, sizeof(columnData), NULL);
std::cout << "Column Data: " << columnData << std::endl;
} // Clean up
SQLFreeHandle(SQL_HANDLE_STMT, hStmt);
SQLDisconnect(hDbc);
SQLFreeHandle(SQL_HANDLE_DBC, hDbc);
SQLFreeHandle(SQL_HANDLE_ENV, hEnv);
return 0; }
SQL Middleware and APIs
• JDBC
– JDBC enables Java to access relational databases.
– Similar to ODBC, JDBC consists of a set of classes
and interfaces that can be used to access relational
data.
– There are several types of JDBC middleware,
including the JDBC-to-ODBC bridge, as well as direct
JDBC connectivity to the relational database.
– Anyone familiar with application programming and
ODBC (or any call-level interface) can get up and
running with JDBC quickly
Drivers
• ODBC and JDBC rely on
– A driver provides an optimized interface for a particular DBMS
implementation
• Programs can make use of the drivers to communicate
with any JDBC- or ODBC-compliant database.
• The drivers enable a standard set of SQL statements in
any Windows application to be translated into
commands recognized by a remote SQL-compliant
database.
• There are multiple types of JDBC drivers
Drivers Components

• Driver Manager: It manages the communication


between the application and the ODBC drivers. It
loads the appropriate driver for the specific database
being accessed.
• ODBC Driver: The driver translates the SQL queries
sent by the application into the format understood by
the database.
• Data Source Name (DSN): A DSN provides the
necessary information to connect to a database, such
as the database type, name, and how to access it.
SQL Middleware and APIs

• SQLJ
– SQLJ (SQL for Java) is an extension of the Java programming
language that allows developers to embed SQL statements
directly within Java code
– A precompiler translates the embedded SQL into Java code.
– The Java program is then compiled into bytecodes, and a
database bind operation creates packaged access routines for
the SQL.
– This bytecode is platform-independent, meaning it can run on
any machine with a Java Virtual Machine (JVM). The JVM
interprets or compiles the bytecode into native machine code
for execution.
SQL Middleware and APIs
• OLE DB (Object Linking and Embedding Database)
– OLE DB presents an object-oriented interface for generic
data access.
– COM Architecture: OLE DB is built on the Component Object
Model (COM), which allows for language independence and
supports the creation of reusable software components.
This enables OLE DB to work across different programming
languages.
– OLE DB provides greater flexibility than ODBC because it can
be used to access both relational and nonrelational data.
– OLE DB is conceptually divided into and .
• consumers are the applications that need access to the data
• providers are the software components that implement the interface
and thereby provide the data to the consumer.
Application Infrastructure
• Application infrastructure is the combined hardware and software
environment that supports and enables the application.
• The application infrastructure will vary from organization to
organization, and even from application to application within an
organization.
• From a hardware perspective, the application infrastructure
includes the servers, clients, and networking components.
• From a software perspective, things are a bit more difficult to nail
down. Software components of an application infrastructure can
include database servers, application servers, web servers,
transaction managers, and development frameworks.
Application Infrastructure
Mainframe Distributed
• IBM z Series hardware • Most modern, distributed,
• Running z/OS, DB2, CICS, with
application programs written in non-mainframe application
COBOL. development projects
• Typically, applications consist of
both batch and online workload. typically rely upon
• A modern mainframe application development
infrastructure adds interfaces to
non-mainframe clients, as well as frameworks.
WebSphere Application Server and • The two most
Java programs.
• Most new mainframe development commonly-used
uses IDEs to code modern frameworks are Microsoft .
applications instead of relying
upon COBOL programmers. NET and J2EE.
Microsoft .NET
• ... is a set of Microsoft technologies for
connecting people, systems, and devices
• ... allows Internet Servers to expose functions
to any client named as .NET web services
• … enables software to be delivered as a
service over the web
• … is designed to let many different services
and systems interact
Microsoft .NET Framework
The Microsoft .NET framework provides a comprehensive development
platform for the construction, deployment, and management of applications.
The .NET framework provides CLR (common language runtime) and class
library for building components using a common foundation. This offers
benefits to developers such as support for standard practices, extensibility,
and a tightly integrated set of development tools.

The .NET framework consists of multiple major components in addition to


the CLR and class library. From a data perspective, the most important
component is ADO.NET which provides access to data sources, such as a
database management system.

ADO.NET is comprised of a series of technologies that enables .NET


developers to interact with data in standard, structured, and predominantly
disconnected ways. Applications that use ADO.NET depend on .NET class
libraries provided in DLL files. ADO.NET manages both internal data (created
in memory and used by the program) and external data (in the database).
ADO.NET provides interoperability and maintainability through its use and
support of XML, simplified programmability with a programming model that
uses strongly-typed data, and enhanced performance and scalability.
Microsoft .NET Framework

Visual Visual Visual Visual Third


JScript
C# Basic J# C++ Party

Microsoft .NET Framework

ADO.NET ASP.NET User Interfaces

.NET Framework Class Library

CLR (Common Language Runtime)


Java Alphabet Soup
• J2EE - Java 2 Enterprise Edition
– Standard services and specifications for making
Java highly available, secure, reliable, and
scalable for enterprise adoption
• EJB - Enterprise Java Beans
– Components that contain the business logic for a
J2EE application
Java Alphabet Soup
• The Java 2 Platform, Enterprise Edition (J2EE) is a set of coordinated
specifications and practices that together enable solutions for
developing, deploying, and managing multitier enterprise applications.
The J2EE platform simplifies enterprise applications by basing them on
standardized, modular components. J2EE provides a complete set of
services to those components and handles many details of application
construction without requiring complex programming.

• So J2EE is not exactly a software framework, but a set of specifications,


each of which dictates how various J2EE functions must operate.
Software conforming to the J2EE platform offers advantages such as
"Write Once, Run Anywhere" portability, JDBC API for database access,
CORBA technology for interaction with existing enterprise resources, and
a security model for data protection. Building on this base, the Java 2
Platform, Enterprise Edition adds full support for Enterprise JavaBeans
components, Java Servlets API, JavaServer Pages and XML technology.

Note that there is much more to Java than is covered in this section.
J2EE and Java

Client Tier Web Tier Business Tier EIS Tier

Java
Standalone
Runtime JSP Enterprise
Java JavaBeans
Application
Pages

Database
Browser
Pure Business
HTML Components
Servlets
for Java
Applet
Impact of Java on DBA
• Application tuning
– Must understand Java
• To provide guidance during design reviews
– Is the problem in the SQL or the application
• How can you tune the application if you do not
understand the language (Java)?
– Optimizing SQL is not enough since it may be embedded in
poor application code
– Must understand the SQL techniques used
• JDBC and SQLJ
Java .Net
• ...designed to enable • …designed to enable
applications to be development in
deployed on any multiple languages as
platform as long as long as the application
they are written in Java is deployed on
Windows
Other Application Choices
• There are other choices, including
– Ruby on Rails
– Ajax
– PHP
– C/C++
– And so on…

• This is not an exhaustive list…


Object Orientation
• OO programming advantages:
– faster program development time
– reduced maintenance costs
– resulting in a better ROI
• Piecing together reusable objects and
defining new objects based on similar object
classes can dramatically reduce development
time and costs.
OO, SQL and Databases
• OO and relational databases are not
inherently compatible
• The set-based nature of SQL is anathema to
the OO techniques practiced by Java and C++
developers.
• All too often insufficient consideration has
been given to the manner in which data is
accessed, resulting in poor design and faulty
performance
Impedance Mismatch
• When OO programming language is used to
access a relational database, you must map
objects to relations.
– OO programs deal with objects
– RDBMSs deal with relations, (that is, tables)
• Applications will not be object-oriented in the
“true” sense of the word because the data will
not be encapsulated within the method (that
is, the program).
Making OO Programs Work with
Relational Databases
1. Serialization
– Saving data using a flat file representation of the
object. This approach can be slow and difficult to use
across applications.
2. XML
– can be stored natively in many relational database
systems. But XML adds a layer of complexity and
requires an additional programming skillset.
3. Object-Relational Mapping (ORM)
– Most common approach
Object Relational Mapping
• With ORM an object’s attributes are stored in one or
more columns of a relational table. Hibernate is a
popular ORM library for Java; NHibernate is an
adaptation of Hibernate for the .NET framework.
• Both Hibernate and NHibernate provide capabilities
for mapping objects to a relational database by
replacing direct persistence-related database
accesses with high-level object handling functions.
• Another option is Microsoft LINQ, which stands for
Language Integrated Query. LINQ provides a set of .
NET framework and language extensions for
object-relational mapping.
Types of SQL

– A planned SQL request is typically embedded into an application program,
but it might also exist in a query or reporting tool. At any rate, a planned SQL
request is designed and tested for accuracy and efficiency before it is run in
a production system. Contrast this with the characteristics of an unplanned
SQL request. Unplanned SQL, also called , is created “on the fly” by
end users during the course of business. Most ad hoc queries are created to
examine data for patterns and trends that impact business. Unplanned, ad
hoc SQL requests can be a significant source of inefficiency and are difficult
to tune. How do you tune requests that are constantly written, rewritten, and
changed?

– Embedded SQL is contained within an application program, whereas
stand-alone SQL is run by itself or within a query, reporting, or OLAP tool.

– A dynamic SQL statement is optimized at run time. Depending on the DBMS,
a dynamic SQL statement may also be changed at run time. Static SQL, on
the other hand, is optimized prior to execution and cannot change without
reprogramming. Favor static SQL to minimize the possibility of SQL injection
attacks.
SQL Usage Considerations

Situation Execution type Program Dynamism

requirement

Columns and predicates of the SQL Planned Embedded Dynamic

statement can change during execution.

SQL formulation does not change. Planned Embedded Static

Highly concurrent, high-performance Planned Embedded Dynamic or static

transactions.

Ad hoc one-off queries. Unplanned Stand-alone Dynamic

Repeated analytical queries. Planned Embedded or stand-alone Dynamic or static

Quick one-time “fix” programs. Unplanned Embedded or stand-alone Dynamic or static


SQL Coding for Performance
• It is important to learn how to code SQL for
performance
• Generally a good idea to rely on the DBMS to
optimize the code
• Let SQL do the work instead of coding it in host
language program
– The less data brought from the DBMS to the program
the better performance will be
• More performance guidelines come later in the
course!
What is XML?
• XML stands for eXtensible • XML is actually a meta
Markup Language. language (a language used to
– Like HTML, XML is based on define other languages).
SGML – These languages are collected
– HTML uses tags to describe the in dictionaries called DTDs
appearance of data on a page, Document Type Definitions.
whereas XML uses tags to – The DTD stores definitions of
describe the data itself, instead of tags for specific industries or
its appearance. fields of knowledge.
– Allows documents to be – The DTD for an XML document
self-describing, through the can be either part of the
specification of tag sets and the document or stored in an
structural relationships between external file.
the tags.
XML Data
• XML uses tags to describe the data itself

<CUSTOMER>
<first_name>Craig</first_name>
<middle_initial>S.</middle_initial>
<last_name>Mullins</last_name>
<company_name>Mullins Consulting, Inc.</company_name>
<street_address>15 Coventry Ct.</street_address>
<city>Sugar Land</city>
<state>TX</state>
<zip_code>77479</zip_code>
<country>USA</country>
</CUSTOMER>

http://www.xml.org
Querying XML
• XQUERY
– FLWOR
• FOR, LET, WHERE, ORDER BY, and RETURN.
– Not just for querying, it also allows for new XML
documents to be constructed
• SQL/XML
– Uses functions to access XML data
• XMLDOCUMENT, XMLELEMENT, XMLCONCAT,
XMLAGG, XMLQUERY, XMLTABLE
Select the customer's first name:
xpath
/CUSTOMER/first_name

Select the customer's full name (first, middle initial, last):


concat(/CUSTOMER/first_name, ' ', /CUSTOMER/middle_initial, '
',/CUSTOMER/last_name)
Retrieve the full name of the customer using xquery:
for $detail in /CUSTOMER/*
return
concat(name($detail), ': ', $detail/text())

Extract address information:

for $address in (/CUSTOMER/street_address, /CUSTOMER/city,


/CUSTOMER/state, /CUSTOMER/zip_code, /CUSTOMER/country)
return $address/text()
Defining Transactions
• A transaction is an atomic unit of work with
respect to recovery and consistency.
• When all the steps that make up a specific
transaction have been accomplished, a
COMMIT is issued.
– ROLLBACK before COMMIT to undo transaction’
s work
• DBMS maintains transaction log
ACID Properties of Transactions
• Defining Transactions
– Atomicity
– Consistency
– Isolation
– Durability
• Unit of Work
– Ensure proper definition and coding
Unit of Work
• A UOW is a series of instructions and
messages that guarantees data integrity.
• Example: bank transaction
– Withdrawal of $20
– The transaction must involve both the
subtraction of $20 from your account and the
delivery of $20 to you
– Only doing one or the other is not a complete unit
of work
TP System Versus DBMS (Stored Procs)
Presentation
(Client)
Presentation
(Client)

Workflow
Controller

Transaction
Server Relational
Relational DBMS
DBMS (2)

Relational Disk
DBMS (1)
Disk Disk
Application Servers
• An application server combines the features
of a transaction server with additional
functionality to assist in building, managing,
and distributing database applications.
• Examples:
– WebSphere (IBM)
– Zend Server
– Base4 Application Server (open source)
Transactions and Locking
• The DBMS uses a mechanism to enable
multiple, concurrent users to access and modify data
in the database.
• By using locks, the DBMS automatically guarantees
the integrity of data. The DBMS locking strategies
permit multiple users from multiple environments to
access and modify data in the database at the same
time.
• Locking Granularity
– Row
– Page (or Block)
– Table
– Table Space
– Database
Level of Lock Granularity
High
Access Concurrency

Low
Granularity of Lock
Column Row Page Table Tablespace Database
Types of Locks
• The following types of locks can be taken on database pages or
rows:
– Shared Lock
• Taken when data is read with no intent to update it.
• If a shared lock has been taken on a row, page, or table, other processes or
users are permitted to read the same data.
– eXclusive Lock
• Taken when data is modified.
• If an exclusive lock has been taken on a row, page, or table, other processes
or users are generally not permitted to read or modify the same data.
– Update Lock
• Taken when data must first be read before it is changed or deleted.
• The update lock indicates that the data may be modified or deleted in the
future.
• If the data is actually modified or deleted, the DBMS will promote the update
lock to an exclusive lock.
Intent Locks
• Intent locks are placed on higher-level
database objects when a user or process
takes locks on the data pages or rows.
– Table or Table Space
• An intent lock stays in place for the life of the
lower-level locks.
Lock Compatibility
Lock Timeouts

e r ror
Deadlocks
Process A Process B

. Table X
.
.
Request row 3 .
data… data… data...
. lock .
. .
. Request row 7
. .
Request row 7 data… data… data... .
lock .
Request row 3

Process A is waiting on Process B


Process B is waiting on Process A
Lock Duration
• refers to the length of time that
a lock is held by the DBMS.
• Two parameters impact lock duration:
– Isolation level
– Acquire/Release
Isolation Level
• Read uncommitted
– aka dirty read
• Read committed
– aka cursor stability
• Repeatable read
• Serializable
Acquire/Release Specification
• Controls when Intent locks are acquired and
released
– Intent locks can be acquired either immediately
when the transaction is requested or iteratively as
needed while the transaction executes.
– Intent locks can be released when the
transaction completes or when each intent lock is
no longer required for a unit of work.
Lock Escalation
• is the process of increasing the
lock granularity for a process or program.
• Typically controlled by system parameters and
DDL parameters in CREATE statements.
• For example:
– If a threshold is hit for the number of locks being
held by a process (or by the entire DBMS), page locks
(or row locks) can be escalated to table locks.
– Can cause concurrency issues
• If the entire table is locked other processes cannot access
the data
Programming Techniques to Minimize
Locking Problems
• Avoid deadlocks by coding updates in the same
sequence regardless of program
– For example, alphabetical order by table name
• Issue data modification SQL statements as
close
to the end of the UOW as possible
– The later in the UOW the update occurs, the
shorter the duration of the lock
Batch Processing
• Batch Processing
– Where programs are scheduled to run at predetermined
times without any user input
• Batch programmers sometimes tend to treat tables
like flat file… that is NOT a good idea.
– Think relationally instead of file processing
• Plan and implement a COMMIT strategy
in all batch application programs
– Instead of holding locks until the end of the program
– Otherwise you will experience a lot of lock timeouts
Questions

You might also like