SQL
SQL
Geoff Noel
Databases Overview
Databases come in all different shapes and sizes. They can be flat files of ASCII data (like Access or Q&A) or complex binary tree structures (Oracle or Sybase). In any form, a database is a data store, or a place that holds data.
If a database is simply a collection of data, then what keeps track of changes to this data?
That is the job of the database management system, or DBMS. Some DBMSs are relational. Those are RDBMS. The relational part refers to the fact that separate collections of data within the reaches of the RDBMS can be looked at together in unison. The RDBMS is responsible for ensuring the integrity of the database. Sometimes, things will get out of whack and the RDBMS will keep all that data in line.
What Is a Database?
The evolution of relational data storage began in 1970 with the work of Dr. E. F. Codd, who proposed a set of 12 rules for identifying relationships between pieces of data. Codd's rules formed the basis for the development of systems to manage data. Today, Relational Database Management Systems (RDBMS) are the result of Codd's vision.
Data in an RDBMS are stored as rows of distinct information in tables. A structured language is used to query (retrieve), store and change the data. The Structured Query Language (SQL) is an ANSI standard, and all major commercial RDBMS vendors provide mechanisms for issuing SQL commands.
Two-Tier Database Design Two-tier model appeared with the advent of server technology. Communication-protocol development and extensive use of local and wide area networks allowed the database developer to create an application front end that accessed data through a connection (socket) to the back-end server. A two-tier database design, where the client software is connected to the database through a socket connection.
Client programs (applying a user interface) send SQL requests to the database server. The server returns the appropriate results, and the client is responsible for the formatting and display of the data. Clients still use a vendor-provided library of functions that manage the communication between client and server. Most of these libraries are written in either the C language or Perl. Commercial database vendors realized the potential for adding intelligence to the database server. They created proprietary techniques that allowed the database designer to develop macro programs for simple data manipulation. These macros, called stored procedures, can cause problems relating to version control and maintenance. Because a stored procedure is an executable program living on the database, it is possible for the stored procedure to attempt to access named columns of a database table after the table has been changed. For example, if a column with the name id is changed to cust_id, the meaning of the original stored procedure is lost. The advent of triggers, which are stored procedures executed automatically when some action (such as insert) happens with a particular table or tables, can compound these difficulties when the data returned from a query are not expected. Again, this can be the result of the trigger reading a table column that has been altered.
Version control is an issue. When the vendor updates the client-side libraries, the applications that utilize the database must be recompiled and redistributed.
Vendor libraries deal with low-level data manipulation. Typically, the base library only deals with queries and updates of single rows or columns of data. This can be enhanced on the server side by creating a stored procedure, but the complexity of the system then increases.
All of the intelligence associated with using and manipulating the data is implemented in the client application, creating large client-side runtimes. This drives up the cost of each client set.
It is multithreaded to manage multiple client connections simultaneously. It can accept connections from clients over a variety of vendor-neutral protocols (from HTTP to TCP/IP), then hand off the requests to the appropriate vendor-specific database servers, returning the replies to the appropriate clients. It can be programmed with a set of "business rules" that manage the manipulation of the data. Business rules could include anything from restricting access to certain portions of data to making sure that data is properly formatted before being inserted or updated. It prevents the client from becoming too heavy by centralizing processintensive tasks and abstracting data representation to a higher level. It isolates the client application from the database system and frees a company to switch database systems without having to rework the business rules. It can asynchronously provide the client with the status of a current data table or row.
Oracle
7.x 8.0.x 8.1.x ( aka 8i) 9.2 10G
NCR Teradata Ingres Informix Sybase MySQL Gupta/Centura -SQLbase DBase Paradox . . . Many others
Various Methods of Connection ODBC JDBC Native OLE/ADO BCP SQL-NET SQL-LOADER HPL
Tools
I-SQL SQL Worksheet Enterprise Manager Toad DB Artisan Query Analyzer Win SQL O-SQL SQL-Plus
Enterprise Manager DB Artisan MS Query Access Crystal Data Dictionary ERD (Entity Relationship Diagram)
Schema By Owner
What is a Schema ?
Pronounced -> skee-ma.
The structure of a database system, described in a formal language supported by the database management system (DBMS). In a relational Database (RDBMS), the schema defines the tables, the fields in each table, and the Relationships between fields and tables. Schemas are generally stored in a data dictionary. Although a schema is defined in text database language, the term is often used to refer to a graphical depiction of the database structure.
Database Objects
Tables Columns Data-types Indexes Primary key Foreign key Views Table-spaces Partitions Constraints Synonyms
Database actions
Table Basics
A relational database system contains one or more objects called tables. The data or information for the database are stored in these tables. Tables are uniquely identified by their names and are comprised of columns and rows. Columns contain the column name, data type, and any other attributes for the column. Rows contain the records or data for the columns. Here is a sample table called "weather". city, state, high, and low are the columns. The rows contain the data for this table:
Weather
city
state
high
low
Phoenix
Arizona
105
90
Tucson
Arizona
101
92
Flagstaff
Arizona
88
69
San Diego
California
77
60
Albuquerque
New Mexico
80
72
21
Fred
Jones
47
32
Bill
Smith
23
87
Wendy
Jones
32
Bob
Stikino
943
A is a primary key consisting of more than one column. In the above example, the combinations (RecordNo,FirstName), (RecordNo,Lastname), (RecordNo,FirstName,Lastname), and (FirstName,LastName) are all candidate keys. Any combination including Age is not a candidate key because it contains a null. Often, database designers add an extra column to their table designs, a column defined as an integer, which will hold a number. In Microsoft Access, this is an autonumber, in MySQL it's an auto-increment, in Oracle it's a sequence, and in SQL/Server it's an identity column. As these names suggest, this integer is automatically assigned by the database, usually incrementally, sometimes using an initial value and increment that you can specify. Some databases allow these numbers to be generated randomly.
The purpose of this type of automatically generated number is to act as the surrogate primary key, usually in those situations similar to the above where candidate keys are multi-column. The awkwardness of a multicolumn candidate key becomes apparent as soon as you define a foreign key on it.
TestResult 87 73 56 null 92
An example of a column that would take a "not applicable" null is Date Terminated in a human resources database, where the value would be null for all active employees. To test for nulls, you can filter them out in the WHERE clause SELECT EmployeeID , (DateTerminated - DateHired) AS LengthOfService FROM EmployeeTable WHERE DateTerminated IS NOT NULL which would give results only for terminated employees. If you didn't have the WHERE clause, the above query would return null for every active employee, because any expression involving a null yields a null result. Alternatively, you can use the COALESCE function to supply a non-null value SELECT EmployeeID , ( COALESCE(DateTerminated,GETDATE()) - DateHired) AS LengthOfService FROM EmployeeTable where GETDATE() returns today's date and therefore provides an accurate measure for the length of service of active employees. So for terminated employees, DateTerminated is not null, and the calculation is the same as above, while for active employees, DateTerminated is null so COALESCE uses today's date instead.
What SQL?
SQL isn't (properly) structured it's more than just queries (e.g. insert, update, delete) it isn't a real computing language (Very Debatable)
In any case, SQL is a database query language that was adopted as an industry standard in 1986. It has undergone two important revisions, SQL2 (also called SQL-92), and SQL3 (also called SQL-99).
Selecting Data
The select statement is used to query the database and retrieve selected data that match the criteria that you specify. Here is the format of a simple select statement: select "column1" [,"column2",etc] from "tablename" [where "condition"]; [ ] = optional The column names that follow the select keyword determine which columns will be returned in the results. You can select as many column names that you'd like, or you can use a "*" to select all columns. The table name that follows the keyword from specifies the table that will be queried to retrieve the desired results. The where clause (optional) specifies which data values or rows will be returned or displayed, based on the criteria described after the keyword where. Conditional selections used in the where clause:
<=
<> LIKE
select first, last, city from empinfo; select last, city, age from empinfo where age > 30;
select first, last, city, state from empinfo where first LIKE 'J%'; select * from empinfo; select first, last, from empinfo where last LIKE '%s'; select first, last, age from empinfo where last LIKE '%illia%'; select * from empinfo where first = 'Eric';
first John
ag 45 25 32
0 2 2
Mary Eric
Mary
An n
Edward s
8823
32
Ginger
Sebastia n Gus Mary
Howell
Smith Gray
2 1 2
42
23 35
Arizona
Arizona Arizona
An n
3232
52
Tucson
Arizona
7 0 2
60 22 22
0 2 2 3 2 1 2 6 7 0 2
select first, last, city from empinfo; select last, city, age from empinfo where age > 30;
city Payson San Diego Phoenix Cottonwood Bagdad Tucson Show Low age 45 32 32 42 35 52 60
Mary
Eric Mary Ann
Jones
Edward s Edward s Howell Smith Gray May Williams Brown Cleaver
25
32 32 42 23 35 52 60 22 22
Payson
San Diego Phoenix
Arizona
Californi a Arizona Arizona Arizona Arizona Arizona Arizona Arizona Arizona
9800
9200 2232 3232 3232 3238 3238
Cottonwoo d
Gila Bend Bagdad Tucson Show Low Pinetop Globe
select first, last, city, state from empinfo where first LIKE 'J%';
Sample Table: empinfo first John last Jones id 99980
age
city Payson
state Arizona
45
first
last
city
state
Mary
Eric Mary Ann Ginger
Jones
Edwards Edwards Howell Smith Gray May Williams Brown Cleaver
99982
88232 88233 98002 92001 22322 32326 32327 32380 32382
25
32 32 42 23 35 52 60 22 22
Payson
San Diego Phoenix Cottonwood Gila Bend Bagdad Tucson Show Low Pinetop Globe
Arizona
California Arizona Arizona Arizona Arizona Arizona Arizona Arizona Arizona
John
Jones
Payson
Arizona
45 25 32
Gus
Mary Ann Erica Leroy Elroy
Gray
May Williams Brown Cleaver
22322
32326 32327 32380 32382
35
52 60 22 22
Bagdad
Tucson Show Low Pinetop Globe
Arizona
Arizona Arizona Arizona Arizona
Mary Ann
Edwards
88233
32
Phoenix
Arizona
42 23 35
Mary Ann
May
32326
52
Tucson
Arizona
Erica Leroy
Williams Brown
32327 32380
60 22
Arizona Arizona
Elroy
Cleaver
32382
22
Globe
Arizona
Sample Table: empinfo first John Mary Eric last Jones Jones Edwards id 99980 99982 88232 age 45 25 32 city Payson Payson San Diego state Arizona Arizona California
Mary Ann
Edwards
88233
32
Phoenix
Arizona
42 23 35
Erica
Leroy Elroy
Williams
Brown Cleaver
32327
32380 32382
60
22 22
Show Low
Pinetop Globe
Arizona
Arizona Arizona
select first, last, age from empinfo where last LIKE '%illia%';
Sample Table: empinfo first John last Jones id 99980 age 45 city Payson state Arizona
Mary
Eric
Jones
Edwards
99982
88232
25
32
Payson
San Diego
Arizona
California
Mary Ann
Edwards
88233
32
Phoenix
Arizona
Ginger Sebastian
Howell Smith
98002 92001
42 23
Arizona Arizona
Gus
Gray
22322
35
Bagdad
Arizona
first Erica
last Williams
age
Mary Ann May 32326 52 Tucson Arizona
60
Erica Leroy Elroy Williams Brown Cleaver 32327 32380 32382 60 22 22 Show Low Pinetop Globe Arizona Arizona Arizona
id
age
city
state
John
Jones
99980
45
Payson
Arizona
Mary
Jones
99982
25
Payson
Arizona
first Eric
last Edwards
id 88232
age 32
state California
Eric
Edwards
88232
32
San Diego
California
Mary Ann
Edwards
88233
32
Phoenix
Arizona
Ginger
Howell
98002
42
Cottonwood
Arizona
Sebastian
Smith
92001
23
Gila Bend
Arizona
Gus
Gray
22322
35
Bagdad
Arizona
Mary Ann
May
32326
52
Tucson
Arizona
Erica
Williams
32327
60
Show Low
Arizona
Leroy
Brown
32380
22
Pinetop
Arizona
Elroy
Cleaver
32382
22
Globe
Arizona
John
Jones
99980
45
Payson
Arizona
Mary
Jones
99982
25
Payson
Arizona
The LIKE pattern matching operator can also be used in the conditional selection of the where clause. Like is a very powerful operator that allows you to select only rows that are "like" what you specify. The percent sign "% can be used as wild card to match any possible character that might appear before or after the characters specified. For example: select first, last, city from empinfo where First LIKE 'Er%'; This SQL statement will match any first names that start with 'Er'. Strings must be in single quotes.
Eric
Edwards
88232
32
San Diego
California
Mary Ann
Edwards
88233
32
Phoenix
Arizona
Ginger
Howell
98002
42
Cottonwood
Arizona
Sebastian
Smith
92001
23
Gila Bend
Arizona
Gus
Gray
22322
35
Bagdad
Arizona
Mary Ann
May
32326
52
Tucson
Arizona
Or you can specify, select first, last from empinfo where last LIKE '%s'; This statement will match any last names that end in a 's'. select * from empinfo where first = 'Eric';
Erica
Williams
32327
60
Show Low
Arizona
Leroy
Brown
32380
22
Pinetop
Arizona
Elroy
Cleaver
32382
22
Globe
Arizona
This will only select rows where the first name equals 'Eric' exactly.
TransID and TransAmt do not require fully qualified names because they exist in only one of the tables. You can use fully qualified names for readability if you wish. The Customer table is considered to be the left table because it was called first. Likewise, the Transaction table is the right table. You can use more than two tables, in which case each one is naturally joined to the cumulative result in the order they are listed, unless controlled by other functionality such as join hints or parenthesis. You may use WHERE and ORDER BY clauses with any JOIN statement to limit the scope of your results. Note that these clauses are applied to the results of your JOIN statement. SQL Server does not recognize the semicolon (;), but I use it in the included examples to denote the end of each statement, as would be expected by most other RDBMSs.
Another addition to your SQL toolbox Although the JOIN statement is often perceived as a complicated concept, you will see that its a powerful timesaving resource thats relatively easy to understand. Use this functionality to get related information from multiple tables with a single query and to skillfully reference normalized data. Once youve mastered JOINs, you can elegantly maneuver within even the most complex database.
Inner join
In relational databases, a join operation matches records in two tables. The two tables must be joined by at least one common field. That is, the join field is a member of both tables. Typically, a join operation is part of a SELECT query. select * from A, B where A.x = B.y
The column names (x and y in this example) are often, but not necessarily, the same.
Outer Join
(database)outer join - A less commonly used variant of the inner join relational database operation. An inner join selects rows from two tables such that the value in one column of the first table also appears in a certain column of the second table. For an outer join, the result also includes all rows from the first operand ("left outer join", "*="), or the second operand ("right outer join", "=*"), or both ("full outer join", "*=*"). A field in a result row will be null if the corresponding input table did not contain a matching row. For example, if we want to list all employees and their employee number, but not all employees have a number, then we could say (in SQL): SELECT employee.name, empnum.number WHERE employee.id *= empnum.id The "*=" means "left outer join" and means that all rows from the "employee" table will appear in the result, even if there is no match for their ID in the empnum table.
I really did try to come up with examples where this function was useful, and they were all very contrived. However, Im sure someone out there is generating lists of all their products in all possible colors or something similar, or we wouldnt have this wonderful but dangerous feature.
SELECT CustomerName, TransDate FROM Customer INNER JOIN Transaction ON Customer.CustomerID = Transaction.CustomerID;
If a row in the Transaction table contains a CustomerID thats not listed in the Customer table, that row will not be returned as part of the result set. Likewise, if the Customer table has a CustomerID with no corresponding rows in the Transaction table, the row from the Customer table wont be returned.
SQL Subquery
It is possible to embed a SQL statement within another. When this is done on the WHERE or the HAVING statements, we have a subquery construct. What is subquery useful for? First, it can also be used to join tables. Also, there are cases where the only way to correlate two tables is through a subquery. The syntax is as follows: SELECT "column_name1" FROM "table_name" WHERE "column_name2" [Comparison Operator] (SELECT "column_name1" FROM "table_name" WHERE [Condition]) [Comparison Operator] could be equality operators such as =, >, <, >=, <=. It can also be a text operator such as "LIKE." Let's use the same example as we did to illustrate SQL joins:
Table Store_Information
store_name Los Angeles San Diego Los Angeles Sales $1500 $250 $300 Date Jan-05-1999 Jan-07-1999 Jan-08-1999 East East West West
Table Geography
region_name store_name Boston New York Los Angeles San Diego
Boston
$700
Jan-08-1999
and we want to use a subquery to find the sales of all stores in the West region. To do so, we use the following SQL statement: SELECT SUM(Sales) FROM Store_Information WHERE Store_name IN (SELECT store_name FROM Geography WHERE region_name = 'West')
SUM(Sales) 2050
In this example, instead of joining the two tables directly and then adding up only the sales amount for stores in the West region, we first use the subquery to find out which stores are in the West region, and then we sum up the sales amount for these stores.
'ALTER'||OBJECT_TYPE||''||OWNER||'.'||OBJECT_NAME||'COMPILE;' ALTER PROCEDURE MAXDATA.P_REPLACE_PROD compile; ALTER PROCEDURE MAXDATA.P_SETPROTOTYPE compile; ALTER PROCEDURE MAXDATA.UPDATE_LV10MAST compile; ALTER PROCEDURE MAXAPP.P_COMPPROC compile;
Originally not to be available until the Yukon release of SQL Server, Microsoft decided to release Reporting Services early because of the customer excitement they heard. Why the excitement? Reporting Services fills a need that many organizations are faced withthe need to build business intelligence and reporting solutions. Until now, developers were required to embed reports into their applications, or organizations were required to purchase expensive and sometimes problematic third-party reporting solutions. Now, Reporting Services offer a complete solution for distributing reports across the enterprise; enabling businesses to make decisions better and faster. Overview of Reporting Services Reporting Services is a scalable, secure, robust reporting solution for SQL Server. It supports the complete reporting lifecycle by including tools for report creation, execution, distribution, and management. New users can have Reporting Services installed and new reports published within a matter of hours instead of days or weeks. Reporting Services consists of the following key components:
Report Designer: Supports the report creation phase of the report lifecycle. It is an add-on tool for any edition of Visual Studio .NET 2003, suitable for both programmers and non-programmers. Report Server: Provides services for execution and distribution of reports. Report Manager: A Web-based administration tool for managing the Report Server. Report Designer Report Designer is a Visual Studio .NET 2003 add-on and is included with Reporting Services (see Figure A). As the name implies, it provides developers and non-developers an intuitive tool to create sophisticated reports. Users get standard reporting functionality such as grouping, sorting, and report formatting. This should be sufficient for most reporting needs. For more advanced reports, the Report Designer has full VB.NET support. Plus, designers can add ActiveX controls to their reports to create rich, live, interactive reports.
One of the more compelling features of Report Designer is the ability to have dynamic, querybased parameters. This eliminates the administrator having to maintain parameter lists for all the reports (i.e., department names, office locations, employee names, etc.). You simply have to create a new dataset and tie the results to the parameter. It even allows cascading parameters. At the heart of Reporting Services architecture is the Report Definition Language (RDL), which is an XML-based standard for defining reports. RDL is key to the Reporting Services success by allowing third parties to publish reports to the Reporting Server. There are already product offerings from independent software vendors (ISVs) today.
Though Reporting Services requires SQL Server as its repository, Report Designer can connect to all types of data sources including OLE DB, ODBC, Oracle, SQL Server, and others. It also has many rendering options such as HTML, Microsoft Excel, PDF, CSV, XML, and others. The list can also be extended by third parties or by using the Reporting Services extension library.
Report Server The Report Server provides the repository, management, execution, and delivery functions. It is scalable and secure, and can support the most demanding reporting needs. It consists of several subcomponents, including:
Request Handler: Handles all inbound server requests and routes them to the appropriate component. Scheduling and Delivery Processor: Provides the scheduling and delivering functionality, and it can be extended to deliver reports to other devices such as fax machines or printers. Report Processor: Provides the execution functionality and it, too, can be extended to render to new output formats such as a Microsoft Word document. Report Server Database: All the data required by Reporting Services is stored in the Report Server Database, which must be a SQL Server. This includes everything from server settings to report definitions, even to cached data from a report execution. Report Manager Report Manager provides administrators an easy-to-use tool for configuring the server and managing the reports. Using Report Manager, administrators can configure security, change server settings, schedule reports for execution, and maintain the structure of the report folders (see Figure B).
Administrators have a variety of options for executing reports, including on-demand, cached reporting with an adjustable expiration period, and flexible report scheduling. All of these options are configurable at the report level through Report Manager. Report Manager supports both push and pull distribution options. For e-mail delivery, Report Manager can include a link to the report, attach the report to the message, or embed reports directly into the message using Web archive. This eliminates one extra step for the reader of the report. Where you need Reporting Services Here are a few typical scenarios where Reporting Services will be invaluable: New application development: Most applications have reporting requirements that are sometimes very complex. With Reporting services, the business analyst, who usually has a better understanding of what is required, is able to fulfill these requirements. This frees a developer to perform other development tasks and not develop tedious, time-consuming reports. Rarely do I meet a developer who enjoys writing reports. Furthermore, these reports will be easier to maintain and support. Existing applications: Because of the complexity and time required to embed reports into applications, reports are often created outside of the application using third-party tools and distributed manually or through batch jobs. By using Reporting Services, these applications can easily be extended to include a complete reporting solution, embedded within the application.
Executive dashboard Executive dashboard is the buzzword for providing executives a comprehensive view of their business, commonly in an Enterprise Portal. Reporting Services includes several key features for an executive dashboard, such as My Reports, My Subscriptions, push/pull delivery, numerous rendering options, and support for Web services. Though very powerful, there are a couple of scenarios that would not be suitable for Reporting Services: Applications using third-party reporting solutions: Reporting Services supply only a migration tool for Microsoft Access. If you have a significant investment in another tool and it supports your needs, youll need to carefully consider the costs of migrating to Reporting Services. Enterprise reporting solution: Though non-programmers can be very productive with Report Designer, there are reporting tools absent from Report Designer that experienced report creators will miss. For example, Reporting Services does not include a database abstraction layer to hide database details from users. This abstraction is very useful if you want to deploy a reporting solution to a large, semi-technical audience.