Msbi Interview - Questions

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 45
At a glance
Powered by AI
The key takeaways are about optimizing SQL queries and indexes for better performance.

A clustered index sorts and stores row data, while a nonclustered index stores a pointer to the row data. Reading from a clustered index is generally faster.

A filtered index applies a filter to index only a portion of rows, improving performance and reducing storage and maintenance costs compared to full table indexes.

what is the index structure of clustered and non clustered index?

Clustered index
A clustered index sorts and stores the data rows of the table or view in order based on the
clustered index key. The clustered index is implemented as a B-tree index structure that
supports fast retrieval of the rows, based on their clustered index key values.
CREATE CLUSTERED INDEX MyIndex1 ON MyTable(Column1);
Non clustered index
A non clustered index can be defined on a table or view with a clustered index or on a heap.
Each index row in the non clustered index contains the non clustered key value and a row
locator. This locator points to the data row in the clustered index or heap having the key
value. The rows in the index are stored in the order of the index key values, on but the data
rows are not guaranteed to be in any particular order unless a clustered index is created the
table.
As an example of a non-clustered index, lets say that we have a non-clustered index on the
EmployeeID column. A non-clustered index will store both the value of the
EmployeeID AND a pointer to the row in the Employee table where that value is actually
stored. But a clustered index, on the other hand, will actually store the row data for a
particular EmployeeID so if you are running a query that looks for an EmployeeID of 15,
the data from other columns in the table like EmployeeName, EmployeeAddress, etc. will all
actually be stored in the leaf node of the clustered index itself.
This means that with a non-clustered index extra work is required to follow that pointer to
the row in the table to retrieve any other desired values, as opposed to a clustered index
which can just access the row directly since it is being stored in the same order as the
clustered index itself. So, reading from a clustered index is generally faster than reading
from a non-clustered index.
CREATE NONCLUSTERED INDEX My Index ON My Table(Column1,Column2);
what is filtered index and how it is useful?
Filtered Index is a new feature in SQL SERVER 2008. Filtered Index is used to index a portion
of rows in a table that means it applies filter on INDEX which improves query performance,
reduce index maintenance costs, and reduce index storage costs compared with full-table
indexes.
A filtered Index is an optimized non clustered Index which is one of the great performance
improvements in SQL SERVER 2008 reducing the Index storage cost and reduces
maintenance cost.
An index scan operation happens when every row of an index must be examined in order to
find values in columns that the index does not cover. We see that for our query,
the clustered index was scanned to locate rows that match the given criteria for the Unit
Price column:

efore we create a filtered index, lets create a traditional nonclustered index on
the UnitPrice column of
theAdventureWorks2012b databases Sales.SalesOrderDetail table, and see how that
improves our query. Well compare the performance of a nonclustered index against the
performance of a filtered nonclustered index - the query engine suggests that a
nonclustered index could improve the query by over 92%.
--add nonclustered index to UnitPrice column
CREATE NONCLUSTERED INDEX ncIX_SalesOrderDetail_UnitPrice
ON AdventureWorks2012b.Sales.SalesOrderDetail(UnitPrice)
GO
Now lets compare the query execution for both tables, in one batch:
--find SalesOrderDetailIDs with UnitPrice > $2000 - no index
SELECT SalesOrderDetailID, UnitPrice
FROM AdventureWorks2012.Sales.SalesOrderDetail
WHERE UnitPrice > 2000
GO

--find SalesOrderDetailIDs with UnitPrice > $2000 - using nonclustered index
SELECT SalesOrderDetailID, UnitPrice
FROM AdventureWorks2012b.Sales.SalesOrderDetail
WHERE UnitPrice > 2000
GO

Its obvious that the introduction of a simple nonclustered index did improve our query
by more than 90%. Since nonclustered indexes refer to the primary key (if it exists) of a
table, our query did not incur a key lookup. A key lookup is an expensive operation what
occurs when a column is included in the SELECT list of a query, but is not covered by an
index.
what is index_id in sql server?
Select*from sys.indexes, in that we find index_id column.So, Heap tables (ones with no
clustered index) will always have one index with entry index_id = 0, and tables with
clustered indexes will always have an entry with index_id = 1. In addition, index_id > 1
are non-clustered indexes.
what is covering index and how it will increase the performance (what is key
loop and how to avid it)?
Clustered Indexes
A clustered index is an index whose leaf nodes, that is the lowest level of the index, contain
the actual data pages of the underlying table. Hence the index and the table itself are, for all
practical purposes, one and the same. Each table can have only one clustered index. For
more information on clustered indexes, see the Books Online topic "Clustered Index
Structures" (http://msdn.microsoft.com/en-us/library/ms177443.aspx).
When a clustered index is used to resolve a query, SQL Server begins at the root node for
the index and traverses the intermediate nodes until it locates the data page that contains
the row it's seeking.
Many database designs make prolific use of clustered indexes. In fact, it is generally
considered a best practice to include a clustered index on each table; of course that's
painting with a very broad brush and there will most assuredly be exceptions. For more
information about the benefits of clustered indexes, see the SQL Server Best Practices
Article entitled "Comparing Tables Organized with Clustered Indexes versus Heaps" on
TechNet .
Lets consider an example. In the Figure 1, the Customers table has a clustered index
defined on the Customer_ID column. When a query is executed that searches by the
Customer_ID column, SQL Server navigates through the clustered index to locate the row in
question and returns the data. This can be seen in the Clustered Index Seek operation in the
querys Execution Plan.

Figure 1. Clustered index execution plan
The following Transact-SQL statement was used to create the clustered index on the
Customer_ID column.

CREATE CLUSTERED INDEX [ix_Customer_ID] ON [dbo].[Customers]
(
[Customer_ID] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB =
OFF,IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE =
OFF, ALLOW_ROW_LOCKS = ON,ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Nonclustered indexes
Nonclustered indexes use a similar methodology to store indexed data for tables within SQL
Server. However in a nonclustered index, the lowest level of the index does not contain the
data page of the table. Instead, it contains the information that allows SQL Server to
navigate to the data pages it needs. For tables that have a clustered index, the leaf node of
the nonclustered index contains the clustered index keys. In the previous example, the leaf
node of a nonclustered index on the Customers table would contain the Customer_ID key.
If the underlying table does not have a clustered index (this data structure is known as a
heap), the leaf node of the nonclustered index contains a row locator to the heap data
pages.
In the following example, a nonclustered composite index has been created on the
Customers table as described in the following Transact-SQL code.

CREATE NONCLUSTERED INDEX [ix_Customer_Name] ON [dbo].[Customers]
(
[Last_Name] ASC,
[First_Name] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB =
OFF,IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE =
OFF, ALLOW_ROW_LOCKS = ON,ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
In this case when a query that searched by customer last name was executed, the SQL
Server query optimizer chose to use the ix_Customer_Name index to resolve the query.
This can be seen in the Execution Plan in the following figure.

Figure 2. Nonclustered index execution plan
For more information on clustered indexes, see the Books Online topic "Nonclustered
Index Structures" .
Using Nonclustered indexes
As illustrated in the preceding example, nonclustered indexes may be employed to
provide SQL Server with an efficient way to retrieve data rows. However, under some
circumstances, the overhead associated with nonclustered indexes may be deemed too
great by the query optimizer and SQL Server will resort to a table scan to resolve the
query. To understand how this may happen, let's examine the preceding example in
more detail.
Key Lookups
Looking again at the graphical query execution plan depicted in Figure 2, notice that the
plan not only includes an Index Seek operation that uses
the ix_Customer_Name nonclustered index, it also includes a Key Lookup operation.
SQL Server uses a Key Lookup to retrieve non-key data from the data page when a
nonclustered index is used to resolve the query. That is, once SQL Server has used the
nonclustered index to identify each row that matches the query criteria, it must then
retrieve the column information for those rows from the data pages of the table.
Since the leaf node of the nonclustered index contains the key value information for the
row, SQL Server must navigate through the clustered index to retrieve the columnar
information for each row of the result set. In this example, SQL Server choose to do this
using a nested loop join type.
This query produced a result of 1,000 rows and nearly 100% of the expense of the query
was directly attributed to the Key Lookup operation. Digging a little deeper into the Key
Lookup operation, we can see why.

Figure 3. Key lookup operation properties
This Key Lookup operation was executed 1000 times, once for each row of the result set.
Resorting to Table Scans
As the number of rows in the result set increases, so does the number of Key lookups. At
some point, the cost associated with the Key Lookup will outweigh any benefit provided
by the non clustered index.
To illustrate this point, let's modify the query so that it retrieves more rows. Figure 4.
depicts the new query along with the actual execution plan used to resolve the query.

Figure 4. Resorting to a table scan
The new query searches for a range of customers whose last name is between "Roland"
and "Smith". There are 69,000 of them in our database. From the actual execution plan,
we can see that the query optimizer determined that the overhead cost of performing a
Key Lookup for each of the 69,000 rows was more than simply traversing the entire table
via a table scan. Hence, our ix_Customer_Name index was not used at all during the
query.
Figure 5 shows some additional properties of the table scan.

Figure 5. Properties for the table scan operation
One may be tempted to force SQL Server to resolve the query using the nonclustered
index by supplying a table hint as shown in the following illustration.

Figure 6. Using a table hint to resolve the query
This is almost always a bad idea since the optimizer generally does a good job in choosing
an appropriate execution plan. Additionally, the optimizer bases its decisions on column
statistics; those are likely to change over time. A table hint that works well today, may
not work well in the future when the selectivity of the key columns change.
Figure 7 shows the properties for the Key Lookup when we forced SQL Server to use the
nonclustered ix_Customer_Name index. The Estimated Operator Cost for the Key Lookup
is 57.02 compared to 12.17 for the Clustered Index Scan shown in Figure 5. Forcing SQL
Server to use the index significantly affected performance, and not for the better!

Figure 7. Properties of the Key Lookup for the table hint
Covering Indexes
So, if Key Lookups can be detrimental to performance during query resolution for large
result sets, the natural question is: how can we avoid them? To answer that question,
let's consider a query that does not require a Key Lookup.
Let's begin by modifying our query so that it no longer selects
the Email_Address column. Figure 8 illustrates this updated query along with its actual
execution plan.

Figure 8. Reduced query to eliminate the Key Lookup
The new execution plan has been streamlined and only uses
the ix_Customer_Name nonclustered index. Looking at the properties of the operation
providers further evidence of the improvement. The properties are shown in Figure 9.

Figure 9. Reduced query to eliminate the Key Lookup properties
The Estimated Operator Cost went down dramatically, from 12.17 in Figure 5 to 0.22 in
Figure 9. We could also look at the Logical and Physical Read characteristics by
setting STATISTICS IO on, however for this demonstration its sufficient to view the
Operator Costs for each operation.
The observed improvement is due to the fact that the nonclustered index contained all of
the required information to resolve the query. No Key Lookups were required. An index
that contains all information required to resolve the query is known as a "Covering
Index"; it completely covers the query.
Using the Clustered Key Column
Recall that if a table has a clustered index, those key columns are automatically part of
the nonclustered index. So, the following query is a covering query by default.

Figure 10. Covering index using the clustered index keys
However unless your clustered index contains the required columns, which is not the
case in our example, it will be insufficient for covering our query.
Adding Key Columns to the Index
To increase the likelihood that a nonclustered index is a covering index, it is tempting to
begin adding additional columns to the index key. For example, if we regularly query the
customer's middle name and telephone number, we could add those columns to the
ix_Customer_Name index. Or, to continue with our previous example, we could the
Email_Address column to the index as shown in the following Transact-SQL code.

CREATE NONCLUSTERED INDEX [ix_Customer_Email] ON [dbo].[Customers]
(
[Last_Name] ASC,
[First_Name] ASC,
[Email_Address] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB =
OFF,IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE =
OFF, ALLOW_ROW_LOCKS = ON,ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Before doing so, it is important to remember that indexes must be maintained by SQL
Server during data manipulation operations. Too many index hurts performance during
write operations. Additionally, the wider the index, that is to say the more bytes that
make up the index keys, the more data pages it will take to store the index.
Furthermore, there are some built in limitations for indexes. Specifically, indexes are
limited to 16 key columns or 900 bytes, whichever comes first, in both SQL Server 2005
and SQL Server 2008. And some datatypes cannot be used as index
keys, varchar(max) for instance.
Including Non-Key columns
SQL Server 2005 provided a new feature for nonclustered indexes, the ability to include
additional, non-key columns in the leaf level of the nonclustered indexes. These columns
are technically not part of the index, however they are included in the leaf node of the
index. SQL Server 2005 and SQL Server 2008 allow up to 1023 columns to be included in
the leaf node.
To create a nonclustered index with included columns, use the following Transact-SQL
syntax.

CREATE NONCLUSTERED INDEX [ix_Customer_Email] ON [dbo].[Customers]
(
[Last_Name] ASC,
[First_Name] ASC
)
INCLUDE ( [Email_Address]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE =
OFF,SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE =
OFF,ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Rerunning our query yields an execution plan that make use of our new index to rapidly
return the result set. The execution plan may be found in the Figure 11.

Figure 11. Covering query with included columns
Notice that even though our query selects columns that are not part of the nonclustered
indexs key, SQL Server is still able to resolve the query without having to use a Key
Lookup for each row. Since the ix_CustomerEmail index includes the Email_Address
column as part of its definition, the index covers the query. The properties of the Non
clustered Index Seek operator confirm our findings as depicted in the Figure 12.

Figure 12. Execution properties for the covering query with included columns
From this execution plan, we can see that the Estimate Operator Cost decreased from
12.17 in Figure 5 to 0.41. Including an additional non-key column has dramatically
improved the performance of the query.
Can u create non clustered index on primary key?
Yes.
Duplicate elimination using co-related sub query?
SELECT * FROM Employee E1
WHERE E1.ID = (SELECT MAX (ID) FROM Employee E2
WHERE E2.FirstName = E1.FirstName AND E1.LastName = E2.LastName
AND E1.Address = E2.Address)
GO
--Deleting duplicates
DELETE Employee
WHERE ID < (SELECT MAX (ID) FROM Employee E2
WHERE E2.FirstName = Employee.FirstName AND E2.LastName =Employee.LastName
AND E2.Address = Employee. Address)
GO
SELECT * FROM Employee
GO
delete Student1Details where ID< (select max(ID) from Student1Details S2
where S2.Location = Student1Details.Location)

What will be the output of count (*) over () from EMP?
It will print same count number equal to number of rows in table.
what is the difference between index rebuild and index reorganise?
Index Rebuild : This process drops the existing Index and Recreates the index. REBUILD is an
online operation in Enterprise editions, offline in other editions, and uses as much extra
working space again as the index size. It creates a new copy of the index and then drops the
old one, thus getting rid of fragmentation

USE AdventureWorks;
GO
ALTER INDEX ALL ON Production.Product REBUILD
GO

Index Reorganize : This process physically reorganizes the leaf nodes of the index.
REORGANIZE is an online operation that defragments leaf pages in a clustered or non-
clustered index page by page using little extra working space.

USE AdventureWorks;
GO
ALTER INDEX ALL ON Production.Product REORGANIZE
GO
Recommendation: Index should be rebuild when index fragmentation is great than 30%.
Index should be reorganized when index fragmentation is between 10% to 30%
When you reorganize an index, SQL Server physically reorders the leaf-level pages to match
the logical order of the leaf nodes. The process uses the existing pages only and does not
allocate new ones, but it does compact the index pages. In addition, reorganization uses
minimal resources and is automatically performed online, without blocking queries or
updates. You should reorganize indexes only if theyre lightly fragmented, otherwise, you
should rebuild them.
To reorganize an index, run an ALTER INDEX statement and include the
keyword REORGANIZE, as shown in the following example:
ALTER INDEX PK_StoreContact_CustomerID_ContactID
ON AdventureWorks.Sales.StoreContact REORGANIZE
Notice that I specify the index name and the table. After I run this statement, I then use
thesys.dm_db_index_physical_stats function to retrieve index-related statistics. The results
are shown in Table 4.
IndexName PercentFragment TotalFrags PagesPerFrag NumPages
PK_StoreContact_CustomerID_ContactID 20 2 2.5 5

What is index fragmentation and how to defragmentation of indexes?
When data is inserted into, deleted from, or updated in a SQL Server table, the indexes
defined on that table are automatically updated to reflect those changes. As the indexes are
modified, the information stored in them becomes fragmented, resulting in the information
being scattered across the data files. When this occurs, the logical ordering of the data no
longer matches the physical ordering, which can lead to a deterioration of query
performance.
To fix this problem, indexes must be periodically reorganized or rebuilt (defragmented) so
the physical order of the leaf-level pages matches the logical order of the leaf nodes. This
means that you should analyze your indexes periodically to determine whether theyve
become fragmented and the extent of that fragmentation. From there, you can either
reorganize or rebuild the affected indexes, depending on the results of your analysis.
Analyzing Fragmentation
To analyze SQL Server 2005 or 2008 indexes, you use the system
function sys.dm_db_index_physical_statsto determine which indexes are fragmented and
the extent of that fragmentation. You can use the function to analyze all the indexes in an
instance of SQL Server 2005 or 2008, all indexes in a database, all indexes defined on a
table, or a specific index. You can also analyze an index based on the partition number of
the indexed object.
The sys.dm_db_index_physical_stats function takes the following parameters (in the order
specified):
Database ID: A smallint value that represents the ID number of a database. If null
is specified, the function retrieves index-related data from all databases on a SQL
Server instance. If you specify null, you must also specify null for the object ID,
index ID, and partition number.
Object ID: An int value that represents the ID number of a table or view. If null is
specified, the function retrieves index-related data for all tables and views in a
specific database or SQL Server instance. If you specify null, you must also specify
null for the index ID and partition number.
Index ID: An int value that represents the ID number of an index. If null is
specified, the function retrieves index-related data for all indexes defined on the
specified table or view. If you specify null, you must also specify null for the
partition number. Also, if the object ID refers to a heap, use 0 as the index ID.
Partition number: An int value that represents the partition number of an index
or heap. If null is specified, the function retrieves index-related information for all
partitions related to a specific object.
Mode: The scan level used to obtain index-related information. Valid inputs
include NULL, DEFAULT, or one of the following three modes:
LIMITED: Scans the smallest number of pages, which means this is the
fastest mode. The LIMITED mode is equivalent to NULL and DEFAULT.
SAMPLED: Scans 1% of all pages. If an index contains fewer than 10,000
pages, then DETAILED mode is used.
DETAILED: Scans all index pages, which means this is the slowest mode,
but most accurate.
You must specify all five parameters, even if their values are null.
The sys.dm_db_index_physical_stats function returns a number of values that provide
details about the indexes you specify. The topic sys.dm_db_index_physical_stats in SQL
Server Books Online provides details about each of these values. However, several values
are worth noting when analyzing an index:
avg_fragmentation_in_percent: Percentage of the logical index that is
fragmented.
fragment_count: Number of fragments in the leaf level.
avg_fragment_size_in_pages: Average number of pages in a leaf-level fragment.
page_count: Number of index or data pages.
An index always has at least one fragment (fragment_count). The maximum number of
fragments that an index can have is equal to the number of pages (page_count). For
example, an index that is made up of five pages can at the most have five fragments. The
larger the fragment, the less disk I/O that is required. So a five-page index with one
fragment requires less disk I/O than the index with five fragments. Ideally,
the avg_fragmentation_in_percentvalue should be as close to zero as possible, and
the avg_fragment_size_in_pages should be as high as possible.
Based on your index analysis, you can determine what action to take. Microsoft
recommends that you reorganize your index if the avg_fragmentation_in_percent value is
less than or equal to 30% and rebuild the index if the value is greater than 30%.
(Reorganizing and rebuilding indexes are described in the following sections.)
Keep in mind that these recommendations are guidelines only. A fragmented index
(especially a low percentage) is not always enough of a reason to reorganize or rebuild your
index. If your queries do not regularly involve table scans as a result of singleton lookups,
defragmenting the index might have no effect on performance. In addition, for smaller
indexes with relatively few pages and small amounts of data, you might see little to no
improvement when you defragment the index. FILLFACTOR settings can also affect the
types of improvements you see.
That said, you should still analyze your indexes regularly, and
the sys.dm_db_index_physical_stats function is the best tool to use. So lets take a look at
an example of how to use the function to retrieve index-related statistics. In the following
SELECT statement, I retrieve index data from the AdventureWorks database:
SELECT object_id AS ObjectID,
index_id AS IndexID,
avg_fragmentation_in_percent AS PercentFragment,
fragment_count AS TotalFrags,
avg_fragment_size_in_pages AS PagesPerFrag,
page_count AS NumPages
FROM sys.dm_db_index_physical_stats(DB_ID('AdventureWorks'),
NULL, NULL, NULL , 'DETAILED')
WHERE avg_fragmentation_in_percent > 0
ORDER BY ObjectID, IndexID

Difference between local temporary tables and global temporary tables?
Table variables (DECLARE @t TABLE) are visible only to the connection that creates it, and
are deleted when the batch or stored procedure ends.
Local temporary tables (CREATE TABLE #t) are visible only to the connection that creates
it, and are deleted when the connection is closed.
Global temporary tables (CREATE TABLE ##t) are visible to everyone, and are deleted
when all connections that have referenced them have closed.
Tempdb permanent tables (USE tempdb CREATE TABLE t) are visible to everyone, and
are deleted when the server is restarted.

Local temporary tables are visible only in the current session, and global temporary
tables are visible to all sessions. Global temporary tables are automatically dropped
when the session that created the table ends and all other tasks have stopped
referencing them. The association between a task and a table is maintained only for the
life of a single Transact-SQL statement. This means that a global temporary table is
dropped at the completion of the last Transact-SQL statement that was actively
referencing the table when the creating session ended.
what are xml indexes?
XML indexes can be created on xml data type columns. They index all tags, values and paths
over the XML instances in the column and benefit query performance. Your application may
benefit from an XML index in the following situations:
Queries on XML columns are common in your workload. XML index maintenance
cost during data modification must be considered.
Your XML values are relatively large and the retrieved parts are relatively small.
Building the index avoids parsing the whole data at run time and benefits index
lookups for efficient query processing.
XML indexes fall into the following categories:
Primary XML index
Secondary XML index
The first index on the xml type column must be the primary XML index. Using the primary
XML index, the following types of secondary indexes are supported: PATH, VALUE, and
PROPERTY. Depending on the type of queries, these secondary indexes might help improve
query performance.

What is Recursive CTE?
A common table expression (CTE) provides the significant advantage of being able to
reference itself, thereby creating a recursive CTE. A recursive CTE is one in which an initial
CTE is repeatedly executed to return subsets of data until the complete result set is
obtained. However; a CTE is more powerful than a derived table as it can also be self-
referencing, or even referenced multiple times in the same query.
Building a Recursive CTE
In the following examples, you will learn how to harness the power of a recursive CTE query
by fulfilling a common business requirement, retrieving hierarchical data. By the time the
final query is complete you will be able to easily determine how many levels from the top
executive each employee is.
A recursive CTE requires four elements in order to work properly.
1. Anchor query (runs once and the results seed the Recursive query)
2. Recursive query (runs multiple times and is the criteria for the remaining results)
3. UNION ALL statement to bind the Anchor and Recursive queries together.
4. INNER JOIN statement to bind the Recursive query to the results of the CTE.

WITH MyCTE
AS ( SELECT EmpID, FirstName, LastName, ManagerID
FROM Employee
WHERE ManagerID IS NULL
UNION ALL
SELECT EmpID, FirstName, LastName, ManagerID
FROM Employee
INNERJOIN MyCTE ON Employee.ManagerID = MyCTE.EmpID
WHERE Employee.ManagerID IS NOTNULL)
SELECT *
FROM MyCTE




Identify the Anchor and Recursive Query
Anyone who does not have a boss is considered to be at the top level of the company and
everyone who does have a boss either works for the person(s) at the top level (upper
management), or the people that work for them (mid-management thru base employees).
For example, a CEO is at the top level and thus has a ManagerID of null. Likewise, everyone
below the CEO will have a ManagerID. This is demonstrated in the two queries below:

The first SELECT statement will become your Anchor query as it will find the employee that
has a ManagerID of null (representing Level 1 of the organization). The second SELECT
statement will become your Recursive query and it will find all employees that do have a
ManagerID (representing Level 2-3 of this organization).
As you can see from the results so far, these queries are unable to give hierarchical data on
which level each employee is at within the organization.
Add the Anchor and Recursive query to a CTE
Begin transforming this entire query into a CTE by placing a UNION ALL statement between
the Anchor and Recursive queries. Now add parentheses around the entire query, indenting
it, moving it down, and adding the declaration WITH EmployeeList AS before the open
parenthesis, and then add SELECT * FROM EmployeeList on the next line after the close
parenthesis.
Your query should now look like the screenshot below:

As you can see, the results from your CTE are exactly the same as the results returned from
running the anchor and Recursive queries simultaneously in the previous example.
Add an expression to track hierarchical level
The Anchor query (aliased as Boss) inside the CTE represents everyone at Level 1 (i.e. Sally
Smith). The Recursive query (aliased as Emp) represents everyone at Levels 2 and 3. In
order to visualize each level in a result set, you will need to add an expression field to each
query.
Add the expression 1 AS EmpLevel to the Anchor query and the expression 2 AS
EmpLevel to the Recursive query. Before executing the entire query, look closely at the
expression field. The EmpLevel expressions in the Anchor query will hard-code the numeral
1 (for Sally Smiths level), while the EmpLevel expressions in the Recursive query will hard-
code the numeral 2 for everyone else.
Your query should now look like the screenshot below:

The two new expression fields were a helpful step. In fact, they show the correct EmpLevel
information for Sally Smith and for the people at Level 2 (i.e., Adams, Bender, Brown,
Kennson, Lonning and Osako). However, the 2 is just a hard-coded placeholder to help
visualize your next step. Lisa Kendall and several other employees need to be at Level 3.
Ideally you would like to make the expression dynamic by replacing 2 AS EmpLevel with
the expression EmpLevel + 1.
Add a self-referencing INNER JOIN statement
Lets take a moment and recognise why this is not going to work quite so simply. The idea to
increment EmpLevel in the recursive query of the CTE is on the right track. Unfortunately,
the recursive query is trying to reference a field called EmpLevel but cant find one, since it
has only been materialized in the result set of the Anchor query and does not yet exist in the
recursive set.
How can you materialize the EmpLevel field for the recursive query? We can use the CTE for
this! Remember, a recursive CTE requires an INNER JOIN to connect the recursive query to
the CTE itself. Go ahead and write an INNER JOIN statement binding the recursive query
Emp to the CTE EmployeeList AS EL ON Emp.ManagerID = EL.EmpID.
Your query should now look like the screenshot below:


What are the isolation levels in SQL server?
SQL Server 2008 supports the following isolation levels
Read Uncommitted
Read Committed (The default)
Repeatable Read
Serializable
Snapshot

Dirty Reads:- occur when one transaction reads data written by another, uncommitted,
transaction. The danger with dirty reads is that the other transaction might never commit,
leaving the original transaction with "dirty" data.

Non-repeatable Reads:- occur when one transaction attempts to access the same data twice
and a second transaction modifies the data between the first transaction's read attempts. This
may cause the first transaction to read two different values for the same data, causing the
original read to be non-repeatable.

Phantom Reads:- occur when one transaction accesses a range of data more than once and a
second transaction inserts or deletes rows that fall within that range between the first
transaction's read attempts. This can cause "phantom" rows to appear or disappear from the
first transaction's perspective.

Read Uncommitted:-This is the lowest isolation level there is. Read uncommitted causes no
shared locks to be requested which allows you to read data that is currently being modified in
other transactions. It also allows other transactions to modify data that you are reading.

Read Committed:- It is the default Isolation set by the SQL Server for any Database. This
Isolation level prevents the transaction from reading data which are modified by some other
transactions but still are not committed yet. Thus it eliminates the problem of Dirty read. But
it do not eliminates the occurrence of Phantom reads and Repeatable reads.
Repeatable reads:- Repeatable reads condition occur when a transaction tries to read a data
multiple times and and between the two reads, another transaction modified that data.
Therefore when the original transaction tries to read that data second time, it find different
value for that data. In other words, the original transaction reads two different values for the
same data.

Serializable:- This Isolation level do not allow any transaction to read the data unless the
other transactions completed their data modification operation. Also it doesn't allow other
transactions to modify the data until the current transaction completed its read operation. This
isolation level allows a transaction to acquire a read lock (if only read operation) or write lock
(for insert,delete,update) for the entire range of records that the transaction is going to affect.

Snapshot:-In this isolation level, a transaction recognise only data which is committed before
the start of the transaction. Any modification of the data after the transaction is begin, is not
visible to any statements of the currently executing transaction. It is like a snapshot of data,
given to each transaction. It is done with the help of row version where a separate version of
each modified row is maintain in the temp db database dedicated to the transactions.
This isolation level eliminates dirty reads, lost updates, repeatable reads and Phantom reads
problem.

Isolation level can be set by using the following command:-

SET TRANSACTION ISOLATION LEVEL

SQL Server database architecture?
The major components of SQL Server are:
1. Relational Engine
2. Storage Engine
3. SQL OS
Now we will discuss and understand each one of them.
1) Relational Engine: Also called as the query processor, Relational Engine includes the
components of SQL Server that determine what your query exactly needs to do and the best
way to do it. It manages the execution of queries as it requests data from the storage engine
and processes the results returned.
Different Tasks of Relational Engine:
1. Query Processing
2. Memory Management
3. Thread and Task Management
4. Buffer Management
5. Distributed Query Processing
2) Storage Engine: Storage Engine is responsible for storage and retrieval of the data on
tothe storage system (Disk, SAN etc.). to understand more, lets focus on the concepts.
When we talk about any database in SQL server, there are 2 types of files that are created at
the disk level Data file and Log file. Data file physically stores the data in data pages. Log
files that are also known as write ahead logs, are used for storing transactions performed on
the database.
Lets understand data file and log file in more details:
Data File: Data File stores data in the form of Data Page (8KB) and these data pages are
logically organized in extents.
Extents: Extents are logical units in the database. They are a combination of 8 data pages i.e.
64 KB forms an extent. Extents can be of two types, Mixed and Uniform. Mixed extents hold
different types of pages like index, system, data etc (multiple objects). On the other hand,
Uniform extents are dedicated to only one type (object).
Pages: As we should know what type of data pages can be stored in SQL Server, below
mentioned are some of them:
Data Page: It holds the data entered by the user but not the data which is of type
text, ntext, nvarchar(max), varchar(max), varbinary(max), image and xml data.
Index: It stores the index entries.
Text/Image: It stores LOB ( Large Object data) like text, ntext, varchar(max),
nvarchar(max), varbinary(max), image and xml data.
GAM & SGAM (Global Allocation Map & Shared Global Allocation Map): They are
used for saving information related to the allocation of extents.
PFS (Page Free Space): Information related to page allocation and unused space
available on pages.
IAM (Index Allocation Map): Information pertaining to extents that are used by a
table or index per allocation unit.
BCM (Bulk Changed Map): Keeps information about the extents changed in a Bulk
Operation.
DCM (Differential Change Map): This is the information of extents that have
modified since the last BACKUP DATABASE statement as per allocation unit.

Log File: It also known as write ahead log. It stores modification to the database (DML and
DDL).
Sufficient information is logged to be able to:
Roll back transactions if requested
Recover the database in case of failure
Write Ahead Logging is used to create log entries
Transaction logs are written in chronological order in a circular way
Truncation policy for logs is based on the recovery model
3)SQL OS: This lies between the host machine (Windows OS) and SQL Server. All the
activities performed on database engine are taken care of by SQL OS. It is a highly
configurable operating system with powerful API (application programming interface),
enabling automatic locality and advanced parallelism. SQL OS provides various operating
system services, such as memory management deals with buffer pool, log buffer and
deadlock detection using the blocking and locking structure. Other services include exception
handling, hosting for external components like Common Language Runtime, CLR etc.
I guess this brief article gives you an idea about the various terminologies used related to
SQL Server Architecture. In future articles we will explore them further.



what is merge and how do you implement scd2 using merge?
https://www.simple-talk.com/sql/learn-sql-server/the-merge-statement-in-sql-server-2008/
You can use a MERGE statement to modify data in a target table based on data in a source
table. The statement joins the target to the source by using a column common to both tables,
such as a primary key. You can then insert, modify, or delete data from the target tableall
in one statementaccording to how the rows match up as a result of the join.
Implementing the WHEN MATCHED Clause
The first MERGE clause well look at is WHEN MATCHED. You should use this clause
when you want to update or delete rows in the target table that match rows in the source
table. Rows are considered matching when the joined column values are the same.
For example, if the BookID value in the BookInventory table matches the BookID value in
the BookOrder table, the rows are considered to match, regardless of the other values in the
matching rows. When rows do match, you can use the WHEN MATCHED clause to modify
data in the target table. Lets look at an example to demonstrate how this works.
In the following MERGE statement, I join the BookInventory table (the target) to the
BookOrder table (the source) and then use a WHEN MATCHED clause to update the
Quantity column in the target table:
MERGE BookInventory bi
USING BookOrder bo
ON bi.TitleID = bo.TitleID
WHEN MATCHED THEN
UPDATE
SET bi.Quantity = bi.Quantity + bo.Quantity;
second WHEN MATCHED clause to your MATCH statement, as shown in the following
example:
MERGE BookInventory bi
USING BookOrder bo
ON bi.TitleID = bo.TitleID
WHEN MATCHED AND
bi.Quantity + bo.Quantity = 0 THEN
DELETE
WHEN MATCHED THEN
UPDATE
SET bi.Quantity = bi.Quantity + bo.Quantity;
Implementing the WHEN NOT MATCHED [BY TARGET] Clause
The next clause in the MERGE statement well review is WHEN NOT MATCHED [BY
TARGET]. (The BY TARGET keywords are optional.) You should use this clause to insert
new rows into the target table. The rows you insert into the table are those rows in the source
table for which there are no matching rows in the target. For example, the BookOrder table
contains a row for Gone with the Wind. However, the BookInventory table does not contain
this book. The following example demonstrates how to include a WHEN NOT MATCHED
clause in your MERGE statement that adds Gone with the Wind to your target table:
MERGE BookInventory bi
USING BookOrder bo
ON bi.TitleID = bo.TitleID
WHEN MATCHED AND
bi.Quantity + bo.Quantity = 0 THEN
DELETE
WHEN MATCHED THEN
UPDATE
SET bi.Quantity = bi.Quantity + bo.Quantity
WHEN NOT MATCHED BY TARGET THEN
INSERT (TitleID, Title, Quantity)
VALUES (bo.TitleID, bo.Title,bo.Quantity);
Implementing the WHEN NOT MATCHED BY SOURCE Clause
As youll recall from the discussion about the WHEN MATCHED clause, you can use that
clause to delete rows from the target table. However, you can delete a row that matches a row
in the source table. But suppose you want to delete a row from the target table that does not
match a row in the source table.
For example, one of the rows originally inserted into the BookInventory table is for the
book Catch 22. The Quantity value for that book was never updated because no order was
placed for the book, that is, the book was never added to the BookOrder table. Because there
are no copies of that book in stock, you might decide to delete that book from the target table.
To delete a row that does not match a row in the source table, you must use the WHEN NOT
MATCHED BY SOURCE clause.
NOTE: Like the WHEN MATCHED clause, you can include up to two WHEN NOT
MATCHED BY SOURCE clauses in your MERGE statement. If you include two, the first
clause must include the AND keyword followed by a search condition.
The following example includes a WHEN NOT MATCHED BY SOURCE clause that
specifies that any rows with a quantity of 0 that do not match the source should be deleted:
MERGE BookInventory bi
USING BookOrder bo
ON bi.TitleID = bo.TitleID
WHEN MATCHED AND
bi.Quantity + bo.Quantity = 0 THEN
DELETE
WHEN MATCHED THEN
UPDATE
SET bi.Quantity = bi.Quantity + bo.Quantity
WHEN NOT MATCHED BY TARGET THEN
INSERT (TitleID, Title, Quantity)
VALUES (bo.TitleID, bo.Title,bo.Quantity)
WHEN NOT MATCHED BY SOURCE
AND bi.Quantity = 0 THEN
DELETE;

Implementing the OUTPUT Clause
When SQL Server 2005 was released, it included support for the OUTPUT clause in several
data modification language (DML) statements. The OUTPUT clause is also available in the
MERGE statement. The OUTPUT clause returns a copy of the data that youve inserted into
or deleted from your tables. When used with a MERGE statement, the clause provides you
with a powerful tool for capturing the modified data for archiving, messaging, or application
purposes.
NOTE: To learn more about the OUTPUT clause, see the article Implementing the
OUTPUT Clause in SQL Server 2008 (http://www.simple-talk.com/sql/learn-sql-
server/implementing-the-output-clause-in-sql-server-2008/).
In the following example, I use an OUTPUT clause to pass the outputted data to a variable
named @MergeOutput:
DECLARE @MergeOutput TABLE
(
ActionType NVARCHAR(10),
DelTitleID INT,
InsTitleID INT,
DelTitle NVARCHAR(50),
InsTitle NVARCHAR(50),
DelQuantity INT,
InsQuantity INT
);

MERGE BookInventory bi
USING BookOrder bo
ON bi.TitleID = bo.TitleID
WHEN MATCHED AND
bi.Quantity + bo.Quantity = 0 THEN
DELETE
WHEN MATCHED THEN
UPDATE
SET bi.Quantity = bi.Quantity + bo.Quantity
WHEN NOT MATCHED BY TARGET THEN
INSERT (TitleID, Title, Quantity)
VALUES (bo.TitleID, bo.Title,bo.Quantity)
WHEN NOT MATCHED BY SOURCE
AND bi.Quantity = 0 THEN
DELETE
OUTPUT
$action,
DELETED.TitleID,
INSERTED.TitleID,
DELETED.Title,
INSERTED.Title,
DELETED.Quantity,
INSERTED.Quantity
INTO @MergeOutput;

SELECT * FROM BookInventory;

SELECT * FROM @MergeOutput;
Notice that I first declare the @MergeOutput table variable. In the variable, I include a
column for the action type plus three additional sets of column. Each set corresponds to the
columns in the target table and includes a column that shows the deleted data and one that
shows the inserted data. For example, the DelTitleID and InsTitleID columns correspond to
the deleted and inserted values, respectively, in the target table.
The OUTPUT clause itself first specifies the built-in $action variable, which returns one of
three nvarchar(10) valuesINSERT, UPDATE, or DELETE. The variable is available only
to the MERGE statement. I follow the variable with a set of column prefixes (DELETED and
INSERTED) for each column in the target table. The column prefixes are followed by the
name of the column theyre related to. For example, I include DELETED.TitleID and
INSERTED.TitleID for the TitleID column in the target table. After I specify the column
prefixes, I then include an INTO subclause, which specifies that the outputted values should
be saved to the @MergeOutput variable.
What are SQL server statistics, how they are very helpful to improve the performance?
Statistics are critical metadata used by SQL Server's query optimizer, which influence the
selected execution plan for a query. The optimizer obtains its knowledge of the data, its
distribution, and the number of rows a given query is likely to return from the available
statistics. Based on this knowledge, it decides the optimal access path, making choices such
as whether to scan a table or perform an index seek, use a nested loop join or a hash join, and
so on.
If statistics are out of date, or do not exist, the optimizer can make poor choices and
execution plan quality, and consequently query performance, can suffer. SQL Server can
automatically maintain statistics, periodically refreshing them based on its tracking of data
modifications. However, for some tables, such as those subject to significant changes in
distribution, or those with skewed values, it's possible that SQL Server's automatic statistics
update will be inadequate to maintain consistently high levels of query performance.
Auto Update Statistics database option is enabled for the SQL Server instance, SQL
Server will automatically update the statistics, but only after a certain "volume threshold" of
changes to the data.
what are logical and physical joins?
The logical operators are what you ask for in the context of the query, the physical
operators are what the optimiser picks to do the join.
The six logical operators are:
Inner Join
Outer Join
Cross Join
Cross Apply (new in SQL 2005)
Semi-Join
Anti Semi-Join
Craig Freedman wrote a long article on the logical join operators Introduction to Joins
The semi-joins are the exception, in that they cannot be specified in a query.
Nonetheless, they are present in disguise. Theyre the logical operators for EXISTS, IN,
NOT EXISTS and NOT IN. Theyre used when matching is required, but not a complete
join.
The three physical operators are what the optimiser uses to evaluate the logical join.
There are various conditions that affect the physical operator the will be used for a
particular join. The three operators are:
Nested Loop Join
Merge Join
Hash join
Nested Loop
The nested loop join works by looping through all the rows of one input and for each row
looping through all the rows of the other input, looking for matches. The nested loop join
works best when one or both of the input row sets is small. Since the input that is
chosen as the second is read as many times as there are rows in the outer, this join can
get very expensive as the size of the inputs increases.
For more detail on the nested loop, see Craig Freedmans post
Merge Join
The merge join works by running through the two inputs, comparing rows and outputting
matched rows. Both inputs must be sorted on the joining columns for this join to be
possible. Since both inputs are only read once, this is an efficient join for larger row sets.
This efficiency may be offset by the sorted requirement. If the join column is not indexed
so as to retrieve the data already sorted, then an explicit sort is required.
For more detail on the merge join, see Craig Freedmans post
Hash Join
The hash join is one of the more expensive join operations, as it requires the creation of
a hash table to do the join. That said, its the join thats best for large, unsorted inputs.
It is the most memory-intensive of any of the joins
The hash join first reads one of the inputs and hashes the join column and puts the
resulting hash and the column values into a hash table built up in memory. Then it reads
all the rows in the second input, hashes those and checks the rows in the resulting hash
bucket for the joining rows.
What is the difference between index scan and index seek, what are the types of scans and seeks
optimizer will perform?
Execution plan operations scans and seeks
Another post in my ongoing series on reading execution plans. I
know Im jumping around a bit. I hope it makes some kind of sense.
I thought Id quickly go over the seek and scan operations that can
be seen in execution plans. There are 6 main ones. Theres a fair bit
that Im glossing over in this. Ill get into some details at a later
date.
Scans
Table scan. This operation only appears for a heap (table
without a clustered index). The first page in the heap is
located based on info in the system tables, and then the pages
are read one by one, using the next and, if necessary,
previous pointers in the page headers. This is generally an
expensive operation and should be avoided where ever
possible
Clustered index scan. Essentially the same operation as for
a table scan, just on a table that has clustered index. This
operation reads the leaf pages of the clustered index, using
the next and previous page pointers. Like with the table scan,
this can be an expensive operation and should, wherever
possible be avoided
Index scan. Reading all the leaf pages of a non-clustered
index using the next and previous page pointers to navigate.
Because non-clustered indexes generally have fewer pages in
the leaf than a clustered index, this operation is usually
cheaper than a clustered index scan
Note that none of the scans use the indexs b-tree structure to
locate data. Its a straight read of all of the leaf pages.
Seeks
Clustered index seek. This operation uses the clustered
indexs b-tree structure. The seek starts at the root of the tree
and navigates down the levels of the index until it reached the
leaf page(s) with the desired data. This operation also appears
when a partial scan of the table is done, when the indexs tree
is used to locate a page, and the index is scanned from that
point until another point in the table (possibly the end).
Non-clustered index seek. Much the same as the clustered
index seek, just using a non-clustered index.
Key lookup. This appeared as a bookmark lookup in SQL
2000, a clustered index seek in SQL 2005 RTM and SP1 and as
a key lookup in SQL 2005 SP2. This operation occurs when a
seek is done on a non-clustered index to locate one or more
rows, but the non-clustered index does not contain all the
columns necessary for the query. The clustered index key
(which is always included in all non-clustered indexes) is then
used to locate the row in the clustered index, to retrieve the
remaining data.
performance tuning tips(hash table,sort table etc.,)?
PERFORMANCE TUNING:-
https://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-
server,-part-2/
The SQL Server events that you would like to capture
The data columns that you would like to capture for each event
Any filters that you would like to create, in order to omit from the trace any occurrences of the
event in which you are uninterested. For example, it is common to capture only those events that
are generated by a particulate user, application or database.
In order to generate the trace definition, I must first create a trace within Profiler that defines all of
the events, columns and filters that I want. For identifying poorly performing stored procedures, the
most important events are:
RPC:Completed (in the stored procedures event category) This event is fired whenever a remote
procedure call completes. An example of a Remote Procedure Call would be a stored procedure
executed from a .NET application where the SQLCommand object's command type is
StoredProcedure.
TSQL:BatchCompleted (in the T-SQL event category). This event is fired whenever an ad-hoc SQL
batch completes executing. An example of an ad-hoc SQL batch would be a query run from
Management Studio, or a query executed from a .net application where the SQLCommand object's
command type is Text.
The most important data columns are TextData, CPU, Reads, Writes and Duration. Other columns,
such as the LoginName, ApplicationName and HostName, may be useful for identifying where the
query comes from, but they are not essential for identifying which queries are performing poorly.

Find the queries that constitute a typical workload, using SQL Profiler
Aggregate at the stored procedure level the execution characteristics, provided by the trace, in order
to find the stored procedures that are having the most impact on the system.
Use query statistics to pinpoint the queries that are causing the biggest problems, and the query
execution plans to find out which operations are the root cause.
Identify the indexes that can optimize the plans and therefore the overall query execution.
In light of this, I've devised the following plan to identify the indexes, and any other changes, that
are necessary to restore forum response times to the required level:
Capture a profiler trace for a short period while the server is active. This will identify the queries that
constitute a typical workload.
Import the trace results into a database table and analyse them to find the queries or stored
procedures that are having the most impact on the server.
Run those stored procedures in Management Studio against a test database, and examine the
execution plans and query statistics.
Use the execution plan and statistics to identify queries that need tuning and indexes that need
creating.
Implement the changes and review the effects.
A trace definition describes the trace that you would like to capture. In other words, it defines:
Tips and Guidelines
As a common practice, every table should have a clustered index. Generally, but not always, the
clustered index should be on a column that monotonically increases, such as an identity column
or some other column where the value is unique. In many cases, the primary key is the ideal
column for a clustered index.
Indexes should be measured on all columns that are frequently used in WHERE, ORDER
BY, GROUP BY, TOP andDISTINCT clauses.
Do not automatically add indexes on a table because it seems like the right thing to do. Only
add indexes if you know that they will be used by the queries run against the table.
For historical (static) tables, create the indexes with a FILLFACTOR and a PAD_INDEX of 100 to
ensure there is no wasted space. This reduces disk I/O, helping to boost overall performance.
Queries that return a single row are just as fast using a non-clustered index as a clustered index.
Queries that return a range of rows are just as fast using a clustered index as a non-clustered
index.
Do not add more indexes on your OLTP tables to minimize the overhead that occurs with
indexes during data modifications.
Do not add the same index more than once on a table with different names.
Drop all those indexes that are not used by the Query Optimizer, generally. You probably won't
want to add an index to a table under the following conditions:
If the index is not used by the query optimizer. Use the Query Analyzer's "Show Execution Plan"
option to see if your queries against a particular table use an index or not.
If the table is small, most likely indexes will not be used.
If the column(s) to be indexed are very wide.
If the column(s) are defined as TEXT, NTEXT or IMAGE data types.
If the table is rarely queried but insertion, updating is frequent.
To provide up-to-date statistics, the query optimizer needs to make smart query optimization
decisions. You will generally want to leave the "Auto Update Statistics" database option on. This
helps to ensure that the optimizer statistics are valid, ensuring that queries are properly
optimized when they are run.
Keep the "width" of your indexes as narrow as possible. This reduces the size of the index and
reduces the number of disk I/O reads required to read the index.
If possible, try to create indexes on columns that have integer values instead of characters.
Integer values use less overhead than character values.
If you have two or more tables that are frequently joined together, then the columns used for
the joins should have an appropriate index. If the columns used for the joins are not naturally
compact, then consider adding surrogate keys to the tables that are compact in order to reduce
the size of the keys. This will decrease I/O during the join process, which increases overall
performance.
When creating indexes, try to make them unique indexes if at all possible. SQL Server can often
search through a unique index faster than a non-unique index. This is because, in a unique index,
each row is unique and once the needed record is found, SQL Server doesn't have to look any
further.
If a particular query against a table is run infrequently and the addition of an index greatly
speeds the performance of the query, but the performance
of INSERTS, UPDATES and DELETES is negatively affected by the addition of the index, consider
creating the index for the table for the duration of when the query is run and then dropping the
index. An example of this is when monthly reports are run at the end of the month on an OLTP
application.
Avoid using FLOAT or REAL data types as primary keys, as they add unnecessary overhead that
can hurt performance.
If you want to boost the performance of a query that includes an AND operator in
the WHERE clause, consider the following:
Of the search criteria in the WHERE clause, at least one of them should be based on a highly
selective column that has an index.
If at least one of the search criteria in the WHERE clause is not highly selective, consider adding
indexes to all of the columns referenced in the WHERE clause.
If none of the columns in the WHERE clause are selective enough to use an index on their own,
consider creating a covering index for this query.
The Query Optimizer will always perform a table scan or a clustered index scan on a table if
the WHERE clause in the query contains an OR operator and if any of the referenced columns in
the OR clause are not indexed (or do not have a useful index). Because of this, if you use many
queries with OR clauses, you will want to ensure that each referenced column in the WHERE clause
has an index.
If you have a query that uses ORs and it is not making the best use of indexes, consider
rewriting it as a UNION and then testing performance. Only through testing can you be sure that
one version of your query will be faster than another.
If you use the SOUNDEX function against a table column in a WHERE clause, the Query Optimizer
will ignore any available indexes and perform a table scan.
Queries that include either the DISTINCT or the GROUP BY clauses can be optimized by
including appropriate indexes. Any of the following indexing strategies can be used:
Include a covering, non-clustered index (covering the appropriate columns) of the DISTINCT or
the GROUP BYclauses.
Include a clustered index on the columns in the GROUP BY clause.
Include a clustered index on the columns found in the SELECT clause.
Adding appropriate indexes to queries that include DISTINCT or GROUP BY is most important for
those queries that run often.
Avoid clustered indexes on columns that are already "covered" by non-clustered indexes. A
clustered index on a column that is already "covered" is redundant. Use the clustered index for
columns that can better make use of it.
Ideally a clustered index should be based on a single column (not multiple columns) that are as
narrow as possible. This not only reduces the clustered index's physical size, it also reduces the
physical size of non-clustered indexes and boosts SQL Server's overall performance.
When you create a clustered index, try to create it as a unique clustered index, not a non-
unique clustered index.
SET NOCOUNT ON at the beginning of each stored procedure you write. This statement should
be included in every stored procedure, trigger, etc. that you write.
Keep Transact-SQL transactions as short as possible within a stored procedure. This helps to
reduce the number of locks, helping to speed up the overall performance of your SQL Server
application.
If you are creating a stored procedure to run in a database other than the Master database,
don't use the prefix sp_in its name. This special prefix is reserved for system stored procedures.
Although using this prefix will not prevent a user defined stored procedure from working, what it
can do is to slow down its execution ever so slightly.
Before you are done with your stored procedure code, review it for any unused code,
parameters or variables that you may have forgotten to remove while you were making changes
and remove them. Unused code just adds unnecessary bloat to your stored procedures, although
it will not necessarily negatively affect performance of the stored procedure.
For best performance, all objects that are called within the same stored procedure should be
owned by the same object owner or schema, preferably dbo, and should also be referred to in
the format ofobject_owner.object_name or schema_owner.object_name.
When you need to execute a string of Transact-SQL, you should use
the sp_executesql stored procedure instead of the EXECUTE statement.
If you use input parameters in your stored procedures, you should validate all of them at the
beginning of your stored procedure. This way, if there is a validation problem and the client
application needs to be notified of the problem, it happens before any stored procedure
processing takes place, preventing wasted effort and boosting performance.
When calling a stored procedure from your application, it is important that you call it using its
qualified name, for example:
Collapse | Copy Code
exec dbo.myProc
...instead of:
Collapse | Copy Code
exec myProc
If you think a stored procedure will return only a single value and not a record set, consider
returning the single value as an output parameter.
Use stored procedures instead of views. They offer better performance.
Don't include code, variable or parameters that don't do anything.
Don't be afraid to make broad-minded use of in-line and block comments in your Transact-SQL
code. They will not affect the performance of your application and they will enhance your
productivity when you have to come back to the code and try to modify it.
If possible, avoid using SQL Server cursors. They generally use a lot of SQL Server resources and
reduce the performance and scalability of your applications.
If you have the choice of using a join or a sub-query to perform the same task within a query,
generally the join is faster. This is not always the case, however, and you may want to test the
query using both methods to determine which is faster for your particular application.
If your application requires you to create temporary tables for use on a global or per
connection use, consider the possibility of creating indexes for these temporary tables. While
most temporary tables probably won't need -- or even use -- an index, some larger temporary
tables can benefit from them. A properly designed index on a temporary table can be as great a
benefit as a properly designed index on a standard database table.
Instead of using temporary tables, consider using a derived table instead. A derived table is the
result of using aSELECT statement in the FROM clause of an existing SELECT statement. By using
derived tables instead of temporary tables, you can reduce I/O and often boost your application's
performance.
For better performance, if you need a temporary table in your Transact-SQL code, consider
using a table variable instead of creating a conventional temporary table.
Don't repeatedly reuse the same function to calculate the same result over and over within your
Transact-SQL code.
If you use BULK INSERT to import data into SQL Server, then use the TABLOCK hint along with
it. This will prevent SQL Server from running out of locks during very large imports and will also
boost performance due to the reduction of lock contention.
Always specify the narrowest columns you can. The narrower the column, the less amount of
data SQL Server has to store and the faster SQL Server is able to read and write data. In addition,
if any sorts need to be performed on the column, the narrower the column, the faster the sort will
be.
If you need to store large strings of data and they are less than 8000 characters, use
a VARCHAR data type instead of a TEXT data type. TEXT data types have extra overhead that drag
down performance.
Don't use the NVARCHAR or NCHAR data types unless you need to store 16-bit character
(Unicode) data. They take up twice as much space as VARCHAR or CHAR data types, increasing
server I/O and wasting unnecessary space in your buffer cache.
If the text data in a column varies greatly in length, use a VARCHAR data type instead of
a CHAR data type. The amount of space saved by using VARCHAR over CHAR on variable length
columns can greatly reduce the I/O reads that the cache memory uses to hold data, improving
overall SQL Server performance.
If a column's data does not vary widely in length, consider using a fixed-length CHAR field
instead of a VARCHAR. While it may take up a little more space to store the data, processing
fixed-length columns is faster in SQL Server than processing variable-length columns.
If you have a column that is designed to hold only numbers, use a numeric data type such
as INTEGER instead of aVARCHAR or CHAR data type. Numeric data types generally require less
space to hold the same numeric value than does a character data type. This helps to reduce the
size of the columns and can boost performance when the columns are searched (WHERE clause),
joined to another column or sorted.
If you use the CONVERT function to convert a value to a variable length data type such
as VARCHAR, always specify the length of the variable data type. If you do not, SQL Server
assumes a default length of 30. Ideally, you should specify the shortest length to accomplish the
required task. This helps to reduce memory use and SQL Server resources.
Avoid using the new BIGINT data type unless you really need its additional storage capacity.
The BIGINT data type uses 8 bytes of memory, versus 4 bytes for the INT data type.
Don't use the DATETIME data type as a primary key. From a performance perspective, it is more
efficient to use a data type that uses less space. For example, the DATETIME data type uses 8
bytes of space, while the INT data type only takes up 4 bytes. The less space used, the smaller
the table and index, and the less I/O overhead that is required to access the primary key.
If you are creating a column that you know will be subject to many sorts, consider making the
column integer-based and not character-based. This is because SQL Server can sort integer data
much faster than character data.
Carefully evaluate whether your SELECT query needs the DISTINCT clause or not. Some
developers automatically add this clause to every one of their SELECT statements, even when it is
not necessary. This is a bad habit that should be stopped.
When you need to use SELECT INTO option, keep in mind that it can lock system tables,
preventing other users from accessing the data they need while the data is being inserted. In
order to prevent or minimize the problems caused by locked tables, try to schedule the use
of SELECT INTO when your SQL Server is less busy. In addition, try to keep the amount of data
inserted to a minimum. In some cases, it may be better to perform several smallerSELECT INTOs
instead of performing one large SELECT INTO.
If you need to verify the existence of a record in a table, don't use SELECT COUNT (*) in your
Transact-SQL code to identify it. This is very inefficient and wastes server resources. Instead, use
the Transact-SQL IF EXISTS to determine if the record in question exists, which is much more
efficient.
By default, some developers -- especially those who have not worked with SQL Server before --
routinely include code similar to this in their WHERE clauses when they make string comparisons:
Collapse | Copy Code
SELECT column_name FROM table_name
WHERE LOWER (column_name) = 'name'
In other words, these developers are making the assumption that the data in SQL Server is case-
sensitive, which it generally is not. If your SQL Server database is not configured to be case
sensitive, you don't need to use LOWER orUPPER to force the case of text to be equal for a
comparison to be performed. Just leave these functions out of your code. This will speed up the
performance of your query, as any use of text functions in a WHERE clause hurts performance.
However, what if your database has been configured to be case-sensitive? Should you then use
the LOWER and UPPERfunctions to ensure that comparisons are properly compared? No. The
above example is still poor coding. If you have to deal with ensuring case is consistent for proper
comparisons, use the technique described below, along with appropriate indexes on the column
in question:
Collapse | Copy Code
SELECT column_name FROM table_name
WHERE column_name = 'NAME' or column_name = 'name'
This code will run much faster than the first example.
If you currently have a query that uses NOT IN, which offers poor performance because the SQL
Server optimizer has to use a nested table scan to perform this activity, instead try to use one of
the following options, all of which offer better performance:
Use EXISTS or NOT EXISTS
Use IN
Perform a LEFT OUTER JOIN and check for a NULL condition
When you have a choice of using the IN or the EXISTS clause in your Transact-SQL, you will
generally want to use the EXISTS clause, as it is usually more efficient and performs faster.
If you find that SQL Server uses a TABLE SCAN instead of an INDEX SEEK when you use
an IN/OR clause as part of your WHERE clause, even when those columns are covered by an index,
consider using an index hint to force the Query Optimizer to use the index.
If you use LIKE in your WHERE clause, try to use one or more leading characters in the clause, if
possible. For example, use:
Collapse | Copy Code
LIKE 'm%' instead of LIKE %m
If your application needs to retrieve summary data often, but you don't want to have the
overhead of calculating it on the fly every time it is needed, consider using a trigger that updates
summary values after each transaction into a summary table.
When you have a choice of using the IN or the BETWEEN clauses in your Transact-SQL, you will
generally want to use the BETWEEN clause, as it is much more efficient. For example...
Collapse | Copy Code
SELECT task_id, task_name
FROM tasks
WHERE task_id in (1000, 1001, 1002, 1003, 1004)
...is much less efficient than this:
Collapse | Copy Code
SELECT task_id, task_name
FROM tasks
WHERE task_id BETWEEN 1000 and 1004
If possible, try to avoid using the SUBSTRING function in your WHERE clauses. Depending on
how it is constructed, using the SUBSTRING function can force a table scan instead of allowing
the optimizer to use an index (assuming there is one). If the substring you are searching for does
not include the first character of the column you are searching for, then a table scan is
performed.
If possible, you should avoid using the SUBSTRING function and use the LIKE condition instead
for better performance. Instead of doing this:
Collapse | Copy Code
WHERE SUBSTRING(task_name,1,1) = 'b'
Try using this instead:
Collapse | Copy Code
WHERE task_name LIKE 'b%'
Avoid using optimizer hints in your WHERE clauses. This is because it is generally very hard to
out-guess the Query Optimizer. Optimizer hints are special keywords that you include with your
query to force how the Query Optimizer runs. If you decide to include a hint in a query, this
forces the Query Optimizer to become static, preventing the Query Optimizer from dynamically
adapting to the current environment for the given query. More often than not, this hurts -- not
helps -- performance.
If you have a WHERE clause that includes expressions connected by two or more AND operators,
SQL Server will evaluate them from left to right in the order they are written. This assumes that
no parentheses have been used to change the order of execution. Because of this, you may want
to consider one of the following when using AND:
Locate the least likely true AND expression first.
If both parts of an AND expression are equally likely of being false, put the least
complex AND expression first.
You may want to consider using Query Analyzer or Management Studio to look at the execution
plans of your queries to see which is best for your situation
Don't use ORDER BY in your SELECT statements unless you really need to, as it adds a lot of
extra overhead. For example, perhaps it may be more efficient to sort the data at the client than
at the server.
Whenever SQL Server has to perform a sorting operation, additional resources have to be used
to perform this task. Sorting often occurs when any of the following Transact-SQL statements are
executed:
ORDER BY
GROUP BY
SELECT DISTINCT
UNION
If you have to sort by a particular column often, consider making that column a clustered index.
This is because the data is already presorted for you and SQL Server is smart enough not to
resort the data.
If your WHERE clause includes an IN operator along with a list of values to be tested in the
query, order the list of values so that the most frequently found ones are placed at the start of
the list and the less frequently found ones are placed at the end of the list. This can speed up
performance because the IN option returns true as soon as any of the values in the list produce a
match. The sooner the match is made, the faster the query completes.
If your application performs many wildcard (LIKE %) text searches
on CHAR or VARCHAR columns, consider using SQL Server's full-text search option. The Search
Service can significantly speed up wildcard searches of text stored in a database.
The GROUP BY clause can be used with or without an aggregate function. However, if you want
optimum performance, don't use the GROUP BY clause without an aggregate function. This is
because you can accomplish the same end result by using the DISTINCT option instead, and it is
faster. For example, you could write your query two different ways:
Collapse | Copy Code
SELECT task_id
FROM tasks
WHERE task_id BETWEEN 10 AND 20
GROUP BY OrderID
...or:
Collapse | Copy Code
SELECT DISTINCT task_id
FROM tasks
WHERE task_id BETWEEN 10 AND 20
It is important to design applications that keep transactions as short as possible. This reduces
locking and increases application concurrently, which helps to boost performance.
In order to reduce network traffic between the client or middle-tier and SQL Server -- and also
to boost your SQL Server-based application's performance -- only the data needed by the client
or middle-tier should be returned by SQL Server. In other words, don't return more data (both
rows and columns) from SQL Server than you need to the client or middle-tier and then further
reduce the data to the data you really need at the client or middle-tier. This wastes SQL Server
resources and network bandwidth.

You might also like