SQL Server Clustered Index Design For Performance

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 17

SQL Server clustered index design for performance

Clustered indexes in SQL Server are a critical consideration in the overall architecture of the database. They are often overlooked, misunderstood or, if the database is small, considered unimportant. This article points out the importance of clustered indexes for overall system performance and maintenance as your database grows. I will briefly cover how SQL Server clustered indexes are stored on disk, why they should always increase over time and why it is best that clustered indexes be static. I'll also touch on many-to-many tables, why they are used and how clustered indexes make these tables more efficient. Finally, it is absolutely critical that we touch on the new SQL Server 2005 partitioned table concept and examine how partitioned tables affect clustered indexes. This will help you make the right decisions from the very start. Clustered indexes are created by default to match the primary key, which is defined on tables in SQL Server. However, you can create a clustered index on any column and then define a primary key on a separate column or columns. At this point, the primary key would be created as a unique non-clustered index. Typically, a clustered index will match the primary key, but not necessarily, so be careful. Given the variety of situations that can arise, I'll be discussing the clustered indexes themselves, and for now ignore whether you choose to make them primary keys. Clustered indexes actually hold the row data for SQL Server, so wherever your clustered indexes are stored is also where your data is stored. The clustered indexes are organized into ranges of data. For example, values 1 to 10 may be stored in one range and 90 to 110 in another range. Since clustered indexes are stored as ranges, if you need to do a search on a range for an audit log, it would be more efficient for the clustered index to be based on the date column that would be used to return the date ranges. Non-clustered indexes work better for specific value searches, e.g. "date = DateValue," rather than range searches, e.g. "date between date1 and date2." Ever-increasing values for clustered indexes Clustered indexes should be based on columns whose values constantly increase over time. In my prior example on using the date column from an audit log, the date values for an audit log would be constantly increasing and older dates would not be inserted into the table. This would be an "ever-increasing" column. Another good example of an everincreasing value is an identity column, since, by design, it constantly increases. Why am I spending so much time discussing ever-increasing values for clustered indexes? The most important attributes of clustered indexes is that they are ever-

increasing and static in nature. The reason ever-increasing is so important has to do with the range architecture I outlined earlier. If the values are not ever-increasing, then SQL Server has to allocate space within existing ranges for those records rather than placing them in new ranges at the end of the index. If the values are not ever-increasing, then once the ranges fill up and a value comes in that fits within a filled up index range, SQL Server will make room in an index by doing a page split. Internally, SQL Server takes the filled up page and splits it into two separate pages that have substantially more room at that point but take significantly more resources to process. You can prepare for this eventuality by setting a fill factor of 70% or so, which gives you 30% free space for incoming values. The problem with this approach is that you continually have to "reindex" the clustered index so it maintains a free space percentage of 30%. Reindexing the clustered index will also cause heavy I/O load since it has to move the actual data itself and any non-clustered indexes have to be rebuilt, adding greatly to maintenance time. If the clustered index is ever-increasing, you will not have to rebuild the clustered index; you can set a 100% fill factor on the clustered index, and at that point you will only need to reindex the less-intensive, non-clustered indexes as time progresses, resulting in more up time. Ever-increasing values will only add entries to the end of the index and build new ranges when necessary. Logical fragmentation will not exist since the new values are continually added to the end of the index and the fill factor will be 100%. The higher the fill factor, the more rows are stored on each page. Higher fill factors require less I/O, RAM and CPU for queries. The smaller the data types you pick for the clustered index, the faster the joins/queries will be. Also, since each non-clustered index requires it to contain the clustered index key, then the smaller the clustered index key and the smaller the nonclustered indexes will be. The best data types for clustered indexes are generally pretty narrow. Referring to data type size, it's typically a smallint, int, bigint or datetime. When datetime values are used as the clustering index, they are the only column and are normally ever-increasing date values that are often queried as range data. Generally, you should avoid compound (multiple columns) clustered indexes except in the following situations: many-to-many tables and SQL Server 2005 partitioned tables that have the partitioning column included as part of the clustered index to allow for index alignment. Many-to-many tables and clustered indexes Many-to-many tables are used for their extremely fast join capabilities and their ability to allow for quick re-association of records, from one owning record to another. Consider the following structure: Customer

CustomerID (bigint identity) Name Fieldn+ CustomerOrder

CustomerID OrderID Orders

OrderID (bigint identity) Date Fieldn+ The clustered indexes in this structure would be CustomerID, OrderID. The compound would be CustomerID/OrderID. Here are the benefits with this structure:

The joins are all based on clustered indexes (much faster than joins to nonclustered indexes). Moving an order to another customer only involves an update to the CustomerOrder table, which is very narrow, with only one clustered index. Therefore, it reduces the blocking that would occur if you had to update a wider table such as Orders. Use of a many-to-many table eliminates the need for some non-clustered indexes on the wider tables such as Customer/Orders. Hence, it reduces the maintenance time on the large tables.

One negative result of this approach is the fragmentation that occurs on the CustomerOrder table. However, that should not be a big issue, since the table is relatively narrow, has only two columns with narrow data types and only one clustered index. The elimination of the non-clustered indexes, which would be needed on the Orders table if it contained CustomerID, more than makes up for this cost. Clustered indexes and partitioned tables in SQL Server 2005 Partitioned tables in SQL Server 2005 are tables that appear to be a single table on the surface, but behind the scenes -- at the storage subsystem level -- they are actually multiple partitions that can be spread across many filegroups. The table partitions are spread across various filegroups based on the values in a single column. Partitioning tables in this manner causes several side effects. I will just cover the basics here, to give you some understanding of the factors involved. I recommend that you study partitioned tables before attempting to implement them. You can create a clustered index in this environment based on only one column.

But, if that one column is not the column the table is partitioned on, then the clustered index is said to be non-aligned. If a clustered index is non-aligned, then any snapping in/out (or merging) of partitions will require you to drop the clustered index along with the non-clustered indexes and rebuild them from scratch. This is necessary because SQL Server cannot tell what portions of the clustered/non-clustered indexes belong to which table partitions. Needless to say, this will certainly cause system downtime. The clustered index on a partitioned table should always contain the regular clustering column, which is ever-increasing and static, as well as the column that is used for partitioning the table. If the clustered index includes the column used for partitioning the table, then SQL Server knows what portion of the clustered/non-clustered indexes belong to which partition. Once a clustered index contains the column that the table is partitioned on, then the clustered index is "aligned." Partitions can then be snapped in/out (and merged) without rebuilding the clustered/non-clustered indexes, causing no downtime for the system. Inserts/updates/deletes will also work faster, because those operations only have to consider the indexes that reside on their particular partition. Summary SQL Server clustered indexes are an important part of database architecture and I hope you've learned enough from this article to know why you need to carefully plan for them from the very start. It is vital for the future health of your database that clustered indexes be narrow, static and ever-increasing. Clustered indexes can help you achieve faster join times and faster IUD operations and minimize blocking as the system becomes busy. Finally, we covered how partitioned tables in SQL Server 2005 affect your choices for the clustered index, what it means to "align" the clustered index with the partitions, and why clustered indexes have to be aligned in order for the partitioned table concept to work as intended. Keep watching for tips on non-clustered indexes (part two) coming in February and optimal index maintenance (part three) in March.

Designing SQL Server non-clustered indexes for query optimization


Non-clustered indexes are bookmarks that allow SQL Server to find shortcuts to the data you're searching for. Non-clustered indexes are important because they allow you to focus queries on a specific subset of the data instead of scanning the entire table. We'll address this critical topic by first hitting the basics, such as how clustered indexes interact with non-clustered indexes, how to pick fields, when to use compound indexes and how statistics influence non-clustered indexes. The basics of non-clustered indexes in SQL Server

A non-clustered index consists of the chosen fields and the clustered index value. If the clustered index is not defined as unique, then SQL Server will use a clustered index value plus a uniqueness value. Always define your clustered indexes as unique -- if they are in fact unique -- because it will result in a smaller clustered/non-clustered index size. If your unique clustered index consists of an int and you create a non-clustered index on a year column (defined as smallint), then your non-clustered index will contain an int and smallint for every row in the table. The size would increase according to the data types chosen. So the smaller the clustered/non-clustered index data types are, the smaller the resulting index size will be, and the maintenance capacity will increase. Choosing fields for non-clustered indexes The first rule is to never include the clustered index key fields in the non-clustered index. The field is already part of the clustered index, so it will always be used for queries. The only time it makes sense to include any clustered index key in a non-clustered index is when the clustered index is a compound index and the query is referencing the second, third or higher field in the compound index. Assume you have the following table:

ID (identity, clustered unique) DateFrom DateTo Amt DateInserted Description Now assume you always run queries such as: Example 1:
Select * From tbl [t] where t.datefrom = '12/12/2006' and t.DateTo = '12/31/2006' and t.DateInserted = '12/01/2006'

At this point it makes sense to have a non-clustered index defined on DateFrom, DateTo and DateInserted, since that will always give the best unique results. Now assume you run multiple queries such as: Example 2:
Select * From tbl [t] where t.datefrom = '12/12/2006' and t.DateInserted = '12/01/2006' Select * From tbl [t] where t.datefrom = '12/12/2006'

Select * From tbl [t] where t.DateTo = '12/31/2006' Select * From tbl [t] where t.DateInserted = '12/01/2006' Select * From tbl [t] where t.DateTo = '12/31/2006' and t.DateInserted = '12/01/2006' Select * From tbl [t] where t.id = 5 and t.DateTo = '12/31/2006' and t.DateInserted = '12/01/2006'

Many people, at this point, would be tempted to create the following non-clustered indexes: 1. 2. 3. 4. 5. 6. DateFrom DateTo DateInserted DateTo and DateInserted DateFrom and DateInserted ID, DateTo and DateInserted

You probably expect the index size to increase dramatically at this point, since you are storing DateFrom in two separate locations, DateTo in three locations and DateInserted in four locations. On top of this, you've stored the clustered index key in seven locations. This approach increases I/O for insert, update and delete operations (also known as IUD operations). Updates to the records must be written first to the clustered index data row. Then, the non-clustered indexes will have to be updated so they can be written to. You should routinely ask yourself these questions: Is the cost of additional I/O for IUD operations and maintenance worth the improved query time? Will the additional I/O and increased maintenance time outweigh any performance boost I get on the queries? What will give me the most unique results with the least overhead as possible? In this case, the best solution would be three non-clustered indexes as follows: 1. DateFrom 2. DateTo 3. DateInserted

Each field in this scenario is only stored once, except for the primary key which is stored on all three non-clustered indexes. As a result, the index size is much smaller and will require less I/O and less maintenance. SQL Server will query each of the non-clustered indexes, depending on the criteria chosen, and then hash the results together. While this is not as efficient as Example 1, it is much more efficient than defining the five separate non-clustered indexes. Real world queries will more often match Example 2 rather than being structured as Example 1. SQL Server statistics Statistics tell SQL Server how many rows most likely match a given value. It gives SQL Server an idea of how "unique" a value is, information it then uses to determine whether to use an index. By default, SQL Server automatically updates statistics whenever it thinks approximately 20% of the records have changed. In SQL Server 2000, this is done synchronously with the IUD operation, delaying the completion of the IUD operation while the rows are sampled. In SQL Server 2005, you can have it sample either synchronously with the IUD operation or asynchronously after the IUD operation is done. The latter approach is better and will cause less blocking because locks will be released sooner. I recommend turning off the database setting "Auto Update Statistics." This setting will increase your server loads at the worst times. Instead of letting SQL Server automatically keep statistics up to date, create a job that calls the command "update statistics" and runs during your slowest time. You can pick your own sampling ratio depending on how accurate you want the statistics to be. Statistics are only kept on the first column in any non-clustered index. What does this mean in compound non-clustered indexes? It means SQL Server will use the first field to determine whether an index should be used. Even if the second field in the compound index will match 50% of the rows, the field still needs to be used to return the results (see Example 3). Now, if the non-clustered index were split into two non-clustered indexes, SQL Server might choose to use index 1, but not index 2. This is because the statistics on index 2 may show that it will not benefit the query (see Example 4). Example 3 Assume you have a compound, non-clustered index defined on DateFrom and Amt. Statistics would only be kept on the DateFrom field within the index, and SQL Server would have to seek (or scan) across both DateFrom and Amt. Since SQL Server has to traverse more data, the query will be slower. Example 4 Assume you have two non-clustered indexes: The first is defined on DateFrom and the second is defined on Amt.

Statistics would be kept on both fields because they are separate indexes. SQL Server will examine the statistics on DateFrom and decide to use that index. It will then examine the Amt column and may decide -- based on the statistics -- that the index is not unique enough and should be ignored. At this point, SQL Server would only need to traverse the DateFrom field, rather than both DateFrom and Amt, resulting in a faster query. By using non-clustered indexes in SQL Server, you'll be able to focus queries on a data subset. Use the guidelines described in this tip to determine if it's best to create multiple non-clustered indexes or a compound non-clustered index. Also keep in mind the role of statistics and how they impact non-clustered indexes: Statistics affect the choice between using multiple non-clustered indexes and a compound non-clustered index in SQL Server.

How to maintain SQL Server indexes for query optimization


Maintaining SQL Server indexes is an uncommon practice. If a query stops using indexes, oftentimes a new non-clustered index is created that simply holds a different combination of columns or the same columns. A detailed analysis on why SQL Server is ignoring those indexes is not explored. Let's take a look at how clustered and non-clustered indexes are selected and why query optimizer might choose a table scan instead of a non-clustered index. In this tip, you'll learn how page splits, fragmented indexes, table partitions and statistics updates affect the use of indexes. Ultimately, you'll find out how to maintain SQL Server indexes so that query optimizer uses these indexes, and so these indexes are searched quickly. Index selection Clustered indexes are by far the easiest to understand in the area of index selection. Clustered indexes are basically keys that reference each row uniquely. Even if you define a clustered index and do not declare it as unique, SQL Server still makes the clustered index unique behind the scenes by adding a 4-byte "uniqueifier" to it. The additional "uniqueifier" increases the width of the clustered index, which causes increased maintenance time and slower searches. Since clustered indexes are the key that identifies each row, they are used in every query. When we start talking about non-clustered indexes, things get confusing. Queries can ignore non-clustered indexes for the following reasons: 1. High fragmentation If an index is fragmented over 40%, the optimizer will probably ignore the index because it's more costly to search a fragmented index than to perform a table scan.

2. Uniqueness If the optimizer determines that a non-clustered index is not very unique, it may decide that a table scan is faster than trying to use the nonclustered index. For example: If a query references a bit column (where bit = 1) and the statistics on the column say that 75% of the rows are 1, then the optimizer will probably decide a table scan will get the results faster versus trying to scan over a non-clustered index. 3. Outdated statistics If the statistics on a column are out of date, then SQL Server can misguide the benefit of a non-clustered index. Automatically updating statistics doesn't just slow down your data modification scripts, but over time it also becomes out of sync with the real statistics of the rows. Occasionally it's a good idea to run sp_updatestats or UPDATE STATISTICS. 4. Function usage SQL Server is unable to use indexes if a function is present in the criteria. If you're referencing a non-clustered index column, but you're using a function such as convert(varchar, Col1_Year) = 2004, then SQL Server cannot use the index on Col1_Year. 5. Wrong columns If a non-clustered index is defined on (col1, col2, col3) and your query has a where clause, such as "where col2 = 'somevalue'", that index won't be used. A non-clustered index can only be used if the first column in the index is referenced within the where clause. A where clause, such as "where col3 = 'someval'", would not use the index, but a where clause, like "where col1 = 'someval'" or "where col1='someval and col3 = 'someval2'" would pick up the index. The index would not use col3 for its seek, since that column is not after col1 in the index definition. If you wanted col3 to have a seek occur in situations such as this, then it is best if you define two separate non-clustered indexes, one on col1 and the other on col3. Page splits To store data, SQL Server uses pages that are 8 kb data blocks. The amount of data filling the pages is called the fill factor, and the higher the fill factor, the more full the 8 kb page is. A higher fill factor means fewer pages will be required resulting in less IO/CPU/RAM usage. At this point, you might want to set all your indexes to 100% fill factor; however, here is the gotcha: Once the pages fill up and a value comes in that fits within a filled-up index range, then SQL Server will make room in an index by doing a "page split." In essence, SQL Server takes the full page and splits it into two separate pages, which have substantially more room at that point. You can account for this issue by setting a fillfactor of 70% or so. This allows 30% free space for incoming values. The problem with this approach is that you continually have to "re-index" the index so that it maintains a free space percentage of 30%. Clustered index maintenance Clustered indexes that are static or "ever-increasing" should have a fill factor of 100%. Since the values are always increasing, pages will just be added to the end of the index

and virtually no fragmentation will occur. For a more detailed explanation, see part 1 of this series, SQL Server clustered index design for performance. This index category does not need to be re-indexed because it doesn't fragment. Clustered indexes that are either not static or "ever-increasing" will experience fragmentation and page splits as the data rows move around within the data pages. The indexes in this category have to be re-indexed in order to keep fragmentation low and allow queries to efficiently use the index. When you re-index these clustered indexes, you have to decide what the fill factor should be. Normally this is 70% to 80%, giving you 20% to 30% empty space for new records coming into the page. The optimal settings for your environment will depend on how often records shift around, how many records are inserted and how often re-indexing occurs. The goal is to set a fill factor low enough so that by the time you reach your next maintenance cycle, the pages are around 95% full, but not yet splitting, which happens when they hit the 100% limit. Non-clustered index maintenance Non-clustered indexes will always have data shifting around the pages. It's not quite as big of an issue like it is with clustered indexes -- the actual row data shifts with clustered indexes, whereas only row pointers shift with non-clustered indexes. That said, the same rules apply to non-clustered indexes as far as fill factors go. Again, the goal is to set a fill factor low enough so that by the time you reach your next maintenance cycle, the pages are only around 95% full. Non-clustered indexes will always fragment, and to avoid this you must constantly monitor and maintain them. Partitioned table index considerations Partitioned tables allow data to be segregated into different partitions, depending on the data in a column. Many tables are partitioned based on date ranges. Let's say your order table is partitioned into years. Assuming the clustered index is aligned (see part 1 of this series), then you could re-index the non-clustered indexes for, say, year 2000 at 100% fill factor, since that data, technically, won't be shifting around. In this scenario, the year 2008 partition may have a fill factor of 70% on non-clustered indexes to allow for data shifts, but the year 2000 will not have any shifts and can be re-indexed at 100% fill factor so you optimize index seeks. The same concept would apply to clustered indexes that are either not static or everincreasing. Clustered indexes with shifting data might be set to 70% fill factor for the year 2008 partition and 100% fill factor for the year 2000. SQL Server statistics

Statistics are maintained on columns and indexes and they help SQL Server determine how "unique" some value may be -- i.e., if statistics say a value will match approximately 80% of the rows, SQL Server will do a table scan instead. If statistics say a value will probably match around 10% of the rows, then the query optimizer will opt for a seek to minimize database impact. SQL Server statistics can be maintained automatically or you can run them manually. Since re-indexing changes the statistics results, I recommend that after re-indexing, you manually run sp_updatestats or the T-SQL UPDATE STATISTICS command. Statistics are only maintained on the first column of any compound index, so the "uniqueness" of other columns in the index cannot be determined. Summary Index maintenance is critical to ensure that queries continue to benefit from index use and to reduce IO/RAM/CPU, which reduces blocking as well. Run your queries with the option "show execution plan" turned on. If the query is not using your index, then check the following: 1. Run dbcc showcontig ('tablename') to see if the table is fragmented. 2. Check your "where clause" to see if it references the first column in the index. 3. Ensure that your "where clause" does not have a function for the criteria for the first column of the index. 4. Update the statistics just in case they are out of date. If the table is fragmented, then run this step after re-indexing. 5. Make sure the criteria you are using is unique enough and that SQL Server will see a benefit in using it to search the data.

(1) How can I find out whether my indexes are useful? How are they used?

First, we will determine whether indexes are useful. DDL is used to create objects (such as indexes) and update the catalog. Creating the index does not constitute use of the index, and thus the index will not be reflected in the index DMVs until the index is actually used. When an index is used by a Select, Insert, Update, or Delete, its use is captured by sys.dm_db_index_usage_stats. If you have run a representative workload, all useful indexes will have been recorded in sys.dm_db_index_usage_stats. Thus, any index not found in sys.dm_db_index_usage_stats is unused by the workload (since the last re-cycle of SQL Server). Unused indexes can be found as follows: (2) Do I have any tables or indexes that are not used (or rarely used)? ------ unused tables & indexes. Tables have index_ids of either 0 = Heap table or 1 = Clustered Index Declare @dbid int

Select @dbid = db_id('Northwind') Select objectname=object_name(i.object_id) , indexname=i.name, i.index_id from sys.indexes i, sys.objects o where objectproperty(o.object_id,'IsUserTable') = 1 and i.index_id NOT IN (select s.index_id from sys.dm_db_index_usage_stats s where s.object_id=i.object_id and i.index_id=s.index_id and database_id = @dbid ) and o.object_id = i.object_id order by objectname,i.index_id,indexname asc Rarely used indexes will appear in sys.dm_db_index_usage_stats just like heavily used indexes. To find rarely used indexes, you look at columns such as user_seeks, user_scans, user_lookups, and user_updates. --- rarely used indexes appear first declare @dbid int select @dbid = db_id() select objectname=object_name(s.object_id), s.object_id, indexname=i.name, i.index_id , user_seeks, user_scans, user_lookups, user_updates from sys.dm_db_index_usage_stats s, sys.indexes i where database_id = @dbid and objectproperty(s.object_id,'IsUserTable') = 1 and i.object_id = s.object_id and i.index_id = s.index_id order by (user_seeks + user_scans + user_lookups + user_updates) asc (3) What is the cost of index maintenance vs. its benefit? If a table is heavily updated and also has indexes that are rarely used, the cost of maintaining the indexes could exceed the benefits. To compare the cost and benefit, you can use the table valued function sys.dm_db_index_operational_stats as follows: --- sys.dm_db_index_operational_stats declare @dbid int select @dbid = db_id() select objectname=object_name(s.object_id), indexname=i.name, i.index_id , reads=range_scan_count + singleton_lookup_count , 'leaf_writes'=leaf_insert_count+leaf_update_count+ leaf_delete_count , 'leaf_page_splits' = leaf_allocation_count , 'nonleaf_writes'=nonleaf_insert_count + nonleaf_update_count + nonleaf_delete_count , 'nonleaf_page_splits' = nonleaf_allocation_count

from sys.dm_db_index_operational_stats (@dbid,NULL,NULL,NULL) s, sys.indexes i where objectproperty(s.object_id,'IsUserTable') = 1 and i.object_id = s.object_id and i.index_id = s.index_id order by reads desc, leaf_writes, nonleaf_writes --- sys.dm_db_index_usage_stats select objectname=object_name(s.object_id), indexname=i.name, i.index_id ,reads=user_seeks + user_scans + user_lookups ,writes = user_updates from sys.dm_db_index_usage_stats s, sys.indexes i where objectproperty(s.object_id,'IsUserTable') = 1 and s.object_id = i.object_id and i.index_id = s.index_id and s.database_id = @dbid order by reads desc go The difference between sys.dm_db_index_usage_stats and sys.dm_db_index_operational_stats is as follows. Sys.dm_db_index_usage_stats counts each access as 1, whereas sys.dm_db_index_operational_stats counts depending on the operation, pages or rows. (4) Do I have hot spots & index contention? Index contention (e.g. waits for locks) can be seen in sys.dm_db_index_operational_stats. Columns such as row_lock_count, row_lock_wait_count, row_lock_wait_in_ms, page_lock_count, page_lock_wait_count, page_lock_wait_in_ms, page_latch_wait_count, page_latch_wait_in_ms, pageio_latch_wait_count, pageio_latch_wait_in_ms detail lock and latch contention in terms of waits. You can determine the average blocking and lock waits by comparing waits to counts as follows: declare @dbid int select @dbid = db_id() Select dbid=database_id, objectname=object_name(s.object_id) , indexname=i.name, i.index_id --, partition_number , row_lock_count, row_lock_wait_count , [block %]=cast (100.0 * row_lock_wait_count / (1 + row_lock_count) as numeric(15,2)) , row_lock_wait_in_ms , [avg row lock waits in ms]=cast (1.0 * row_lock_wait_in_ms / (1 + row_lock_wait_count) as numeric(15,2))

from sys.dm_db_index_operational_stats (@dbid, NULL, NULL, NULL) s, sys.indexes i where objectproperty(s.object_id,'IsUserTable') = 1 and i.object_id = s.object_id and i.index_id = s.index_id order by row_lock_wait_count desc The following report shows blocks in the [Order Details] table, index OrdersOrder_Details. While blocks occur less than 2 percent of the time, when they do occur, the average block time is 15.7 seconds. It would be important to track this down using the SQL Profiler Blocked Process Report. You can set the Blocked Process Threshold to 15 using sp_configure Blocked Process Threshold,15. Afterwards, you can run a trace to capture blocks over 15 seconds. The Profiler trace will include the blocked and blocking process. The advantage of tracing for long blocks is the blocked and blocking details can be saved in the trace file and can be analyzed long after the block disappears. Historically, you can see the common causes of blocks. In this case the blocked process is the stored procedure NewCustOrder. The blocking process is the stored procedure UpdCustOrderShippedDate. The caveat with Profiler Trace of Blocked Process Report is that in the case of stored procedures, you cannot see the actual statement within the stored procedure that is blocked. You do however, get the stmtstart and stmtend offset that does identify the statement blocked inside the stored procedure NewCustOrder. Using the above blocked process report, you could extract the blocked statement out of the NewCustOrder stored procedure by providing the sqlhandle, stmtstart and stmtend as follows: declare @sql_handle varbinary(64), @stmtstart int, @stmtend int Select @sql_handle = 0x3000050005d9f67ea8425301059700000100000000000000 Select @stmtstart = 920, @stmtend = 1064 select substring(qt.text,s.statement_start_offset/2, (case when s.statement_end_offset = -1 then len(convert(nvarchar(max), qt.text)) * 2 else s.statement_end_offset end -s.statement_start_offset)/2) as "blocked statement" ,s.statement_start_offset ,s.statement_end_offset

,batch=qt.text ,qt.dbid ,qt.objectid ,s.execution_count ,s.total_worker_time ,s.total_elapsed_time ,s.total_logical_reads ,s.total_physical_reads ,s.total_logical_writes from sys.dm_exec_query_stats s cross apply sys.dm_exec_sql_text(s.sql_handle) as qt where s.sql_handle = @sql_handle and s.statement_start_offset = @stmtstart and s.statement_end_offset = @stmtend You can capture the actual blocked statement of a stored procedure in realtime (as it is occuring) using the following: create proc sp_block_info as select t1.resource_type as [lock type] ,db_name(resource_database_id) as [database] ,t1.resource_associated_entity_id as [blk object] ,t1.request_mode as [lock req] --lock requested ,t1.request_session_id as [waiter sid] --spid of waiter ,t2.wait_duration_ms as [wait time] ,(select text from sys.dm_exec_requests as r --- get sql for waiter cross apply sys.dm_exec_sql_text(r.sql_handle) where r.session_id = t1.request_session_id) as waiter_batch ,(select substring(qt.text,r.statement_start_offset/2, (case when r.statement_end_offset = -1 then len(convert(nvarchar(max), qt.text)) * 2 else r.statement_end_offset end - r.statement_start_offset)/2) from sys.dm_exec_requests as r cross apply sys.dm_exec_sql_text(r.sql_handle) as qt where r.session_id = t1.request_session_id) as waiter_stmt --statement blocked ,t2.blocking_session_id as [blocker sid] -- spid of blocker ,(select text from sys.sysprocesses as p --- get sql for blocker cross apply sys.dm_exec_sql_text(p.sql_handle) where p.spid = t2.blocking_session_id) as blocker_stmt

from sys.dm_tran_locks as t1, sys.dm_os_waiting_tasks as t2 where t1.lock_owner_address = t2.resource_address go exec sp_block_info (5) Could I benefit from more (or less) indexes? Remembering that indexes involve both a maintenance cost and a read benefit, the overall index cost benefit can be determined by comparing reads and writes. Reading an index allows us to avoid table scans however they do require maintenance to be kept up-todate. While it is easy to identify the fringe cases where indexes are not used, and the rarely used cases, in the final analysis, index cost benefit is somewhat subjective. The reason is the number of reads and writes are highly dependent on the workload and frequency. In addition, qualitative factors beyond the number of reads and writes can include a highly important monthly management report or quarterly VP report in which the maintenance cost is of secondary concern. Writes of all indexes are performed for inserts, but there are no associated reads (unless there are referential constraints). Besides select statements, reads are performed for updates and deletes, writes are performed if rows qualify. OLTP workloads have lots of small transactions, frequently combining select, insert, update and delete operations. Data Warehouse activity is typically separated into batch windows having a high concentation of write activity, followed by an on-line window of read activity. SQL Statement Select Insert Update Delete Read Yes No Yes Yes Write No Yes, all indexes Yes, if row qualifies Yes, if row qualifies

In general, you want to keep indexes to a functional minimum in a high transaction OLTP environment due to high transaction throughput combined with the cost of index maintenance and potential for blocking. In contrast, you pay for index maintenance once during the batch window when updates occur for a data warehouse. Thus, data warehouses tend to have more indexes to benefit its read-intensive on-line users. In conclusion, an important new feature of SQL Server 2005 includes Dynamic Management Views (DMVs). DMVs provide a level of transparency that was not available in SQL Server 2000 and can be used for diagnostics, memory and process tuning, and monitoring. DMVs can be useful in answering practical questions such as index usage, cost benefit of indexes, and index hot spots. Finally, DMVs are queriable with SELECT statements but are not persisted to disk. Thus they reflect changing server state information since the last SQL Server recycle.

You might also like