Monday, March 13, 2017

Indices

Most of the time I think it should be spelled "indexes" but okay whatever -- throughout this blog I'll follow my spellcheck and call them indices. There's so much to say about them, all of it fun. What makes indexing so much fun? You can secretly do it in the background (well, you can if you're on enterprise version). You can't screw up too much adding an index. There are all sorts of tools to help spot those that are missing. You get to look at SQL plans to figure out which indices the optimizer missed. They gradually fragment, thus providing you another monitoring opportunity. You get to play around with various fill parameters. You can reorganize them while online. Finally adding indices vastly improves queries and makes you a hero with the developers.

Having said all that, you do have to be fairly judicious with the time you allocate to these puppies. In a RAD environment most of your work will be building and evaluating them, but in a mature legacy system you will likely just have reorganizations or rebuilds scheduled and you won't pay them much mind. In any case there's enough to say about indices that it will stretch into a couple of blog posts, so herewith the details of evaluating, building, and disabling them.

To start, it makes a lot of sense to impose a naming standard at your shop. Nobody likes to see that index DTA_0013x_abcde_482 is fragmented. What the heck does that index do? In my shop we use names like ix_myTable_col1col2col3. To keep the names a useful length we might abbreviate the table or column names.

What indices do you need to build? This query will give you a good clue:
SELECT        mid.statement, round(migs.user_seeks * migs.avg_total_user_cost * migs.avg_user_impact / 100, 0) as imPactor,
    migs.unique_compiles, migs.user_seeks, migs.user_scans, migs.last_user_seek, migs.last_user_scan,
    migs.avg_total_user_cost, migs.avg_user_impact, mig.index_group_handle, mig.index_handle,
              mid.object_id, mid.equality_columns, mid.inequality_columns,
              mid.included_columns
FROM            sys.dm_db_missing_index_group_stats AS migs INNER JOIN
                         sys.dm_db_missing_index_groups AS mig ON migs.group_handle = mig.index_group_handle INNER JOIN
                         sys.dm_db_missing_index_details AS mid ON mig.index_handle = mid.index_handle
WHERE        (mid.database_id = DB_ID())
and migs.last_user_seek > GETDATE() - 14
ORDER BY 2 DESC
You need to select the database you are analyzing or include a Use statement and then run this for each dB you are inspecting. Should you build every index in this resultset? No, you need to use some good judgement, start at the top, and think about what you're doing. Certainly indices toward the top of the list with a couple key columns and a smattering of included columns make sense. On the other indices you need to use your better judgement.

Why not build them all? Well, indices take IO and CPU power to maintain. They immensely speed up SELECT queries, however they slow down the UPDATEs and INSERTs. Hence if you walk into a shop with a legacy database, you should run this query to see how the existing indices are actually being used:
SELECT sys.objects.name AS object_name, sys.indexes.name AS index_name,
Case when (is_unique = 1 and is_primary_key = 1) then 'PK UNIQUE '
when is_unique = 1  then 'UNIQUE ' else '' end + sys.indexes.type_desc type_desc,
c.index_columns AS index_columns_key, s.used_page_count * 8   AS IndexSizeKB,
sys.dm_db_index_usage_stats.user_seeks, sys.dm_db_index_usage_stats.user_scans,
sys.dm_db_index_usage_stats.user_lookups, sys.dm_db_index_usage_stats.user_updates,
sys.dm_db_index_usage_stats.last_user_seek, sys.dm_db_index_usage_stats.last_user_scan,
sys.dm_db_index_usage_stats.last_user_lookup, sys.dm_db_index_usage_stats.last_user_update
FROM sys.objects JOIN sys.indexes ON sys.indexes.object_id=sys.objects.object_id
JOIN (SELECT distinct
object_id, index_id, stuff((SELECT ','+col_name(object_id,column_id ) as 'data()'
FROM sys.index_columns t2
WHERE t1.object_id =t2.object_id and t1.index_id = t2.index_id
FOR XML PATH ('')),1,1,'') as 'index_columns'
FROM sys.index_columns t1
) c ON c.index_id = sys.indexes.index_id AND c.object_id = sys.indexes.object_id
LEFT OUTER JOIN sys.dm_db_index_usage_stats
ON sys.indexes.index_id=sys.dm_db_index_usage_stats.index_id
AND sys.indexes.object_id=sys.dm_db_index_usage_stats.object_id
JOIN sys.dm_db_partition_stats  AS s
ON s.[object_id] = sys.indexes.object_id AND s.index_id = sys.indexes.index_id
WHERE sys.objects.type='u'                 --Only UserTables will be selected
AND sys.indexes.type_desc  <> 'HEAP'       --Only indexes will appear, not HEAP tables
AND sys.indexes.type_desc like '%NONCLUSTER%'
ORDER BY 6 desc
Again precede it with a "use" statement or select the database to run it against. The results of this query are useful for two different purposes, but you have to understand that the context of these results are statistics only since the last SQL restart. It's best therefore to review this after your instance has been up and running happily for a few weeks.

First scroll down toward the bottom of the result set. Indices that have zeros or nulls in all the columns of seeks, scans, and lookups aren't being used by the plan optimizer. If they have something in the updates column, then you are incurring the extra overhead of maintaining them without gaining any advantage from their use. Before you disable them however, do a courtesy check to see if they are perhaps badly fragmented -- sometimes this is why they aren't being used in the first place.

And that is the second value of this resultset: those indices shown toward the top of the list are the prime movers, the high optimizers of your database. You should check them regularly to see if they should be rebuilt or reorganized.