Monday, December 14, 2015

Defrag It

I mentioned in earlier posts both the joys of indices and some strategies for maintaining them so they don't become too badly fragmented. Although these great high-level guidelines generally collect and resolve the major issues, occasionally you need a deeper inspection to make sure you aren't overlooking any mid-size tables that could be causing a few queries some consternation.  This method is more resource intensive, so I only run it every couple months or so.

You will first need to create a table to receive the results of the fragmentation analysis:
CREATE TABLE [dbo].[sampledIndexStats](
      [TableName] [nvarchar](128) NULL, [IndexName] [nvarchar](128) NULL,
      [Frag] [decimal](5, 2) NULL
) ON [PRIMARY]
GO
Next, run this:
INSERT INTO dbo.sampledIndexStats
SELECT
  OBJECT_NAME(i.[object_id]),
  i.name, s.avg_fragmentation_in_percent
FROM
  sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'SAMPLED') AS s
INNER JOIN sys.indexes AS i
ON i.[object_id] = s.[object_id]
AND i.index_id = s.index_id;
This takes a while to run as dm_db_index_physical_stats need to physically sample the index trees to see how out-of-sequence the leaf levels are. Once it's done though you can join the resulting sampledIndexStats table into the query that shows overall index usage to determine if any of the mid-sized heavily used indices suffer from fragmentation.

You're welcome.

Thursday, November 19, 2015

Blended Storage

One of the appreciable unseen tasks of a DBA is too manage the file allocations and physical location of the database storage files. I say this is unseen because in most shops, when a new project starts up, some Developer right clicks and selects "new database," fills in the rudimentary info for the primary .mdf and .ldf file, and off they go building tables.

Works fine to start and is very simple to conceptualize, backup, and restore, but ultimately as the database grows this monolithic approach is really tough to manage and optimize. Right off the bat then most of my production databases end up with at least two log file allocations and a couple of secondary .ndf files for row data. This makes the management easier for log swapping and incremental backups.

A database doesn't lodge itself in a static physical environment though. Managers are always upgrading to new physical infrastructure and space keeps getting allocated or reclaimed. After a while you'll likely find yourself in a blended storage environment, with a variety of attached network storage devices spanning a range of IOPS and performance.
What's the best way to manage such a dynamic environment, so that you get fast IO on your database without constantly shuffling complete databases to the latest and greatest device (until all the new devices fill up, natch)?

To start, tempDB files and log files should be on your fastest possible devices. tempDB is often the most active database in an instance, and log files are "write through" (they aren't cached) so they can become a bottleneck on a slower device. When your company purchases the latest and greatest, these should be the first thing you move. Moving tempdB requires a SQL service restart, naturally. You can move the log files dynamically, by just allocating files to the new device, turning off autogrowth on the old files, and then shrinking the old log files at your convenience.

Moving row data files to a new device can be drudgery, but handling the data files for the indices is simpler than those for the tables. I like to create a folder on the second fastest storage device and call it HSI, for High Speed Indices. Then I add a filegroup to my databases called HSI, and allocate files on the aforementioned storage folder. Getting the indices moved is as simple as rebuilding them onto the new HSI filegroup. Maybe you don't really need all of them over there however, so use a query like this to see what you should rebuild:

SELECT sys.objects.name AS object_name, sys.indexes.name AS index_name,
c.index_columns AS index_columns_key, s.used_page_count * 8   AS IndexSizeKB,
sys.dm_db_index_usage_stats.user_seeks, sys.dm_db_index_usage_stats.user_scans,
sys.dm_db_index_usage_stats.user_updates, f.name AS StorageLoc
FROM sys.objects JOIN sys.indexes ON sys.indexes.object_id=sys.objects.object_id
JOIN (SELECT distinct object_id, index_id,
stuff((SELECT ','+col_name(object_id,column_id ) as 'data()'
FROM sys.index_columns t2
WHERE t1.object_id =t2.object_id and t1.index_id = t2.index_id
FOR XML PATH ('')),1,1,'') as 'index_columns'
FROM sys.index_columns t1
) c ON c.index_id = sys.indexes.index_id AND c.object_id = sys.indexes.object_id
LEFT OUTER JOIN sys.dm_db_index_usage_stats
ON sys.indexes.index_id=sys.dm_db_index_usage_stats.index_id
AND sys.indexes.object_id=sys.dm_db_index_usage_stats.object_id
JOIN sys.dm_db_partition_stats  AS s
ON s.[object_id] = sys.indexes.object_id AND s.index_id = sys.indexes.index_id
JOIN sys.filegroups f ON sys.indexes.data_space_id = f.data_space_id
WHERE sys.objects.type='u' AND sys.indexes.type_desc  <> 'HEAP'
AND sys.indexes.type_desc like '%NONCLUSTER%'
ORDER BY 6 desc
Select the large indices that get scanned and seeked most frequently. Finally you can move some of the data files over to a filegroup you create that points to another folder on the second fastest device (I typically call that folder HSdata). I covered how to move data files between filegroups in another post, but to recap briefly it does require a quiet maintenance window, as this takes considerably more effort.