Monday, December 14, 2015

Defrag It

I mentioned in earlier posts both the joys of indices and some strategies for maintaining them so they don't become too badly fragmented. Although these great high-level guidelines generally collect and resolve the major issues, occasionally you need a deeper inspection to make sure you aren't overlooking any mid-size tables that could be causing a few queries some consternation.  This method is more resource intensive, so I only run it every couple months or so.

You will first need to create a table to receive the results of the fragmentation analysis:
CREATE TABLE [dbo].[sampledIndexStats](
      [TableName] [nvarchar](128) NULL, [IndexName] [nvarchar](128) NULL,
      [Frag] [decimal](5, 2) NULL
Next, run this:
INSERT INTO dbo.sampledIndexStats
  OBJECT_NAME(i.[object_id]),, s.avg_fragmentation_in_percent
  sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'SAMPLED') AS s
INNER JOIN sys.indexes AS i
ON i.[object_id] = s.[object_id]
AND i.index_id = s.index_id;
This takes a while to run as dm_db_index_physical_stats need to physically sample the index trees to see how out-of-sequence the leaf levels are. Once it's done though you can join the resulting sampledIndexStats table into the query that shows overall index usage to determine if any of the mid-sized heavily used indices suffer from fragmentation.

You're welcome.

Thursday, November 19, 2015

Blended Storage

One of the appreciable unseen tasks of a DBA is too manage the file allocations and physical location of the database storage files. I say this is unseen because in most shops, when a new project starts up, some Developer right clicks and selects "new database," fills in the rudimentary info for the primary .mdf and .ldf file, and off they go building tables.

Works fine to start and is very simple to conceptualize, backup, and restore, but ultimately as the database grows this monolithic approach is really tough to manage and optimize. Right off the bat then most of my production databases end up with at least two log file allocations and a couple of secondary .ndf files for row data. This makes the management easier for log swapping and incremental backups.

A database doesn't lodge itself in a static physical environment though. Managers are always upgrading to new physical infrastructure and space keeps getting allocated or reclaimed. After a while you'll likely find yourself in a blended storage environment, with a variety of attached network storage devices spanning a range of IOPS and performance.
What's the best way to manage such a dynamic environment, so that you get fast IO on your database without constantly shuffling complete databases to the latest and greatest device (until all the new devices fill up, natch)?

To start, tempDB files and log files should be on your fastest possible devices. tempDB is often the most active database in an instance, and log files are "write through" (they aren't cached) so they can become a bottleneck on a slower device. When your company purchases the latest and greatest, these should be the first thing you move. Moving tempdB requires a SQL service restart, naturally. You can move the log files dynamically, by just allocating files to the new device, turning off autogrowth on the old files, and then shrinking the old log files at your convenience.

Moving row data files to a new device can be drudgery, but handling the data files for the indices is simpler than those for the tables. I like to create a folder on the second fastest storage device and call it HSI, for High Speed Indices. Then I add a filegroup to my databases called HSI, and allocate files on the aforementioned storage folder. Getting the indices moved is as simple as rebuilding them onto the new HSI filegroup. Maybe you don't really need all of them over there however, so use a query like this to see what you should rebuild:

SELECT AS object_name, AS index_name,
c.index_columns AS index_columns_key, s.used_page_count * 8   AS IndexSizeKB,
sys.dm_db_index_usage_stats.user_seeks, sys.dm_db_index_usage_stats.user_scans,
sys.dm_db_index_usage_stats.user_updates, AS StorageLoc
FROM sys.objects JOIN sys.indexes ON sys.indexes.object_id=sys.objects.object_id
JOIN (SELECT distinct object_id, index_id,
stuff((SELECT ','+col_name(object_id,column_id ) as 'data()'
FROM sys.index_columns t2
WHERE t1.object_id =t2.object_id and t1.index_id = t2.index_id
FOR XML PATH ('')),1,1,'') as 'index_columns'
FROM sys.index_columns t1
) c ON c.index_id = sys.indexes.index_id AND c.object_id = sys.indexes.object_id
LEFT OUTER JOIN sys.dm_db_index_usage_stats
ON sys.indexes.index_id=sys.dm_db_index_usage_stats.index_id
AND sys.indexes.object_id=sys.dm_db_index_usage_stats.object_id
JOIN sys.dm_db_partition_stats  AS s
ON s.[object_id] = sys.indexes.object_id AND s.index_id = sys.indexes.index_id
JOIN sys.filegroups f ON sys.indexes.data_space_id = f.data_space_id
WHERE sys.objects.type='u' AND sys.indexes.type_desc  <> 'HEAP'
AND sys.indexes.type_desc like '%NONCLUSTER%'
ORDER BY 6 desc
Select the large indices that get scanned and seeked most frequently. Finally you can move some of the data files over to a filegroup you create that points to another folder on the second fastest device (I typically call that folder HSdata). I covered how to move data files between filegroups in another post, but to recap briefly it does require a quiet maintenance window, as this takes considerably more effort.

Monday, October 12, 2015

Monitors Are Up!

I spend a good ten percent of every day watching monitoring tools. I have a handful of Dynamic Management View (DMV) queries and some stored procs that I like to run (I covered those in this previous post) but I also use three exceptionally helpful third-party tools. Two of them are free and one definitely is not. Still though I don't know what I'd do without any of them.

A happy SQL needs lots of disk space: tempDB, log files, databases, backups... there always seems to be -more- data growing its tentacles (never -less- oddly enough). As long as you're managing the storage allocation on the SQL boxes frankly you might as well be managing it across your entire shop, so you need a tool like Free Disk Management Software, FDMS. You need to be included in the Administrator group on each server; after that the setup allows you to select which disks you're actually interested in monitoring. Use the Options to set more appropriate bar-split color levels, and slow down the sampling to once every half hour. If you sort the display by free-space (low to high) then the problems obviously aggregate toward the top.

I've read several interesting posts on using the built-in Windows Performance Monitor to measure key SQL metrics, but to get serious you'll want to download this terrific free tool, Magnalis PerfPro, that wraps PerfMon in customizable MDI-mode windows with various display options. First though definitely perform an online search of "using PerfMon to monitor SQL" so you'll know which counters to add.

Finally, the semi-expensive monitoring tool that is worth every penny you spend: Redgate Monitor. This takes time to install and tweak properly, but with its excellent historical recordkeeping, variable instance-specified alerts, and flexibly filtered notification-clearing this tool will help you out with both your immediate concerns as well as your deep research.

Tuesday, September 15, 2015

Instance Optimization

Not too shortly after cataloging your shop's databases, it's time for some instance tuning. It's sad how many places have their operational guy (or yikes, a developer) install SQL straight off the CD onto a server box, without a second thought as to sizing or tuning. Then they load up a bunch of databases and wonder why performance is so poor.

First set the instance memory. When SQL spins up it tries to grab all of the memory you allocated to it right off the bat. If you have multiple instances on the same server you should plan for each of them having their own memory space. After adding up the gigabytes used by your SQL instances you still need to allow some gigs for the operating system -- here is a useful guideline for what to allow.

The next option to set is the maximum degree of parallelism (so called max DOP). I like to set this to less than half the CPUs on the box, usually considerably less. Look at it this way: if you have multiple long-running queries (running bulk updates or aggregations) at the same time, each one will take up the max DOP number of processors until you reach the total count, after which new queries will signal-wait until a previous one finishes or gets swapped out. On an 8 CPU box 2 or 3 max DOP is usually plenty.

Next I like to check that the "Instant File Initialization" option is turned on from the operating system. Unless you're running some high-security dB on crappy attached storage, you should always have this turned on.

Lastly you'll need to allocate sufficient space and file duplicity on tempDB. I've heard folks say that you should create a tempDB file for each CPU, but I prefer to start with 3 mdf files, and then monitor it for stalls (see this post about monitoring file IO); after that increase the count as necessary. How you size them depends greatly on how the instance is used: instances where large BI aggregations run need substantial tempDB space. I'll have more to say about sizing tempDB also in a later post.

After an instance has been up and running for a while run this query that shows total waits:
WITH [Waits] AS
        [wait_time_ms] / 1000.0 AS [WaitS],
        ([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS],
        [signal_wait_time_ms] / 1000.0 AS [SignalS],
        [waiting_tasks_count] AS [WaitCount],
        100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage],
        ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum]
    FROM sys.dm_os_wait_stats
    WHERE [wait_type] NOT IN (
        N'BROKER_TASK_STOP',                N'BROKER_TO_FLUSH',
        N'CHKPT',                           N'CLR_AUTO_EVENT',
        N'CLR_MANUAL_EVENT',                N'CLR_SEMAPHORE',
        N'EXECSYNC',                        N'FSAGENT',
        N'HADR_TIMER_TASK',                 N'HADR_WORK_QUEUE',
        N'KSOURCE_WAKEUP',                  N'LAZYWRITER_SLEEP',
        N'LOGMGR_QUEUE',                    N'ONDEMAND_TASK_QUEUE',
        N'SERVER_IDLE_CHECK',               N'SLEEP_BPOOL_FLUSH',
        N'SLEEP_DBSTARTUP',                 N'SLEEP_DCOMSTARTUP',
        N'SLEEP_SYSTEMTASK',                N'SLEEP_TASK',
        N'WAITFOR',                         N'WAITFOR_TASKSHUTDOWN',
        N'XE_DISPATCHER_WAIT',              N'XE_TIMER_EVENT')
    AND [waiting_tasks_count] > 0
    MAX ([W1].[wait_type]) AS [WaitType],
    CAST (MAX ([W1].[WaitS]) AS DECIMAL (16,2)) AS [Wait_S],
    CAST (MAX ([W1].[ResourceS]) AS DECIMAL (16,2)) AS [Resource_S],
    CAST (MAX ([W1].[SignalS]) AS DECIMAL (16,2)) AS [Signal_S],
    MAX ([W1].[WaitCount]) AS [WaitCount],
    CAST (MAX ([W1].[Percentage]) AS DECIMAL (5,2)) AS [Percentage],
    CAST ((MAX ([W1].[WaitS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgWait_S],
    CAST ((MAX ([W1].[ResourceS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgRes_S],
    CAST ((MAX ([W1].[SignalS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgSig_S]
FROM [Waits] AS [W1]
INNER JOIN [Waits] AS [W2]
    ON [W2].[RowNum] <= [W1].[RowNum]
GROUP BY [W1].[RowNum]
HAVING SUM ([W2].[Percentage]) - MAX ([W1].[Percentage]) < 95; -- percentage threshold
Take a look at the top one or two waits on your instance and then do a bit of online research to discover the cause, and then fix it! There, you've now done the bulk of your instance tuning.

Thursday, August 6, 2015

A Reasonable DR

I wracked my brains over Disaster Recovery concepts for several months before I reached a viable plan. The key was when the light bulb went on that you must plan for two distinct types of disasters (call them major and catastrophic). You need to plan for both, but quite possibly you can be a little more lenient on yourself for the catastrophies. I'm not going to get too far into the weeds with my specific implementation as your's will likely be different. How you recover from each disaster really depends on the perceptions of your customers (what is their range of allowable excuses) and your employer's ownership contingencies (what is their insurance like and what are their long-term objectives). Things that you read in the "trades" about recovery time objective and recovery point objective (RTO and RPO) are still relevant, but they are different under each of the scenarios.

A "major" is when the halon failed and a fire in your server room melted your NAS. Maybe some other equipment survived, and all the hardware outside the server room is just fine. Under this scenario both your customers and your management will extend you a fair amount of sympathy but they still expect some kind of reasonable recovery efforts. It widely varies by circumstance, but in my case they would expect that after a couple of days our systems would be up and running again, and at worse we might have to enter a half day's data over.

A "catastrophic" is when Hurricane Jeff rolls in from the gulf, your whole building is flooded, half the state is without power for two weeks, trees are down, roads are impassible, and the national guard is called out to maintain order and provide drinking water. In my case I suspect management could actually be called in to decide whether to walk away from the business by declaring a total insurance loss. If they did decide to continue, they could probably convince our customers that a month long gap in service was totally justified (of course we'd waive a couple months of bills out of courtesy).

You have to plan for recovery from a major disaster with the resources at your near immediate disposal. It helps to have both onsite and offsite copies of your software and your data; onsite copies are typically faster to access and should be counted on as the primary recovery source. Offsite resources might be compromised by the disaster (such as if the fire took down your switch hardware servicing your cloud provider). Offsite resources however may provide some resilience should you not end up with a complete set of onsite backups after the disaster.

You have to plan for catastrophic recovery from solely the offsite resources. You may need to procure new hardware in a new location and your old location might be permanently inaccessible.

Once you've got these concepts firmly in mind and know for each the RTO and RPO, you can work backward to determine your backup strategy. It doesn't make sense to backup your log files locally on a frequency greater than your major disaster RPO, and to perform offsite database log copies more frequently than your catastrophic RPO.

Planning to meet the Recovery Time Objective is rather more nuanced, and likely takes actual practice to discover. How long does it take you to restore a copy of each of your databases? How long to restore the subsequent differentials and log backups? Until you try it you won't know for sure. Also consider that your charges are usually blessed with running on the fastest IO hardware; in a disaster or catastrophe you may well be restoring to considerably slower boxes.

There now, that was fun, wasn't it!

Wednesday, July 8, 2015

Physical Design

Likely everyone who makes it to the position of a database administrator has been through the ropes of serving as an elementary developer. By now you already know how to design tables, views, relationships between tables, stored procs, on and on. Early in your administrator career though you will run into the underworkings of the database engine, and recognize that it is time to face straight up and address the physical design of your databases. As much as poorly designed normalization can hose your system's queries, the underlying physical design can wreak just as much havoc.

Physical design covers several related aspects of adjustments to the databases, but deep down it's all about disk IO across the various files. Adding server memory nowadays is a fast and inexpensive affair, so the limiting operational performance in the host of most database systems is the IO throughput. Fast IO can be achieved numerous ways, but once you've established a baseline for the physical hardware and make sure you don't have any bottlenecks in disk controllers or bandwidth to your NAS, then what remains is spreading out the active database files across various drives and controllers to maximize the IO parallelism.

I use a couple different management queries to identify places where files should be split. These two queries below tell you where your usage is heaviest on your SQL instance:
 DB_NAME(VFS.database_id) AS DatabaseName
,CASE WHEN smf.type = 1 THEN 'LOG_FILE' ELSE 'DATA_FILE' END AS DatabaseFile_Type
,SUM(VFS.num_of_bytes_written) AS IO_Write, SUM(VFS.num_of_bytes_read) AS IO_Read
,SUM(VFS.num_of_bytes_read + VFS.num_of_bytes_written) AS Total_IO
,SUM(VFS.io_stall) AS IO_STALL
FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS VFS
JOIN sys.master_files AS smf   ON VFS.database_id = smf.database_id
 AND VFS.file_id = smf.file_id
GROUP BY  DB_NAME(VFS.database_id), smf.type
,DatabaseName, DatabaseFile_Type
,CAST(1.0 * IO_Read/ (1024 * 1024) AS DECIMAL(12, 2)) AS IO_Read_MB
,CAST(1.0 * IO_Write/ (1024 * 1024) AS DECIMAL(12, 2)) AS IO_Write_MB
,CAST(1. * Total_IO / (1024 * 1024) AS DECIMAL(12, 2)) AS IO_TOTAL_MB
,CAST(IO_STALL / 1000. AS DECIMAL(12, 2)) AS IO_STALL_Seconds

select db_name(mf.database_id) as database_name, mf.physical_name,
left(mf.physical_name, 1) as drive_letter,
vfs.num_of_writes, vfs.num_of_bytes_written, vfs.io_stall_write_ms,
mf.type_desc, vfs.num_of_reads, vfs.num_of_bytes_read, vfs.io_stall_read_ms,
vfs.io_stall, vfs.size_on_disk_bytes
from sys.master_files mf
join sys.dm_io_virtual_file_stats(NULL, NULL) vfs
on mf.database_id=vfs.database_id and mf.file_id=vfs.file_id
order by vfs.num_of_bytes_written desc
The files that pop out from this with the highest stall percentages should likely be split and spread. Of course you need to pay special attention to tempDB as it gets shared across all of the applications running on each particular instance (I'll have more to say about allocating tempDB in a later post).

Once you've split up and created multiple files for your heavily used logs and databases, it's time to dig into some deeper level detail. This query below shows you table IO activity on a particular database; you should select your database connection (or add a USE statement) to see the details of one database at a time.
SELECT TableName = object_name(s.object_id),
       Reads = SUM(user_seeks + user_scans + user_lookups), Writes =  SUM(user_updates)
FROM sys.dm_db_index_usage_stats AS s
INNER JOIN sys.indexes AS i ON s.object_id = i.object_id
AND i.index_id = s.index_id
WHERE objectproperty(s.object_id,'IsUserTable') = 1
AND s.database_id = DB_ID()
GROUP BY object_name(s.object_id)
This will show you which specific tables are being heavily read and written to. Not infrequently I like to take the heavyweights and move them to their own filegroup.

You can do more IO tuning down at the query level, but that's a topic for another post.

Monday, June 22, 2015

The Best Tool

DBAs have all sorts of scripts and tools for managing SQL Server databases. In day to day operations you are down in the weeds with query plans and performance statistics and various monitors. You pop up a level and daily check the database backups and size of the log files. Every once in a while you lift on up to the "instance" level and check out the server logs or disk throughput.

Every other month or so I escape all the way out to the top and review once again what has turned out to be my best tool for managing the shop overall: an Excel spreadsheet. You see new databases seem to materialize on various instances all of the time, and developers even have a habit of spinning up new (poorly configured) instances all by themselves as well. So this spreadsheet helps me keep track of the whole mess and figure out where my DBA love needs to focus next.

Let me go into some detail about what the layout and each column signifies, and why they are important. Each instance has its own section; in the header adjacent to the instance name I show the SQL version, the processor count and speed, and the memory available on the server. Next to that is its backup schedule with an indication of frequency and level of manual intervention. This row of information comes in handy both when the CIO wants to suggest a different configuration on the VM boxes, and when normal backups run out of space and I need to review when they /should/ have happened.

Next comes a subheader row with column labels, and then a row (or more) for each database on that instance. The column next to the database name has the size of the data storage in KB. This is useful as every few months when I return to update the spreadsheet, I can readily tell which databases are most actively growing. Next comes the recovery model, Simple or Full (more about this in a later post).

Next the location of the mdf database file; if the database has secondary filegroups then I show the location of the secondary files below this (with as many rows as necessary to cover all the secondaries). Then the location of the log file and the location of the backups. Note that to make the document more legible I abbreviate the often long path references with a double-letter pseudo disk designation, that is described in a legend at the bottom of each section.

Listing the file locations serves a couple of useful purposes. When you are first assembling this spreadsheet the discovery and documentation process helps you avoid the high risk of having all the files on a single physical drive (a notoriously dangerous arrangement). Later, when reviewing disk latency, you may find the spreadsheet useful in avoiding disk bottlenecks. Finally in the dreaded clutches of disaster recovery, you will know where your lifesaving logs and backups are located.

Next comes a column I have labeled Write Coupling. This and the next column, Software Volatility, relate to how you design your recovery models. For each database I provide a soft subjective rating from very low to very high of how tightly the dB writes are coupled to intricate processes, and how much the underlying update software is changing (more on this in the later post about recovery models).

Next follows a column labeled SME, for subject matter expert. Typically one of the lead developers at your company knows the most about this particular database, and if you need to migrate tables or take the database offline for maintenance then this is the person with whom you will get clearance. When you are first cobbling together this spreadsheet it will be smart to hold a few meetings with the developers to nail this down.

Finally a column for Notes that mostly alerts me to tasks and technical debt that needs to be worked down, or any special circumstances that excuse a database from normal maintenance processes.

You can find all sorts of costly tools to assist you in managing an array of databases, but far and away the most useful and cost effective is this simple spreadsheet.