Monday, October 16, 2017

IO Stalls and Latency

One of the more useful optimizations you can run on a database instance if you are unfortunate enough to be on old fashioned spinning media (as opposed to SSDs) is to split up files to reduce the amount of stalls. These can be either database or log files, for either user or system databases, reads or writes. In an earlier post I gave a couple of queries you could run to get a quick gross overview, but now let's get further into the weeds and fine tune some of those slower files in detail. Run these queries:
select db_name(mf.database_id) as database_name, mf.physical_name,
(vfs.num_of_bytes_read / vfs.io_stall_read_ms) as BytesPerStall,
mf.type_desc, vfs.num_of_reads, vfs.num_of_bytes_read, vfs.io_stall_read_ms,
vfs.io_stall, vfs.size_on_disk_bytes
from sys.master_files mf
join sys.dm_io_virtual_file_stats(NULL, NULL) vfs
on mf.database_id=vfs.database_id and mf.file_id=vfs.file_id
where vfs.io_stall_read_ms > 0
order by 3

select db_name(mf.database_id) as database_name, mf.physical_name,
(vfs.num_of_bytes_written / vfs.io_stall_write_ms) as BytesPerStall,
mf.type_desc, vfs.num_of_writes, vfs.num_of_bytes_written, vfs.io_stall_write_ms,
vfs.io_stall, vfs.size_on_disk_bytes
from sys.master_files mf
join sys.dm_io_virtual_file_stats(NULL, NULL) vfs
on mf.database_id=vfs.database_id and mf.file_id=vfs.file_id
where vfs.io_stall_write_ms > 0
order by 3
This returns two resultsets, the first is problem stalls on reads, the second is problem stalls on writes. Do pay attention to the column showing number of bytes read and written; likely it's a waste of time to optimize the smaller files. Now that you know generally who your problem children are, you have a couple of matters of recourse. If the stalls are happening in a log file, then it's time to increase the number of log files or move them to a faster device. If the latency is in a table (a file of type "rows") then you need to dig a little deeper; run this query attached to the database that concerns you.
SELECT TableName = object_name(s.object_id),
       Reads = SUM(user_seeks + user_scans + user_lookups), Writes =  SUM(user_updates)
FROM sys.dm_db_index_usage_stats AS s
INNER JOIN sys.indexes AS i ON s.object_id = i.object_id
AND i.index_id = s.index_id
WHERE objectproperty(s.object_id,'IsUserTable') = 1
AND s.database_id = DB_ID()
GROUP BY object_name(s.object_id)
ORDER BY writes DESC
Of course change the Order By clause for whether you are researching the stalls on reads or on the writes. Likely the top couple of tables in this returned set should be spun off to their own filegroup with multiple files (or possibly their clustered index needs reorganizing -- check their fragmentation).