Monday, May 23, 2016

DMVs for Current Activity

Aside from the three external tools I mentioned in this previous post, I also keep a pocketful of queries and stored procs that I run to periodically monitor performance. I'm afraid I can't take credit for any of these: some I've picked up from reading various SQL books or online forums, others were written by quite skilled folks and you can download them from their website. Often times I modify them slightly to give me a better understanding of what I'm actually looking for or to hone them to show a more concentrated set of issues.

I have a separate group of DMV queries I use more for database tuning and analysis, but those in this post are the ones I'm constantly running to address alerted and critical real-time issues.

For a really fast (and low overhead) look at what is happening right now on an instance I use this variant of who2:

CREATE TABLE #sp_who2 (SPID INT,Status VARCHAR(255),
      Login  VARCHAR(255),HostName  VARCHAR(255),
      BlkBy  VARCHAR(255),DBName  VARCHAR(255),
      Command VARCHAR(255),CPUTime INT,
      DiskIO INT,LastBatch VARCHAR(255),
      ProgramName VARCHAR(255),SPID2 INT,
      REQUESTID INT)
     
INSERT INTO #sp_who2 EXEC sp_who2

SELECT     distinct SPID, Status, Login, HostName, BlkBy, DBName, Command, LastBatch, ProgramName
FROM        #sp_who2
where Command not in ('AWAITING COMMAND', 'TASK MANAGER', 'BRKR TASK')
and DBName is not null
and SPID <> @@SPID

drop table #sp_who2
The best thing about the results of this query is a quick glance at the BlkBy column tells me right off the bat who is blocking whom. If I get more concerned and don't mind the overhead of incurring a little extra CPU I run these two:

exec sp_whoisactive

select r.session_id
   ,r.status
   ,substring(qt.text,r.statement_start_offset/2,
   (case when r.statement_end_offset = -1
   then len(convert(nvarchar(max), qt.text)) * 2
   else r.statement_end_offset end - r.statement_start_offset)/2)
   as query_text, qt.dbid, qt.objectid, r.cpu_time, r.total_elapsed_time
   , r.reads,r.writes, r.logical_reads, r.scheduler_id
 from sys.dm_exec_requests as r
 cross apply sys.dm_exec_sql_text(sql_handle) as qt
 inner join sys.dm_exec_sessions as es on r.session_id = es.session_id
 where es.is_user_process = 1
 order by r.cpu_time desc
The first of this set, WhoIsActive, is an invaluable freeware stored proc by Adam Machanic: a quick click on the XML query column pops up a new window showing you the precise SQL that is executing. By all means if you find this utility useful I'd recommended making a donation to this guy on his website.

If my various external monitors seem to suggest IO bottlenecks then I like to run this query:

SELECT        DB_NAME(dbid) AS ConnectedToDB, hostname, program_name,
    loginame, cpu, physical_io, memusage, login_time, last_batch, status
FROM            sysprocesses
WHERE        (status <> 'background') AND (program_name NOT LIKE 'SQLAgent%') AND (cpu > 0)
ORDER BY physical_io DESC
GO
This shows the currently connected IO hogs. You can then see if they have any long running queries with the WhoIsActive.  If instead they are running millions of little queries with an ETL tool or some other application, then you will need to run sp_sqltrace, another work of art by a serious SQL developer that you can download for free.

Finally if the CPU is getting pegged chances are either it's tons of recompiles, or a missing index.  This will show  the recompiles:

SELECT        sys.dm_exec_cached_plans.refcounts, sys.dm_exec_cached_plans.usecounts, sys.dm_exec_cached_plans.cacheobjtype, sys.dm_exec_cached_plans.objtype,
                          left(sys.dm_exec_sql_text.text, 50) as SQLpartial, sys.dm_exec_sql_text.text as SQLstmt
into #tempCompiles                         
FROM            sys.dm_exec_cached_plans CROSS APPLY sys.dm_exec_sql_text(sys.dm_exec_cached_plans.plan_handle)
WHERE sys.dm_exec_sql_text.text LIKE '%SELECT %'
AND sys.dm_exec_sql_text.text NOT LIKE '%backup%'
and usecounts = 1

SELECT        SUM(refcounts) AS howMany, SQLpartial
into #tempCompileTotals
FROM            #tempCompiles
GROUP BY SQLpartial

SELECT        SUM(CT.howMany) AS howManyCompiles, MIN(CD.SQLstmt) AS sqlExample
FROM            #tempCompiles AS CD INNER JOIN
                         #tempCompileTotals AS CT ON CD.SQLpartial = CT.SQLpartial
GROUP BY CD.SQLpartial
ORDER BY howManyCompiles DESC

DROP table #tempCompiles
DROP table #tempCompileTotals

GO
If you have bad recompiles then the best bet is to chat with the responsible developer and have them create less dynamic queries, perhaps by storing the feeding parameters in static tables. As a last resort you can set the Forced Parameterization option on the instance, but this has some other compensating drawbacks.

Here's the query I like to use for missing indices:

SELECT        mid.statement, round(migs.user_seeks * migs.avg_total_user_cost * migs.avg_user_impact / 100, 0) as imPactor,
        migs.unique_compiles, migs.user_seeks, migs.user_scans, migs.last_user_seek, migs.last_user_scan,
        migs.avg_total_user_cost, migs.avg_user_impact, mig.index_group_handle, mig.index_handle,
                         mid.object_id, mid.equality_columns, mid.inequality_columns,
                         mid.included_columns
FROM            sys.dm_db_missing_index_group_stats AS migs INNER JOIN
                         sys.dm_db_missing_index_groups AS mig ON migs.group_handle = mig.index_group_handle INNER JOIN
                         sys.dm_db_missing_index_details AS mid ON mig.index_handle = mid.index_handle
WHERE        (mid.database_id = DB_ID())
and migs.last_user_seek > GETDATE() - 14
ORDER BY 2 DESC

You don't have to build all the missing indices, but after some use you'll be able to eye those that stand out above a critical threshold. Also note that if this suggests tons of "included" columns then the issue on your table is larger than just a missing index -- likely you need to split the table widthwise or check if most of the columns should be marked "sparse."

Friday, May 6, 2016

IO Stalls and Latency

One of the more useful optimizations you can run on a database instance if you are unfortunate enough to be on old fashioned spinning media (as opposed to SSDs) is to split up files to reduce the amount of stalls. These can be either database or log files, for either user or system databases, reads or writes. In an earlier post I gave a couple of queries you could run to get a quick gross overview, but now let's get further into the weeds and fine tune some of those slower files in detail. Run these queries:
select db_name(mf.database_id) as database_name, mf.physical_name,
(vfs.num_of_bytes_read / vfs.io_stall_read_ms) as BytesPerStall,
mf.type_desc, vfs.num_of_reads, vfs.num_of_bytes_read, vfs.io_stall_read_ms,
vfs.io_stall, vfs.size_on_disk_bytes
from sys.master_files mf
join sys.dm_io_virtual_file_stats(NULL, NULL) vfs
on mf.database_id=vfs.database_id and mf.file_id=vfs.file_id
where vfs.io_stall_read_ms > 0
order by 3

select db_name(mf.database_id) as database_name, mf.physical_name,
(vfs.num_of_bytes_written / vfs.io_stall_write_ms) as BytesPerStall,
mf.type_desc, vfs.num_of_writes, vfs.num_of_bytes_written, vfs.io_stall_write_ms,
vfs.io_stall, vfs.size_on_disk_bytes
from sys.master_files mf
join sys.dm_io_virtual_file_stats(NULL, NULL) vfs
on mf.database_id=vfs.database_id and mf.file_id=vfs.file_id
where vfs.io_stall_write_ms > 0
order by 3
This returns two resultsets, the first is problem stalls on reads, the second is problem stalls on writes. Do pay attention to the column showing number of bytes read and written; likely it's a waste of time to optimize the smaller files. Now that you know generally who your problem children are, you have a couple of matters of recourse. If the stalls are happening in a log file, then it's time to increase the number of log files or move them to a faster device. If the latency is in a table (a file of type "rows") then you need to dig a little deeper; run this query attached to the database that concerns you.
SELECT TableName = object_name(s.object_id),
       Reads = SUM(user_seeks + user_scans + user_lookups), Writes =  SUM(user_updates)
FROM sys.dm_db_index_usage_stats AS s
INNER JOIN sys.indexes AS i ON s.object_id = i.object_id
AND i.index_id = s.index_id
WHERE objectproperty(s.object_id,'IsUserTable') = 1
AND s.database_id = DB_ID()
GROUP BY object_name(s.object_id)
ORDER BY writes DESC
Of course change the Order By clause for whether you are researching the stalls on reads or on the writes. Likely the top couple of tables in this returned set should be spun off to their own filegroup with multiple files (or possibly their clustered index needs reorganizing -- check their fragmentation).