Thursday, July 20, 2017

Fixing a Slow Query

I've written a full category of posts on how to generally optimize your database instance, indices, and storage. Quite frequently though the DBA finds themselves responding to a specific performance concern: this one query is running especially slow now. It used to run fine before, but now it is timing out. I can't even begin to tell you how frequently this happens.  The usual cause is either one of two things: blocking, or an obsolete plan cache.

Blocking is quite easy to ferret out; this replay of a who2 will rapidly show you any issues:

CREATE TABLE #sp_who2 (SPID INT,Status VARCHAR(255),
      Login  VARCHAR(255),HostName  VARCHAR(255),
      BlkBy  VARCHAR(255),DBName  VARCHAR(255),
      Command VARCHAR(255),CPUTime INT,
      DiskIO INT,LastBatch VARCHAR(255),
      ProgramName VARCHAR(255),SPID2 INT,
      REQUESTID INT)
     
INSERT INTO #sp_who2 EXEC sp_who2

SELECT     distinct SPID, Status, Login, HostName, BlkBy, DBName, Command, LastBatch, ProgramName
FROM        #sp_who2
where Command not in ('AWAITING COMMAND', 'TASK MANAGER', 'BRKR TASK') and
DBName is not null
and SPID <> @@SPID

drop table #sp_who2

Naturally your favorite instance monitor (such as RedGate) will also show blocking. Why would blocking suddenly start happening? Well typically a developer writes a new long-running stored proc (or sometimes creates a query to pull all of the data across several large tables) and doesn't use the Nolock hint. Nolock isn't always the appropriate prescription, but it is a reasonable choice for reporting when intervening update production cycles are clear. (Nolock is like a read uncommitted).

When a long-running stored proc is blocking you likely will need to get a developer involved for troubleshooting. Frequently it makes sense to create batches (of say 10000 records each) and loop over the commits in smaller parcels.

By far the most frequent cause of long running timeouts however relates to how developers create code. When they start on a project the source tables are small, so when they run the query the first few times the plan engine says Hey, I can full scan this. Many of the joins are small enough to be cached completely in memory. Once the tables have been up and running a few months and have accumulated several hundred thousand rows the old way of filtering and joining are naturally too slow for the engine.

I do read of DBAs who take the simple step of clearing the plan cache, but I prefer a more nuanced approach. Most of the time what works best is some actual inspection of the SQL statement, pulling it apart to deduce what new indices would help. The objective is to create an environment where the engine only has to scan or seek indices without going to the base tables.  Here's a perfect example... the developer said it used to run in a couple of seconds originally but now exceeds the 30-second default .net timeout:

   SELECT DISTINCT tv.VIN_ID AS 'VIN', d.DealerCode AS 'DealerCode',
        mc.[Segment] AS 'Segment', tv.UserLogin AS 'UserLogin', tv.Destination
    FROM [TXN3_VIN] tv (NOLOCK)
    INNER JOIN [MarketingList] mc ON mc.VIN = tv.VIN_ID AND mc.MonthID = tv.MonthID
    INNER JOIN dealer d on d.DealerID = tv.DealerID
    WHERE tv.Destination = @dest  AND tv.SubmitDateTime = @timeStamp
        and tv.Campaign = 1
    ORDER BY tv.[VIN_ID], mc.[Segment]

When I looked at the query plan the execution count estimates on the index seeks where rather small, in the low hundreds.  When I checked the base tables though they had hundreds of thousands of rows. The query designer didn't alert to any missing indices, but clearly something was awry here. Time for a deeper inspection.

First, you have to think like you're the SQL engine. The most restrictive part of the selection process on this query is the where clause, so if you can create an index using the columns of the where clause that will certainly help.

CREATE NONCLUSTERED INDEX [ix_TXN3_subDestCamp] ON [dbo].[TXN3_VIN]
    ([SubmitDateTime] ASC,
    [Destination] ASC,
    [Campaign] ASC)
INCLUDE ( [UserLogin], [VIN_ID], [MonthID])
WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 85) ON [PRIMARY]

Notice I included the columns of UserLogin, VIN_ID, and MonthID; VIN_ID and MonthID are there because the engine will need it when it joins to the MarketingList table.  UserLogin is there because the output results will require it. Now that that part is fixed, consider the join. It will need these columns from the MarketingList table to join:

CREATE NONCLUSTERED INDEX [ix_MktgList_VinMonth] ON [dbo].[MarketingList]
    ([VIN] ASC,
    [MonthID] ASC)
INCLUDE ( [Segment]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 85) ON [PRIMARY]

On this index I included the Segment column, because the output results will require it.

After building these two indices, the query executed in less than half a second. Problem solved.

No comments:

Post a Comment