Monday, April 17, 2017

Grokking Query Plans

In an earlier post I gave you some good clues to identify the problematic queries on your instance. After you've finished the initial review in the Designer to resolve bad joins and poorly formed filters, you should click the button "Show Execution Plan". Right off the bat you may see a red warning of a missing index. If the projected savings are high then you should stop and build that index first, before you continue examining the plan.

There are a few hints you should know about when reviewing an execution plan. First off check the width (the thickness) of the arrows joining the elements. Thicker paths indicate more voluminous records are traversing. Now examine the percentage shown underneath each element. They are approximate but indicate how much processing is dedicated to each step. Work from the bottom right corner of the graph to the upper left corner (for some reason the engine displays the actual dataflow sequence right to left, bottom to top).

Generally "scans" are way worse than "seeks," and operations on tables will be slower than the same on an index. The exception is a clustered index scan, which in actuality is a regular table scan (that has a clustered index).

To look at it strictly from the engine's viewpoint: the elapsed time for access depends upon the number of rows it has to read, multiplied by the quantity of times it needs to read them, multiplied by the number of pages it needs to read (determined by the width of the sum of the columns it needs to access and then write to tempdb). From that viewpoint you want to minimize read and write lengths, counts, and widths.

Hover your mouse over one of the query plan boxes and it will show you the estimated number of rows it expects to read, and how often it thinks it need to repeat this operation (number of executions). Table scans (and clustered index scans) of wide tables with many rows should be avoided.

Even with all its smarts the plan engine is only about half as bright as you. The optimizer won't always suggest a helpful index even if you could easily create one. If you examine the criteria in the where clause and the way tables are joined, you can almost always come up with an effective index encapsulating what the engine needs for its filter and joins (thus turning scans into seeks). Furthermore if the where clause has a range of selection (for example "> getdate() - 7") you can build a filtered index that will be even faster.

The primary objective is to narrow down the length and width of your resultset as early into the dataflow as possible, to reduce the stress of joins further up the line. Many times you can do this by building an included subquery, or by splitting out the keys of the targeted records into a temp table to begin with in a first step. In other words, you want to delay adding in the fat wide rows until the end of the dataflow, when you have already applied the necessary filters to determine the count of rows you will be returning.

Spend a bit of occasional love reviewing the worst queries and if the fix requires more than just adding an index, pass your recommendations back to the development staff so they can learn and improve while going forward.