Thursday, December 14, 2017

Protecting Key Tables

In most production systems you've got a half to a couple dozen tables that contain configuration or system-wide critical information. Things like State abbreviation tables, printer routing tables, user rights tables, hierarchies, workflow parameters, flag mapping key-value pair descriptions, et cetera. Generally you want to lock these tables down pretty tightly so that you don't allow somebody to accidentally update or delete a row, without some kind of more general awareness of what is happening. The way to do this is with a Database Role. At first this concept may seem confusing so I'll try to tease out the particulars in a way that slowly reveals its sense.

You already are familiar with the Instance level Server Roles, things like sysadmin and public. When you click on the Security folder off of the instance level, and then the folder under that called Server Roles, you can see all of the Instance level roles.

You also have seen the Database level Roles, for example after you add a new user login and then set up their access to the databases. You are familiar with db_datareader, db_datawriter, db_ddladmin, db_owner, et cetera. If you want to see these roles from the top down (rather than from the Logins mapping level) then click on a database, expand the Security folder under that database, and then click on the Roles folder, and then double-click the Database Roles folder.

This is where you add a Role to protect the tables within a database. Right click and select New database Role.  Give it a name (I like to use the name KeyTablesReadOnly). In a moment we'll revisit the remainder of this setup, but first a side digression. The point I'm making here though is that data protection roles belong in the database level of Roles, not in the Instance level.

How do you identify what tables to protect? A good clue on a fairly mature production database is to sort the tables in the Object Explorer Details pane by the column titled Data Space Used. Those that are small in size (and non-zero) likely contain the key configuration and mapping information. If you're using something like "Toad" you can select and paste these table names into a spreadsheet (which comes in handy shortly).

You probably will need to confirm these table names with the Development Manger so send that spreadsheet off for approval before proceeding. When you get confirmation, go back and double-click on that KeyTablesReadOnly database Role. The properties pane has two areas: "General" is for assigning the role to specific users, and "Securables" is for setting the protection level on objects within the database. On the tables that you want to protect, check the Deny column next to Delete, Insert, and Update, and then go ahead and add all the users (or you may leave a couple of users with full rights if they specifically are responsible for maintaining these tables).

If somebody needs to do an update, then the fast way to adjust them is to go to the Instance security folder, click on Logins, then their User mapping, highlight the database affected, and then remove them from the KeyTablesReadOnly role.

Tuesday, November 14, 2017

Memory Used

In an earlier post I mentioned some general guidelines for optimizing each instance of the SQL engine. Once you've got the memory set up it is quite interesting however to observe how the engine is actually using it. Run these two queries; they both show pretty much the same thing but you may find one more useful than the other.
(CASE WHEN ([is_modified] = 1) THEN 'Dirty' ELSE 'Clean' END) AS 'Page State',
(CASE WHEN ([database_id] = 32767) THEN 'Resource Database' ELSE DB_NAME (database_id) END) AS 'Database Name',
COUNT (*) AS 'Page Count'
FROM sys.dm_os_buffer_descriptors
GROUP BY [database_id], [is_modified]
ORDER BY [Page Count], [is_modified];

SELECT count(*)AS cached_pages_count, (count(*) * 8)/1024 As Mbytes,  db_name(database_id)
FROM sys.dm_os_buffer_descriptors
GROUP BY db_name(database_id) ,database_id
ORDER BY cached_pages_count DESC;
The amount of memory used naturally varies depending upon which queries are running at any particular time, but by monitoring this when you are under "memory pressure" you can easily identify the culprit database. This can then guide you whether it makes sense to reallocate that database to a different SQL instance, or perhaps allocate a deeper inspection to what queries are running against it.

To that end, you can run this script:
SELECT        session_id, text, requested_memory_kb,
    granted_memory_kb, used_memory_kb
FROM            sys.dm_exec_query_memory_grants MG
CROSS APPLY sys.dm_exec_sql_text(sql_handle) 

that will show you the /current/ allocation for any running queries. If you need greater detail down at the table index level, you can run this script:
       count(*)as cached_pages_count, as objectname, as indexname, obj.index_id as indexid
from sys.dm_os_buffer_descriptors as bd
    inner join
SELECT     p.object_id AS objectid, object_name(p.object_id) AS name, p.index_id, au.allocation_unit_id
FROM       sys.allocation_units au INNER JOIN
sys.partitions p ON au.container_id = p.hobt_id AND (au.type = 1 OR au.type = 3)      
        union all
SELECT     p.object_id AS objectid, object_name(p.object_id) AS name, p.index_id, au.allocation_unit_id
FROM       sys.allocation_units au INNER JOIN
sys.partitions p ON au.container_id = p.partition_id AND au.type = 2
    ) as obj
on bd.allocation_unit_id = obj.allocation_unit_id
left outer join sys.indexes ind on  obj.objectid = ind.object_id and  obj.index_id = ind.index_id
where bd.database_id = db_id() and bd.page_type in ('data_page', 'index_page')
group by,, obj.index_id
order by cached_pages_count desc

Monday, October 16, 2017

IO Stalls and Latency

One of the more useful optimizations you can run on a database instance if you are unfortunate enough to be on old fashioned spinning media (as opposed to SSDs) is to split up files to reduce the amount of stalls. These can be either database or log files, for either user or system databases, reads or writes. In an earlier post I gave a couple of queries you could run to get a quick gross overview, but now let's get further into the weeds and fine tune some of those slower files in detail. Run these queries:
select db_name(mf.database_id) as database_name, mf.physical_name,
(vfs.num_of_bytes_read / vfs.io_stall_read_ms) as BytesPerStall,
mf.type_desc, vfs.num_of_reads, vfs.num_of_bytes_read, vfs.io_stall_read_ms,
vfs.io_stall, vfs.size_on_disk_bytes
from sys.master_files mf
join sys.dm_io_virtual_file_stats(NULL, NULL) vfs
on mf.database_id=vfs.database_id and mf.file_id=vfs.file_id
where vfs.io_stall_read_ms > 0
order by 3

select db_name(mf.database_id) as database_name, mf.physical_name,
(vfs.num_of_bytes_written / vfs.io_stall_write_ms) as BytesPerStall,
mf.type_desc, vfs.num_of_writes, vfs.num_of_bytes_written, vfs.io_stall_write_ms,
vfs.io_stall, vfs.size_on_disk_bytes
from sys.master_files mf
join sys.dm_io_virtual_file_stats(NULL, NULL) vfs
on mf.database_id=vfs.database_id and mf.file_id=vfs.file_id
where vfs.io_stall_write_ms > 0
order by 3
This returns two resultsets, the first is problem stalls on reads, the second is problem stalls on writes. Do pay attention to the column showing number of bytes read and written; likely it's a waste of time to optimize the smaller files. Now that you know generally who your problem children are, you have a couple of matters of recourse. If the stalls are happening in a log file, then it's time to increase the number of log files or move them to a faster device. If the latency is in a table (a file of type "rows") then you need to dig a little deeper; run this query attached to the database that concerns you.
SELECT TableName = object_name(s.object_id),
       Reads = SUM(user_seeks + user_scans + user_lookups), Writes =  SUM(user_updates)
FROM sys.dm_db_index_usage_stats AS s
INNER JOIN sys.indexes AS i ON s.object_id = i.object_id
AND i.index_id = s.index_id
WHERE objectproperty(s.object_id,'IsUserTable') = 1
AND s.database_id = DB_ID()
GROUP BY object_name(s.object_id)
Of course change the Order By clause for whether you are researching the stalls on reads or on the writes. Likely the top couple of tables in this returned set should be spun off to their own filegroup with multiple files (or possibly their clustered index needs reorganizing -- check their fragmentation).

Wednesday, September 13, 2017

Cascade Blockage

Sometimes the answer to why a SQL process gets blocked relates to overall network dataflow. In these cases you may not be able to tease out the issue until some event stresses your server past a critical threshold.  A real world example will help to elucidate this.

We fulfill our eMail touchpoints through a service hosted by Rackspace. They communicate clicks, bounces, and opens back to a webservice on one of our servers; that webservice connects to a couple of databases on internal SQL servers. This is a fairly typical modern client-vendor arrangement. The webservice constantly runs around 90 transactions per second through the SQL database, a perfectly reasonable load.

One day a developer opened up a query that locked the eMail transaction table for a couple of minutes. The webservice began throwing timeout messages to our internal notifications queue. Even after the developer closed their session the timeout notifications continued from the webservice for a couple more hours.

When I pulled up sp_who2 there were hundreds of connections to SQL from the webservice, where normally during the day I might glimpse one or two. After a couple of hours the problem abated. When I awoke the next morning my subconscious had pieced together the problem, along with this nice metaphor.

Suppose you're driving along on the 101 freeway with light traffic and way up ahead you see some tailights glisten. In anticipation you remove your foot from the gas and let the car slow down on its own. Nearby traffic does the same, and so there is a brief slowing that passes quickly as you resume your normal speed. Next scenario: as you're driving along the 101 in light traffic the fog starts to thicken, so you slow down. Suddenly up ahead tailights! You switch to the brake, slow down, and then once you're past the distraction resume your slightly slower speed. There is a brief and persistent restricted slowing at the original location.

Now take this scenario: on the 101 in very heavy traffic, the fog thickening. The car in front of you slows, you brake, the car behind you doesn't have much room so they brake hard, and the car behind that hits them. Accident. Now that traffic has stopped a chain reaction of accidents keeps trickling back. One brief constriction, and now the blockage will be there for many hours.

When I examined the queries that were executing from the webservice I noticed they did a Select top(1) * of the row in the event table matching the key for the transaction. Unfortunately one of the fields in that row was an XML copy of the actual eMail we sent, usually around 10 MB in size (you know, graphics and all). At the typical rate of 90 transactions per second this was just about the right size to squeeze through the 10 gigabit switch between the two servers.

As soon as somebody fired off another large query on the same table though, the switch temporarily got overwhelmed, and transactions that would clear in real time suddenly were starting to queue up. Once multiple transactions were running against the same table the extra load began slowing their execution. A constriction in heavy traffic.

The solution was to change the webservice to use a view that excluded the XML column. But frankly, if the problem had never happened, I would never have specifically looked for it, as it was just below the radar.

Tuesday, August 15, 2017

All Them Logs

Oh yes those dear transaction logs: can't live with them, can't live without them. Understanding how transaction logs work and how to manage them is the key to having a happy DBA life. The transaction log is basically a copy of all SQL statements run against your database, (except for things like bulk inserts in Bulk Logged mode, or inserts made when traceflag 610 is on). In a disaster on a database set on Full Recovery you can obtain a fully functioning database by first restoring the last complete backup, and then restoring any subsequent log backups. You may still of course have to manually run any bulk imports executed after the last dB backup, but that's seldom an issue.

The nice thing about logs is they backup really fast (they are after all just a bunch of SQL). I can back up a log for a day's worth of activity on a heavy production database in about 15 minutes. The nasty things about logs is they restore rather slowly (as the engine has to re-execute all that SQL). My 15 minute backup takes a good hour and a half to restore. The other nasty thing about log files is, if you go with Instant File Initialization and autogrow, then you need to manage their size by shrinking them. If the device hosting your log files for a database runs out of space, then that database won't execute any more transactions. No bueno.

Hence it's a good idea to add a customized alert to RedGate monitor when your log file expands beyond a certain size. Also I use a little trick to "flip" between different copies of the log file: right click on your database, choose properties, and then files. Add a couple files of type "log" but leave their size small and turn off autogrowth. Now when your primary log file gets too large, turn autogrowth on one of the small files, and off on your primary. After the next log backup you should be able to shrink the primary log file down to nearly nothing (well if there aren't long pending transactions still out there). In this manner you keep occasionally swapping between active log files as their size dictates.

Another nifty trick is to allocate an extra spare log file to a different physical device (but leave it small without growth). That way in an emergency where your disk drive hosting the logs fills up, you can just click into properties and turn autogrowth on your spare log file. Once that's done, backup and shrink the primary logs.

Of course if you don't want to hassle with the logs, don't need point-in-time recovery, and don't need them for log-shipping to a synchronized database, just leave your database on Simple recovery. You can then shrink the logs at your own leisure (usually after a full dB backup) and you won't have to worry about backing up the logs themselves.

Thursday, July 20, 2017

Fixing a Slow Query

I've written a full category of posts on how to generally optimize your database instance, indices, and storage. Quite frequently though the DBA finds themselves responding to a specific performance concern: this one query is running especially slow now. It used to run fine before, but now it is timing out. I can't even begin to tell you how frequently this happens.  The usual cause is either one of two things: blocking, or an obsolete plan cache.

Blocking is quite easy to ferret out; this replay of a who2 will rapidly show you any issues:

CREATE TABLE #sp_who2 (SPID INT,Status VARCHAR(255),
      Login  VARCHAR(255),HostName  VARCHAR(255),
      BlkBy  VARCHAR(255),DBName  VARCHAR(255),
      Command VARCHAR(255),CPUTime INT,
      DiskIO INT,LastBatch VARCHAR(255),
      ProgramName VARCHAR(255),SPID2 INT,
INSERT INTO #sp_who2 EXEC sp_who2

SELECT     distinct SPID, Status, Login, HostName, BlkBy, DBName, Command, LastBatch, ProgramName
FROM        #sp_who2
where Command not in ('AWAITING COMMAND', 'TASK MANAGER', 'BRKR TASK') and
DBName is not null
and SPID <> @@SPID

drop table #sp_who2

Naturally your favorite instance monitor (such as RedGate) will also show blocking. Why would blocking suddenly start happening? Well typically a developer writes a new long-running stored proc (or sometimes creates a query to pull all of the data across several large tables) and doesn't use the Nolock hint. Nolock isn't always the appropriate prescription, but it is a reasonable choice for reporting when intervening update production cycles are clear. (Nolock is like a read uncommitted).

When a long-running stored proc is blocking you likely will need to get a developer involved for troubleshooting. Frequently it makes sense to create batches (of say 10000 records each) and loop over the commits in smaller parcels.

By far the most frequent cause of long running timeouts however relates to how developers create code. When they start on a project the source tables are small, so when they run the query the first few times the plan engine says Hey, I can full scan this. Many of the joins are small enough to be cached completely in memory. Once the tables have been up and running a few months and have accumulated several hundred thousand rows the old way of filtering and joining are naturally too slow for the engine.

I do read of DBAs who take the simple step of clearing the plan cache, but I prefer a more nuanced approach. Most of the time what works best is some actual inspection of the SQL statement, pulling it apart to deduce what new indices would help. The objective is to create an environment where the engine only has to scan or seek indices without going to the base tables.  Here's a perfect example... the developer said it used to run in a couple of seconds originally but now exceeds the 30-second default .net timeout:

   SELECT DISTINCT tv.VIN_ID AS 'VIN', d.DealerCode AS 'DealerCode',
        mc.[Segment] AS 'Segment', tv.UserLogin AS 'UserLogin', tv.Destination
    INNER JOIN [MarketingList] mc ON mc.VIN = tv.VIN_ID AND mc.MonthID = tv.MonthID
    INNER JOIN dealer d on d.DealerID = tv.DealerID
    WHERE tv.Destination = @dest  AND tv.SubmitDateTime = @timeStamp
        and tv.Campaign = 1
    ORDER BY tv.[VIN_ID], mc.[Segment]

When I looked at the query plan the execution count estimates on the index seeks where rather small, in the low hundreds.  When I checked the base tables though they had hundreds of thousands of rows. The query designer didn't alert to any missing indices, but clearly something was awry here. Time for a deeper inspection.

First, you have to think like you're the SQL engine. The most restrictive part of the selection process on this query is the where clause, so if you can create an index using the columns of the where clause that will certainly help.

    ([SubmitDateTime] ASC,
    [Destination] ASC,
    [Campaign] ASC)
INCLUDE ( [UserLogin], [VIN_ID], [MonthID])

Notice I included the columns of UserLogin, VIN_ID, and MonthID; VIN_ID and MonthID are there because the engine will need it when it joins to the MarketingList table.  UserLogin is there because the output results will require it. Now that that part is fixed, consider the join. It will need these columns from the MarketingList table to join:

CREATE NONCLUSTERED INDEX [ix_MktgList_VinMonth] ON [dbo].[MarketingList]
    ([VIN] ASC,
    [MonthID] ASC)

On this index I included the Segment column, because the output results will require it.

After building these two indices, the query executed in less than half a second. Problem solved.

Thursday, June 15, 2017

Reorgs, Rebuilds, and Fill

In an earlier post I chatted about the joy of creating indices; now we get a bit deeper into the topic to explore how you maintain the things. Then we'll circle back to discuss how you should have built them in the first place, so as to minimize your maintenance (there's nothing like 20-20 hindsight, eh?).

Let me preface the post with a disclaimer: some DBAs feel the best way to handle index maintenance is to use the wizard to create a maintenance plan for overnight or weekend defragments. I call this the lazy man's way of index maintenance, but it does have its place under many circumstances. If your employees use your database mainly during weekday working hours and it is quiescent at night (or on the weekend) and if it's not gigantic (say 100 gigabyte or less), this automatic maintenance is a smart idea and works well enough. Otherwise though I have concerns that it is a bit too easy and hides the sins of bad index design in the first place.

For large databases that never sleep you should take care to spend some time reorganizing indices online. If you are running the enterprise edition you can even rebuild indices online. Certainly not all of them however. First, learn to target the indices that need the most love and attention; I have several ways to do this. In the database explorer window of SSMS, after clicking on Tables, include the column for index size and sort the tables descending. Double click the top table, double click the index icon, and again include the column for index size. Sort descending. Fortunately SSMS remembers all this column setup so now that you've done it, it will always be at the ready.

Every week or so page through the topmost indices, right click on properties, then click fragmentation. Anything in the 30 to 40 percent range should be defragmented. How do you do this? Well you have two options, reorganize or rebuild, but more about that briefly.
Another good way to find indices to target is by running this query (the same one as in my earlier post on indices). The top rows in the resultset are the heaviest used so they need a bit tighter maintenance planning.
SELECT AS object_name, AS index_name,
Case when (is_unique = 1 and is_primary_key = 1) then 'PK UNIQUE '
when is_unique = 1  then 'UNIQUE ' else '' end + sys.indexes.type_desc type_desc,
c.index_columns AS index_columns_key, s.used_page_count * 8   AS IndexSizeKB,
sys.dm_db_index_usage_stats.user_seeks, sys.dm_db_index_usage_stats.user_scans,
sys.dm_db_index_usage_stats.user_lookups, sys.dm_db_index_usage_stats.user_updates,
sys.dm_db_index_usage_stats.last_user_seek, sys.dm_db_index_usage_stats.last_user_scan,
sys.dm_db_index_usage_stats.last_user_lookup, sys.dm_db_index_usage_stats.last_user_update
FROM sys.objects JOIN sys.indexes ON sys.indexes.object_id=sys.objects.object_id
JOIN (SELECT distinct
object_id, index_id, stuff((SELECT ','+col_name(object_id,column_id ) as 'data()'
FROM sys.index_columns t2
WHERE t1.object_id =t2.object_id and t1.index_id = t2.index_id
FOR XML PATH ('')),1,1,'') as 'index_columns'
FROM sys.index_columns t1
) c ON c.index_id = sys.indexes.index_id AND c.object_id = sys.indexes.object_id
LEFT OUTER JOIN sys.dm_db_index_usage_stats
ON sys.indexes.index_id=sys.dm_db_index_usage_stats.index_id
AND sys.indexes.object_id=sys.dm_db_index_usage_stats.object_id
JOIN sys.dm_db_partition_stats  AS s
ON s.[object_id] = sys.indexes.object_id AND s.index_id = sys.indexes.index_id
WHERE sys.objects.type='u'                 --Only UserTables will be selected
AND sys.indexes.type_desc  <> 'HEAP'       --Only indexes will appear, not HEAP tables
AND sys.indexes.type_desc like '%NONCLUSTER%'
ORDER BY 6 desc
Now that you're ready to fix these puppies, you can reorganize or rebuild. A reorganize unshuffles the b-tree by using the free space you left by allowing a fill factor (ahhh, you didn't allow a fill factor? More on that in a moment). The nice thing about a reorganize is you can interrupt it any time without incurring any rollback and it doesn't interfere much with existing SQL activity. Option two is a rebuild. Unless you have enterprise edition though, a rebuild will interfere or deadlock other queries and may escalate to a table lock. You do however have to do a rebuild if you are changing the fill factor (so back to that again).

Now you see the value of a fill factor: it creates extra space for inter-branch growth of the b-tree, so your indices don't fragment as quickly. The extra allocated space also provides the means for running the online reorganization. How much fill should you allow? I start by setting the fill factor to 90 (in other words 10 percent vacant space). Then when I am checking for fragmentation I may adjust the fill lower, depending upon the property of page utilization. Usually I midpoint the difference, so for example if my present fill factor is 90 and the fragmented index shows 70 percent page fullness, I'll schedule a rebuild to set the fill factor down to 80.

Keeping your indices happy will make your developers and your managers happy, and it's a pleasant enough diversion for a daytime job.

Tuesday, May 16, 2017

Mid-Sized Databases

Most of my problem indices happen on mid-size databases (MSDs), in the range of 100 GB or so. Small databases rebuild their indicies quickly; giant databases tend either to get reloaded and reindexed offline or have a limited exposure to random query plans.  The MSDs though can be sticklers for high random access, high required availability, and a large quantity of fairly large indices.

Several factors contribute to MSD index maintenance woes.  For one, MSDs often run on the Standard Edition of SQL, so you can't rebuild the indices with the "online" option. Secondly MSDs are large enough that rebuilding an index takes a fair amount of time.  Finally the MSD indices often get built by developers who forget to include a fill factor.

Obviously at your first opportunity you will schedule some jobs to rebuild the fragmented indices with an adequate fill factor.  More importantly though you should find some time to meet with the developers and encourage them to use a fill factor (when appropriate) on indices where the referenced fields are clearly loading with out-of-sequence data. They'll be clueless what factor to use so it's easiest to specify a safe fixed fill, say 85 or 90%... then you can adjust it to lower if you find it fragmenting exceedingly fast.

Once you've got your fill allocated the next challenge is finding the time to run index reorganization on a regular basis. Although it won't lock tables or block other spids, the reorgs do take a fair amount of resources. If you've got a nice chunk of quiet time on the weekend you can schedule a big reorg job then, naturally, but if your systems is always active then you might like this little trick.

First build a table within your MSD to hold the reorganization SQL statements; I use something like this:
CREATE TABLE [IndexReorgScripts](
[ReorgSQL] [varchar](350) NOT NULL,
[RunDayNum1] [smallint] NULL,
[RunDayNum2] [smallint] NULL) 
Now from a connection to your MSD, run this query:
INSERT INTO IndexReorgScripts
SELECT 'ALTER INDEX ' + [name] + ' ON [dbo].' + OBJECT_NAME(OBJECT_ID) +
FROM sys.indexes
where type_desc = 'NONCLUSTERED'
and fill_factor > 0
and is_disabled = 0
This will build all the statements to reorganize the SQL indices. The two RunDayNum columns are for you to set to a day of the month you want the index rebuilt -- I made two columns because I reorganize my indices on average twice a month.  Try to assign a broad range of days across the various tables.

Finally, schedule a SQL job to run this script every evening:
DECLARE @ReorgStatement varchar(350)
DECLARE @ThisDayOfMonth AS smallint
SET @ThisDayOfMonth = DAY(GETDATE())
  FROM [IndexReorgScripts]
         AND ( [RunDayNum1] = @ThisDayOfMonth
OR [RunDayNum2] = @ThisDayOfMonth )
OPEN IndicesToReorg;
FETCH NEXT FROM IndicesToReorg into @ReorgStatement
   FETCH NEXT FROM IndicesToReorg into @ReorgStatement
CLOSE IndicesToReorg;
DEALLOCATE IndicesToReorg;
Now you're all set to check in every morning and find another handful of indices got rebuilt!  The method isn't foolproof; you will need to occasionally update the table with additional indices that folks have added, or remove those that got disabled.  Still though quite a lifesaver.

Monday, April 17, 2017

Grokking Query Plans

In an earlier post I gave you some good clues to identify the problematic queries on your instance. After you've finished the initial review in the Designer to resolve bad joins and poorly formed filters, you should click the button "Show Execution Plan". Right off the bat you may see a red warning of a missing index. If the projected savings are high then you should stop and build that index first, before you continue examining the plan.

There are a few hints you should know about when reviewing an execution plan. First off check the width (the thickness) of the arrows joining the elements. Thicker paths indicate more voluminous records are traversing. Now examine the percentage shown underneath each element. They are approximate but indicate how much processing is dedicated to each step. Work from the bottom right corner of the graph to the upper left corner (for some reason the engine displays the actual dataflow sequence right to left, bottom to top).

Generally "scans" are way worse than "seeks," and operations on tables will be slower than the same on an index. The exception is a clustered index scan, which in actuality is a regular table scan (that has a clustered index).

To look at it strictly from the engine's viewpoint: the elapsed time for access depends upon the number of rows it has to read, multiplied by the quantity of times it needs to read them, multiplied by the number of pages it needs to read (determined by the width of the sum of the columns it needs to access and then write to tempdb). From that viewpoint you want to minimize read and write lengths, counts, and widths.

Hover your mouse over one of the query plan boxes and it will show you the estimated number of rows it expects to read, and how often it thinks it need to repeat this operation (number of executions). Table scans (and clustered index scans) of wide tables with many rows should be avoided.

Even with all its smarts the plan engine is only about half as bright as you. The optimizer won't always suggest a helpful index even if you could easily create one. If you examine the criteria in the where clause and the way tables are joined, you can almost always come up with an effective index encapsulating what the engine needs for its filter and joins (thus turning scans into seeks). Furthermore if the where clause has a range of selection (for example "> getdate() - 7") you can build a filtered index that will be even faster.

The primary objective is to narrow down the length and width of your resultset as early into the dataflow as possible, to reduce the stress of joins further up the line. Many times you can do this by building an included subquery, or by splitting out the keys of the targeted records into a temp table to begin with in a first step. In other words, you want to delay adding in the fat wide rows until the end of the dataflow, when you have already applied the necessary filters to determine the count of rows you will be returning.

Spend a bit of occasional love reviewing the worst queries and if the fix requires more than just adding an index, pass your recommendations back to the development staff so they can learn and improve while going forward.

Monday, March 13, 2017


Most of the time I think it should be spelled "indexes" but okay whatever -- throughout this blog I'll follow my spellcheck and call them indices. There's so much to say about them, all of it fun. What makes indexing so much fun? You can secretly do it in the background (well, you can if you're on enterprise version). You can't screw up too much adding an index. There are all sorts of tools to help spot those that are missing. You get to look at SQL plans to figure out which indices the optimizer missed. They gradually fragment, thus providing you another monitoring opportunity. You get to play around with various fill parameters. You can reorganize them while online. Finally adding indices vastly improves queries and makes you a hero with the developers.

Having said all that, you do have to be fairly judicious with the time you allocate to these puppies. In a RAD environment most of your work will be building and evaluating them, but in a mature legacy system you will likely just have reorganizations or rebuilds scheduled and you won't pay them much mind. In any case there's enough to say about indices that it will stretch into a couple of blog posts, so herewith the details of evaluating, building, and disabling them.

To start, it makes a lot of sense to impose a naming standard at your shop. Nobody likes to see that index DTA_0013x_abcde_482 is fragmented. What the heck does that index do? In my shop we use names like ix_myTable_col1col2col3. To keep the names a useful length we might abbreviate the table or column names.

What indices do you need to build? This query will give you a good clue:
SELECT        mid.statement, round(migs.user_seeks * migs.avg_total_user_cost * migs.avg_user_impact / 100, 0) as imPactor,
    migs.unique_compiles, migs.user_seeks, migs.user_scans, migs.last_user_seek, migs.last_user_scan,
    migs.avg_total_user_cost, migs.avg_user_impact, mig.index_group_handle, mig.index_handle,
              mid.object_id, mid.equality_columns, mid.inequality_columns,
FROM            sys.dm_db_missing_index_group_stats AS migs INNER JOIN
                         sys.dm_db_missing_index_groups AS mig ON migs.group_handle = mig.index_group_handle INNER JOIN
                         sys.dm_db_missing_index_details AS mid ON mig.index_handle = mid.index_handle
WHERE        (mid.database_id = DB_ID())
and migs.last_user_seek > GETDATE() - 14
You need to select the database you are analyzing or include a Use statement and then run this for each dB you are inspecting. Should you build every index in this resultset? No, you need to use some good judgement, start at the top, and think about what you're doing. Certainly indices toward the top of the list with a couple key columns and a smattering of included columns make sense. On the other indices you need to use your better judgement.

Why not build them all? Well, indices take IO and CPU power to maintain. They immensely speed up SELECT queries, however they slow down the UPDATEs and INSERTs. Hence if you walk into a shop with a legacy database, you should run this query to see how the existing indices are actually being used:
SELECT AS object_name, AS index_name,
Case when (is_unique = 1 and is_primary_key = 1) then 'PK UNIQUE '
when is_unique = 1  then 'UNIQUE ' else '' end + sys.indexes.type_desc type_desc,
c.index_columns AS index_columns_key, s.used_page_count * 8   AS IndexSizeKB,
sys.dm_db_index_usage_stats.user_seeks, sys.dm_db_index_usage_stats.user_scans,
sys.dm_db_index_usage_stats.user_lookups, sys.dm_db_index_usage_stats.user_updates,
sys.dm_db_index_usage_stats.last_user_seek, sys.dm_db_index_usage_stats.last_user_scan,
sys.dm_db_index_usage_stats.last_user_lookup, sys.dm_db_index_usage_stats.last_user_update
FROM sys.objects JOIN sys.indexes ON sys.indexes.object_id=sys.objects.object_id
JOIN (SELECT distinct
object_id, index_id, stuff((SELECT ','+col_name(object_id,column_id ) as 'data()'
FROM sys.index_columns t2
WHERE t1.object_id =t2.object_id and t1.index_id = t2.index_id
FOR XML PATH ('')),1,1,'') as 'index_columns'
FROM sys.index_columns t1
) c ON c.index_id = sys.indexes.index_id AND c.object_id = sys.indexes.object_id
LEFT OUTER JOIN sys.dm_db_index_usage_stats
ON sys.indexes.index_id=sys.dm_db_index_usage_stats.index_id
AND sys.indexes.object_id=sys.dm_db_index_usage_stats.object_id
JOIN sys.dm_db_partition_stats  AS s
ON s.[object_id] = sys.indexes.object_id AND s.index_id = sys.indexes.index_id
WHERE sys.objects.type='u'                 --Only UserTables will be selected
AND sys.indexes.type_desc  <> 'HEAP'       --Only indexes will appear, not HEAP tables
AND sys.indexes.type_desc like '%NONCLUSTER%'
ORDER BY 6 desc
Again precede it with a "use" statement or select the database to run it against. The results of this query are useful for two different purposes, but you have to understand that the context of these results are statistics only since the last SQL restart. It's best therefore to review this after your instance has been up and running happily for a few weeks.

First scroll down toward the bottom of the result set. Indices that have zeros or nulls in all the columns of seeks, scans, and lookups aren't being used by the plan optimizer. If they have something in the updates column, then you are incurring the extra overhead of maintaining them without gaining any advantage from their use. Before you disable them however, do a courtesy check to see if they are perhaps badly fragmented -- sometimes this is why they aren't being used in the first place.

And that is the second value of this resultset: those indices shown toward the top of the list are the prime movers, the high optimizers of your database. You should check them regularly to see if they should be rebuilt or reorganized.

Saturday, February 11, 2017

Avoiding Phishing

Like all DBAs I suppose I'm irrationally paranoid: with the private records of millions of customer records at my fingertips I am acutely aware of any security threats. Besides the usual best practices of securing the firewall access and installing antivirus software on the clients, a DBA has to be particularly careful to avoid getting phished. An outside miscreant can approach you a couple of ways, by phone or by eMail.

Usually I won't even click on an eMail unless it's from somebody I know and isn't obviously spam. You probably shouldn't open an eMail from someone outside your organization if you weren't previously expecting it. Sometimes though the eMail will appear to be a legitimate communication from somebody you already do business with. Even then, I take special precautions. Never ever click on a link within such an eMail, and don't download images. Rather, save the eMail as an HTML file to disk, and then open it in a simple text editor. Now after verifying the link address corresponds to what you are expecting, manually copy and paste the relevant parts of the link into a browser. Be especially careful of intentional spoofs that may have been injected into the spelling.

Avoiding phish eMails tend to be less taxing than a phishing phone call. A lot of this is due to the immediacy of talking to somebody who is expecting a verbal response. I have a twofold defense to this: first, be the one who is controlling the path of the conversation. Second, work from a prepared script. Don't answer any questions unless the information you are providing helps move the conversation in the direction you wish it to resolve. Use courteous deferral and the handy excuse of "sorry but our vendor security operating procedures require me to first find out...".

I could post the exact script that I've found helpful here, but having such a document in the public domain would defeat its purpose as it encourages "social engineering" to bypass its safeguards. Rather, I will describe its contents in general terms, so you can write your own prepackaged talk track.

The primary purpose of your script should be to positively validate the identity of the caller. You will need to gather enough information about the caller to independently verify who he is and who he works for. Besides his full name, other useful factotums are:
  • work eMail address
  • private (not work) cell phone number
  • name of employer
  • city where his employer is located
  • department he works for
  • name of his supervisor
  • how they obtained your phone number
I'm sure you can think of other things to ask. Any reputable vendor account manager should readily be able to provide this information without hesitation. If the caller questions why you need this information explain that your company policies require positive identification of vendors. Whether you record your phone call or not is up to you, but regardless I like saying "this call may be recorded for training or quality assurance purposes." You'd be surprised how many scammers immediately hang up after that little tidbit.

Once you've obtained the above information you may ask the caller how you can help them. At this point you don't want to provide any useful information, but you do want to gauge the nature and the scope of their inquiry. Reply with most of the inquiries with "I can get that for you." Then you need to perform a "backflip" -- ask the caller to hold on, put the phone down briefly, take a slow breath to calm and center yourself, then tell the caller you will need to call them back as something urgent has arisen. Ask for the best time to call, thank them for calling, and hang up.

Now is when the actual work starts. If you are indeed planning to return this call you will need to go through the effort to validate who they are and who they work for. Depending on how suspicious you are, some good avenues to explore include:
  • call the main switchboard of their employer and ask for them by name
  • find their profile on LinkedIn to verify it shows the same employer
  • use a web service to text a verification code to their cell phone
  • call their supervisor and ask for verification of employment
  • send a message to his eMail address asking for a reply, then verify the routing information in the reply's header
And there's always Google. This discourages most casual phishing, but always be alert to anything suspicious that might compromise your security, and never provide an outsider with any information that isn't abolutley necessary for them to work effectively with you.

Tuesday, January 17, 2017

That Old Cleanup

Once you're consolidating information into your database tracking worksheet, one thing you'll quickly discover is that your shop has muckloads of old data lying about. Some are completely obsolete databases, some of them are cranky old former incarnations of databases that are now active elsewhere, duplicate test or QA versions, or ghosts of playthings that former employees forgot to delete. If you examine the directories where databases are housed you will even find remnant mdf and ldf files that somebody detached (but never deleted). Inside of existing active reporting databases you will find large obsolete tables, former versions of tables, and old historical data in the actively referenced tables.

Cleanup and archiving is a royal mess, but it's the first step in organizing your storage and then reallocating filegroups appropriately. Here are some general principles and guidelines to help resolve this headache.

The first principle is to err on the side of caution. Before you permanently remove any database take it offline for a couple of weeks to verify that no unanticipated processes are affected. If you have low-cost compressible storage available (say on an old file server) then zip the zombie databases there before you delete them.

If you encounter a database with lots of old date-versioned tables, the best trick I've found is to download a tool along the lines of TOAD. With their database explorer you can easily highlight rows, ctrl-c for copy, and then paste the resulting table names into an Excel spreadsheet. Sort out the tables you've targeted for deletion and run it past the staff to make sure they are indeed obsolete. Then you can prepend "drop table" in front of each row in Excel and copy and paste the rows back into a SQL editing window. Presto. (Keep a current backup just in case).

Archiving historical data from live tables is the most work. Depending upon the version of SQL you are running (Standard or Enterprise) you may tackle this with archiving or partioning, but in any case the last few steps typically require weekend or off-hours work to down the database. In the Standard environment my general approach is first to archive, then to shrink.

You can run the first step, archiving, while everyone is blissfully plugging away at work. You may choose to keep the archive in a new table in a separate database, or you may export it to a text file for zipping to your low cost compressible storage. To avoid having too severe an impact on production I like performing the archival copy (or export) with SSIS, running on a view that selects perhaps a half million rows at a time.

Once you've got your archive you're ready for the more consumptive task (on the weekend with production stopped). Identify whereabouts on the clustered index you will be "cutting" the table. Select top(0) from the old table into a new working table. Make sure the new target table has the correct cluster index defined and any auto-increment Identity column with a seed. Now set identity insert on for the new working target and use "Select Into" with a tablock hint to load the data you are saving into the new working table. Rescript the original table's non-cluster indices for a later step, remove any parent-child table constraints, and then you can drop the original table. Is your heart beating in your ears yet? Now rename the new work table back to the original table name and rebuild the non-cluster indices.  If you'd like you can recreate the parent-child referential integrity. There, that wasn't so bad now, was it?

Of course if you're on Enterprise edition you can use partitioned tables.  So much easier.