Friday, May 15, 2020

Azure Query Wizard

I don't know about you, but I sort of like Microsoft Azure. It's pretty easy to set up a database with as much processing power as you might need. Unfortunately, Microsoft limits the tool set you might otherwise use to create queries, so I put together a quick little wizard here to help you build more complicated queries.

Unlike other wizards, these screens lack the back button. Instead, just close a window and it will return to the previous screen. If you are using this on Azure, for the server name just type in the first qualified name (i.e Azure calls your server jeffc.database.windows.net -- just type in jeffc for the server name). You can also use this wizard with regular SQL Server; if you usually connect with Windows Auth (active directory) just leave the login name and password blank.

This windows software requires that you have the runtime framework for .Net 4.0. (this comes standard on windows versions 8 and later). For questions or comments please e-mail me at jeffdchapman@hotmail.com.

This shareware is licensed under the GPL; source code for it is on github, here.

Monday, October 15, 2018

Instance Optimization

Not too shortly after cataloging your shop's databases, it's time for some instance tuning. It's sad how many places have their operational guy (or yikes, a developer) install SQL straight off the CD onto a server, without a second thought as to tuning. Then they load up a bunch of databases and wonder why performance is so poor.

First set the instance memory. When SQL spins up it tries to grab all of the memory you allocated to it right off the bat. If you have multiple instances on the same server you should plan for each of them having their own memory space. After adding up the gigabytes used by your SQL instances you still need to allow some gigs for the operating system -- here is a useful guideline for what to allow.

The next option to set is the maximum degree of parallelism (so called maxDOP). I like to set this to less than half the CPUs on the box, usually considerably less. Look at it this way: if you have multiple long-running queries (running bulk updates or aggregations) at the same time, each one will take up the maxDOP number of processors until you reach the total count, after which new queries will signal-wait until a previous one finishes or gets swapped out. On an 8 CPU box 2 or 3 maxDOP is usually plenty.

Next (for SQL2012 or lower) I like to check that the "Instant File Initialization" option is turned on from the operating system. Unless you're running some high-security dB on crappy attached storage, you should always have this turned on.

Lastly you'll need to allocate sufficient space and file duplicity on tempDB. I've heard folks say that you should create a tempDB file for each CPU, but I prefer to start with 3 mdf files, and then monitor it for stalls (see this post about monitoring file IO); after that increase the count as necessary. How you size them depends greatly on how the instance is used: instances where large BI aggregations run need substantial tempDB space. See this post where I have more to say about sizing tempDB.

After an instance has been up and running for a while run this query that shows total waits:
WITH [Waits] AS
    (SELECT
        [wait_type],
        [wait_time_ms] / 1000.0 AS [WaitS],
        ([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS],
        [signal_wait_time_ms] / 1000.0 AS [SignalS],
        [waiting_tasks_count] AS [WaitCount],
        100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage],
        ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum]
    FROM sys.dm_os_wait_stats
    WHERE [wait_type] NOT IN (
        N'BROKER_EVENTHANDLER',             N'BROKER_RECEIVE_WAITFOR',
        N'BROKER_TASK_STOP',                N'BROKER_TO_FLUSH',
        N'BROKER_TRANSMITTER',              N'CHECKPOINT_QUEUE',
        N'CHKPT',                           N'CLR_AUTO_EVENT',
        N'CLR_MANUAL_EVENT',                N'CLR_SEMAPHORE',
        N'DBMIRROR_DBM_EVENT',              N'DBMIRROR_EVENTS_QUEUE',
        N'DBMIRROR_WORKER_QUEUE',           N'DBMIRRORING_CMD',
        N'DIRTY_PAGE_POLL',                 N'DISPATCHER_QUEUE_SEMAPHORE',
        N'EXECSYNC',                        N'FSAGENT',
        N'FT_IFTS_SCHEDULER_IDLE_WAIT',     N'FT_IFTSHC_MUTEX',
        N'HADR_CLUSAPI_CALL',               N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
        N'HADR_LOGCAPTURE_WAIT',            N'HADR_NOTIFICATION_DEQUEUE',
        N'HADR_TIMER_TASK',                 N'HADR_WORK_QUEUE',
        N'KSOURCE_WAKEUP',                  N'LAZYWRITER_SLEEP',
        N'LOGMGR_QUEUE',                    N'ONDEMAND_TASK_QUEUE',
        N'PWAIT_ALL_COMPONENTS_INITIALIZED',
        N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP',
        N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP',
        N'REQUEST_FOR_DEADLOCK_SEARCH',     N'RESOURCE_QUEUE',
        N'SERVER_IDLE_CHECK',               N'SLEEP_BPOOL_FLUSH',
        N'SLEEP_DBSTARTUP',                 N'SLEEP_DCOMSTARTUP',
        N'SLEEP_MASTERDBREADY',             N'SLEEP_MASTERMDREADY',
        N'SLEEP_MASTERUPGRADED',            N'SLEEP_MSDBSTARTUP',
        N'SLEEP_SYSTEMTASK',                N'SLEEP_TASK',
        N'SLEEP_TEMPDBSTARTUP',             N'SNI_HTTP_ACCEPT',
        N'SP_SERVER_DIAGNOSTICS_SLEEP',     N'SQLTRACE_BUFFER_FLUSH',
        N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
        N'SQLTRACE_WAIT_ENTRIES',           N'WAIT_FOR_RESULTS',
        N'WAITFOR',                         N'WAITFOR_TASKSHUTDOWN',
        N'WAIT_XTP_HOST_WAIT',              N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG',
        N'WAIT_XTP_CKPT_CLOSE',             N'XE_DISPATCHER_JOIN',
        N'XE_DISPATCHER_WAIT',              N'XE_TIMER_EVENT')
    AND [waiting_tasks_count] > 0
 )
SELECT
    MAX ([W1].[wait_type]) AS [WaitType],
    CAST (MAX ([W1].[WaitS]) AS DECIMAL (16,2)) AS [Wait_S],
    CAST (MAX ([W1].[ResourceS]) AS DECIMAL (16,2)) AS [Resource_S],
    CAST (MAX ([W1].[SignalS]) AS DECIMAL (16,2)) AS [Signal_S],
    MAX ([W1].[WaitCount]) AS [WaitCount],
    CAST (MAX ([W1].[Percentage]) AS DECIMAL (5,2)) AS [Percentage],
    CAST ((MAX ([W1].[WaitS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgWait_S],
    CAST ((MAX ([W1].[ResourceS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgRes_S],
    CAST ((MAX ([W1].[SignalS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgSig_S]
FROM [Waits] AS [W1]
INNER JOIN [Waits] AS [W2]
    ON [W2].[RowNum] <= [W1].[RowNum]
GROUP BY [W1].[RowNum]
HAVING SUM ([W2].[Percentage]) - MAX ([W1].[Percentage]) < 95; -- percentage threshold
GO
Take a look at the top one or two waits on your instance and then do a bit of online research to discover the cause, and then fix it! There, you've now done the bulk of your instance tuning.

Tuesday, July 17, 2018

Monitors Are Up!

I spend a good ten percent of every day watching monitoring tools. I have a handful of Dynamic Management View (DMV) queries and some stored procs that I like to run (I covered those in this previous post) but I also use three exceptionally helpful third-party tools. Two of them are free and one definitely is not. Still though I don't know what I'd do without any of them.

A happy SQL needs lots of disk space: tempDB, log files, databases, backups... there always seems to be -more- data growing its tentacles (never -less- oddly enough). As long as you're managing the storage allocation on the SQL boxes frankly you might as well be managing it across your entire shop, so you need a tool like Free Disk Management Software, FDMS. You need to be included in the Administrator group on each server; after that the setup allows you to select which disks you're actually interested in monitoring. Use the Options to set more appropriate bar-split color levels, and slow down the sampling to once every half hour. If you sort the display by free-space (low to high) then the problems obviously aggregate toward the top.

I've read several interesting posts on using the built-in Windows Performance Monitor to measure key SQL metrics, but to get serious you'll want to download this terrific free tool, Magnalis PerfPro, that wraps PerfMon in customizable MDI-mode windows with various display options. First though definitely perform an online search of "using PerfMon to monitor SQL" so you'll know which counters to add.

Finally, the semi-expensive monitoring tool that is worth every penny you spend: Redgate Monitor. This takes time to install and tweak properly, but with its excellent historical recordkeeping, variable instance-specified alerts, and flexibly filtered notification-clearing this tool will help you out with both your immediate concerns as well as your deep research.

Thursday, May 17, 2018

A Reasonable DR

I wracked my brains over Disaster Recovery concepts for several months before I reached a viable plan. The key was when the light bulb went on that you must plan for two distinct types of disasters (call them major and catastrophic). You need to plan for both, but quite possibly you can be a little more lenient on yourself for the catastrophies. I'm not going to get too far into the weeds with my specific implementation as your's will likely be different. How you recover from each disaster really depends on the perceptions of your customers (what is their range of allowable excuses) and your employer's ownership contingencies (what is their insurance like and what are their long-term objectives). Things that you read in the "trades" about recovery time objective and recovery point objective (RTO and RPO) are still relevant, but they are different under each of the scenarios.

A "major" is when the halon failed and a fire in your server room melted your NAS. Maybe some other equipment survived, and all the hardware outside the server room is just fine. Under this scenario both your customers and your management will extend you a fair amount of sympathy but they still expect some kind of reasonable recovery efforts. It widely varies by circumstance, but in my case they would expect that after a couple of days our systems would be up and running again, and at worse we might have to enter a half day's data over.

A "catastrophic" is when Hurricane Jeff rolls in from the gulf, your whole building is flooded, half the state is without power for two weeks, trees are down, roads are impassible, and the national guard is called out to maintain order and provide drinking water. In my case I suspect management could actually be called in to decide whether to walk away from the business by declaring a total insurance loss. If they did decide to continue, they could probably convince our customers that a month long gap in service was totally justified (of course we'd waive a couple months of bills out of courtesy).

You have to plan for recovery from a major disaster with the resources at your near immediate disposal. It helps to have both onsite and offsite copies of your software and your data; onsite copies are typically faster to access and should be counted on as the primary recovery source. Offsite resources might be compromised by the disaster (such as if the fire took down your switch hardware servicing your cloud provider). Offsite resources however may provide some resilience should you not end up with a complete set of onsite backups after the disaster.

You have to plan for catastrophic recovery from solely the offsite resources. You may need to procure new hardware in a new location and your old location might be permanently inaccessible.

Once you've got these concepts firmly in mind and know for each the RTO and RPO, you can work backward to determine your backup strategy. It doesn't make sense to backup your log files locally on a frequency greater than your major disaster RPO, and to perform offsite database log copies more frequently than your catastrophic RPO.

Planning to meet the Recovery Time Objective is rather more nuanced, and likely takes actual practice to discover. How long does it take you to restore a copy of each of your databases? How long to restore the subsequent differentials and log backups? Until you try it you won't know for sure. Also consider that your charges are usually blessed with running on the fastest IO hardware; in a disaster or catastrophe you may well be restoring to considerably slower boxes.

There now, that was fun, wasn't it!

Thursday, April 12, 2018

Architecture

When you're the DBA in a shop undergoing vigorous software development, one of the knottier challenges is how to specify which production SQL instances to order, as well as how to allocate your databases across those instances. The decisions in this process fall under the general rubric of "data architecture". Several design aspects interplay to pull your decision-making into conflicting directions. The big four are:

- cost
- execution efficiency
- ease of ongoing instance management
- security

You won't be able to provide a clearcut recommendation nor is there a process that magically defines a "best" answer. Also the parameters change over time, as new projects create new databases, and advances in hardware change the optimal choices. You need to weigh the pros and cons of all the options and come up with a "gut feel" configuration that appears to provide the best balanced approach.

Let's delve into each of the important factors to arrive at a happy medium. This process plays out similar to a game of ping-pong, where you smack the instance counts back and forth between high and low until you settle into a reasonable compromise.

1) Cost: Fairly early on you need to plan for the full operational workload, with the features of the database engine that you will require. "Enterprise" versions of SQL Server, although considerably more expensive, often provide the crucial functionality that a large implementation will require, such as data compression, advanced encryption, and a more thorough utilization of modern hardware and cloud-based blended resources.

Some of your databases may not require this complete functionality however: many vendor-supplied packages posses less severe operational requirements and work fine in a "sandboxed" engine. Often you can host "reporting" databases on simpler engines as well.

For future flexibility it's helpful to have a minimum of three instances, one of which is Enterprise. Normally you would purchase more instances depending upon the additional factors discussed below. So we've started by ping-ponging the low end to three. Yes you could throw hundreds of databases on each instance, but this certainly isn't optimal.

2) Execution efficiency: The multiple databases that reside on an instance naturally compete for resources: CPU threads, IO throughput, disk space, and tempDB all experience a shared impact that can lead to resource contention. Each individual database would prefer to have the instance all to itself, but if you follow this route it quickly becomes prohibitively expensive.

Besides, outside of small vendor supplied systems, most software needs to facilitate reads, updates, and data movement across various company databases; having databases with heavy non-batch cross references on the same server provides faster operations as it reduces the network traffic.

Therefore to establish a top bound on the number of instances, group your databases by non-batched cross reference activity. You can also group databases onto an instance by other similar attributes, such as common departmental access, or parallel functionality (eMail related databases clump together, reporting databases clump together, etc). Avoid putting large databases on the same server though if you anticipate it would cause high contention of CPU threads, IO throughput, disk space, or tempDB.

3) Ease of ongoing instance management: calculating the last section likely gave you a dozen or so instances (or more). Now it's time to think about the operational overhead of maintaining all those instances. For one thing, you will need to occasionally down an instance to apply a service pack or patch rollup. You also will build and maintain the automated backup and index optimization plans on each server. Some time in the future you may be even consider upgrading each engine to a higher release version. Once in a while you want to review the SQL error logs on each box.

Clearly then "ease of management" pushes you toward a smaller, further consolidated set of instances. Some helpful guides to further shrink your instances include whether workloads are similar (reporting, batch or OLTP), and if large jobs that you anticipate might be running can be scheduled to avoid stepping upon one another (and avoid stepping upon your scheduled maintenance plans).

4) Security: Now that you've narrowed down and consolidated your instances as much as possible, one final slight expansion might be appropriate to accommodate specialized security considerations. For example, most businesses are especially sensitive about protecting their financial data from prying eyes, so it is often a sensible choice to host the finance ERP on its own server.

In some settings the customer data may warrant exceptional auditing, protection, or encryption safeguards, so it's not unusual for that to be segmented off to the side as well. Many times externally supported vendor-supplied systems may also best be positioned on their own server, not only to more clearly separate update and maintenance responsibilities but also to delineate a security environment for the VARs with offsite access.

It may feel vaguely unsatisfying to heuristically arrive at a solution to such an important decision. Yet after you've ordered, setup, configured instances and then allocated the databases you have a brief window of a year or so where you still have the flexibility to shift things around (before the subsequent effort to reconfigure becomes too overwhelming). Once lots of new production software gets implemented with connection strings set to various locations, the architecture gets fairly tightly "baked in".