All Your SQL Server: Backup-Recovery

Showing posts with label Backup-Recovery. Show all posts

Thursday, May 17, 2018

A Reasonable DR

I wracked my brains over Disaster Recovery concepts for several months before I reached a viable plan. The key was when the light bulb went on that you must plan for two distinct types of disasters (call them major and catastrophic). You need to plan for both, but quite possibly you can be a little more lenient on yourself for the catastrophies. I'm not going to get too far into the weeds with my specific implementation as your's will likely be different. How you recover from each disaster really depends on the perceptions of your customers (what is their range of allowable excuses) and your employer's ownership contingencies (what is their insurance like and what are their long-term objectives). Things that you read in the "trades" about recovery time objective and recovery point objective (RTO and RPO) are still relevant, but they are different under each of the scenarios.

A "major" is when the halon failed and a fire in your server room melted your NAS. Maybe some other equipment survived, and all the hardware outside the server room is just fine. Under this scenario both your customers and your management will extend you a fair amount of sympathy but they still expect some kind of reasonable recovery efforts. It widely varies by circumstance, but in my case they would expect that after a couple of days our systems would be up and running again, and at worse we might have to enter a half day's data over.

A "catastrophic" is when Hurricane Jeff rolls in from the gulf, your whole building is flooded, half the state is without power for two weeks, trees are down, roads are impassible, and the national guard is called out to maintain order and provide drinking water. In my case I suspect management could actually be called in to decide whether to walk away from the business by declaring a total insurance loss. If they did decide to continue, they could probably convince our customers that a month long gap in service was totally justified (of course we'd waive a couple months of bills out of courtesy).

You have to plan for recovery from a major disaster with the resources at your near immediate disposal. It helps to have both onsite and offsite copies of your software and your data; onsite copies are typically faster to access and should be counted on as the primary recovery source. Offsite resources might be compromised by the disaster (such as if the fire took down your switch hardware servicing your cloud provider). Offsite resources however may provide some resilience should you not end up with a complete set of onsite backups after the disaster.

You have to plan for catastrophic recovery from solely the offsite resources. You may need to procure new hardware in a new location and your old location might be permanently inaccessible.

Once you've got these concepts firmly in mind and know for each the RTO and RPO, you can work backward to determine your backup strategy. It doesn't make sense to backup your log files locally on a frequency greater than your major disaster RPO, and to perform offsite database log copies more frequently than your catastrophic RPO.

Planning to meet the Recovery Time Objective is rather more nuanced, and likely takes actual practice to discover. How long does it take you to restore a copy of each of your databases? How long to restore the subsequent differentials and log backups? Until you try it you won't know for sure. Also consider that your charges are usually blessed with running on the fastest IO hardware; in a disaster or catastrophe you may well be restoring to considerably slower boxes.

There now, that was fun, wasn't it!

Tuesday, August 15, 2017

All Them Logs

Oh yes those dear transaction logs: can't live with them, can't live without them. Understanding how transaction logs work and how to manage them is the key to having a happy DBA life. The transaction log is basically a copy of all SQL statements run against your database, (except for things like bulk inserts in Bulk Logged mode, or inserts made when traceflag 610 is on). In a disaster on a database set on Full Recovery you can obtain a fully functioning database by first restoring the last complete backup, and then restoring any subsequent log backups. You may still of course have to manually run any bulk imports executed after the last dB backup, but that's seldom an issue.

The nice thing about logs is they backup really fast (they are after all just a bunch of SQL). I can back up a log for a day's worth of activity on a heavy production database in about 15 minutes. The nasty things about logs is they restore rather slowly (as the engine has to re-execute all that SQL). My 15 minute backup takes a good hour and a half to restore. The other nasty thing about log files is, if you go with Instant File Initialization and autogrow, then you need to manage their size by shrinking them. If the device hosting your log files for a database runs out of space, then that database won't execute any more transactions. No bueno.

Hence it's a good idea to add a customized alert to RedGate monitor when your log file expands beyond a certain size. Also I use a little trick to "flip" between different copies of the log file: right click on your database, choose properties, and then files. Add a couple files of type "log" but leave their size small and turn off autogrowth. Now when your primary log file gets too large, turn autogrowth on one of the small files, and off on your primary. After the next log backup you should be able to shrink the primary log file down to nearly nothing (well if there aren't long pending transactions still out there). In this manner you keep occasionally swapping between active log files as their size dictates.

Another nifty trick is to allocate an extra spare log file to a different physical device (but leave it small without growth). That way in an emergency where your disk drive hosting the logs fills up, you can just click into properties and turn autogrowth on your spare log file. Once that's done, backup and shrink the primary logs.

Of course if you don't want to hassle with the logs, don't need point-in-time recovery, and don't need them for log-shipping to a synchronized database, just leave your database on Simple recovery. You can then shrink the logs at your own leisure (usually after a full dB backup) and you won't have to worry about backing up the logs themselves.

Wednesday, September 14, 2016

Recovery Full or Simple

Once you've got your databases identified and organized into a spreadsheet and accomplished the preliminary cleanup, the next step to getting your act together is to determine the recovery model on each database. Well, maybe some basic instance tuning, but we can leave that for another blog post.

You only get two choices for the recovery model, full or simple, but I still can't believe how much difficulty folks have when choosing between the two. One thing you'll notice right off the bat is software developers are your worst resource for intelligently making this choice. You need to ping them for critical information to measure certain volatility features, but after that the choice really should be up to you. You're the one that will be managing disk space, backups, and restores.

There are only two key questions you need to ask yourself to make a final determination on what recovery model to choose for each database: 1) would it kill me if I was asked to restore this database to a precisely specific point in time and could only get it to the closest day? 2) Am I willing to manage log file backups and shrinks on this puppy?

If your answer to both questions is yes, then you need the full recovery model. Otherwise, use the simple model. But getting to the answer of these two questions is really the bulk of what I'd like to chat about.

Despite all the confusing things you will read about the recovery model, the gist of the difference between full and simple is the ability to perform "point in time" recovery. On the simple model you can restore whatever backup you want to your database, then any differential you ran, and you'll get the state of the database naturally from when you made that last differential backup. On the full model, you can additionally restore all or portions of the log file backup(s) to reflect the transactions that happened after your last differential.

Therein lies the rub: on simple recovery your log files don't grow much, but on full they can get quite large. On databases set to full recovery you will need to periodically shrink the log files, and of course you need to add them to your maintenance plan to back them up as well.

This is where that spreadsheet of databases is helpful. If a production database exhibits a highly combined level of Write Coupling and Software Volatility then you'll likely need to bite the bullet and set it on full recovery model. Very seldom will I set a development, QA, or staging database to this model though. A better way to manage high software volatility on a development database is to make sure your developers are using good quality source control. If a production database is written to at random times by all sorts of batch and OLTP systems or your developers are constantly rolling out production upgrades, then you'd best stick with full recovery.

Monday, June 6, 2016

Backup Restore

More than anything, a DBA provides a company a solid underpinning confidence that you are "protecting" the databases. Between periodic maintenance chores and the scheduled backups, you are supposed to make sure that come hell or high water the company's data will always be safe, reliable, and recoverable. In fact you can pretty safely assume that should you under any circumstances "lose" a database, you will also lose your job.

Setting up and then keeping track of the backups is a large part of this chore. Once you've determined whether to set each dB on Full or Simple recovery, it's time to create a maintenance plan on each instance. For some peculiar reason known only to Microsoft, I've only been able to consistently do this RDP'd into the server with SSMS fired up connected to (local). I like to keep my backup jobs separate from all the other maintenance, so when I run the maintenance Wizard I select just the three options of Log Backups, Database Backups, and Backup Cleanup. That's not the order in which the checkboxes appear, but when you arrange them, sequence them in that order.

Needless to say, you don't place your backup files on the same device where you host your log or mdf files. One lesser known useful fact however is, although you can't select it from clicking on treeviews, you can manually type in a path to a UNC share (\\myBackupSrv\sqlBackkups, or even \\10.10.44.44\myStuff) ahead of your backup file name. You can even address remote shares on the cloud this way.

Once your plans are in place, you need to verify they are working properly. Things get more complicated once you backup to a cloud vendor: not only do you have your cloud backup routines but for safety you should keep a local backup copy occasionally as well. Also if you are hosting gigantic databases supporting OLTP then chances are it makes more sense to run daily differential backups instead of full backups (with maybe a weekly or monthly full depending on how much the dB grows and changes). After a while therefore you can end up with a fairly complicated blended backup strategy.

If you're like me then, you'll want to open up your SSMS every morning and examine the Explorer Details window. Add the column "Last Backup" and sort it, and right away you know what needs some attention. Also you will find this query helpful:

USE msdb
GO
SELECT        b.backup_set_id, b.name, b.user_name, b.database_creation_date, b.backup_start_date, b.backup_finish_date, b.type, b.backup_size, b.database_name,
                         b.server_name, b.machine_name, b.recovery_model, b.is_snapshot, b.is_readonly,
                         b.is_single_user, b.is_damaged, f.physical_drive, f.physical_name, f.state,
                         f.state_desc, f.is_present, b.is_copy_only, m.physical_device_name
FROM            backupset AS b INNER JOIN
                         backupfile AS f ON f.backup_set_id = b.backup_set_id INNER JOIN
                         backupmediafamily AS m ON b.media_set_id = m.media_set_id
WHERE        (f.physical_name LIKE '%.mdf%') AND (b.backup_start_date > GETDATE() - 14)
ORDER BY 1 DESC
GO

Often I'll add into the where-clause a specific b.database_name if I'm looking specifically for the backups for just one database. Note the results of this query show not only the dB backups but also the log, differential, and VM snapshots. It's useful to recognize that when you do a Restore through the user-interface, these are the tables the system references to reach the correct restore file suggestions.

In some environments, as when you have another team moving reporting databases around, it's helpful to see the inverse condition (when were the restores performed?):

USE msdb
GO
SELECT DBRestored = destination_database_name ,
        RestoreDate = restore_date ,
        SourceDB = b.database_name ,
        SourceFile = physical_name ,
        BackupDate = backup_start_date
FROM    RestoreHistory h
        INNER JOIN BackupSet b ON h.backup_set_id = b.backup_set_id
        INNER JOIN BackupFile f ON f.backup_set_id = b.backup_set_id
where physical_name like '%.mdf%'
ORDER BY RestoreDate desc
GO

What's great about this query is it shows the name of what backupset was used to restore each database and the date that backup got created.

Naturally (or maybe I should phrase that as "unfortunately") if you yourself are asked to do a restore, a good percentage of time it will be under duress with the boss breathing down your neck. Hence I always keep this template by my side to pull up in times of emergency:

RESTORE DATABASE [ThisBizBI]
FROM DISK = N'\\bkupStore\sql backups\DEV-ThisBizDB\ThisBizBI\ThisBizBI_backup_2014_08_21_040010_8176618.bak'
WITH FILE = 1, MOVE N'ThisBizBI' TO N'E:\SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\ThisBizBI.mdf',
MOVE N'ThisBizBI_log' TO N'E:\SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\ThisBizBI_log.ldf',
NOUNLOAD, STATS = 10

GO

\\*------------------------ *\\

ALTER DATABASE [ThisBizBIDW] SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO

RESTORE DATABASE [ThisBizBIDW]
FROM DISK = N'\\bkupStore\sql backups\OLTP2SRV\ThisBizBIDW\ThisBizBIDW_backup_2014_09_08_040001_0041150.bak'
WITH FILE = 1, NOUNLOAD, REPLACE,
STANDBY = N'E:\XMPIEDB\MSSQL11.XMPIE\MSSQL\DATA\ThisBizBIDW_sby.BAK',
STATS = 10
GO

ALTER DATABASE [ThisBizBIDW] SET MULTI_USER WITH ROLLBACK IMMEDIATE
GO

\\*------------------------ *\\

RESTORE LOG [ThisBizOSDEV_Restore]
FROM DISK = N'\\bkupStore\sql backups\OLTP2SRV\ThisBizOSDev\ThisBizOSDev_backup_2015_05_10_010007_0679267.trn'
WITH FILE = 1, NOUNLOAD,
STANDBY = N'E:\XMPIEDB\MSSQL11.XMPIE\MSSQL\DATA\ThisBizOSDev_Restore_sby2.BAK',
STATS = 10
GO

This template shows three different variations of a restore; I seldom use any one of them straight vanilla but may modify or copy a line or two from one to the other to achieve the results that I desire. The first is a restore of one database over another, the second example is a restore of a live production dB (kicking off the current users in the process) and leaving the database in "standby" mode, and the third example is a logfile restore. If you haven't yet, do read up on restoring a database with NoRecovery or Standby -- these are the options to use if you need to restore more than a single backup file to achieve currency.