I don’t know if any of you have ever had to recover a master database before, but I had to recently, and it’s one of those bits of knowledge that’s good to have, but you don’t ever want to have to use it. You know, kind of like car insurance. You know you need it, but you don’t ever want to have a reason to use it.
So I’ll tell you about my experience – might not work for you – but it did for me, so I thought I’d share the experience.
It all started with a patch cycle, a server that wouldn’t come online, and we enter our little story with me staring at this entry in the SQL error log:
Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online
Now I’ve had to restore a master database before. It’s not fun. By the time you get to the point of actually having to do this, chances are you’ve done a bunch of troubleshooting, you’re already frazzled, the pressure is on, and clear thinking is both critically important, which, because of the pressure, and as elusive as a Sasquatch in Singapore.. Rebuilding a master database was just not something I wanted to do because of what I’d lose. Restoring it seemed like a better idea – so I did some thinking, and remembered something I’d learned in a PSS call a long time ago, in a galaxy far, far away…
It involved restoring the master database, but doing it in a way that didn’t involve using anything on the command line – it was just straight SQL.
I’m sure you’re thinking, “But… SQL won’t start!”
You are correct, young grasshopper, and have been listening along well…
But let’s not be afraid to think outside the box, shall we?
This was a clustered server, and it would start, but wouldn’t stay running no matter what I did. Looking at that error log gave me an idea, and the good thing was that I did have backups.
So here – I’ll give you the big picture in a nutshell, then we’ll go to details:
- The thing that SQL wants when it fires up is an MDF and LDF file from the master database.
- Given that, somehow, I needed to get not only an MDF and LDF file for the master database, but one from my toasted server.
- Once I could get them to that server, it ought to start up, because we know the backup’s good, right?
So let’s give this a shot.
Since I had the backups, I restored the latest one *as a user database* onto a test box – giving it a different name and putting the files in a location not standard for that box (that way the test box wouldn’t be confused by me trying to put two master.MDF files in the same directory).
I made sure the Test server was the same release/patch level/build number as the production server so I wouldn’t run into any inadvertent upgrades I wasn’t planning on doing.
So on my Test server (I wasn’t going to play with production, production was broke, I didn’t want to break it more) –
I created a folder with a name I could remember in a location I wouldn’t forget
For example, we’ll make a directory called C:\EMERGENCY_MASTER_BACKUP
I then copied the production backup into that directory, just to keep everything all in one place.
Now, remember when I was talking about restoring the master database as a user database? This is where this happened. I did a restore command on the test server – but did it a little differently, just so I could get those MDF and LDF files to exist, and to have them to use back on my production server.
Given the directory we made up above, I wrote a restore command something like this:
Restore database masterback
From disk = ‘c:\Emergency_master_backup\<whatever the name of the backup was>
Move ‘master’ to ‘c:\emergency_master_backup\master.MDF’,
Move ‘mast_log’ to ‘c:\emergency_master_backup\mastllog.LDF’
Note: this is a very standard restore command, only you’re using a master database backup, and you’re fooling SQL into thinking it’s a user database by changing the name. That’s it.
Examples of this code and further information on restores can be found here: http://msdn.microsoft.com/en-us/library/ms186858.aspx
I hit control E, and in a few seconds, had a user database sitting there on my box, looking for all the world like a master database, except it was a user database.
So what I wanted, remember, was the MDF and LDF files, and I couldn’t use them if SQL had its hooks in them, so I detached the database. That way I could copy them back over to my production box.
With great fanfare, I copied them over to the broken production server – and remember? All SQL wants is an MDF and an LDF file it can read. Given that those were the only two files that were screwed up, I renamed the existing ones on the server so SQL wouldn’t find them, (I think I called them *.MDFold – and *.LDFold – just so SQL didn’t recognize them) and then copied the ones I’d just detached from the test server into the same directory the screwed up files were in on the production server.
Understand, SQL was very stopped at this point.
What I’d done is taken the broken MDF and LDF files for the master database and replaced them with functional ones.
Next bit was to go into cluster administrator and see if I could start SQL.
I right clicked on the instance name, selected ‘Bring Online’ – and to my great relief, it fired up in about a second.
The next day I had to interview someone, and presented him with this situation. He gave me the absolute textbook answers – I was truly impressed with the depth of his knowledge, and feel that the textbook depth of his answers will be very, very useful when the times comes, but the experience of a few (in this case, Microsoft sanctioned from that PSS case those many years ago) shortcuts learned over the years helped me get the server up and running more quickly than it would have taken for me to dust off all the command line master database recovery commands that I’m not nearly as familiar with.