RSS

Tag Archives: Software

T-SQL Tuesday: My favorite Bug


Heyo –

Haven’t written a T-SQL Tuesday post in a long time – but this one sounded like fun, so here we go.

tsql2sday150x150

A number of years ago, when SQL 2005 came out – I have to admit, I didn’t like it at all.  At the time, there were no trackpads in our department, and a lot of the simple keyboard functionality I’d gotten used to from SQL 7 and sql 2000 were just gone.  We joked that you could take down an entire database department by walking around to people’s desks at lunchtime and stealing their mice.

It had a lot of bugs in it.  It took me just over 2 minutes after install to find the first one, and I started reporting them.  Daily.  This got to the point where I was sending them so often that one day when I had essentially a catalog of workarounds and didn’t send a bug report, the PM over there sent me a note asking if I was okay.  He was worried because I hadn’t sent in a bug report that day – and I’d been so consistent that – well, it just wasn’t right for him to start his day without the requisite gripe from Tom.

But the biggest gripe of all was how slow the RTM version of 2005 was.  I didn’t understand how an app could be that slow, but it was, and one day I decided to just report the bug the official way, not the usual way, so I went out to the Connect thing, and, being in a less than grumpy mood that day, filed a bug.   Performance related.  Because I just needed something to do.

And darned if it didn’t eventually get closed.

As fixed.

I’m gonna guess you have to turn your speakers up just a bit if you still have a SQL 2005 RTM box around – but it’s fixed… Just ask Buck.  🙂

https://connect.microsoft.com/SQLServer/feedback/details/253524/feature-request-usability-performance-related

 

 

Advertisements
 
Leave a comment

Posted by on January 10, 2017 in Uncategorized

 

Tags: , ,

DPM… and “Why are my SQL Transaction Logs filling up?” (the answer… really)


Here at Avanade we run a lot of pre-release software from Microsoft, so we can work the bugs out, so to speak, and thus be able to get better solutions to the customers we’ve got.

One of the software packages that we use is the System Center Suite – including, in this case, the Data Protection Manager part.  It’s now in production, released product for some time, and we now run all our backups with it.

The thing that is totally weird for me as a DBA is that all of the backup jobs I had on the boxes got disabled, and DPM took over. It was like driving a car with an automatic transmission for the first time. “What do you mean I don’t have to do backups?”

If you have things set up properly (which is a deeper subject than this blog post is meant for) – it’s close to hands off – which, come to think of it, is still weird.

So, over the time we’ve had it – I’ve found that there are several things to be aware of in the implementation of it.  This is not an all-inclusive list of surprises, this is just a couple of the things that I found out – after researching them myself and finding a lot of bad information out there.

The information below is from my own experience, from conversations with Microsoft, and from research I’ve done.  That said, my goal is to help keep you out of the DPM weeds by helping you

  1. Understand how DPM handles SQL backups in the varying recovery models you can have (full, simple, etc.)
  2. Understand where it needs drive space. (this can be an absolutely evil ‘gotcha’ if you’re not careful)
  3. Edge cases.
  4. How items 2 and 3 can intertwine and get you into deep doo-doo and what you want to do to stay out of it.

So, ready?

Oh – assumptions:

You’ve got DPM installed, and for the most part, configured.  It’s working, but you have transaction log drives filling up on some of your servers, and it’s not really clear why.

Wanna know why?

Here’s the answer:

It’s because the UI is very unclear, because the documentation is unclear, (there was a hint of it on page 83) and because the things that would be obvious to a DBA simply aren’t mentioned.

So, having said that – let’s explain a little.

After a few years of running it, and we flog it to within an inch of its life, I’ve come to – if not make friends with it, then at least I’ll give it grudging respect.

But you have to know what you’re doing.

So first off: Settings.

You organize your backups in DPM in what are called protection groups.

Think of it as a glorified schedule, with lots of properties you can adjust.

Now when you’re creating the protection group, you can create it to back up file servers, Exchange servers, and SQL servers.  We’ll talk about SQL servers only here.

So when you back up a SQL box, you might have some databases (let’s say the system ones, Master, Model, and MSDB) in simple recovery mode, and the user databases in full recovery.

What you’d do is create two protection groups.

One for the system databases, for simple recovery.

And one for the user databases, for full recovery.

And this is where the first gotcha comes into play.

See, when you create that protection group, you’re going to come across a tab in the creation of it that gives you this choice… It’s like the matrix… Blue pill? Red pill?

And, when we were setting it up, we scoured the documentation to try to figure out how to set it up for databases that were in full recovery mode as well as doing mirroring.

And we kept running into problems with full transaction logs.

It turned out I wasn’t alone in this.

I googled the problem…

I binged the problem..

Heck, I even duckduckgo’ed the problem.

Everyone, it seemed, had the same question.

And the answers were surprisingly varied.

And most, honestly, were wrong.  (Note: that’s not to sound arrogant, this was a tough one to crack, so a lot of folks were coming up with creative ways to try to work around this issue)

Some folks were doing what we’d done initially *just* to keep the systems running. (manual backup, flip to simple recovery, shrink the logs, flip back to full recovery) – yes, we absolutely shredded the recovery chain, just shredded it – but the data in that application was very, very transient, so keeping it up and functioning was more important than keeping the data forever.

So while we were frantically – okay, diligently – searching for answers, managing the problem, we were also looking for a cure to the problem, because there was no possible way this could be an enterprise level application if it was behaving so badly… Right? There had to be some mistake, some setting we (and everyone else in those searches above) weren’t seeing, and it finally ended up in a support call.

My suspicion was that the transaction logs weren’t being backed up at all, even though that’s what we thought we were setting.

I’d been around software enough to know that clicking on an insignificant little button could wreak catastrophic results if that’s not the button you meant to push.

And this was one of them.

See, the databases that were in full recovery (namely those in that app that had the mirroring) were our problem children.  Databases in simple recovery weren’t.

It made me wonder why.

And one day, I was on the phone about another DPM issue, (for another post) and I asked the question, “So what exactly is happening if I click on this button versus that one? Because my suspicion is that the tlog is not getting backed up at all.”

And then I asked the more crucial question for me, and likely for you who are reading this:  “What code is being executed behind these two options?”

And the fellow over at Microsoft researched it for me and came back with this:

“In DPM, when we setup synchronization for SQL protection, those Syncs are log backups. DPM achieves that by running the following SQL Query

BACKUP LOG [<SQL_DB>] TO DISK = N'<Database_location_folder>\DPM_SQL_PROTECT\<Server_Name>\<Instance>\<DB_Name>_log.ldf\Backup\Current.log'

Recovery Points are Express Full backups. If we setup a group to run sync just before recovery points, DPM will create a snapshot from the replica and then create a Full Database backup (sync).

BACKUP DATABASE [<SQL_DB>] TO VIRTUAL_DEVICE='<device_name>'

WITH SNAPSHOT,BUFFERCOUNT=1,BLOCKSIZE=1024

In this case we will never have log backup thus no log truncation should be expected.”

What does this mean?

It means that if you have a database in full recovery, you will want to put it in a protection group that is set to schedule the backup every X minutes/hours like this:

In DPM, click on “Protection” tab (lower left), then find the protection group.

Right click on it and choose ‘Modify’ as below.

GEEQL_DPM_Modify_Protection_Group

Expand the protection group and pick the server you’re trying to set up backups for there – you’ll do some configuring, and you’ll click next a few times, but below is the deceptively simple thing you have to watch…  This dialogue box below – which will have the protection group name up at the very top (where I’ve got ‘GROUP NAME HERE’ stenciled in) can bite you in the heinie if you’re not careful.  So given what I’ve written, and from looking at this and reading what I wrote above – can you tell whether this is backing up databases in full or simple recovery mode?

GEEQL_DPM_Backup_Full_Recovery

See how it’s backing up every 1 hour(s) up there?

That means the code it’s running in the background is this:

BACKUP LOG [<SQL_DB>] TO DISK = N'<Database_location_folder>\DPM_SQL_PROTECT\<Server_Name>\<Instance>\<DB_Name>_log.ldf\Backup\Current.log'

We’ll get into more detail in a bit, but this means you won’t have full transaction logs.  This is the setting you want for the protection group you’ve got to backup your databases in full recovery mode (and the ones that are mirrored or in Availability Groups). The other option you have is to back up “Just before a recovery point” – which, if you’re thinking in terms of SQL and transaction logs, really doesn’t make a lot of sense.  We went through the documentation at one point, and I think we were right around 83 pages in before it gave an indication of what it *might* be doing here – but even so it wasn’t clear, but now we know.  So what you’d want to have in this protection group would be a bunch of databases in full recovery mode.  You might want to create different protection groups for different servers, or different schedules, that’s all up to you… The crux is, if it’s a database in full recovery mode, this is how you want to set it up, by backing up every X minutes/hours… Making sense?

Okay, let’s take a look at the other option…

GEEQL_DPM_Backup_Simple_Recovery

If you have a database in simple recovery, you’ll want to put it in a protection group that does backups just before the recovery point.  And that’s what the screenshot above does.    When you click on that radio button, the code it runs in the background if you’re backing up SQL databases, is this:

BACKUP DATABASE [<SQL_DB>] TO VIRTUAL_DEVICE='<device_name>'

WITH SNAPSHOT,BUFFERCOUNT=1,BLOCKSIZE=1024

And you should be set.

You can change the frequency of the express full backups by clicking on the ‘modify’ button in the dialogue above, and you’ll have quite a few options there.

Understand, you have several different buckets to put your databases in.

  1. Simple recovery (see above)
  2. Full recovery (see above)
  3. Whatever frequency you need for your systems (from above)
  4. Whatever schedule you need for your systems (from above)

Believe it or not, that’s it.

Put the right things in the right place, and DPM is close to a ‘set it and forget it’ kind of a deal.

However…

…there are some Gotchas and some fine print.  This is stuff I’ve found, and your mileage may vary – but just be aware of the below:

  • If you put a db that’s in simple recovery into the protection group meant for databases that are in full recovery, you’ll likely get errors with DPM complaining that it can’t backup the log of a database that’s in simple recovery mode. Since you manually put that db (in simple recovery mode) into that protection group (that’s configured to back up databases in full recovery mode), it will be your job to get it out and put it in the right protection group.  That will make the alerts go away.
  • If you put a db that’s in full recovery mode into the protection group meant for simple, you’ll fill up your transaction logs, fill up your disks, and your backups will fail, and you may, depending on a few factors, hork up your database pretty bad… (this is what most people complain about, and that will solve the disk space issue). And, since you (or someone on your team) likely put the db in the wrong protection group, putting it in the right protection group will be the first thing to do… Having enough space on your log drives is critical at this point – because DPM will start to make copies of your transaction logs as part of its backup process, and will need the room (as in, half of your transaction log drive).  More details below.
  • I’ve found a couple of Corollaries to go with this:
    • Corollary 1: DPM creates a folder on your transaction log drive called “DPM_PROTECT” -it stores copies of the transaction logs in there.  Those are the ‘backups’.
      • You have a choice between compressing backups and encrypting them…
      • If you encrypt them, they’re full sized, even if they’re empty.
      • So if you have transaction logs filling up 50% of your t-log drive – guess what’s filling up the other half?  (DPM’s t-log backups).  That DPM_PROTECT folder is a staging folder and is sometimes full, sometimes not (making it devilishly hard to monitor for), but you need to be aware that if that folder fills up half the drive, you’re running very close to disaster, and that’s when you have to start getting creative in your problem solving (see ‘examples’ below)
    • Corollary 2: DPM can be configured to put the DPM_PROTECT folder on a separate drive, which may suit your needs, and is a topic that will have to be discussed in a separate post, but if you run your transaction log drives pretty full, and have cheaper storage available, this might be an option for you to consider.  We don’t have ours architected that way, so it’s an option I’ve not tried.
  • Examples of things that can go very wrong (like the disaster mentioned above)
    • If you are working with a clustered SQL server and are getting alerts because your log file can’t grow, chances are it’s because your transaction log drive (or mountpoint) is full, and it will be full of both transaction logs, and DPM’s staged backups of the transaction logs.To fix this, you will either need to
      • Extend the drive/make it bigger  (assuming that’s an option for you) and then restart your DPM backups.
        • Note: DPM will likely want to run a ‘validation’ at this point, which will take some time.  My recommendation is to let it do that, but there’s a huge “it depends” associated with this one.  Sometimes – depending on how long things have been broken before you were able to get to them, you might find yourself essentially taking that database out of DPM’s protection group and starting over.  It breaks the recovery chain, but can be faster than letting DPM do its validation of what it thinks your latest backup is compared to your existing one (where you likely broke the recovery chain with the manual log backups)… Like I said, it depends..
      • (not advised, but if you’ve run out of options) backup the database once, and backup the log manually repeatedly (via SQL, not DPM, because you’re trying to empty the drive that DPM has filled up) until you can shrink the transaction log so you have space on the drive for DPM to actually stage a copy of the log file for backup.
        • Once you’ve done that, remember, you’ll have fixed one issue but created another, namely, your recovery chain ends in DPM where you started doing manual SQL backups.  Now you have backups in DPM, and a full + a bunch of log backups in SQL.  Make sure you have a full set of backups in case things go wrong.
    • You’re working with a mirrored server or a server that’s part of an availability group and the databases are in the wrong protection group (simple instead of full recovery)…. You’ve got transaction logs filling up from both the replication that’s involved in this, and transaction logs filling up because they’re not being backed up… It gets ugly.  I ran into this (and wrote about our resolution to it here) where we had issues with log file growth, high numbers of virtual log files, and an availability group with multiple geographically dispersed secondaries… It was, um… “fun…” <ahem>

So…  All this to say something very very simple: Pick the right recovery group, know what’s going on behind the curtains that are in front of what DPM is doing behind the scenes, and honestly, you should be good.  If you understand what radio button to select when you’re configuring the protection group, you as a DBA are about 90% of the way there.  Make sure your transaction log file drives are twice as big as you think they should be (or configure DPM to store them elsewhere), because chances are, you’ll be using half of the transaction log drives for the logs themselves, and the other half for temporary storage of the backups of those transaction logs.

Know what your protection groups will do for you… Know the gotchas, and DPM will, strangely enough, be your friend…

Take care out there – and keep your electrons happy.

 
11 Comments

Posted by on June 21, 2016 in Uncategorized

 

Tags: , , , , ,

Problem Solved: Lync Reporting issues


Awhile back we had a pretty significant series of issues with our Lync implementation that took some time to resolve. Now that it is, I thought I’d write up something for one of them to help others that have run into the same problem so that it can be fixed up once and for all.

Note: these notes come from my own personal experience with a dash of a case we had open with MSFT at the time.

Lync, for those of you who aren’t familiar with it, is Microsoft’s corporate instant messenger suite – which brings voice, (both individual and conferencing), sharing, and IM into one package. If there was ever any program I’ve ever used that’s changed how I work, this is it.  Calls, conferences, desktop sharing, all from the same app.  Truly well done.

That’s on the front end.

On the backend, however, it gets a little more involved – and for those running the app, there’s a lot of reporting that will allow you to monitor things like call quality, what the system is doing, and if there is trouble in paradise, as it were, you can use this reporting to start narrowing down the issues you’re running into, and that will help you both troubleshoot and resolve them.

And that’s the part that was giving us trouble.

It’d work for a bit, then give us weird errors, and the folks trying to troubleshoot global call/connection issues were completely blocked. So they were frustrated, and we needed to figure out how to fix it – not just this once, but fix it period.

The problem:

Lync reporting, when we got to the Monitoring Dashboard report, would often render an error like this:

What you'll get if the stored procedures haven't been run

What you’ll get if the stored procedures haven’t been run

If you can’t read the image, there are two error messages:

“Report processing stopped because too many rows in summary tables are missing in the call detail recording (CDR) database.  To resolve this issue, run dbo.RtcGenerateSummaryTables on the LcsCDR database.”

“Report processing stopped because too many rows in summary tables are missing in the Quality of Experience (QoE) database.  To resolve this issue, run dbo.RtcGenerateSummaryTables on the QoEMetrics database.”

You can find that I’m not the only one that ran into this error by just using your favorite search engine – Google, Bing, DuckDuckGo, whatever. (the results are pretty similar)

Bottom line – the error is the same: Some level of activity that’s been happening on the server is not reflected in the troubleshooting reports you want – and the error message has both good and bad parts to it.

The good: As far as error messages go, this one is surprisingly clear. Basically do what it says, and, depending on how much it has to summarize, it takes a few minutes and sums up a bunch of information that is amazingly useful in the reports our Lync team needs.

The bad: It exposes names of objects that someone might not actually want to have exposed, though people who see this are often the people who need to, so it’s a bit of a two edged sword.  The other thing is that there’s nothing to give us any indication of how often what’s mentioned in the message needs to run. I figured (as did many of the others who’ve run into this) there was some process that called this set of stored procedures at some defined period, but from what I saw in my research and what I experienced myself, that was not happening for the folks who were running into this. On a hunch, while was on a call with MSFT on a related issue, I discovered that the stored procedure referenced in the above screenshot needs to be run at least daily.

Well that’s not hard…

So my suggestions below are based on the following assumptions:

That you’ve got SQL installed on your server with SQL Agent running – this was something that seemed to be the culprit in a lot of the issues in the links above.

We depend on SQL Agent to run our automation, so it was running but the process/job/scheduler to run the needed code wasn’t there at all.  The below instructions fix that.

So I created a SQL job, and scheduled it to run once daily. Since the stored procedures are in different databases, I just wrote the execution commands (below) fully qualified, so you could do the same.

I also created some output files on the jobs just to be sure I could monitor what the jobs were doing for the first little bit, and guess what?

It worked.

Problem solved.

So – if you’re experiencing the issues described above and don’t have a job scheduled on your Lync SQL Server, do this:

  1. Create a SQL job on the SQL server that has your QoEMetrics and LcsCDR databases on it.
  2. Give the job a name that makes sense and fits your job naming conventions..
    1. I named mine: Lync Reporting Monitoring Dashboard Summary Numbers
    2. Put something like the below in the description field so that once the braincells you’re now using to solve this problem have been overwritten, you’ll still have a clue as to why you did this:
      1. LYNC – Error appears in reporting when tables haven’t been populated frequently enough. This code was missing and it is confirmed that it needed to be run daily with MSFT. It is scheduled as such. Questions? Contact <you or your SQL operations team>
    3. Click on ‘steps’
    4. Click’new’
    5. Add one step – you’ll have to name it… I named mine: RTC Generate Summary Tables – because that’s what it does.
    6. I leave the database set to master and paste the following code in.

–this code should be run daily per MSFT

EXEC      QoEMetrics.dbo.RtcGenerateSummaryTables

GO

EXEC      LcsCDR.dbo.RtcGenerateSummaryTables

GO

  1. If you want an output file, find your log drive and paste this in there (editing the <drivepath> below as appropriate):
    1. <drivepath>\MSSQL\Log\LyncPopulateSummaryTables_$(ESCAPE_SQUOTE(JOBID))_$(ESCAPE_SQUOTE(STEPID))_$(ESCAPE_SQUOTE(STRTDT))_$(ESCAPE_SQUOTE(STRTTM)).txt (that’s a little snippet from some of the best code I’ve seen at http://ola.hallengren.com)
  2. I scheduled mine to run at some hour when I’m asleep – and it now takes about 12 seconds to run daily.  You may want to adjust the schedule as needed for your environment
  3. Do that, and you should be able to forget about that report, and your Lync team should be able to know that it’s there every day – whenever they need it.

Take care – and good luck.

Tom

 
Leave a comment

Posted by on September 15, 2014 in Uncategorized

 

Tags: , , , , , , ,

Error 42d, SQL, and which kind of “Software”?


The other day we had a clustered server freak out on us.  On a four node cluster, one instance appeared to be randomly failing over constantly, never staying in one spot long enough to really do anything.  It was maddening, it was frustrating, and it was really hard to even catch up to it before it moved to another node. It also housed the database for a fairly important application.

Eventually I got the instance locked down long enough to get it to just sit still, and was able to look at the logs.

The weird thing was, while I was intently looking at the logs, I also found my mind wandering off in directions that I wouldn’t have expected – at least at that time in the morning, and with the pressure on so much – I mean, the server was just plain down and wouldn’t start.  Something was really weird, there was more than a little urgency to get things fixed, and yet my mind was wandering into places it just shouldn’t have been going right then.

Okay, Focus…

Let’s see what happen – and there again… I’m looking at problems software is causing, and ironically, it was – oh, how to put this delicately – another kind of ‘software’ that had my mind wandering.

And then I saw it – and I saw what my subconscious mind was trying to tell me… That the strange error I was seeing was a 42d.

Hmm… SQL would start, appear to run, I’d see a few lines in the errorlog – then boom, it’d failover, the one constant was that whole distracting 42d thing…

Hmm… I did some checking, and found that 42d is the hex equivalent of a logon failure.

What, so SQL can’t log on to the network itself?

No, wait, the Service Account can’t log on.

That’s a network account – and on a hunch I checked to see if for some reason that account was locked out – and sure enough, it was.

My colleagues over in AD land unlocked the account, I fired up SQL, and all was happy.

So…

Obviously, your mileage may vary, you might be facing a totally different issue, but if you find yourself thinking about software that has nothing to do with computers, or a double barreled slingshot about the time you’re trying to troubleshoot a server down, take a look at the SQL errorlog and check with your friends over in AD (that would be ‘Active Directory’) land  to see if that service account is locked out.

If it is, either unlock it or have your AD people unlock it, and you should be all set.

So remember, 42d can apply to two different types of software… It’ll be up to you to know which one to use where.

 
2 Comments

Posted by on March 20, 2013 in Uncategorized

 

Tags: , , , , , ,

Cool


When it comes to software – the word “cool” has been used so often, and – and I just can’t do it.

There are a few applications that are mind bending in how useful they are, how they connect dots that were never connectable before, but for the most part, I cannot bring myself to utter the word “cool” about most of the software out there.

I’ll tell you why – and as always it begins with a short story.

Back in – well, a few years back… my sister and I worked for a little software company that had been started by a couple of college dropouts and was being run in a small town just east of Seattle.

She and I had both also managed to drag our parents out of what we thought was the stone age and had gotten them a computer, and we spent hours, days, weeks going over and over and over with them what an “icon” was, the concept of “clicking and dragging”, and the realization that “just because you didn’t see it, doesn’t mean it’s gone.”  I’m sure some of you out there have worked with your parents in this way.  Doing “support” calls, leaning back in your chair, eyes closed, one hand holding the phone, the other holding your head, the bottle of aspirin nearby, trying to project this voice of calm and reason to a parent who simply doesn’t “get it”.

Now – understand – this is not to slam parents or their generation.  Oh, Lordy – That’s the last thing I’d want to do.  The thing that’s difficult is translating what you know and take for granted into something that’s so unfamiliar to them (if they can’t see it and feel it, it must not be there, right?) to something they can relate to and understand – without having them go over the edge in frustration.  I’ve had a couple of calls with my mom that lasted two hours (almost as much time as it would take to drive down there, do it, and drive back) – and they were often things we in the IT world would call ridiculously simple – copying and pasting, for example, was one thing my mom’s had trouble with.  But try explaining it to someone without using computer terms – to someone who grew up in another country, another culture, who’s never actually copied and pasted – and it becomes a bit of a challenge, and you have to make sure the concept itself is clear before you try to explain the mechanics of it.  She’s got a device that allows her to browse the Web from her TV as well as that computer we got her – and of course the commands for one can’t be consistent with the commands for the other – so that makes it hard to assume that something will just simply work as you expect it to.  It can get frustrating – on both sides.

But a little note – nay – reminder for anyone out there who gets frustrated at explaining something technical to your parents – well, I can’t speak for your parents specifically, so I’ll speak generally:

These are the folks who quite literally dealt with your crap.  These are the folks who changed your diapers, who changed work schedules to take you to or be at your school events growing up, who fixed thousands of sack lunches, who helped with last minute school projects, who listened to your dreams as you became a teenager, and did what they could to keep food on the table, and a roof ­­­over your head… And they smiled that proud, gut wrenchingly bittersweet smile parents get as they watched your car – or bus – or whatever – leave that one time that neither of you really knew would be the last time you’d ‘live at home’ – A little bit of a support call to help them out is the least I can do.   I do not for one moment regret the time spent on the calls I get from my mom on computer problems.

After about a year and a half, mom and dad knew how to use the computer, the “support calls” were fewer, and they were actually able to write letters and stories and the like.  It was kind of a neat feeling – almost like the shoe was on the other foot – me watching them “grow up” – so to speak, instead of them watching me.

But one day, at work, when a developer wanted to show me something new he’d written – I just about flipped.

See, he’d written this snippet of code, that did something… and to this day I can’t remember what it was – but he ran it, showed me what it did, and then said, with eyes just beaming with excitement, “Tom, isn’t that COOOOOOOOL?”

I understand that feeling.  I’ve written little snippets of code that solve problems in rather ingenious ways – but even as I sit here, writing this, I still remember that sinking feeling I had in the pit of my stomach when I heard him say that…

Given where I worked, and given what I knew of software development, I knew that this little “cool” thing would make it into an operating system at some point, and soon.

And it would fundamentally change how my parents had to work.  And it would mean more telephone calls, to explain something that someone had changed.

Not because it made something better.

Not because it made it easier.

Not even because it made it faster.

But because it made something “cool”.

See, there seems to be a misunderstanding of the purpose of writing software.

My take is that you write software to use it.  And if you use it at work, it should make your work faster, more efficient, more streamlined.  If you use it at home, it should solve problems and make it take less time to do things (think putting together a photo album, balancing your checkbook, or trying to find out where your 401K went).

Here’s the deal: Software Should Simply Work.

Period.

Make it do what it’s supposed to do first.

If developers write code with the understanding that it is to be used by people, and that it should make their lives easier – then they have it nailed.

If developers write code only to show off their prowess, or only to show off their ingeniousness to other developers, then they’ve forgotten who their audience is, and that misses things entirely.

So… what’s ‘cool’?

An application that solves a problem and does it well.

That’s cool.

An application that is easy to learn and intuitive for a first time user, and at the same time, just surprises you with deeper and deeper little snippets of usefulness that solve a problem you didn’t even know you had, just as you realize you have it.

That’s cool.

Oh – and last but most certainly not least: An application that works consistently enough like the old version of the same application so users don’t have to relearn everything they’ve learned to be as efficient with the new version as they were with the old.

THAT’s cool.

I don’t want shiny sparkly crap in my software.  I don’t want context sensitive menus where I have to know what I’m looking for in order to find it (hint: the whole ribbon thing did nothing to increase productivity for a long honking time.)

And there you go.

Software should be written for the user, and with the user in mind.  Anything that gets in the way of that, no matter how glitzy, no matter how sparkly and shiny – misses the point altogether.

—–

…and there is a post script to this story…

Last year, my mom was working on her Christmas letter – in Microsoft Word, and she wanted to send it to someone to do some proof reading.

Before, she’d had to go through a web interface on her computer to get to email – and sending the text of a document involved copying all the text in the document, opening the web email interface, creating a new outgoing mail, putting in the address, tabbing a couple of times, typing in the subject, one more tab, then a paste.  It worked, but it was pretty convoluted.

Then I got Outlook Express configured on her box, so now when she wants to send a document, it’s a matter of going up to the file menu in Word, choosing send, typing in the name, and then hitting the send button.

The first time she did that she was floored.  “That’s so EASY!” and then, remembering this story, she changed that just a bit…

“That’s so COOOOL!!!”

And she’s been rubbing it in ever since.

So even though the people who made Outlook Express 6 work with Word 2003 on Windows XP are probably long moved on from those projects, a note of thanks to the architects, devs, testers, PM’s, and managers who shepherded those things all the way through to RTM:  Ya done good.

 
1 Comment

Posted by on November 4, 2011 in Uncategorized

 

Tags: , ,