What My Dog Taught Me About Software Engineering

http://www.evidscience.com

Any knock at my door sends my dog tearing through the house, barking like crazy. Unfortunately, any knock-like sound does the same (I’m clumsy, so you can imagine the issue…).  And so, over time, her useful warnings withered first into an annoyance, and then, just shrank away into the background noise.

And yet, to her, each knock is important. (“The mail is here! DEFCON4!”) But she lacks sensitivity to context, which makes her alerts almost useless. By bark alone, how can I tell if it’s UPS (worthy of alert) or simply someone dropping the recycling (not so much)?  If barks happen all the time, for any reason, it’s like they don’t happen at all.

This is a lesson I had to learn in software management as well. When you lead a team, and everyone is deeply committed to the product, any small setback can feel like a three-alarm fire. But it’s not. That bug might seem crippling to you, but to your users it might just be annoying (or better yet, they might not even notice). So fix it and get on with life. But… some setbacks are major: the critical daily data dump broke? Here, sound the alarm.

And so, I had to learn “bug triage” to know when an issue really requires barking full blast, because if you don’t do this, eventually your team will move you to background noise.

Essentially bugs come in two flavors: ones that will cost your company money (potentially lots of it), and ones that won’t. Knowing which is which is bug triage. This is an over simplification of course (for instance, some bugs may drain morale, which isn’t quantified in dollars, but can certainly cost your company), but the gist is that some bugs are alarm-worthy, and the rest should just become tasks for your next dev cycle. There is no perfect prescription to predict what might happen to your product, but in general, I use the following bug triage steps. And remember, you have limited barks that will inspire urgency, so error on the side of discretion.

More concretely, here are my “bug triage” steps:

  • Step 1: Inject some “cool” into your thinking. Your first instinct will be to pull your hair and make loud noises or immediately fix the situation. Instead take a deep breath, sip your coffee, and remember that everyone makes mistakes. The key here is to try and detach a bit emotionally.
  • Step 2: Consider the worst-case outcomes. Think through how this issue affects the business (not just the technology). Will you lose a major account (bad) or can you simply email your customers with an explanation to smooth things over (less bad). If your worst case is really bad, this is a critical bug. The outcome of this step is to quantify what could happen if the wheels really fall off the bus.
  • Step 3: Consider the frequency of occurrence. Maybe the issue happened, but only under a strange confluence of circumstances, and you don’t expect it to actually affect anyone (less bad). On the other hand, maybe the common case is actually the worst case, and all of your major customers are going to be very angry (worst case). Here, your goal is to understand how pervasive the issue might be, and if that aligns with the worst-case outcomes, it means many of your customers will be feeling the pain. That’s bad.
  • Step 4: Choose to bark (or not). If the previous steps lead you to believe the issue is both pervasive and very bad, bark like crazy! Raise the alarms, fire off the texts and emails, and get cracking to fix the issue. In any other scenario, think deeply about whether it’s worth risking the credibility of your alerts. This is a crucial trade-off between your team’s long-term effectiveness and short-term fixes. And your company probably aims to be around longer than your bug.

I have been the barker and the barkee, and I can tell you that in both cases, more bark does not equal more hustle. So, make sure your barks count, you only have so many that will cause a response. And pet your dog, it’s not her fault (she can’t read this blog).