RIM's Excuse For BlackBerry Outage Finally Emerges

from the too-little-too-late? dept

Research In Motion has delivered an explanation of what caused the BlackBerry outage earlier this week -- sort of. It says an insufficiently tested software upgrade set off a series of errors at its network operations center, which processes all the emails for BlackBerry devices in North America, and then its "failover process", which is supposed to switch things to a backup system, didn't work properly. The company says that it has plenty of capacity and resources to deal with its volume of messages and growing user base, and that it will better test its upgrades in the future. However, that explanation -- and the long time it took to come out -- doesn't wash with some observers, who say there are enough holes in the story that it doesn't add up. In particular, RIM's contention that it was upgrading its software on a Tuesday night, rather than over a weekend, has raised some red flags. Then, if a scheduled upgrade was behind the problem, shouldn't that have been immediately obvious to the company and news spread quickly by its PR team? The real damage from this episode won't be the outage itself, but rather the fallout from how RIM deals with it. On that front, things already aren't looking so good.
Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team


Reader Comments

Subscribe: RSS

View by: Time | Thread


  1. identicon
    Joel Coehoorn, 20 Apr 2007 @ 7:10pm

    I imagine the problem was an automatically applied update to a windows server on Patch Tuesday, the day each month on which Microsoft releases new updates.

    link to this | view in thread ]

  2. identicon
    Anonymous Coward, 20 Apr 2007 @ 8:04pm

    It doesn't matter.

    No matter what you do, occasional screwups will happen. I'm an anal retentive software developer who believes in testing first and foremost, yet there are things which can get past testing and QA simply because real world stress is different than your testing process can anticipate in many cases. Edge case combinations of issues are the bane of all software/hardware developers because they can not properly test for such things up front in all cases.

    It is very possible that they released a minor patch to fix something and that caused a cascade failure when released into the wild. An unexpected and untestable side effect of the "minor" patch screwed up many other things. This is a very common issue when it comes to "minor" items that blow up in your face.

    I'm not trying to say RIM doesn't have to properly answer "WHY" this happened, I'm just stating that at this "point" they could still be doing indepth data analysis and simply posting the "general" result of what they have found so far. I've done live patches in the past and while I've never had a cascade failure I've always known it was possible. (100K'ish subscriber level, not anywhere near the level of RIM.) I know that some day, some time, in some way I will miss some small detail and cause a cascade failure, it WILL happen.

    So, understanding that there is never going to be 100% uptime, an outage of this type is deplorable yet a reality of large systems engineering. RIM "could" be a bit more upfront about what has gone wrong but on the other hand they very well could be scatching their heads over just what "really" caused the problem.

    Now, personally, understanding such things, I would prefer that a company is up front about the reason for the down time. As a technically inclined person, and more importantly, one of the folks who would be asked to justify usage of XYZ system over others, I would not want this sort of generic response which doesn't make a lot of sense to be common. I would want straight answers to the problems and what they are doing to fix them, that's more important than denying there is a problem.

    KB

    link to this | view in thread ]

  3. identicon
    Ju1c3, 20 Apr 2007 @ 8:21pm

    I am glad to hear that it was their problem, and not something I did. i am a network consultant so i enivetably got the question about the blackberry's a few times. had my head scratching there for a few...

    link to this | view in thread ]

  4. identicon
    Bobshaker, 20 Apr 2007 @ 10:12pm

    Don't own a Blackberry, don't intend to, don't care. yay.

    link to this | view in thread ]

  5. identicon
    Anonymous Coward, 20 Apr 2007 @ 10:49pm

    anal retentive?

    you do that anal retentive means "full of crap"

    link to this | view in thread ]

  6. identicon
    Anonymous Coward, 20 Apr 2007 @ 11:27pm

    I'd like to know why companies with a blackberry enterprise server still have to have their emails routed through RIM. There's no reason why the blackberrys can't use their normal data connection and just sync with the BES.

    link to this | view in thread ]

  7. identicon
    Fred Flint, 21 Apr 2007 @ 6:51am

    Just Saving a Few Bucks

    Like most large Canadian companies (like banks), I'm sure RIM regularly fires their experienced and competent staff because such people are expensive.

    I'm also sure RIM hires students fresh from school, supposedly because they are most up-to-date on the technology and of course, they work real, real cheap.

    Of course, as soon as the students start figuring out what the hell they're doing, they want more money and of course, they get fired and a new flock of fresh-faced students gets hired.

    This is a dirty little I.S. secret that's been true for many, many years.

    From years of observation, my best guess about the outage is that it was caused by some new blockhead student who didn't know a bit from a byte but decided to "fix" something anyway.

    link to this | view in thread ]

  8. identicon
    Peter, 21 Apr 2007 @ 9:42am

    re: post # 6 & 7

    Do some more research on how this whole Blackberry thing works. BES (Blackberry Enterprise Server) is just the facility to allow connectivity to your local (behind the firewall) resources.

    As for post 7... RIM is, in my experience, one of the more reliable technology companies out there. I don't manage any other systems that are as reliable and low maintenance as theirs.

    And no, I don't work for them or have any particular investment. Just a very happy customer.

    link to this | view in thread ]

  9. identicon
    pickford, 21 Apr 2007 @ 11:44am

    RIM and BB are fighting an uphill battle. There are devices that do exactly what BB does, only better. I am a network admin and we have more BB issues than we do Treo issues. Personally, I wold rather have a WM OS on my Treo 700 and use exchange's built in active sync than install a buggy, costly middle man like BES.

    link to this | view in thread ]

  10. identicon
    Anonymous Coward, 21 Apr 2007 @ 5:31pm

    Re:

    I've never seen a WM device push emails as fast as a BB. Are you talking about using MS exchange server? Or the provider's email service? Either way, the BES has only failed me this one time in four years and I was still able to access my email thru the web using Opera-mini on my device so I didn't miss out on much.

    link to this | view in thread ]

  11. icon
    James (profile), 22 Apr 2007 @ 3:56am

    Hmmmm

    Looks like some SonyEQ system admins went to work for RIM.

    link to this | view in thread ]

  12. identicon
    Ryan, 22 Apr 2007 @ 5:03am

    Looks to me that they should focus on why the fail-over plan did not work as well. The issues happen, but the fail-over should take over.

    link to this | view in thread ]

  13. identicon
    Fred Flint, 22 Apr 2007 @ 6:01am

    Re: re: post # 6 & 7

    As for post 7... RIM is, in my experience, one of the more reliable technology companies out there. I don't manage any other systems that are as reliable and low maintenance as theirs.

    Well, your argument convinced me.

    I guess RIM doesn't "downsize" the well-paid, experienced staff and hire a bunch of inexpensive bozos, fresh from college.

    I guess their system didn't go down unexpectedly for no honestly explained reason.

    I guess someone actually did provide for a reliable back-up system and someone actually did institute effective change controls and other basic I.S. common sense activities but, dammit, they just didn't work.

    I hate devastating arguments like yours. They make me feel so uninformed and inexperienced.

    link to this | view in thread ]

  14. identicon
    Nick Rao, 22 Apr 2007 @ 6:02am

    Blackberry Outage

    Deploying software updates in the middle of the week would be classified as "worst in class". Inadequate testing is another example of an organization that does not have a solid process for developing and testing code. If these folks are the gate keepers of critical corporate communications worldwide, then corporate clients must demand information on plans to address the systemic issues, not just an "opps, we'll do better next time". OBTW, these types of process problems takes months, if not years to resolve.

    link to this | view in thread ]

  15. identicon
    Anonymous Coward, 22 Apr 2007 @ 8:34am

    Re: Hmmmm

    Looks like some SonyEQ system admins went to work for RIM.

    that's what i was thinking. haha

    link to this | view in thread ]

  16. identicon
    Peter, 22 Apr 2007 @ 7:29pm

    Re: Re: re: post # 6 & 7

    Fred... Apologies for apparently stating my opinion as fact. My intent wasn't to make anyone feel uninformed or inexperienced, but simply to offer a counter point to the inevitable corporate bashing that occurs here. You obviously have some insight into the inner workings of RIM that none of the other posters here could hope to match. I defer herewith to your superiority.

    For clarification, in referring to "any other systems" I realize I was not being accurate. There are other systems that are as reliable and low maintenance... they are, however, few and far between.

    link to this | view in thread ]

  17. identicon
    Scribble, 23 Apr 2007 @ 8:03am

    Don't Rush QA!

    Okay, folks - you want it FAST or you want it GOOD! You can't have it both ways! This looks to me like an example of "QA's holding up the release again". Don't blame us when you release the patch before we're done testing it.

    link to this | view in thread ]

  18. identicon
    pickford, 23 Apr 2007 @ 12:16pm

    @April 21st, 5:31PM - I am talking about M$ Exchange. I was told when I got my WM based Treo that it would not push as fast as BES, wrongo. Recently on a business trip with the director (who uses BB) a mass email was sent out and our devices alerted us of it at the same exact time. Also, while we were out there, an automated process I have setup in case of BES failing went off. So I received a message letting me know that the BES service had gone down and failed to re-start. I then, on my Treo, in the car, logged into my Exchange server and got the service running.

    factor into that the fact that RIM charges for upgrades to BES, Windows updates active sync free of charge.

    Also, syncing with a desktop is MUCH easier and error free with AS than with Desktop Manager.

    link to this | view in thread ]

  19. identicon
    Fred Flint, 24 Apr 2007 @ 7:33am

    Re: Re: Re: re: post # 6 & 7

    Peter,

    I appreciate your response and I will admit I sometimes go berserk when I experience the cavalier attitude of corporations and other large business entities when it comes to Information Systems.

    There are some pretty simple, well-known procedures to follow that will limit unscheduled downtime to something like 0.1 percent. That used to be the target for most mainframe shops and they met the target regularly - or they got fired.

    Not so, these days. For instance, my cable ISP seems to simply turn off their service, accidentally or on purpose, any time they feel like it; no warning, no apologies, no refund. It happens a lot. Hooray for monopolies!

    It is unfortunate when arrogance, greed and stupidity cause senior management to sacrifice dedication and professionalism on the alter of The Bottom Line.

    Worse, they usually adversely affect The Bottom Line, then blame it on the I.S. staff.

    link to this | view in thread ]

  20. identicon
    MO, 12 Feb 2008 @ 7:24am

    RIM Architecture

    Folks: Any system that goes down for 10 hours (April '07) and then 6 hours (yesterday) is poorly architected. For all the BBY fanatics, remember there are added layers of complexity (mail -> mailserver -> BES -> RIM -> PDA) and that is not sound, despite the professed advantages. The "Evil Empire" got it right with ActiveSync (mail -> mailserver -> PDA). When all the Crackberry addicts were wondering what the hell was going on I was getting my e-mail just fine. And then there's the aborted failover attempt... guess they didn't give that process it due dilly, eh? If RIM had any clue they would fire whoever was responsible for the "failed upgrade" (yeah, right) and lame-ass failover.

    link to this | view in thread ]


Follow Techdirt
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Loading...
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.