Does Your Datacenter Have An SLA?

from the prove-it dept

I have great concerns about whether or not mission-critical applications are having their SLA's met in datacenters, whether they are hosted in-house, third-party supported, or any other form of datacenter-based hosting. First, consider the alternative: the server sits in a room next to your expert developers. Sure, it's probably a SOX violation, but I can tell you this much: that server will not go down often, and if it does, you can be sure that it will be restored as fast as humanly possible. That's the advantage to having an expert babysit your system. If you have two experts in different geographic locations and each babysits a server in case one goes down, then you have about the best support possible. However, for large systems, this may not be convenient, etc.

But how do you know that a datacenter-hosted app has this type of support? First, you need to know for sure what the SLA spells out in terms of support and monitoring. Look for this in your SLA:

"If your app encounters event W, person X will do Y about that specific event within Z amount of time"

I guarantee that anything less specific than that, or anything as specific that's not in writing in the SLA to that effect, will not be honored. Vague responses equal no responses, because why would the datacenter host open themselves up to liabilities by initiating a response that wasn't specified in writing? Specific, measurable responses with noted responsible parties are required to be honored for the SLA or the datacenter host can be held accountable for any failure to respond as specified.

So assume you have an acceptable SLA in place, and you know what they're supposed to do. How can you be sure they'll actually do the things they say they'll do? Well, you obviously need to know before you can count on your apps for something mission-critical, so while the mission-critical app is still running somewhere else (i.e. being babysat by an expert), you set out to prove that the support can respond -- by staging various types of failures. You could tell the host about the staged failure attempts, but then they'll know and will definitely staff and respond appropriately. I would stage failures and not tell the host that the failures are a test. After all, from the host's perspective, any failure is a failure. Be sure to measure closely the response and check if the SLA was honored as expected. Any failure to honor it, for any reason, should be a strong indication that the host is not prepared to honor the SLA, thus potentially costing you your mission-critical app.

Do not allow a complicated roll-over or automated monitoring to imply that the datacenter can respond to any event with seamless mission-critical app coverage. An inexperienced datacenter admin simply hitting the wrong button can send any app to Davy Jones' locker in a big hurry. If you truly want mission-critical backup performance, ask yourself what would happen if the datacenter was completely unresponsive? For example, what if it were hit by a hurricane and completely wiped out? How soon could you be back up and running, and at what capacity? If you can't answer that, you better find an answer before some unpredictable event knocks out your one server running everything.

Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team


Reader Comments

Subscribe: RSS

View by: Time | Thread


  • identicon
    mermaldad, 8 Dec 2009 @ 5:36am

    Define your acronyms

    Nice article, but please define your acronyms:

    SLA = Service Level Agreement
    SOX = Sarbanes Oxley Act

    link to this | view in chronology ]

    • identicon
      moore850, 10 Dec 2009 @ 11:43am

      Re: Define your acronyms

      Thanks. In the future, I'll tack definitions on the bottom of the article.

      link to this | view in chronology ]

  • identicon
    Music Search Engine, 19 Mar 2010 @ 6:15am

    That's a good point, the SLA. However, even with SLA, there are outstanding concerns about data privacy. Not only making sure that the customer's data is not leaked and meeting government regulations on that topic, but also making sure that corporate data (beyond that on the customer) is not inadvertently leaked, especially when the cloud provided has a multi-tenant cloud.
    This is of course ignoring the needed ability for customers to change cloud provided. It appears the market is setting up to be, at least for the moment, several non-interoperable islands.

    link to this | view in chronology ]


Follow Techdirt
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Loading...
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.