"Five nines" sounds impressive in a sales pitch, but what does 99.999% uptime actually mean? And more importantly, what SLA should you commit to for your own service? The answer depends on your architecture, your budget, and how much downtime your users will tolerate.
The nines of availability
Uptime is measured in "nines" — the number of 9s in your availability percentage. Each additional nine reduces allowed downtime by a factor of 10:
| SLA | Nines | Downtime/month | Downtime/year |
|---|---|---|---|
| 99% | Two nines | 7h 18m | 3d 15h 39m |
| 99.5% | Two and a half | 3h 39m | 1d 19h 49m |
| 99.9% | Three nines | 43m 49s | 8h 45m 56s |
| 99.95% | Three and a half | 21m 55s | 4h 22m 58s |
| 99.99% | Four nines | 4m 23s | 52m 35s |
| 99.999% | Five nines | 26s | 5m 15s |
Notice the jump between three nines and four nines: you go from 43 minutes of allowed monthly downtime to just over 4 minutes. That's the difference between "we can do a rolling deploy" and "we need zero-downtime deployments with automated failover."
What each tier actually requires
99% — The baseline
Two nines gives you over 7 hours of downtime per month. This is achievable with a single server and manual deployments. Most hobby projects and internal tools live here. If your deploy process takes the service offline for a few minutes and you do it a couple times a week, you're in this range.
99.9% — The standard for production SaaS
Three nines is the most common SLA for production web services. You get about 43 minutes of downtime per month. This requires automated deployments, health checks, and basic redundancy. A load balancer with two app servers and a managed database gets you here.
99.99% — Serious infrastructure
Four nines means under 5 minutes of downtime per month. You need multi-region failover, automated incident response, zero-downtime deployments, and a mature on-call rotation. Database failover must be automatic. Deployments must be blue-green or canary. Every dependency must have a fallback.
99.999% — The gold standard
Five nines allows 26 seconds of monthly downtime. This is what payment processors, core banking systems, and emergency services target. It requires active-active multi-region architecture, circuit breakers on every dependency, and automated remediation. Most companies don't need this and shouldn't promise it.
How to choose your SLA
Don't promise more than you can deliver. Start by measuring your actual uptime over the past 3-6 months. If you're currently at 99.7%, don't commit to 99.99% — commit to 99.9% and work toward improving.
Consider your dependencies
Your SLA can't be higher than your weakest dependency. If your cloud provider guarantees 99.99% and your payment processor guarantees 99.9%, your end-to-end SLA can't realistically exceed 99.9%. Map your dependencies and their SLAs before setting your own.
Factor in planned maintenance
Some SLA definitions exclude planned maintenance windows. Others include all downtime. Be explicit about which definition you use. A 99.9% SLA that excludes a weekly 30-minute maintenance window is very different from one that includes it.
Measuring uptime accurately
You can't manage what you don't measure. External monitoring is the source of truth for uptime because it measures availability from your users' perspective, not from inside your infrastructure.
A good monitoring setup checks every 10-30 seconds, from multiple regions, with assertions beyond just "returns 200." Track response time percentiles, not just averages — your p99 tells you more about user experience than your mean.
When SLA breaches happen
SLA breaches happen. What matters is how you handle them. Have a clear policy for service credits or compensation. More importantly, publish a postmortem explaining what happened and what you're doing to prevent it. Transparency after a breach builds more trust than a perfect uptime number.
The goal of an SLA isn't to promise perfection — it's to set clear expectations and demonstrate accountability when things go wrong.