CDN Resilience Worksheet

Practical steps after the Cloudflare outage — without the expensive overreaction

What Happened: On November 18, 2025, a routine database change at Cloudflare caused their Bot Management feature file to exceed a size limit. Their core proxy crashed globally. X, ChatGPT, Spotify, Discord, and thousands of other services went down for up to 6 hours.

This worksheet focuses on CDN resilience specifically — practical steps you can take without expensive multi-CDN setups. It's a starting point, not a complete resilience strategy.

Let's address the hot takes first.

After every major outage, the same advice floods LinkedIn: "You HAVE to go multi-CDN now!" Same as after an AWS region goes down: "You HAVE to go multi-region!" Or multi-cloud.

These are expensive, complex decisions. Multi-CDN adds 30-50% cost and significant operational overhead — keeping configs in sync, managing certificates across providers, testing failover. For most companies, it's overkill.

You can build meaningful CDN resilience with much lower investment. That's what this worksheet covers.

1. Know What You're Running Through Your CDN

Before you can improve resilience, you need to know what breaks when your CDN goes down.

Checklist:

Common things you might lose if you bypass your CDN:

Domain What It Serves What You Lose If Bypassed Can Bypass?
www.example.com Main website WAF, caching, DDoS protection Yes, but exposed to attacks + slow
api.example.com Customer API Rate limiting, origin IP hidden Yes, if origin can handle load

2. Don't Get Locked Out of Your Own CDN

During the Cloudflare outage, the dashboard login used Turnstile (their CAPTCHA). Turnstile was down. So the GUI login was completely blocked — no amount of clicking would help.

The API was intermittently failing, not completely down. This is a crucial difference: API calls can be retried automatically. IaC tools like Terraform handle retries. You can script "keep trying until it works." With GUI login blocked, you had zero chance. With API access, you at least had a fighting chance.

Even if the API is flaky, submitting changes programmatically beats clicking a login button that's completely broken.

Checklist:

Example emergency actions you should be able to attempt via API/IaC:

3. Know When It's Them, Not You

Teams lost 20+ minutes during the Cloudflare outage debating whether the problem was their code or their CDN. Set up monitoring that answers this instantly.

Checklist:

The test: If your origin returns 200 OK but users see 500 errors, you know instantly it's a CDN/vendor issue.

4. Have a CDN Outage Plan

Not a generic incident runbook. A specific plan for "my CDN is down."

Your plan should answer:

Key decision to make in advance:

If your CDN goes down, do you bypass it and accept the security exposure, or do you wait for recovery?

This depends on your traffic, threat model, and how long you can tolerate downtime. Decide now, not during the incident.

5. Build a Cheap Static Failover

If full multi-CDN isn't justified, at minimum have a way to communicate during an outage — or a bypass ready to deploy.

Real example: During this outage, Resend (an email API company) built a CloudFront bypass while Cloudflare was down. They didn't end up deploying it because Cloudflare recovered first, but they now have a runbook to switch to the fallback within 60 seconds. That's the right approach — have it ready, decide in the moment whether to use it.

Checklist:

You won't be fully operational, but you'll be communicative. That matters more than most teams realize.

6. When Multi-CDN Actually Makes Sense

Multi-CDN isn't always overkill. But it's a serious investment. Be honest about whether you need it — and whether you can actually pull it off.

Why multi-CDN is harder than it sounds

It's not just "add another CDN." Here's what you're signing up for:

Multi-CDN probably makes sense if:

Multi-CDN is probably overkill if:

Check your competitors: Did they stay up during the Cloudflare outage? If yes, they might have multi-CDN and you're behind. If no, the market accepted the risk.

A Reality Check

When giants like Cloudflare go down, there's often not much you can do to fix it. You wait.

But preparation isn't about preventing their outages — it's about staying in control. Knowing what's affected. Communicating clearly. Taking what actions you can instead of sitting blind.

The steps in this worksheet won't prevent CDN outages. But they'll help you respond better when they happen — without the expensive overreaction.

This Worksheet Covers CDN Only

Many companies use Cloudflare for much more than CDN — DNS, Workers, KV, Access, WAF, bot management. Each adds complexity and different failure modes.

A proper resilience assessment looks at your full architecture, dependencies, and business context. That's a different conversation.

Visit incidentist.io
✓ Copied to clipboard!