Last Thursday we released details on a bug in Cloudflare's parser impacting our customers. It was an extremely serious bug that caused data flowing through Cloudflare's network to be leaked onto the Internet. We fully patched the bug within hours of being notified. However, given the scale of Cloudflare, the
For a blog with a title that’s a French quotation turned into a programming joke, I don’t actually post much about computers. Get ready for the exception that proves the rule :)
Cloudflare recently published their post mortem of the impact of “Cloudbleed,” the major security bug that impacted sites that used Cloudflare (and that’s lots of sites). Professionally, I’ve done work along the same lines as what’s described here, written about a dozen reports like this, and read and edited hundreds of similar reports over the last few years. I really liked this writeup and if this is the sort of thing that interests you or you want to know how worried you should be about Cloudbleed I recommend checking it out. Here’s what I like this particular report:
It’s written for a specific audience. From reading it, it’s clear that the audience should be somewhat familiar with how the internet works but that a deep level of expertise in any one area is not necessary. They cover the basics of what caused the bug and why it became much worse recently but they don’t go into super technical details – that is for another report.
Clearly structured hypothesis testing. So many postmortems don’t have this and this is always my favorite part. They lay out their two concerns and then they tell you how they could use their logged data to determine, to the best of their ability, whether or not either of those things happened. That’s followed by the results of the analyses and then by additional analyses using a second data source. I came away from reading this convinced that they: 1) asked the right questions, 2) did an analysis that would actually answer those questions, and 3) that the answers to those questions are supported by two independent data sources. B e a u t i f u l.
It takes responsibility without focusing on it. At no point does Cloudflare try to make it seem like this is not their fault, and they emphasize that even though it seems like the bug was not exploited before it was fixed that that doesn’t make it less bad or themselves less culpable. But this report is not an exercise in self flagellation either – they take responsibility and then tell us what we came there to learn.
This is quite well done and I’m impressed at the transparency, clarity, and thoroughness of the post. If I ever teach another class on hypothesis testing at work, you can bet this is going in there as an example.
And remember, kids: broken HTML is never a good idea.