The Perfect Storm—What We Learn From Talking About Disasters
Failures are inevitable. Every once in a great while, they can become epic disasters. Do you remember what happened during that time your cloud provider lost an entire region? What about that time your teams couldn't check in any of their code or when your favorite social networking site was down for half a day? And yes, even that time when your alerting provider couldn't send you alerts for over a day? These types of disasters erode customer trust and learning how to respond appropriately is critical if you expect to earn it back. George Miranda explains PagerDuty's decision that managing communications during a crisis is just another of the types of incidents we may encounter. He explains how they applied many of the DevOps principles learned from managing technical incidents to other parts of our organization. PagerDuty's Marketing, Support, Sales, and even the Executive Leadership Team now each have trained on-call responders. In learning how to work together to make this happen, PagerDuty started to see agile & DevOps principles adopted by the less technical parts of the company. Examine the role of technical responders during a massive outage. Explore what happens during major outages, and share surprising results along the way. Gain a step-by-step framework you can use to establish your own business continuity plans along with tips and lessons for getting a process like this deployed in your own organization.