Dan is a long-time system administrator - he first installed Linux on his home PC in 1995 and never looked back. A veteran of the original dot-com bubble, he founded a web hosting company in the late 90’s, and has since worked in a variety of environments from start-ups to global corporations, including stints as a university lecturer and a day labourer. Today, Dan is a Developer Advocate at Datadog, a role that allows him to satisfy his obsession with classifying and measuring things in general.
Henry Ford once said, “The only real mistake is the one from which we learn nothing.” So how can we learn the most from system failures? This session will move beyond “blameless” post-mortems to show how we can use data to avoid and mitigate future failures. I'll cover best practices for gathering both systems and people-related data, and how to use that data to formulate actionable response plans to avoid repeating failures.