Daniel Lebrero | Devoxx

Daniel Lebrero
Daniel Lebrero Twitter

From IG

Daniel Lebrero is a technical architect with more than 15 years of software development experience. He works at the fin-tech IG, where he is involved in architecting the web platform, the analytics website and the big data solution. A long time Java practitioner, he now also loves ().

Blog: http://danlebrero.com

cloud Cloud, Containers & Infrastructure

Automating resilience testing with Docker and Property Based testing


If you have read Michael Nygard's "Release it!" you know that a downstream service going down is just one of the many modes of failure that your new service will need to handle in production, and it is not the worse one.

To make your system resilient, you also have to worry about networks going slow, disks getting full or third party services disappearing.

How can we test those scenarios and make them part of our build pipeline?

But most outages are due to unforeseen interactions and unexpected circumstances. So how can you write a test case that you don't know that you want to test?

In this session, you will learn how use Docker and property based testing to automate the resilience testing of your system.

Speaker Q&A

Who should attend your session?

Anybody that has been called at 3am because their server stop responding.

What are the 'next steps' for an attendee to take after attending your session?

First, read the “Release It!” book: https://pragprog.com/book/mnee/release-it Then check the Chaos Engineering principles: http://principlesofchaos.org/ Finally, learn more about property based testing. I highly recommend: https://www.youtube.com/watch?v=zi0rHwfiX1Q

What is your favourite British food or drink?

Haggis and Guinness!

What is resilience testing?

Resilience testing asses how a system behaves when one of its components misbehaves and if the system is able to recover when that component is fixed.

Component misbehaving can be for example a server going down, a network getting saturated or hard disk breaking.

Why is it important?

The last Amazon S3 outage proved, that there is no such thing as a 100% reliable system.

How is your system going to behave when that happens?