We have built a distributed system. We have tested it. We can just launch it into production and assume it works! Right?

Sadly, no.

Once our service is running, it is akin to Schrödinger's cat. The service is neither working nor broken while it's running if we aren't monitoring it. In our analogy, the cat is our service, the place it's running (the cloud, your laptop, etc.) is the box, and the users are our radioactive isotope. For us to know whether our "cat" is alive or dead, we need to look at it according to Schrödinger. If we assume we care about our "cat," we can either open the box and look at it, or find some other proof the "cat" is alive (listen for meowing, purring, smell something, etc). Sorry, this metaphor is getting tired. Hopefully you get the idea: we want to know if our cat is alive, and we need a way to verify that at any point in time.


This page is a preview of Reliable Webservers with Go

Start a new discussion. All notification go to the author.