Distributed key-value server
We have built a server and tested it! Now we have a single binary that can do the thing we want, we need to think about replicating it. Why? Well, let us chat a bit about failure.
If we are talking about hardware failure, it happens pretty frequently. DRAM Errors in the Wild: A Large-Scale Field Study claims that 8% of DIMMs surface errors each year. Failure Trends in a Large Disk Drive Population claims that 2% of all drives fail in their first three months. That means if you had a hundred computers, two of them would suffer data loss from disk failure, and eight would have issues running because of bad RAM.
If we are talking about human failure, a large proportion of software failure comes from human error. Configuration bugs, process bugs, code bugs, and other human mistakes cause all sorts of issues. We talk a bit about how to avoid some of this in Chapter 6.