There is a lot to be said about alerting, so we will keep this brief. Alerting is the concept of telling you about problems in your system. The most basic kind of alerting is done by setting a threshold on a metric. You set something like "when metric X is about Y for Z minutes, alert me".

Every monitoring platform has a different way of defining alerts, so we won't go into that. There's also been a lot written on how to design good alerts.

Some of that writing can be found at:


