Two types of monitoring
Observability is a popular term for the overarching work of making sure you can tell what is going on with your service. In the world of observability, there are two ways to observe your system: black box monitoring and white box monitoring. Black box monitoring is the idea that you have no control over the system, and so you examine it from the outside. White box monitoring is the idea that you have full control of the system, and you can instrument internal pieces with metrics, logs and traces to understand how things are working.
Black box monitoring#
As we said, black box monitoring is examining from the outside. It is often also called synthetic monitoring or behavioral testing. We are going to generate a synthetic event, send it through our system and see what happens. This differs from the integration testing that we discussed in Chapter 3 because we are sending our synthetic event to our production environment, instead of a test environment. That being said, this is essentially testing, because we have a predefined synthetic input, and we will have a test that will validate the output.
Synthetic events for web systems tend to be HTTP requests, which means the easiest synthetic event generator in this case is the command line application
curl. Curl has been around since 1997, and is an essential tool. The command flags I use most frequently are
-L. Together you can watch how a request works.
-sis for silent, and it hides any progress bars.
-vis for verbose, and it outputs the headers sent and received to STDERR.
-Lis for location, and it has curl follow any HTTP redirects that are returned.
So if your server is running locally on port 8080, you could call
curl -svL localhost:8080 > /dev/null 2>&1 | grep HTTP every second and watch the responses come in. For reference,
> /dev/null 2>&1 will ignore any errors
curl would otherwise output, and
| grep HTTP will only show us output lines that include the string
This would, of course, get boring. So instead you could add the
watch command, which runs commands every n seconds.
-d flag tells
watch to highlight any differences in the output of the command and the
-n 1 flag means run the command every second. So every second we will see the changes between the current run of the command and the previous run. We remove the
2>&1 | grep HTTP from above because redirecting stderr into stdout (that's what
2>&1 does) messes with watch.
But this still requires someone to stare at command output, which is not ideal. Instead let us use something prebuilt and automated which alerts when things change.
Cloudprober is an open-source project from Google that focuses on regularly running checks against targets to determine if they are healthy, and then returning that data to your preferred monitoring service.
Cloudprober can output its results to a variety of monitoring programs, and can send requests using a variety of protocols. Each "probe" is defined as a combination of type, interval, targets, and validators. To define a new probe:
choose a type, such as HTTP
define a set of targets, like localhost:8080
decide the intervals - how frequently you want to run your test (for example every 1000 ms)
write a validation that determines if your request was good (for example, response status code equals 200)
You can read the documentation for all of the different functionality, but the idea here is instead of doing something manually, we have a service that regularly sends requests and validates the responses.
Here is a basic config for Cloudprober which sends a request to http://localhost:8080 every second. Right now this won't alert or upload the data it records anywhere. We'll talk in a bit about picking a monitoring service, in the section on white box monitoring, but if you were to pick Google Cloud Monitoring, you'd add something like this to your config:
This will add a lot of metrics with the prefix
custom.googleapis.com/cloudprober/http/ to Cloud Monitoring in the project you have installed this service in, or that you have set as the default project with the
Hosted service options#
Running health checks on a schedule is a common feature that many companies offer for free or cheaply. Cloudprober lets you run all sorts of arbitrary checks, but if you just need HTTP-based checks, then there are many options!
Pingdom - An old standby that has been around forever. They used to have a very generous free plan. These days they charge, but have more features.
Google Cloud Monitoring Uptime Checks - Free for Google Cloud Platform customers, but limits checks to once a minute.
Apex Ping - A small company, but with a comprehensive feature suite similar to Pingdom's.
Datadog Uptime Checks - Datadog is a large monitoring company offering configurable synthetic uptime checks.
and many more!