Tracing adds an ID to a request so you can see how it moves through a system. There are two standards that most tracing systems follow: the W3C's Trace Context and OpenTracing. These were preceded by two popular systems called Zipkin and Dapper.
The idea here is if you have IDs on your requests, you can see how data is moving through a distributed system. For example a user makes a request to
/ which goes through a load balancer, then hits an application server, and then hits a Postgres database. Being able to see the options the user requested on
/ and how that affected the query to the database is incredibly useful for debugging. The goal here is not to see how an individual user interacts with our system (many would consider that a violation of privacy and unethical) but rather to group classes of requests together and see how they work. You do this by sampling requests, obfuscating any information that might be used to identify a user.
How to implement tracing#
OpenTelemetry actually gives us tracing information as well as metric information. Many cloud providers also have tracing products and work with the common tracing standards.
To do tracing, you'll need two things: client instrumentation and a backend. The backend stores the traces, and are what cloud providers sell. The instrumentation will mostly be done by you, but some hosted products like load balancers regularly have tracing instrumentation done for you.
Google Cloud Platform has a tracing product called Cloud Trace, and adds trace information to all requests that pass through their L7 Load Balancer as well as a few of their other hosting products.
Amazon Web Services has a tracing product called X-Ray, and adds trace information to a variety of their products.
Microsoft Azure has a tracing product called Application Insights.
Other companies also sell hosted tracing products, including Datadog, Honeycomb and others.
You can also host your own tracing backend with Jaeger.