Monitoring and Observability

Once upon a time there was “Monitoring”

“Monitoring” traditionally was a preserve of Operations engineers. The term often invokes not very pleasant memories in minds of many who’ve been doing it for long enough they can remember the time when Nagios was state-of-the-art. In the eyes of many, “monitoring” harks back to many dysfunctional aspects of the old school way of operating software, not least the unsophistication of tooling available back in the day that ruled the roost so consummately that the term “monitoring” to this day causes some people to think of simple up/down checks.

Baby’s first Observability

“Observability”, on the other hand, was a term I first encountered while reading a post on Twitter’s tech blog a few years ago and have been hearing the term ever since, not in real life or at the places where I’ve worked but at tech conferences. Twitter has since published a two part blog post on its current observability stack. The posts are more about the architecture of the different components than the term itself, but the first post begins by stating that:

Monitoring is for symptom based Alerting

The SRE book states:

One of my favorite kellabyte rants of all time

And then there’s “Observability”

Quoting the SRE book again:

Debugging

Two of my favorite recent talks were Debugging under fire: Keeping your head when systems have lost their mind and Zebras all the way down: The engineering challenges of the data path, both by Bryan Cantrill. Since I possibly can’t say it better, I’m going to borrow a couple of slides from those talks here (the entire deck is definitely worth checking out).

Context Matters

Another theme that came up during my recent conversations is how simply buying or setting up a tool doesn’t lead to everyone in the organization actually using it. As one of the people I spoke with noted in dismay:

One last thing

Observations can lead a developer to the answers, it can’t make them necessarily find it. The process of examining the evidence (observations) at hand and being able to deduce still requires a good understanding of the system, the domain as well as a good sense of intuition. No amount of “observability” or “monitoring” tooling can ever be a substitute to good engineering intuition and instincts.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Cindy Sridharan

Cindy Sridharan

@copyconstruct on Twitter. views expressed on this blog are solely mine, not those of present or past employers.