How to Design Software — Monitoring Systems

Have you ever seen products with status pages or automated monitoring? Ever wonder how it all works and how to make your own? Learn how!

Feb 06, 2022

∙ Paid

Explore the conceptual architecture of monitoring systems and learn how to build your own system observation engine.

Have you ever seen products with status pages? Ever wonder how it all works and how to make your own? Most companies don’t do it themselves — they use a 3rd-party vendor like StatusPage combined with automated reporting tools like Pingdom. If you’re using this in a production environment, chances are you’ll want to use a vendor.

However, if you’re curious about design approach and possible ways to build something like this on the quick, then this is the article for you!

Firstly — a conceptual exploration of “Monitoring”

The purpose of a monitoring system is simple: determine whether something is working or not.

That statement has a bit more nuance than meets the eye, so let’s take a closer look.

What are you monitoring?

It’s important to recognize that the “something” that gets monitored can be anything. Oftentimes, it is a system within your control, such as an application server. Other times, it is a vendor system. Perhaps it is a mix of both, such as a technical process that needs to contact a vendor system.

Approaching it from the perspective that the monitored target is arbitrary keeps your thinking from going down the path of a specific implementation.

What does it mean to “work”?

A system that is working can mean a lot of different things. The most binary definition is whether the system is up (working) or down (not working).

That’s an incomplete answer, though.

If the system is up, but not doing what it was intended to, does it mean it is working?
How about if it is doing what it is intended to do, but only for half our users?What about 1% of our users?
Does a system have to perform flawlessly to be considered “working”?
If the system is doing what it is supposed to, but delayed, is it “working”?

Based on this ambiguity here, it is clear that from a design perspective, the meaning of what it means to be “working” relies on us to define. The acceptable limits of which errors are tolerated, the threshold, is also determined by us as the operators of the system.

Following this train thought, it means “working” is an arbitrary definition.

What are we building?

A map of various monitoring system components

If we refine the answers to our conceptual questions above, we are left with the essence of monitoring: we are building something that can tell whether another system is functioning based on the parameters we define.

In essence — observing. There are, of course, other elements of monitoring such as alerting, recovering, etc. — we’ll get to that later.

The basic parts of a monitoring system

The Monitor

The Monitor is the component that contains the general logic for monitoring. Note that it doesn’t contain the logic for monitoring a specific system, but rather the algorithm for monitoring:

Step 1: Check a system and get a result
Step 2a: If the result is “working”, do something
Step 2b: If the result is “not working”, do something
Step 3: Repeat step 1

Joseph Gefroh