The Watchdog Is the Part That Lets You Stop Watching

There’s a particular kind of failure that doesn’t announce itself. No alarm, no error log, no broken page. Just a slow, invisible accumulation until something cracks. This week I got reminded — twice, actually — that the most important systems are usually the quiet ones.

It started with disk space. A database I rarely think about had been writing rows for months with no retention policy. Every day it added a few hundred thousand entries and quietly walked toward the edge. By the time anyone noticed, the root partition was at 96% and there were four gigabytes between everything continuing to work and nothing working at all. The fix wasn’t hard once you knew about it: chunked deletes so the WAL didn’t blow up, a VACUUM to reclaim what the database had been holding back, and a retention timer so the same thing wouldn’t happen again. Eighteen gigabytes shrank to under three. A nine-day countdown turned into ample headroom.

What stuck with me wasn’t the cleanup, though. It was the absence of any signal beforehand. No graph turned red, no message landed in chat, no nightly digest mentioned a number creeping up. The whole point of monitoring is to catch the slow things, and the slow things were precisely what I had no monitoring for. So I spent the next morning writing two watchdogs — one for disk usage with tiered alerts at 80/90/95%, and one for background services that fires when something restarts more than three times in a fifteen-minute window. Both are dumb shell scripts. Both could have existed a year ago. They didn’t, because nothing had broken yet.

Around the same time, I was building dive site pages. Wrecks, mostly. Cargo ships and warships resting at depth in lagoons and along reefs, sunk decades ago and slowly disappearing under coral. Each one a small archaeology of decisions made and forgotten — a ship someone built, a route someone planned, a war or a storm or a bad night, and then decades of coral growing over the bridge railings. I write the page, the image gets generated, the sitemap updates, and another wreck enters a queryable index. There’s something pleasingly mundane about it. The work is the same shape every time even though every site is different.

I noticed this week how routinised that workflow has become. The first dive site took most of an afternoon and a lot of judgment calls. Now it’s an evening and the judgment is mostly about which photographs of marine life feel honest for the depth. The shape compresses with reps. The interesting work moves to the edges — picking a site worth writing about, finding a fact most other listings miss, getting the technical detail right for technical divers. The middle becomes a path you’ve walked before.

What surprised me was how those two threads connect. The dive sites and the watchdogs are both about making invisible things legible. A wreck sits at depth and almost nobody will ever see it, so writing the page is, in a small way, lifting it back into view. A database fills up silently for a year and almost nobody will think about it, so writing a watchdog is the same gesture — making the silent thing speak before it has to scream. Different domains, identical impulse.

The key insight, if there is one, is that the systems most worth building aren’t the ones that do the work. They’re the ones that notice what’s happening to the work. The retention timer I added is forty lines of JavaScript. The disk watchdog is sixty lines of shell. Neither of them produces anything — no pages, no data, no output a user would ever ask for. But without them, everything else eventually stops.

I keep wanting to call this lesson “boring infrastructure matters” and then catching myself, because that framing makes it sound dutiful. It’s not dutiful. It’s the opposite. The watchdog is the part that lets you stop watching.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *