Uptime monitoring
for SRE and on-call teams

Every alert your monitoring sends should pass one test: is this worth waking someone up for? Too many tools fail it — they page on a single network blip from a single region. Status Harbor confirms outages across independent regions before it pages, so the alerts you get are the ones worth getting.

The 3 a.m. pager test

Alert fatigue does not start with too many incidents. It starts with too many alerts that were not incidents — a check that flapped once, a regional route that hiccuped, a CDN edge that had a bad thirty seconds. The page fires, the responder wakes up, opens the laptop, and the service is already fine.

Do that a few times and the rotation learns the wrong lesson: alerts can wait. The next page — the real one — gets the same delayed response as the noise. The fix is not a better notification channel. It is sending fewer, truer alerts, so every page passes the test: this was worth waking someone up for.

That is a property of how the check decides something is down — not of how the alert is delivered.

How multi-region confirmation kills false positives

A check from one region answers a narrow question: "can this one network path reach the service?" A transit glitch, a BGP route flap or a single bad CDN POP all answer that question with "no" — and a single-region monitor reports an outage that none of your users in other regions ever experienced.

Requiring agreement changes the maths. Short network blips are mostly independent: a glitch on the path from one region rarely coincides with a glitch on the path from another at the same instant. Ask three or four independent regions to agree before opening an incident, and the simultaneous-blip case — the dominant source of false pages — largely disappears, because independent rare events rarely line up.

A real outage behaves differently. The service is down for everyone, so every region fails together and the alert fires immediately. Status Harbor probes from 9 regions across 6 continents; you choose how many must agree. The result is a monitor that stays quiet for blips and pages fast for outages.

What an alert looks like

An alert is only useful if the responder can act on it without opening the dashboard first. A Status Harbor alert is compact and paste-ready — the monitor, the endpoint, the cause and when it started:

🚨 [Monitor] INCIDENT: api.example.com
https://api.example.com
Cause: upstream returned 503 for GET /healthz
Started: 2026-05-19 02:58:12 UTC

Status Harbor

By the time this lands, the outage has already been confirmed by multiple regions — that filtering happens before the alert fires, not in the message. So an alert arriving at all is the signal that it is real and worth acting on. It reaches you on whichever channel the monitor is routed to — Slack, Telegram, email or a webhook.

The right alert to the right human

Confirmation cuts false positives; routing makes sure the true ones land in the right place. Each monitor routes to its own channel — Slack, Telegram, email or a webhook. Production pages the on-call channel. Staging, internal tools and dev environments go somewhere quieter that nobody is paged from.

That separation is unglamorous and it is one of the highest-leverage things an on-call team can do. A staging deploy that fails at midnight should never wake the person holding the production pager.

Incident grouping so the postmortem writes itself

During an outage, a per-check alerting model floods the channel — one message a minute until someone mutes it, which is exactly when a second unrelated failure gets missed. Status Harbor groups consecutive failed checks for a monitor into a single incident and updates it in place.

The incident carries a timeline: every failed probe, every recovery probe and the region that observed each, with timestamps. When checks recover it closes itself. The postmortem is read off that timeline instead of being reconstructed from channel scrollback the next morning.

Frequently asked questions

How does multi-region confirmation reduce false positives?

A single-region check cannot tell a real outage apart from a transit blip on one network path. Requiring several independent regions to agree before an alert fires changes the odds: a brief routing glitch usually hits one path, not three or four at once, so a quorum requirement filters those out while a real outage still trips every region. You trade a few seconds of detection time for a large drop in noise.

Will multi-region confirmation slow down real alerts?

Slightly, and on purpose. Detection time is one check interval plus the time to confirm from a second region. With 1-minute checks that is roughly 60 to 90 seconds end to end on Slack and Telegram. A real outage trips every region inside that window, so confirmation costs seconds and removes the majority of false pages.

Can different monitors page different people?

Yes. Each monitor routes to its own channel. Production goes to the on-call channel; staging and internal tools go somewhere quieter. A staging failure never pages the production rotation, which is one of the simplest and most effective ways to cut alert fatigue.

What does a Status Harbor alert contain?

The monitor name, which regions observed the failure, the response code or connection error, a humanized description of what went wrong and the timestamp. It is formatted to paste straight into an incident thread, so the responder starts with context instead of opening the dashboard to reconstruct it.

How does incident grouping help on-call?

Consecutive failed checks for the same monitor are grouped into one incident rather than one alert per check. The incident keeps a timeline of every alert, every recovery probe and the region that observed each, and closes automatically when checks recover. The postmortem reads off that timeline instead of being reconstructed from channel scrollback.

Page on outages, not on noise

Free plan, 5 monitors, multi-region confirmation included. No credit card.

Start monitoring free

Related