Docs/Lighthouse troubleshooting
Troubleshooting
The dashboard derives Lighthouse status from the heartbeat: a Lighthouse is online while it's heartbeated within the last 60s, offline otherwise.
When a Lighthouse is offline you'll see an lighthouse.offline
incident open against it. Recovery fires lighthouse.recovered
the moment a heartbeat lands again.
What happens to bound monitors
Every monitor bound to an offline Lighthouse is treated as down for the duration of the outage:
- The monitor's status badge flips to
down. - An ongoing
lighthouse_offlineincident opens against the monitor (visible on the monitor's incident list). - The monitor's 24h / 7d / 30d uptime % drops by the offline duration — those windows are no longer running checks, so counting them as up would lie.
Notification spam is suppressed by design: you get the single
lighthouse.offline alert, not one alert per affected monitor.
On recovery, the per-monitor incidents resolve alongside the
lighthouse-level one and the agent re-syncs current state.
Agent is offline
Check, in order:
- Is the process running? systemd:
systemctl status lighthouse. Docker:docker ps | grep lighthouse. Kubernetes:kubectl get pods -n status-harbor. - Outbound HTTPS to
lighthouse.statusharbor.io? From the host:curl -v https://lighthouse.statusharbor.io. Anything that proxies / inspects TLS will trip up the agent. - Token still valid? If you deleted the Lighthouse in the dashboard, the agent's token is revoked and the agent will exit on its next heartbeat. Re-create and re-deploy.
- Logs. The agent logs JSON to stdout. systemd:
journalctl -u lighthouse -f. Docker:docker logs -f lighthouse. Kubernetes:kubectl logs -f -n status-harbor -l app=lighthouse. SetLIGHTHOUSE_LOG_LEVEL=debugfor verbose output (with check inputs/outputs redacted).
Checks aren't running
If the Lighthouse is online but a specific check isn't showing results:
- The monitor must be bound to this Lighthouse, not another.
- The agent fetches its check list on every heartbeat — give it up to the heartbeat interval before assuming a config change isn't applied.
- Outbound network from the agent to the target must work. Test
from inside the host or pod with
curl/nc/dig.
Pause vs delete
- Pause keeps the Lighthouse and its monitors in place but
stops the agent from running checks. Use this for maintenance
windows. Toggle it from the Lighthouse detail page in the
dashboard, or via the Terraform
pausedattribute. - Delete is irreversible. It revokes the token, deletes bound monitors and incidents, and the agent process exits on its next call.
Upgrading
The dashboard shows an Update available badge when the agent version is older than the latest GitHub release. The Upgrade modal prints the right command per install path. The agent self-restarts under systemd / Docker / Helm — no manual reboot needed.