Docs/Lighthouse troubleshooting

Troubleshooting

The dashboard derives Lighthouse status from the heartbeat: a Lighthouse is online while it's heartbeated within the last 60s, offline otherwise.

When a Lighthouse is offline you'll see an lighthouse.offline incident open against it. Recovery fires lighthouse.recovered the moment a heartbeat lands again.

What happens to bound monitors

Every monitor bound to an offline Lighthouse is treated as down for the duration of the outage:

  • The monitor's status badge flips to down.
  • An ongoing lighthouse_offline incident opens against the monitor (visible on the monitor's incident list).
  • The monitor's 24h / 7d / 30d uptime % drops by the offline duration — those windows are no longer running checks, so counting them as up would lie.

Notification spam is suppressed by design: you get the single lighthouse.offline alert, not one alert per affected monitor. On recovery, the per-monitor incidents resolve alongside the lighthouse-level one and the agent re-syncs current state.

Agent is offline

Check, in order:

  1. Is the process running? systemd: systemctl status lighthouse. Docker: docker ps | grep lighthouse. Kubernetes: kubectl get pods -n status-harbor.
  2. Outbound HTTPS to lighthouse.statusharbor.io? From the host: curl -v https://lighthouse.statusharbor.io. Anything that proxies / inspects TLS will trip up the agent.
  3. Token still valid? If you deleted the Lighthouse in the dashboard, the agent's token is revoked and the agent will exit on its next heartbeat. Re-create and re-deploy.
  4. Logs. The agent logs JSON to stdout. systemd: journalctl -u lighthouse -f. Docker: docker logs -f lighthouse. Kubernetes: kubectl logs -f -n status-harbor -l app=lighthouse. Set LIGHTHOUSE_LOG_LEVEL=debug for verbose output (with check inputs/outputs redacted).

Checks aren't running

If the Lighthouse is online but a specific check isn't showing results:

  • The monitor must be bound to this Lighthouse, not another.
  • The agent fetches its check list on every heartbeat — give it up to the heartbeat interval before assuming a config change isn't applied.
  • Outbound network from the agent to the target must work. Test from inside the host or pod with curl / nc / dig.

Pause vs delete

  • Pause keeps the Lighthouse and its monitors in place but stops the agent from running checks. Use this for maintenance windows. Toggle it from the Lighthouse detail page in the dashboard, or via the Terraform paused attribute.
  • Delete is irreversible. It revokes the token, deletes bound monitors and incidents, and the agent process exits on its next call.

Upgrading

The dashboard shows an Update available badge when the agent version is older than the latest GitHub release. The Upgrade modal prints the right command per install path. The agent self-restarts under systemd / Docker / Helm — no manual reboot needed.