Alert Management
UDS Core deploys Alertmanager as a part of the kube-prometheus-stack Helm chart. Alertmanager is responsible for handling alerts sent by Prometheus/Loki and managing notification routing and silencing.
Configuring Alertmanager
Section titled “Configuring Alertmanager”It is recommended to configure Alertmanager to send alerts to a location that is actively monitored by your team. Common options include email, Slack, Mattermost, Microsoft Teams, or a paging service like PagerDuty or OpsGenie.
You can configure Alertmanager by providing bundle overrides. The below example shows how to configure Alertmanager to send Critical/Warning alert notifications to a Slack channel:
packages: - name: uds-core repository: ghcr.io/defenseunicorns/packages/uds/core ref: x.x.x overrides: kube-prometheus-stack: uds-prometheus-config: values: # Open up network egress to Slack API for Alertmanager - path: additionalNetworkAllow value: - direction: Egress selector: app.kubernetes.io/name: alertmanager ports: - 443 remoteHost: api.slack.com remoteProtocol: TLS description: "Allow egress Alertmanager to Slack API" kube-prometheus-stack: values: # Setup Alertmanager receivers # These are the destinations that alerts can be sent to # See: https://prometheus.io/docs/alerting/latest/configuration/#general-receiver-related-settings - path: alertmanager.config.receivers value: - name: slack slack_configs: - api_url: <YOUR_SLACK_WEBHOOK_SECRET_URL> # e.g. "https://hooks.slack.com/services/XXX/YYY/ZZZ" channel: <YOUR_SLACK_CHANNEL> # e.g. "#alerts" send_resolved: true - name: empty # Default receiver to catch any alerts that don't match a route
# Setup Alertmanager routing # This defines how alerts are grouped and routed to receivers # See: https://prometheus.io/docs/alerting/latest/configuration/#route-related-settings - path: alertmanager.config.route value: group_by: ["alertname", "job"] # group by alertname and job receiver: empty # Default receiver if no routes match
# Routes contains a route chain for matching alerts to receivers routes: # Send always firing Watchdog alerts to the empty receiver to avoid noise # (you could also point this to a Dead Man's Snitch like service to detect if Alertmanager is down) - matchers: - alertname = Watchdog receiver: empty # Send critical and warning alerts to Slack - matchers: - severity =~ "warning|critical" receiver: slackYou can find more information on configuring Alertmanager in the official documentation.
Viewing Alertmanager Alerts
Section titled “Viewing Alertmanager Alerts”By default, UDS Core configures Alertmanager as a data source in Grafana. This means you can view and manage Alertmanager alerts by navigating to the Alerting section in the Grafana UI.
To view alerts go to Alerting -> Alert rules in the left-hand menu. Here you can see a list of all alerts. You can filter alerts by data source (Prometheus or Loki), severity, and status (firing or resolved). You can also click on an individual alert to see more details, including the alert expression, labels, and annotations.
Alert Silencing
Section titled “Alert Silencing”Sometimes you may want to temporarily mute or silence certain alerts, for example during maintenance windows or when investigating an issue. You can do this by creating a silence in Alertmanager via the Grafana UI.
To create a silence, go to Alerting -> Silences (ensure Choose Alertmanager is set to Alertmanager and not Grafana) in the left-hand menu and click the New Silence button. Here you can specify the matchers for the alerts you want to silence, the duration of the silence, and an optional comment. This silence will be applied to Alertmanager via the Grafana Alertmanager data source.