Uptime Monitoring
Overview
Section titled “Overview”UDS Core provides uptime monitoring at two levels:
- Endpoint probes — HTTP/HTTPS probing of application endpoints via Blackbox Exporter, configured through the UDS Package CR
- Core component uptime — recording rules that track the availability of UDS Core’s own infrastructure components (Prometheus, Alertmanager, Keycloak, Loki, etc.)
Endpoint probes are user-configurable. Core component uptime is included out of the box and requires no configuration.
Endpoint Probes
Section titled “Endpoint Probes”To enable uptime monitoring for an exposed service, configure the uptime.checks section within your Package CR’s expose entries. Prometheus Probes are automatically created based on your configuration.
Basic Example
Section titled “Basic Example”apiVersion: uds.dev/v1alpha1kind: Packagemetadata: name: my-app namespace: my-appspec: network: expose: # monitors: https://myapp.uds.dev/ - service: my-app host: myapp gateway: tenant port: 8080 uptime: checks: paths: - /This creates a Prometheus Probe that monitors https://myapp.uds.dev/ using the default http_2xx module, which issues HTTP GET requests at a regular interval and checks for a successful (2xx) response.
Custom Paths
Section titled “Custom Paths”Monitor specific health endpoints:
spec: network: expose: # monitors: https://myapp.uds.dev/health and https://myapp.uds.dev/ready - service: my-app host: myapp gateway: tenant port: 8080 uptime: checks: paths: - /health - /readyMultiple Endpoints
Section titled “Multiple Endpoints”Monitor multiple services in a single package:
spec: network: expose: # monitors: https://app.uds.dev/healthz, https://api.uds.dev/health, https://api.uds.dev/ready, https://app.admin.uds.dev/ - service: frontend host: app gateway: tenant port: 3000 uptime: checks: paths: - /healthz - service: api host: api gateway: tenant port: 8080 uptime: checks: paths: - /health - /ready - service: admin host: app gateway: admin port: 8080 uptime: checks: paths: - /Authservice-Protected Applications
Section titled “Authservice-Protected Applications”For applications protected by Authservice, add uptime.checks to the expose entry as normal. The UDS Operator detects the enableAuthserviceSelector on the matching SSO entry and automatically:
- Creates a Keycloak service account client (
<clientId>-probe) with an audience mapper scoped to the application’s SSO client - Configures the Blackbox Exporter with an OAuth2 module that obtains a token via client credentials before probing
No additional configuration is required beyond adding uptime.checks.paths:
apiVersion: uds.dev/v1alpha1kind: Packagemetadata: name: my-app namespace: my-appspec: sso: - name: My App clientId: uds-my-app redirectUris: - "https://myapp.uds.dev/login" enableAuthserviceSelector: app: my-app network: expose: - service: my-app host: myapp gateway: tenant port: 8080 uptime: checks: paths: - /healthzThe operator matches the expose entry to the SSO entry via the redirect URI origin (https://myapp.uds.dev) and configures the probe to authenticate transparently through Authservice.
Multiple Expose Entries for Same FQDN
Section titled “Multiple Expose Entries for Same FQDN”Uptime monitoring is opt-in by defining uptime.checks.paths. If you have multiple expose entries for the same FQDN, only one can have uptime checks configured:
spec: network: expose: - service: my-app host: myapp gateway: tenant port: 8080 uptime: checks: paths: - / - service: my-app host: myapp gateway: tenant port: 8443 description: secondary-port # no uptime configuration (not monitored)Included Metrics and Recording Rules
Section titled “Included Metrics and Recording Rules”UDS Core ships recording rules that track the availability of core infrastructure components. These produce uds:<component>:up metrics (1 = available, 0 = unavailable) for components including Prometheus, Alertmanager, Blackbox Exporter, Keycloak, Loki, Grafana, Istio, and others. The uds:access:up metric represents Keycloak endpoint reachability, serving as the overall access health indicator. No user configuration is needed.
Endpoint probes produce standard Blackbox Exporter metrics:
| Metric | Description |
|---|---|
probe_success | Whether the probe succeeded (1) or failed (0) |
probe_duration_seconds | Total probe duration |
probe_http_status_code | HTTP response status code |
probe_ssl_earliest_cert_expiry | SSL certificate expiration timestamp |
Example Queries
Section titled “Example Queries”# Check all probes and their success statusprobe_success
# Check if a specific endpoint is upprobe_success{instance="https://myapp.uds.dev/health"}
# Check core component availabilityuds:keycloak:upGrafana Dashboards
Section titled “Grafana Dashboards”UDS Core includes two uptime dashboards:
- UDS / Monitoring / Core Uptime — displays the availability status, uptime percentage, and component status for UDS Core infrastructure components
- UDS / Monitoring / Probe Uptime — displays probe uptime status timeline, percentage uptime, and TLS certificate expiration dates for all monitored endpoints