Recording Rules
Recording Rules
Recording Rules allow precalculating expensive queries and saving results as new metrics for faster access.
Basics
- Purpose: Precalculation of frequently used or expensive PromQL expressions
- Benefits: Faster dashboards and alerts, reduced server load
- Format: YAML files, reloaded with
SIGHUPsignal - Configuration:
rule_filesfield in Prometheus config
Syntax
File structure:
groups:
- name: <group_name>
interval: <duration> # optional
rules:
- record: <metric_name>
expr: <promql_expression>
labels: # optional
<label>: <value>
Example:
groups:
- name: cpu_rules
interval: 30s
rules:
- record: instance:cpu_usage:rate5m
expr: 1 - avg by (instance) (rate(cpu_seconds_total{mode="idle"}[5m]))
- record: job:cpu_usage:rate5m
expr: avg by (job) (instance:cpu_usage:rate5m)
Key Group Parameters
- name: Unique group name
- interval: Evaluation frequency (default
global.evaluation_interval) - limit: Maximum number of series/alerts (0 = no limit)
- query_offset: Query time offset (for distributed systems)
- labels: Global labels for all rules in group
Tools
Syntax validation:
promtool check rules /path/to/rules.yml
- Status 0: File valid
- Status 1: Syntax errors
Best practices
Naming convention: level:metric:operations
- level:
instance,job,cluster - metric: Base metric name
- operations:
rate5m,sum,ratio
Good name examples:
instance:http_requests:rate5mjob:memory_usage:avgcluster:cpu_utilization:ratio
Hierarchical aggregation:
# Level 1: Instance
- record: instance:requests:rate5m
expr: sum by (instance) (rate(http_requests_total[5m]))
# Level 2: Service
- record: service:requests:rate5m
expr: sum by (service) (instance:requests:rate5m)
# Level 3: Global
- record: global:requests:rate5m
expr: sum(service:requests:rate5m)
Use cases
- Dashboards: Fast loading of complex graphs
- Alerts: Accelerated condition evaluations
- SLA/SLI: Precompiled quality metrics
- Capacity planning: Aggregated resource metrics
Monitoring
- Metric:
rule_group_iterations_missed_total- missed evaluations - Problem: Group doesn’t finish before next evaluation → skips iterations
- Effect: Gaps in recording rule data