Service Discovery
Service Discovery
Service discovery is a core feature of Prometheus that enables automatic detection and monitoring of targets in dynamic environments. Instead of manually listing every target, Prometheus can automatically discover what to scrape.
Why Service Discovery?
Traditional static configuration problems:
- Manual updates when services change
- Error-prone IP address management
- Doesn’t scale in dynamic cloud environments
- No automatic adaptation to infrastructure changes
Service discovery benefits:
- Automatic target detection - no manual intervention
- Dynamic adaptation - handles scaling, deployments, failures
- Metadata enrichment - automatic labels from infrastructure
- Reduced configuration - DRY principle for monitoring
Service Discovery Mechanisms
scrape_configs:
- job_name: 'static-nodes' # Hard coded in the main file
static_configs:
- targets:
- 'node1.example.com:9100'
- 'node2.example.com:9100'
- '192.168.1.10:9100'
labels:
environment: 'production'
datacenter: 'dc1'
- job_name: 'file-sd' # Hard coded in external files
file_sd_configs:
- files:
- 'targets/*.json'
- 'targets/*.yaml'
refresh_interval: 30s
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node / pod / service / endpoints / ingress # Different Kubernetes objects
relabel_configs:
# Use internal IP as instance label
- source_labels: [__address__]
regex: '([^:]+)(?::\d+)?'
target_label: __address__
replacement: '${1}:10250'
# Add node name
- source_labels: [__meta_kubernetes_node_name] # Kubernetes labels
target_label: node
- job_name: 'aws-ec2' # Cloud scrape. Available for AWS, Azure, GCP
ec2_sd_configs:
- region: us-east-1
access_key: YOUR_ACCESS_KEY
secret_key: YOUR_SECRET_KEY
port: 9100
filters:
- name: tag:Environment
values: [production]
- name: instance-state-name
values: [running]
Required pod annotations for Kuberentes SD:
apiVersion: v1
kind: Pod
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
And look for _sd_config in the configuration file.
Relabeling
Relabeling is crucial for service discovery - it transforms discovered metadata into useful labels.
Relabeling Actions
- replace (default) - replace target label with source
- keep - keep only targets matching regex
- drop - drop targets matching regex
- labelmap - map label names via regex
- labeldrop - drop labels matching regex
- labelkeep - keep only labels matching regex
- hashmod - modulo of hash for sharding
Common Relabeling Patterns
Pattern 1: Filter targets by annotation
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
Pattern 2: Extract port from annotation
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
Pattern 3: Create custom labels from tags
- source_labels: [__meta_ec2_tag_Name]
target_label: instance_name
- source_labels: [__meta_ec2_tag_Environment]
target_label: environment
Pattern 4: Drop unwanted labels
- regex: '__meta_kubernetes_pod_label_pod_template_hash'
action: labeldrop
Pattern 5: Map all labels with prefix
- regex: '__meta_kubernetes_pod_label_(.+)'
action: labelmap
replacement: k8s_label_$1
Pattern 6: Sharding across multiple Prometheus instances
- source_labels: [__address__]
modulus: 4
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ^1$ # This Prometheus handles shard 1
action: keep
Reserved Labels
Labels starting with __ are temporary and dropped after relabeling:
__address__- target address (host:port)__scheme__- http or https__metrics_path__- metrics endpoint path__param_<name>- URL parameters__meta_*- metadata from service discovery
Best Practices
1. Use annotations/tags for opt-in monitoring:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
2. Add meaningful labels:
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_pod_label_app]
target_label: app
3. Avoid high-cardinality labels:
# DON'T use pod UID as label
# DO use pod name/namespace
4. Use separate scrape configs for different roles:
- Separate configs for pods, services, nodes
- Different intervals for different target types
- Specific relabeling per target type
5. Test relabeling rules:
# Check discovered targets in Prometheus UI
http://prometheus:9090/targets
# Check service discovery state
http://prometheus:9090/service-discovery
6. Monitor service discovery:
# Targets discovered but down
up == 0
# Number of targets per job
count by (job) (up)
# SD refresh failures
prometheus_sd_refresh_failures_total