Service Discovery

Service Discovery

Service discovery is a core feature of Prometheus that enables automatic detection and monitoring of targets in dynamic environments. Instead of manually listing every target, Prometheus can automatically discover what to scrape.

Why Service Discovery?

Traditional static configuration problems:

  • Manual updates when services change
  • Error-prone IP address management
  • Doesn’t scale in dynamic cloud environments
  • No automatic adaptation to infrastructure changes

Service discovery benefits:

  • Automatic target detection - no manual intervention
  • Dynamic adaptation - handles scaling, deployments, failures
  • Metadata enrichment - automatic labels from infrastructure
  • Reduced configuration - DRY principle for monitoring

Service Discovery Mechanisms

scrape_configs:
  - job_name: 'static-nodes' # Hard coded in the main file
    static_configs:
      - targets:
          - 'node1.example.com:9100'
          - 'node2.example.com:9100'
          - '192.168.1.10:9100'
        labels:
          environment: 'production'
          datacenter: 'dc1'
  - job_name: 'file-sd'   # Hard coded in external files
    file_sd_configs:
      - files:
          - 'targets/*.json'
          - 'targets/*.yaml'
        refresh_interval: 30s
  - job_name: 'kubernetes-nodes'
    kubernetes_sd_configs:
      - role: node / pod / service / endpoints / ingress # Different Kubernetes objects

    relabel_configs:
      # Use internal IP as instance label
      - source_labels: [__address__]
        regex: '([^:]+)(?::\d+)?'
        target_label: __address__
        replacement: '${1}:10250'

      # Add node name
      - source_labels: [__meta_kubernetes_node_name] # Kubernetes labels
        target_label: node

  - job_name: 'aws-ec2'     # Cloud scrape. Available for AWS, Azure, GCP
    ec2_sd_configs:
      - region: us-east-1
        access_key: YOUR_ACCESS_KEY
        secret_key: YOUR_SECRET_KEY
        port: 9100
        filters:
          - name: tag:Environment
            values: [production]
          - name: instance-state-name
            values: [running]

Required pod annotations for Kuberentes SD:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"

Full list

And look for _sd_config in the configuration file.

Relabeling

Relabeling is crucial for service discovery - it transforms discovered metadata into useful labels.

Relabeling Actions

  1. replace (default) - replace target label with source
  2. keep - keep only targets matching regex
  3. drop - drop targets matching regex
  4. labelmap - map label names via regex
  5. labeldrop - drop labels matching regex
  6. labelkeep - keep only labels matching regex
  7. hashmod - modulo of hash for sharding

Common Relabeling Patterns

Pattern 1: Filter targets by annotation

- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
  action: keep
  regex: true

Pattern 2: Extract port from annotation

- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
  action: replace
  regex: ([^:]+)(?::\d+)?;(\d+)
  replacement: $1:$2
  target_label: __address__

Pattern 3: Create custom labels from tags

- source_labels: [__meta_ec2_tag_Name]
  target_label: instance_name

- source_labels: [__meta_ec2_tag_Environment]
  target_label: environment

Pattern 4: Drop unwanted labels

- regex: '__meta_kubernetes_pod_label_pod_template_hash'
  action: labeldrop

Pattern 5: Map all labels with prefix

- regex: '__meta_kubernetes_pod_label_(.+)'
  action: labelmap
  replacement: k8s_label_$1

Pattern 6: Sharding across multiple Prometheus instances

- source_labels: [__address__]
  modulus: 4
  target_label: __tmp_hash
  action: hashmod

- source_labels: [__tmp_hash]
  regex: ^1$  # This Prometheus handles shard 1
  action: keep

Reserved Labels

Labels starting with __ are temporary and dropped after relabeling:

  • __address__ - target address (host:port)
  • __scheme__ - http or https
  • __metrics_path__ - metrics endpoint path
  • __param_<name> - URL parameters
  • __meta_* - metadata from service discovery

Best Practices

1. Use annotations/tags for opt-in monitoring:

prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"

2. Add meaningful labels:

- source_labels: [__meta_kubernetes_namespace]
  target_label: namespace

- source_labels: [__meta_kubernetes_pod_label_app]
  target_label: app

3. Avoid high-cardinality labels:

# DON'T use pod UID as label
# DO use pod name/namespace

4. Use separate scrape configs for different roles:

  • Separate configs for pods, services, nodes
  • Different intervals for different target types
  • Specific relabeling per target type

5. Test relabeling rules:

# Check discovered targets in Prometheus UI
http://prometheus:9090/targets

# Check service discovery state
http://prometheus:9090/service-discovery

6. Monitor service discovery:

# Targets discovered but down
up == 0

# Number of targets per job
count by (job) (up)

# SD refresh failures
prometheus_sd_refresh_failures_total

results matching ""

    No results matching ""