Push Gateway

Push gateway

Why not use it?

Prometheus recommends using Pushgateway only in limited cases. There are several pitfalls when mindlessly using Pushgateway instead of Prometheus’s standard pull model:

Main problems:

  • Single Point of Failure: Monitoring multiple instances through a single Pushgateway makes it a single point of failure and potential bottleneck
  • Loss of automatic health monitoring: You lose automatic instance health monitoring through the up metric (generated at each scrape)
  • Metric lifecycle problem: Pushgateway never forgets series pushed to it and will expose them to Prometheus forever unless manually deleted via API

Particularly problematic:

When multiple job instances differentiate their metrics in Pushgateway through an instance label or similar, metrics for instances will remain in Pushgateway even if the original instance is renamed or deleted. This is because the lifecycle of Pushgateway as a metric cache is fundamentally separated from the lifecycle of processes that push metrics to it.

When to use

The only usually justified use case is capturing results of service-level batch jobs - batch jobs at the service level that are not semantically related to a specific machine or job instance (e.g., a batch job deleting users for the entire service).

Alternatives to push gateway

Firewall/NAT problem:

If an incoming firewall or NAT prevents scraping metrics from targets, consider:

  • Moving the Prometheus server behind the network barrier
  • Running Prometheus servers in the same network as monitored instances
  • Using PushProx, which allows Prometheus to traverse firewall or NAT

For batch jobs related to a machine (e.g., automatic security updates, running configuration management clients):

  • Use Node Exporter’s textfile collector instead of Pushgateway
  • This ensures proper lifecycle for metrics associated with a specific machine

How to use correctly

  • Set honor_labels: true in Prometheus!!! Otherwise Prometheus will take metrics from the Push Gateway scrape, not those it reports.
  • Adding: PUT HTTP to url http://pushgateway.example.org:9091/metrics/job/some_job/instance/some_instance
    • some_job - job name. This name will be overwritten during scraping unless honor_labels: true is enabled
    • some_instance - instance name
  • Deletion (because metric lives forever in gateway) DELETE:
    • For instance http://pushgateway.example.org:9091/metrics/job/some_job/instance/some_instance
    • For job: http://pushgateway.example.org:9091/metrics/job/some_job
  • Remember that push gateway exposes all metrics together. So there can be no conflicts.

Timestamp

  • Timestamp of metric sent to push gateway ≠ timestamp of metric in Prometheus. It will use the one from scrape.
  • Gateway adds push_time_seconds and push_failure_time_seconds to metrics

Encoding

Problem 1:

We want to set labels job="directory_cleaner",path="/var/tmp". /var/tmp won’t work because:

/metrics/job/directory_cleaner/path//var/tmp

will be treated as an empty value. So:

/metrics/job/directory_cleaner/path@base64/L3Zhci90bXA

With curl:

echo 'some_metric{foo="bar"} 3.14' | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/directory_cleaner/path@base64/$(echo -n '/var/tmp' | base64url)

Problem 2

We want to set 2 labels:

job="example",first_label="",second_label="foobar"

Which is:

 /metrics/job/example/first_label//second_label/foobar

Won’t work for the same reason as above. Must use = as connector:

/metrics/job/example/first_label@base64/=/second_label/foobar

Problem 3

Label:

job="titan",name="Προμηθεύς"

Can be:

/metrics/job/titan/name/%CE%A0%CF%81%CE%BF%CE%BC%CE%B7%CE%B8%CE%B5%CF%8D%CF%82

or

/metrics/job/titan/name@base64/zqDPgc6_zrzOt864zrXPjc-C

Problem 4 (UTF)

  • Flag --push.enable-utf8-names is required
  • Label name must be prefixed with U__
  • Special characters must be surrounded by _. So _1F60A_
  • Existing _ must have an additional _. So _ becomes __
  • From the above, if we have encoded with _ (e.g., _55_ ) we get U___55_____

Methods/API

PUT

  • pushing a group of metrics
  • format is either protobuf or text
  • Responses:
    • 200 - success
    • 400 - bad request, metric conflict. Reason is returned in response.
    • 202 - returned only if flag --push.disable-consistency-check is set.
      • Then metrics are not checked on push, but scrape will fail.
  • It may happen that Gateway has inconsistent metrics. Then it will start rejecting other requests.
  • Push Gateway is not persistent.
  • PUT with empty body deletes entire metric group (group is defined by url).

POST

  • works the same as PUT, but only metrics with the same name are replaced
    • So POST with metric value push_time_seconds will only update that value. Others will remain unchanged.

DELETE

  • Deletes metrics from group
  • Request body is empty
  • Response is always 202.
  • Deletion doesn’t happen immediately (PUSH and PUT are executed immediately). It’s queued
    • So there’s no guarantee it will succeed,

Admin Api

  • Disabled by default
  • Enable through --web.enable-admin-api
  • URL: /api/<API_VERSION>/admin/<HANDLER>
  • E.g., to delete all metrics: curl -X PUT http://pushgateway.example.org:9091/api/v1/admin/wipe

Query API

  • URL: /api/<API_VERSION>/<HANDLER>
  • Methods:
    • status - gateway info
    • metrics - metrics

Management API

  • Methods
    • GET /-/healthy - Returns code 200 when Pushgateway is healthy.
    • GET /-/ready - Returns code 200 when Pushgateway is ready to handle traffic.
    • Disabled by default and can be enabled with flag –web.enable-lifecycle
    • PUT /-/quit - Triggers graceful shutdown of Pushgateway.

results matching ""

    No results matching ""