Push Gateway
Push gateway
Why not use it?
Prometheus recommends using Pushgateway only in limited cases. There are several pitfalls when mindlessly using Pushgateway instead of Prometheus’s standard pull model:
Main problems:
- Single Point of Failure: Monitoring multiple instances through a single Pushgateway makes it a single point of failure and potential bottleneck
- Loss of automatic health monitoring: You lose automatic instance health monitoring through the
upmetric (generated at each scrape) - Metric lifecycle problem: Pushgateway never forgets series pushed to it and will expose them to Prometheus forever unless manually deleted via API
Particularly problematic:
When multiple job instances differentiate their metrics in Pushgateway through an instance label or similar, metrics for instances will remain in Pushgateway even if the original instance is renamed or deleted. This is because the lifecycle of Pushgateway as a metric cache is fundamentally separated from the lifecycle of processes that push metrics to it.
When to use
The only usually justified use case is capturing results of service-level batch jobs - batch jobs at the service level that are not semantically related to a specific machine or job instance (e.g., a batch job deleting users for the entire service).
Alternatives to push gateway
Firewall/NAT problem:
If an incoming firewall or NAT prevents scraping metrics from targets, consider:
- Moving the Prometheus server behind the network barrier
- Running Prometheus servers in the same network as monitored instances
- Using PushProx, which allows Prometheus to traverse firewall or NAT
Machine-related batch jobs:
For batch jobs related to a machine (e.g., automatic security updates, running configuration management clients):
- Use Node Exporter’s textfile collector instead of Pushgateway
- This ensures proper lifecycle for metrics associated with a specific machine
How to use correctly
- Set
honor_labels: truein Prometheus!!! Otherwise Prometheus will take metrics from the Push Gateway scrape, not those it reports. - Adding:
PUTHTTP to urlhttp://pushgateway.example.org:9091/metrics/job/some_job/instance/some_instancesome_job- job name. This name will be overwritten during scraping unlesshonor_labels: trueis enabledsome_instance- instance name
- Deletion (because metric lives forever in gateway)
DELETE:- For instance
http://pushgateway.example.org:9091/metrics/job/some_job/instance/some_instance - For job:
http://pushgateway.example.org:9091/metrics/job/some_job
- For instance
- Remember that push gateway exposes all metrics together. So there can be no conflicts.
Timestamp
- Timestamp of metric sent to push gateway ≠ timestamp of metric in Prometheus. It will use the one from scrape.
- Gateway adds
push_time_secondsandpush_failure_time_secondsto metrics
Encoding
Problem 1:
We want to set labels job="directory_cleaner",path="/var/tmp". /var/tmp won’t work because:
/metrics/job/directory_cleaner/path//var/tmp
will be treated as an empty value. So:
/metrics/job/directory_cleaner/path@base64/L3Zhci90bXA
With curl:
echo 'some_metric{foo="bar"} 3.14' | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/directory_cleaner/path@base64/$(echo -n '/var/tmp' | base64url)
Problem 2
We want to set 2 labels:
job="example",first_label="",second_label="foobar"
Which is:
/metrics/job/example/first_label//second_label/foobar
Won’t work for the same reason as above. Must use = as connector:
/metrics/job/example/first_label@base64/=/second_label/foobar
Problem 3
Label:
job="titan",name="Προμηθεύς"
Can be:
/metrics/job/titan/name/%CE%A0%CF%81%CE%BF%CE%BC%CE%B7%CE%B8%CE%B5%CF%8D%CF%82
or
/metrics/job/titan/name@base64/zqDPgc6_zrzOt864zrXPjc-C
Problem 4 (UTF)
- Flag
--push.enable-utf8-namesis required - Label name must be prefixed with
U__ - Special characters must be surrounded by
_. So_1F60A_ - Existing
_must have an additional_. So_becomes__ - From the above, if we have encoded with
_(e.g.,_55_) we getU___55_____
Methods/API
PUT
- pushing a group of metrics
- format is either protobuf or text
- Responses:
- 200 - success
- 400 - bad request, metric conflict. Reason is returned in response.
- 202 - returned only if flag
--push.disable-consistency-checkis set.- Then metrics are not checked on push, but scrape will fail.
- It may happen that Gateway has inconsistent metrics. Then it will start rejecting other requests.
- Push Gateway is not persistent.
PUTwith empty body deletes entire metric group (group is defined by url).
POST
- works the same as
PUT, but only metrics with the same name are replaced- So
POSTwith metric valuepush_time_secondswill only update that value. Others will remain unchanged.
- So
DELETE
- Deletes metrics from group
- Request body is empty
- Response is always 202.
- Deletion doesn’t happen immediately (
PUSHandPUTare executed immediately). It’s queued- So there’s no guarantee it will succeed,
Admin Api
- Disabled by default
- Enable through
--web.enable-admin-api - URL:
/api/<API_VERSION>/admin/<HANDLER> - E.g., to delete all metrics:
curl -X PUT http://pushgateway.example.org:9091/api/v1/admin/wipe
Query API
- URL:
/api/<API_VERSION>/<HANDLER> - Methods:
status- gateway infometrics- metrics
Management API
- Methods
GET /-/healthy- Returns code 200 when Pushgateway is healthy.GET /-/ready- Returns code 200 when Pushgateway is ready to handle traffic.- Disabled by default and can be enabled with flag –web.enable-lifecycle
PUT /-/quit- Triggers graceful shutdown of Pushgateway.