Overview
Grafana Pyroscope
Source: https://grafana.com/oss/pyroscope/
Grafana Pyroscope is an open-source continuous profiling platform that enables you to understand resource usage (CPU, memory, etc.) at the code level in production environments with minimal overhead.
Key Features
- Continuous profiling β always-on profiling in production, not just during debugging sessions
- Multiple profile types β CPU, heap/memory, goroutines, mutex contention, block/I/O, off-CPU
- Multi-language support β Go, Java, Python, .NET, Node.js, Rust, C++ (via eBPF)
- Flame graph visualization β intuitive visualization of where time and resources are spent
- Diff flame graphs β compare profiles between deployments to detect regressions
- Grafana integration β native datasource with drill-down from metrics and traces to profiles
Data Flow
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Data Sources β
β β
β ββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β eBPF β β Pyroscope SDKβ β JFR / CORECLRβ β
β β (all β β (Go, Python, β β (Java, .NET) β β
β β languages)β β Node.js) β β β β
β βββββββ¬βββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ β
β β β β β
β βββββββββββββββββΌββββββββββββββββββ β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Grafana Alloy β β
β β pyroscope.ebpf β eBPF-based CPU profiling β β
β β pyroscope.scrape β SDK-based profile scraping β β
β β pyroscope.write β forward to Pyroscope β β
β ββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Pyroscope Server (:4040) β β
β β Storage: Azure Blob / S3 / local disk β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Grafana (:3000) β β
β β Flame graphs, diff views, trace correlation β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Internal Architecture
Pyroscope has a microservices-based architecture. All components are compiled into a single binary, and the -target parameter controls which component(s) the process runs as. This allows the same binary to operate as a monolith or as individual microservices.
Components
| Component | Role | Stateful |
|---|---|---|
| Distributor | Receives incoming profiles from clients and routes them to ingesters | No |
| Ingester | Writes profiles to local disk, periodically flushes blocks to long-term storage | Yes |
| Compactor | Merges blocks from multiple ingesters, removes duplicate samples, reduces storage | No |
| Query-frontend | Receives queries, accelerates execution (splitting, caching), dispatches to query-scheduler | No |
| Query-scheduler | Maintains a per-tenant query queue, ensures fair scheduling | No |
| Querier | Pulls queries from scheduler, fetches data from ingesters (recent) and store-gateways (historical) | No |
| Store-gateway | Provides access to blocks in long-term object storage | No |
The Write Path
Profiles from clients
β
βΌ
βββββββββββββββββ
β Distributor β
βββββββββ¬ββββββββ
β routes by tenant + series
βΌ
βββββββββββββββΌββββββββββββββ
βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ
β Ingester β β Ingester β β Ingester β
β (replica)β β (replica)β β (replica)β
βββββββ¬βββββ βββββββ¬βββββ βββββββ¬βββββ
β β β
β flush blocks to storage β
βΌ βΌ βΌ
ββββββββββββββββββββββββββββββββββββββββββ
β Long-Term Object Storage β
β (S3 / Azure Blob / GCS / local) β
ββββββββββββββββββββββ¬ββββββββββββββββββββ
β
βΌ
βββββββββββββββββ
β Compactor β
β merge blocks, β
β deduplicate β
βββββββββββββββββ
- Distributor receives push requests and routes each profile series to ingesters
- Each series is replicated to 3 ingesters by default
- Ingesters append profiles to a per-tenant database on local disk
- In-memory profiles are periodically flushed to disk as blocks
- Blocks are uploaded to long-term object storage
- Compactor merges blocks from multiple ingesters into single blocks and removes duplicate samples
The Read Path
Query from Grafana
β
βΌ
ββββββββββββββββββββ
β Query-frontend β
β (split, cache) β
ββββββββββ¬ββββββββββ
β
βΌ
ββββββββββββββββββββ
β Query-scheduler β
β (per-tenant β
β fair queuing) β
ββββββββββ¬ββββββββββ
β
βΌ
ββββββββββββββββββββ
β Querier β
βββββ¬βββββββββββ¬ββββ
β β
recent β β historical
data β β data
βΌ βΌ
ββββββββββββ βββββββββββββββββ
β Ingestersβ β Store-gateway β
β (memory) β β (object store) β
ββββββββββββ βββββββββββββββββ
- Query-frontend receives the query, splits it by time range, checks the cache
- Query-scheduler queues the sub-queries with fair per-tenant scheduling
- Querier picks up work and fetches data from:
- Ingesters β for recent, in-memory data
- Store-gateways β for historical data in object storage
- Results are merged and returned to Grafana
Deployment Modes
| Mode | -target |
Description | Use Case |
|---|---|---|---|
| Monolithic | all (default) |
All components in a single process | Development, small workloads, quick start |
| Microservices | per component (e.g. ingester) |
Each component runs as a separate process | Production β independent scaling, isolated failure domains |
π‘ In this workshop we deploy Pyroscope in monolithic mode (
-target=all) as a single replica, which is sufficient for a training environment.
Long-Term Storage
Pyroscope stores each tenantβs profiles in on-disk blocks containing an index, metadata, and Parquet tables. Blocks are uploaded to object storage for durability.
| Backend | Use Case |
|---|---|
| Amazon S3 | Production (AWS) |
| Azure Blob Storage | Production (Azure β used in our setup) |
| Google Cloud Storage | Production (GCP) |
| OpenStack Swift | Production (OpenStack) |
| Local filesystem | Development, single-node only |
Collection Methods
| Method | Languages | Overhead | Code Changes | Description |
|---|---|---|---|---|
| eBPF | All | < 1% | None | Kernel-level sampling via Grafana Alloy |
| Pyroscope SDK | Go, Python, Java, .NET, Node.js | 1-5% | Minimal | In-process profiler with richer data |
| JFR | Java, Kotlin | 1-3% | None (agent) | Java Flight Recorder integration |
| CORECLR Profiler | .NET | 1-3% | None (agent) | .NET CLR profiling |
| Pyroscope scrape | Go (pprof) | < 1% | Annotation only | Pull-based via pod annotations |
Profile Types
| Type | What It Measures | When to Use |
|---|---|---|
| CPU | Time spent executing code | High CPU usage, slow endpoints |
| Heap (Alloc) | Currently allocated memory | Memory leaks, high RAM |
| Goroutine / Thread | Active threads/goroutines | Goroutine leaks, deadlocks |
| Mutex / Lock | Time waiting for locks | Lock contention |
| Block / I/O | Time blocked on I/O | Slow network/disk ops |
| Off-CPU | Time when thread is not on CPU | I/O waits, scheduling |
When to Use Pyroscope
- β Performance optimization β find hot spots in production code
- β Memory leak diagnosis β heap profile shows what holds memory
- β Regression analysis β diff flame graphs before and after deployment
- β Cloud cost reduction β identify inefficient code β smaller instances
- β Latency debugging β trace shows slow span, profile shows why
- β Does not replace traces or metrics β itβs a complementary signal