Overview

Grafana Pyroscope

Pyroscope logo

Source: https://grafana.com/oss/pyroscope/

Grafana Pyroscope is an open-source continuous profiling platform that enables you to understand resource usage (CPU, memory, etc.) at the code level in production environments with minimal overhead.

Key Features

  • Continuous profiling β€” always-on profiling in production, not just during debugging sessions
  • Multiple profile types β€” CPU, heap/memory, goroutines, mutex contention, block/I/O, off-CPU
  • Multi-language support β€” Go, Java, Python, .NET, Node.js, Rust, C++ (via eBPF)
  • Flame graph visualization β€” intuitive visualization of where time and resources are spent
  • Diff flame graphs β€” compare profiles between deployments to detect regressions
  • Grafana integration β€” native datasource with drill-down from metrics and traces to profiles

Data Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Data Sources                       β”‚
β”‚                                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ eBPF     β”‚  β”‚ Pyroscope SDKβ”‚  β”‚ JFR / CORECLRβ”‚  β”‚
β”‚  β”‚ (all     β”‚  β”‚ (Go, Python, β”‚  β”‚ (Java, .NET) β”‚  β”‚
β”‚  β”‚ languages)β”‚  β”‚  Node.js)    β”‚  β”‚              β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚        β”‚               β”‚                 β”‚          β”‚
β”‚        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚                        β–Ό                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚              Grafana Alloy                      β”‚ β”‚
β”‚  β”‚  pyroscope.ebpf β†’ eBPF-based CPU profiling     β”‚ β”‚
β”‚  β”‚  pyroscope.scrape β†’ SDK-based profile scraping β”‚ β”‚
β”‚  β”‚  pyroscope.write β†’ forward to Pyroscope        β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                        β–Ό                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚           Pyroscope Server (:4040)              β”‚ β”‚
β”‚  β”‚  Storage: Azure Blob / S3 / local disk         β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                        β–Ό                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚              Grafana (:3000)                    β”‚ β”‚
β”‚  β”‚  Flame graphs, diff views, trace correlation   β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Internal Architecture

Source: https://grafana.com/docs/pyroscope/latest/reference-pyroscope-architecture/about-grafana-pyroscope-architecture/

Pyroscope has a microservices-based architecture. All components are compiled into a single binary, and the -target parameter controls which component(s) the process runs as. This allows the same binary to operate as a monolith or as individual microservices.

Components

Component Role Stateful
Distributor Receives incoming profiles from clients and routes them to ingesters No
Ingester Writes profiles to local disk, periodically flushes blocks to long-term storage Yes
Compactor Merges blocks from multiple ingesters, removes duplicate samples, reduces storage No
Query-frontend Receives queries, accelerates execution (splitting, caching), dispatches to query-scheduler No
Query-scheduler Maintains a per-tenant query queue, ensures fair scheduling No
Querier Pulls queries from scheduler, fetches data from ingesters (recent) and store-gateways (historical) No
Store-gateway Provides access to blocks in long-term object storage No

The Write Path

                    Profiles from clients
                            β”‚
                            β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Distributor  β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚  routes by tenant + series
                            β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β–Ό             β–Ό             β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚ Ingester β”‚  β”‚ Ingester β”‚  β”‚ Ingester β”‚
        β”‚ (replica)β”‚  β”‚ (replica)β”‚  β”‚ (replica)β”‚
        β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
              β”‚              β”‚              β”‚
              β”‚   flush blocks to storage   β”‚
              β–Ό              β–Ό              β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚         Long-Term Object Storage       β”‚
        β”‚    (S3 / Azure Blob / GCS / local)     β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                             β–Ό
                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                     β”‚   Compactor   β”‚
                     β”‚ merge blocks, β”‚
                     β”‚ deduplicate   β”‚
                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  1. Distributor receives push requests and routes each profile series to ingesters
  2. Each series is replicated to 3 ingesters by default
  3. Ingesters append profiles to a per-tenant database on local disk
  4. In-memory profiles are periodically flushed to disk as blocks
  5. Blocks are uploaded to long-term object storage
  6. Compactor merges blocks from multiple ingesters into single blocks and removes duplicate samples

The Read Path

                    Query from Grafana
                            β”‚
                            β–Ό
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚ Query-frontend   β”‚
                  β”‚ (split, cache)   β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚ Query-scheduler  β”‚
                  β”‚ (per-tenant      β”‚
                  β”‚  fair queuing)   β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚    Querier       β”‚
                  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”˜
                      β”‚          β”‚
            recent    β”‚          β”‚  historical
            data      β”‚          β”‚  data
                      β–Ό          β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚ Ingestersβ”‚ β”‚ Store-gateway  β”‚
              β”‚ (memory) β”‚ β”‚ (object store) β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  1. Query-frontend receives the query, splits it by time range, checks the cache
  2. Query-scheduler queues the sub-queries with fair per-tenant scheduling
  3. Querier picks up work and fetches data from:
    • Ingesters β€” for recent, in-memory data
    • Store-gateways β€” for historical data in object storage
  4. Results are merged and returned to Grafana

Deployment Modes

Mode -target Description Use Case
Monolithic all (default) All components in a single process Development, small workloads, quick start
Microservices per component (e.g. ingester) Each component runs as a separate process Production β€” independent scaling, isolated failure domains

πŸ’‘ In this workshop we deploy Pyroscope in monolithic mode (-target=all) as a single replica, which is sufficient for a training environment.

Long-Term Storage

Pyroscope stores each tenant’s profiles in on-disk blocks containing an index, metadata, and Parquet tables. Blocks are uploaded to object storage for durability.

Backend Use Case
Amazon S3 Production (AWS)
Azure Blob Storage Production (Azure β€” used in our setup)
Google Cloud Storage Production (GCP)
OpenStack Swift Production (OpenStack)
Local filesystem Development, single-node only

Collection Methods

Method Languages Overhead Code Changes Description
eBPF All < 1% None Kernel-level sampling via Grafana Alloy
Pyroscope SDK Go, Python, Java, .NET, Node.js 1-5% Minimal In-process profiler with richer data
JFR Java, Kotlin 1-3% None (agent) Java Flight Recorder integration
CORECLR Profiler .NET 1-3% None (agent) .NET CLR profiling
Pyroscope scrape Go (pprof) < 1% Annotation only Pull-based via pod annotations

Profile Types

Type What It Measures When to Use
CPU Time spent executing code High CPU usage, slow endpoints
Heap (Alloc) Currently allocated memory Memory leaks, high RAM
Goroutine / Thread Active threads/goroutines Goroutine leaks, deadlocks
Mutex / Lock Time waiting for locks Lock contention
Block / I/O Time blocked on I/O Slow network/disk ops
Off-CPU Time when thread is not on CPU I/O waits, scheduling

When to Use Pyroscope

  • βœ… Performance optimization β€” find hot spots in production code
  • βœ… Memory leak diagnosis β€” heap profile shows what holds memory
  • βœ… Regression analysis β€” diff flame graphs before and after deployment
  • βœ… Cloud cost reduction β€” identify inefficient code β†’ smaller instances
  • βœ… Latency debugging β€” trace shows slow span, profile shows why
  • ❌ Does not replace traces or metrics β€” it’s a complementary signal

results matching ""

    No results matching ""