Profiles (eBPF)
π¬ 4th Pillar of Observability - Profiles (eBPF)
What are they for?
Understanding what exactly your code does at runtime β CPU, memory, allocations, locks
π What are profiles?
Profiles are the fourth telemetry signal in OpenTelemetry (alongside logs, metrics, and traces), providing detailed information about how an application uses system resources in real time.
Key characteristics:
- Show which functions/lines of code consume CPU, memory, or other resources
- Enable continuous profiling β continuous collection of profiling data in production (not just during debugging)
- In OpenTelemetry: data model stable (since 2024), SDK implementation in progress
What a profile contains:
Stack trace sample (CPU profile):
ββββββββββββββββββββββββββββββββββββββ
main.handleRequest() β 45% CPU
βββ db.QueryContext() β 30% CPU
βββ net/http.(*conn).readRequest β 10% CPU
βββ json.Marshal() β 5% CPU
Each sample contains:
- Stack trace β full function call path
- Value β how much of the resource was consumed (CPU cycles, memory bytes, allocation count)
- Labels β context (service name, environment, etc.)
- Timestamp β when the sample was collected
π eBPF β The Foundation of Modern Profiling
eBPF (extended Berkeley Packet Filter) is a Linux kernel technology that allows running sandboxed programs in kernel space without modifying the kernel source code or loading modules.
Why eBPF is crucial for profiling:
| Aspect | Traditional Profiling | eBPF Profiling |
|---|---|---|
| Overhead | 5-20% (e.g., Java Flight Recorder) | < 1% |
| Code changes required | Yes (agent/library) | No β operates at kernel level |
| Languages | Language-specific | Any language (observes syscalls) |
| Security | Agent in process | Sandboxed in kernel |
| Visibility | User-space only | User-space + kernel-space |
How eBPF profiling works:
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Space β
β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β Service A β β Service B β β Service C β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Kernel Space β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββ β
β β eBPF Programs (sandboxed) β β
β β β β
β β β’ perf_event β collects stack traces β β
β β β’ kprobe β intercepts syscalls β β
β β β’ uprobe β hooks user-space functions β β
β ββββββββββββββββββββ¬ββββββββββββββββββββββββββ β
β β β
β eBPF Maps (ring buffer) β
βββββββββββββββββββββββΌββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββ
β Profiling Agent β
β (Pyroscope/Parca) β
β β aggregation β
β β symbolization β
β β export to backend β
βββββββββββββββββββββββββ
Key eBPF mechanisms:
perf_eventβ periodic stack trace sampling (e.g., every 10ms) β CPU profilekprobe/kretprobeβ hooking kernel functions (e.g., memory allocations, I/O operations)uprobe/uretprobeβ hooking user-space functions without restarting the process- eBPF Maps β shared memory kernel β user-space for passing samples
π₯ Flame Graphs
Profiles are most commonly visualized as flame graphs:

How to read a flame graph:
- Wide blocks at the top β functions that themselves consume many resources (hot spots)
- Wide blocks at the bottom β functions that call expensive subtrees
- Flame graph comparison (diff) β what changed between deployment versions
π Profile Types
| Profile Type | What it measures | Unit | When to use |
|---|---|---|---|
| CPU | Time spent executing code | nanoseconds / cycles | High CPU usage, slow endpoints |
| Heap (Alloc) | Currently allocated memory | bytes | Memory leaks, high RAM usage |
| Goroutine / Thread | Number of active threads/goroutines | count | Goroutine leaks, deadlocks |
| Mutex / Lock | Time spent waiting for locks | nanoseconds | Contention, slow concurrency |
| Block / I/O | Time blocked on I/O operations | nanoseconds | Slow network/disk operations |
| Off-CPU | Time when thread is NOT on CPU | nanoseconds | Waiting for I/O, scheduling |
π οΈ Continuous Profiling Tools
Grafana Pyroscope (recommended in Grafana Stack)
# Example Pyroscope configuration with Grafana Alloy (eBPF)
pyroscope.ebpf "instance" {
forward_to = [pyroscope.write.endpoint.receiver]
targets_only = false
default_target = {"service_name" = "unspecified"}
demangle = "none"
sample_rate = 97 // Hz - samples per second
}
pyroscope.write "endpoint" {
endpoint {
url = "http://pyroscope:4040"
}
}
Parca (open-source, CNCF sandbox)
# parca-agent as DaemonSet in Kubernetes
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: parca-agent
spec:
template:
spec:
containers:
- name: parca-agent
image: ghcr.io/parca-dev/parca-agent
securityContext:
privileged: true # required for eBPF
args:
- /bin/parca-agent
- --node=$(NODE_NAME)
- --store-address=parca-server:7070
OpenTelemetry Profiling (in development)
OTel Profiling Pipeline:
Application / eBPF Agent
β
βΌ
OTel Collector
(profilesreceiver) β new receiver for profiles
β
βΌ
Backend (Pyroscope / Parca / Datadog / Elastic)
π Correlating Profiles with Other Signals
The greatest value of profiles emerges when they are correlated with other signals:
π Metric: CPU usage spike β 95%
β
βββ π Trace: GET /api/reports (span: 12.5s)
β β
β βββ π¬ Profile: json.Marshal() β 78% CPU in this span
β βββ Conclusion: serialization of a large object
β
βββ πͺ΅ Log: "Report generation completed" (duration: 12.5s)
How it works in practice:
- Span β Profile: OpenTelemetry links
span_idwith profile samples β click on a slow span and see which functions are slowing it down - Metric β Profile: Grafana allows navigating from a metrics dashboard to a flame graph from the same time period
- Profile β Log: Flame graph points to a function β log shows what happened inside it
β‘ When to use profiles?
- β Performance optimization β finding hot spots in production code
- β Memory leak diagnosis β heap profile shows whatβs holding memory
- β Regression analysis β comparing profiles before and after deployment (diff flame graph)
- β Cloud cost reduction β identifying inefficient code β smaller instances
- β Latency debugging β when a trace shows a slow span, the profile shows why
- β Does not replace traces or metrics β itβs a complementary signal
π‘ Tip: Start with a CPU profile using eBPF (zero code changes, < 1% overhead), then add heap/goroutine profiles for specific problems.
π Four Pillars Together
Logs β What happened
Metrics β How often / how long
Traces β Where and why
Profiles β Why so slow / what consumes resources
β‘οΈ Together they provide a complete picture of system behavior