Flame Graphs
Reading Flame Graphs
Flame graphs are the primary visualization for profile data. Understanding how to read them is essential for effective profiling.
Structure
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β main.handleRequest() β β root (widest)
ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ€
β db.QueryContext() β json.Marshal() β
ββββββββββββββββ¬ββββββββββββββββ€ β
β net.Read() β sql.Prepare() β β
ββββββββββββββββ΄ββββββββββββββββ΄βββββββββββββββββββββββββββ
β top (self time)
- Width = proportion of total samples (time/resources)
- Y-axis = call stack depth (root at bottom, leaf at top)
- Color = typically indicates package/module (varies by tool)
How to Read
- Wide blocks at the top β functions that themselves consume many resources (hot spots)
- Wide blocks at the bottom β functions that call expensive subtrees
- Narrow blocks β functions that contribute little to total resource usage
- Self time vs total time: A function may have high total time (it calls expensive children) but low self time (it doesnβt do much work itself)
Key Patterns
CPU Hot Spot
main.ServeHTTP() β 100% total, 2% self
βββ handler.ProcessRequest() β 95% total, 5% self
βββ json.Marshal() β 60% total, 60% self β HOT SPOT
βββ db.Query() β 30% total, 3% self
βββ net.Read() β 27% total, 27% self
Action: Optimize json.Marshal() β perhaps use a faster serializer or reduce payload size.
Memory Leak Pattern
heap profile β growing over time:
main.handleRequest()
βββ cache.Store()
βββ make([]byte, largeSize) β allocations never freed
Action: Check if the cache has eviction logic.
Lock Contention
mutex profile:
main.handleRequest()
βββ sync.(*Mutex).Lock() β 80% of mutex wait time
βββ cache.(*Cache).Get() β shared cache with single lock
Action: Use a sharded cache or sync.RWMutex.
Diff Flame Graphs
Compare two profiles (e.g., before and after a deployment) to find regressions:
- Red = functions that got slower (more samples in the new profile)
- Green/Blue = functions that got faster (fewer samples)
- Grey = unchanged
This is essential for:
- Detecting performance regressions after deployments
- Validating optimization efforts
- Understanding the impact of code changes
Tips
- Start with CPU profiles (most common performance issues)
- Look for unexpectedly wide blocks β they indicate where time is actually spent
- Compare profiles before and after changes using diff view
- Use time range selection in Grafana to focus on specific incidents
- Filter by service name to isolate individual services
- eBPF profiles include both kernel and user-space β kernel time (e.g.,
sys_write) often reveals I/O bottlenecks