Traces
π Introduction

What are they for?
Detail arranged in the context of the entire flow
π§ What are traces?
Traces are request flow trails through a distributed system that show the complete execution path from start to finish.
π Basic Concepts
π Trace
Complete flow of a single request through all services
π¦ Span
A single operation within a trace (e.g., HTTP call, database query)
π³ Parent-Child Relationship
Spans form a hierarchy - parent span contains child spans
ποΈ Anatomy of a Trace
Example: E-commerce checkout
Trace: checkout-flow-12345
βββ π HTTP Request [Frontend β API Gateway] (200ms)
β βββ π Authentication [API Gateway β Auth Service] (50ms)
β βββ π Cart Validation [API Gateway β Cart Service] (80ms)
β β βββ ποΈ Database Query [Cart β PostgreSQL] (20ms)
β βββ π³ Payment Processing [API Gateway β Payment Service] (150ms)
β β βββ π¦ Bank API Call [Payment β External Bank] (120ms)
β β βββ π§ Email Notification [Payment β Email Service] (30ms)
β βββ π¦ Order Creation [API Gateway β Order Service] (90ms)
β βββ ποΈ Database Insert [Order β PostgreSQL] (15ms)
Span hierarchy:
Root Span: checkout-request
βββ Child: auth-validation
βββ Child: cart-validation
β βββ Child: cart-db-query
βββ Child: payment-processing
β βββ Child: bank-api-call
β βββ Child: email-notification
βββ Child: order-creation
βββ Child: order-db-insert
π Relationships Between Spans
Spans in traces can be related in two ways:
- Parent-Child (hierarchy)
- Span Links (loose association).
π¨βπ§ Parent-Child
A Parent-Child relationship means that the child span is part of the parentβs operation β it is invoked by the parent and executes in its context.
Root Span: HTTP GET /checkout β Parent
βββ Child: validate-cart β depends on parent
β βββ Child: db-query β nested child
βββ Child: process-payment β depends on parent
Characteristics:
- Child inherits TraceID from parent
- Child has a ParentSpanID pointing to the parent
- Childβs duration fits within the parentβs duration
- They form a call tree (hierarchy)
Code example (Go):
// Parent span
ctx, parentSpan := tracer.Start(ctx, "checkout")
defer parentSpan.End()
// Child span β automatically linked through ctx
ctx, childSpan := tracer.Start(ctx, "validate-cart")
defer childSpan.End()
π Span Links
Span Links connect spans that are logically related but donβt have a parent-child relationship β they may belong to different traces or different branches of the same trace.
Trace A: order-placed
βββ Span: publish-to-queue βββββββββββ
β Link
Trace B: order-processing β
βββ Span: consume-from-queue βββββββββ
Typical use cases:
- Batch processing β one span processes multiple messages, each from a different trace
- Async messaging β consumer links to the producerβs span (different traces)
- Fan-in β operation dependent on multiple previous operations from different traces
- Retries β new attempt links to the original span
Code example (Go):
// Consumer links to the producer's span from a different trace
link := trace.Link{ /*Link not Start*/
SpanContext: producerSpanContext,
Attributes: []attribute.KeyValue{
attribute.String("messaging.operation", "process"),
},
}
ctx, span := tracer.Start(ctx, "process-order",
trace.WithLinks(link),
)
defer span.End()
Comparison:
| Aspect | Parent-Child | Span Links |
|---|---|---|
| Relationship | Hierarchical (tree) | Loose (graph) |
| TraceID | Same | Can be different |
| Time | Child fits within parent | No time constraints |
| Context | Propagated automatically | Added manually |
| Use case | Synchronous calls | Async, batch, fan-in |
π Benefits of Traces
π Identifying Bottlenecks
Total request: 200ms
βββ Authentication: 50ms (25%)
βββ Cart validation: 80ms (40%) β BOTTLENECK!
βββ Payment: 150ms (75%) β BOTTLENECK!
βββ Order creation: 90ms (45%)
π¨ Debugging Errors
β Trace ID: abc123 - Payment Failed
βββ β
Authentication: SUCCESS (45ms)
βββ β
Cart validation: SUCCESS (75ms)
βββ β Payment processing: ERROR (timeout after 30s)
β βββ β Bank API: TIMEOUT (30s) β ROOT CAUSE
β βββ β οΈ Email: SKIPPED
βββ β οΈ Order creation: SKIPPED
π Performance Monitoring
- Latency percentiles (P50, P95, P99)
- Error rates per service
- Dependency mapping - which services talk to which
π Trace Standards
π― OpenTelemetry (OTel) - Current Standard
JSON format:
{
"traceId": "a1b2c3d4e5f6789012345678abcdef90",
"spanId": "1234567890abcdef",
"parentSpanId": "fedcba0987654321",
"operationName": "payment-processing",
"startTime": "2025-10-27T10:15:30.123456Z",
"endTime": "2025-10-27T10:15:30.273456Z",
"duration": 150000000,
"status": {
"code": "OK",
"message": ""
},
"attributes": {
"service.name": "payment-service",
"service.version": "1.2.3",
"http.method": "POST",
"http.url": "/api/payment",
"http.status_code": 200,
"user.id": "user123",
"payment.amount": 49.99,
"payment.currency": "USD"
},
"events": [
{
"time": "2025-10-27T10:15:30.150000Z",
"name": "bank.api.call.start",
"attributes": {
"bank.provider": "stripe"
}
}
]
}
OTel Span Structure:
- TraceID - unique identifier for the entire trace
- SpanID - unique identifier for the span
- ParentSpanID - parentβs ID (creates hierarchy)
- OperationName - operation name
- StartTime/EndTime - start and end timestamps
- Attributes - key-value metadata
- Events - points in time with additional data
- Status - success/error
π Context Propagation
W3C Trace Context (standard)
# HTTP Headers
traceparent: 00-a1b2c3d4e5f6789012345678abcdef90-1234567890abcdef-01
tracestate: vendor1=value1,vendor2=value2
traceparent structure:
00-[trace-id]-[parent-span-id]-[trace-flags]
β β β β
β β β βββ Flags (01 = sampled)
β β βββ Parent Span ID (16 hex chars)
β βββ Trace ID (32 hex chars)
βββ Version (00)
Code example (Go):
// HTTP Client - sending context
req.Header.Set("traceparent",
fmt.Sprintf("00-%s-%s-01", traceID, spanID))
// HTTP Server - receiving context
traceParent := r.Header.Get("traceparent")
parts := strings.Split(traceParent, "-")
traceID := parts[1]
parentSpanID := parts[2]
π Sampling Strategies
π² Sampling types:
# Head-based sampling (Jaeger)
samplingStrategies:
defaultStrategy:
type: probabilistic
param: 0.1 # 10% sampling
perServiceStrategies:
- service: "critical-service"
type: ratelimiting
maxTracesPerSecond: 100
- service: "high-volume-service"
type: probabilistic
param: 0.01 # 1% sampling
π§ Tail-based sampling (OTel Collector):
# Sampling after seeing complete trace
processors:
tail_sampling:
decision_wait: 10s
policies:
# Sample all errors
- name: error-policy
type: status_code
status_code: {status_codes: [ERROR]}
# Sample slow requests
- name: latency-policy
type: latency
latency: {threshold_ms: 1000}
# Sample 1% of normal traffic
- name: probabilistic-policy
type: probabilistic
probabilistic: {sampling_percentage: 1}
Advantages:
- Decision made after seeing the complete trace β can filter by status, latency, attributes
- 100% of errors and anomalies reach the backend β no important trace is discarded
- Ability to define multiple policies (errors, slow requests, % of normal traffic)
- Better than head-based in environments where important traces are rare
Disadvantages:
- Requires buffering traces in the Collector until the decision is made (
decision_wait) β higher memory usage - All spans of a given trace must reach the same Collector β requires a load balancer with routing by
trace_id - Greater infrastructure complexity (dedicated Collector tier for sampling)
- Export delay β traces wait for
decision_waitbefore being sent further
π Browser Tracing
OpenTelemetry enables browser instrumentation, allowing you to trace the entire request from user click to database response.
How does it work?
Browser (frontend) Backend
βββββββββββββββββββββββ ββββββββββββββββββββ
β User click β β β
β βββ Span: onClick β HTTP + W3C β β
β βββ Span: fetch βββββββββββββββ Span: /api/order β
β (traceparent)β β βββ Span: db β
βββββββββββββββββββββββ ββββββββββββββββββββ
A single trace connects frontend and backend thanks to W3C Trace Context propagation.
What can be traced in the browser?
| Signal | Description |
|---|---|
| Document Load | Page load time (DNS, TCP, TTFB, DOM) |
| HTTP/Fetch requests | XHR and Fetch with automatic traceparent propagation |
| User Interactions | Clicks, navigations, form submissions |
| Web Vitals | LCP, FID, CLS β Core Web Vitals as spans/metrics |
| Errors & Exceptions | Unhandled JS errors, promise rejections |
| Custom spans | Custom instrumentation of business logic |
Configuration (JavaScript)
import { WebTracerProvider } from '@opentelemetry/sdk-trace-web';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { ZoneContextManager } from '@opentelemetry/context-zone';
import { registerInstrumentations } from '@opentelemetry/instrumentation';
import { getWebAutoInstrumentations } from '@opentelemetry/auto-instrumentations-web';
const provider = new WebTracerProvider({
resource: {
attributes: {
'service.name': 'frontend-app',
'service.version': '1.0.0',
},
},
});
// Export traces to OTel Collector
provider.addSpanProcessor(
new BatchSpanProcessor(
new OTLPTraceExporter({
url: 'https://otel-collector.example.com/v1/traces',
})
)
);
// ZoneContextManager ensures proper context propagation
// in asynchronous browser code (setTimeout, fetch, Promise)
provider.register({
contextManager: new ZoneContextManager(),
});
// Auto-instrumentation: document load, fetch, user interactions
registerInstrumentations({
instrumentations: [
getWebAutoInstrumentations({
'@opentelemetry/instrumentation-document-load': {},
'@opentelemetry/instrumentation-fetch': {
propagateTraceHeaderCorsUrls: [/api\.example\.com/],
},
'@opentelemetry/instrumentation-user-interaction': {},
}),
],
});
Custom span in the browser
const tracer = provider.getTracer('frontend-app');
function addToCart(productId) {
const span = tracer.startSpan('add-to-cart', {
attributes: {
'product.id': productId,
'component': 'cart',
},
});
try {
// business logic...
span.setStatus({ code: SpanStatusCode.OK });
} catch (error) {
span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
span.recordException(error);
throw error;
} finally {
span.end();
}
}
CORS and Propagation
For the traceparent header to be sent to a backend on a different domain, the backend must allow this header in CORS:
Access-Control-Allow-Headers: traceparent, tracestate
Without this, the browser will strip the header and the trace will be broken at the frontend β backend boundary.
Export Architecture
Browser β OTel Collector β Backend (Tempo/Jaeger)
β
OTLP/HTTP (not gRPC!)
Note: Browsers do not support gRPC β the exporter must use OTLP/HTTP (
/v1/traces). The OTel Collector should expose an HTTP endpoint (default port4318).