OpenTelemetry Architecture

πŸ—οΈ Main OTel Components

  1. SDK & API – libraries in applications
  2. Instrumentation Libraries – ready-made integrations
  3. OpenTelemetry Collector – central data processing point

πŸ—ΊοΈ Observability Architecture in Our Solution

flowchart TB
    subgraph sources ["Data Sources"]
        app["Applications\n(instrumented with OTel SDK)"]

        subgraph k8s ["Kubernetes"]
            ksm["kube-state-metrics"]
            node["node-exporter"]
            kubelet["kubelet"]
        end

        subgraph exporters ["Exporters"]
            pg["PostgreSQL Exporter\n:9187"]
            redis["Redis Exporter\n:9121"]
        end
    end

    subgraph alloy_box ["Grafana Alloy β€” Unified Collection Layer"]
        direction LR
        otel_recv["OTLP Receiver\n:4317 gRPC / :4318 HTTP"]
        scraper["Prometheus Scraper\nServiceMonitors\nPod Annotations"]
        log_scrape["Log Scraper\nPod stdout/stderr"]
    end

    %% Applications β†’ Alloy (OTLP)
    app -- "OTLP\n(traces, metrics, logs)" --> otel_recv

    %% Kubernetes metrics β†’ Alloy (scrape)
    k8s -- "scrape metrics" --> scraper
    exporters -- "scrape metrics" --> scraper

    %% Alloy scrapes logs from pods
    app -. "stdout/stderr\n(pod logs)" .-> log_scrape

    prometheus["Prometheus:9090"]
    loki["Loki"]
    tempo["Tempo:4317"]
    pyroscope["Pyroscope:4040"]

    %% Alloy β†’ backends
    otel_recv -- "metrics" --> prometheus
    scraper -- "metrics\n(remote write)" --> prometheus
    otel_recv -- "logs (OTLP)" --> loki
    log_scrape -- "logs" --> loki
    otel_recv -- "traces (OTLP)" --> tempo
    otel_recv -- "profiles" --> pyroscope

    %% Metrics from Traces
    tempo -- "Span metrics generator\n(RED metrics: rate, errors, duration)" --> prometheus

    mimir["Mimir\n(long-term metrics)"]

    prometheus -- "remote write" --> mimir

    mimir --> grafana
    grafana["Grafana :3000"]

    prometheus --> grafana
    loki --> grafana
    tempo --> grafana
    pyroscope --> grafana

    %% Styling
    style alloy_box fill:#f59e0b,stroke:#d97706,color:#000

    style sources fill:#e5e7eb,stroke:#9ca3af,color:#000

    style grafana fill:#10b981,stroke:#059669,color:#fff
    style prometheus fill:#3b82f6,stroke:#2563eb,color:#fff
    style loki fill:#3b82f6,stroke:#2563eb,color:#fff
    style tempo fill:#3b82f6,stroke:#2563eb,color:#fff
    style pyroscope fill:#3b82f6,stroke:#2563eb,color:#fff
    style mimir fill:#3b82f6,stroke:#2563eb,color:#fff

Key Flows

1. Metrics β€” Alloy scrapes metrics from both applications (ServiceMonitors, pod annotations) and Kubernetes components (kube-state-metrics, node-exporter, kubelet, PostgreSQL/Redis exporters) and sends them to Prometheus via remote write.

2. Logs β€” Alloy collects logs through two channels: it receives application logs via OTLP (structured, with trace context) and scrapes pod stdout/stderr (classic container logs). Both streams go to Loki.

3. Metrics from Traces β€” Tempo automatically generates RED metrics (Rate, Errors, Duration) from spans using the built-in span metrics generator and sends them back to Prometheus. This allows creating alerts and dashboards based on traces without manual metric instrumentation.

🧱 OpenTelemetry Collector

The Collector is a service that sits between applications and observability backends.

Application β†’ OTel SDK β†’ OTel Collector β†’ Grafana / Tempo / Prometheus / Loki

βš™οΈ Collector Architecture

alt text

The Collector consists of 3 key parts:

Component Role
Receivers receive data (OTLP, Jaeger, Prometheus, Zipkin)
Processors process data (batch, sampling, transformations)
Exporters send data to backends

🧭 Collector Deployment Modes

Mode Description Example Use Case
Agent mode collector running locally on a node collecting data from a single host
Gateway mode central collector gathering data from multiple sources scaled environments
Hybrid mode combination of both approaches large distributed systems

πŸ”„ Grafana Alloy vs OpenTelemetry Collector

Grafana Alloy (formerly Grafana Agent) and OpenTelemetry Collector serve a similar role β€” they collect, process, and forward telemetry data. However, they differ in philosophy and ecosystem.

Feature OpenTelemetry Collector Grafana Alloy
Project CNCF (vendor-neutral) Grafana Labs (open source)
Configuration YAML (pipelines: receivers β†’ processors β†’ exporters) River (HCL-like, declarative, with typing)
Signals Traces, Metrics, Logs Traces, Metrics, Logs, Profiles (Pyroscope)
Prometheus Scraping Yes (receiver prometheus) Native β€” full compatibility with prometheus.scrape
Grafana Stack Integration Requires exporter configuration Out-of-the-box (Loki, Tempo, Mimir, Pyroscope)
Debug UI None (CLI/logs) Built-in UI with component graph (localhost:12345)
Clustering None (requires external load balancer) Built-in β€” automatic target sharding between instances
Distributions otelcol-core, otelcol-contrib, custom builder (ocb) Single binary β€” all components in one
Config conversion β€” alloy convert β€” automatic migration from OTel Collector, Prometheus, Promtail

When to choose OTel Collector?

  • Multi-vendor environment β€” data goes to different backends (Datadog, Jaeger, Elastic, Grafana)
  • You want to stay with the CNCF standard without vendor lock-in
  • You need custom builds with selected components (ocb)

When to choose Grafana Alloy?

  • Stack based on Grafana (Loki, Tempo, Mimir, Pyroscope)
  • You need profiling (native Pyroscope integration)
  • You want clustering and automatic target sharding without external tools
  • You prefer declarative configuration (River) over YAML pipelines

Example β€” the same pipeline in both tools

OpenTelemetry Collector (YAML):

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"

processors:
  batch:
    timeout: 5s

exporters:
  otlp:
    endpoint: "tempo:4317"
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]

Grafana Alloy (River):

otelcol.receiver.otlp "default" {
  grpc {
    endpoint = "0.0.0.0:4317"
  }
  output {
    traces = [otelcol.processor.batch.default.input]
  }
}

otelcol.processor.batch "default" {
  timeout = "5s"
  output {
    traces = [otelcol.exporter.otlp.tempo.input]
  }
}

otelcol.exporter.otlp "tempo" {
  client {
    endpoint = "tempo:4317"
    tls {
      insecure = true
    }
  }
}

4️⃣ Instrumentation Methods

🎯 Instrumentation Goal

Collecting telemetry data from applications in an automatic or manual way, to have a complete picture of system behavior.

🧰 1. Application Instrumentation (Manual)

  • You add code to the application (@WithSpan, Tracer.startSpan(), etc.)
  • Most of the heavy lifting is done by libraries.
  • βœ… Advantages:
    • full control
    • precise data
  • ❌ Disadvantages:
    • time-consuming
    • requires code maintenance
# Install NuGet packages
dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Instrumentation.AspNetCore
dotnet add package OpenTelemetry.Instrumentation.Http
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol
// Program.cs
var builder = WebApplication.CreateBuilder(args);

builder.Services.AddOpenTelemetry()
    .WithTracing(builder => builder
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddOtlpExporter(options =>
        {
            options.Endpoint = new Uri("http://otel-collector:4318");
        }));

var app = builder.Build();

βš™οΈ 2. Auto-Instrumentation

  • Language agent that automatically tracks calls (e.g., HTTP, DB, Kafka)
  • βœ… Advantages:
    • quick start
    • no code changes
  • ❌ Disadvantages:
    • limited flexibility
    • framework-dependent

Example for Java:

# 1. Download OpenTelemetry Java Agent
wget -O opentelemetry-javaagent.jar \
  https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar

# 2. Run application with agent
java -javaagent:opentelemetry-javaagent.jar \
  -Dotel.service.name=my-java-app \
  -Dotel.exporter.otlp.endpoint=http://otel-collector:4318 \
  -Dotel.exporter.otlp.protocol=http/protobuf \
  -jar my-application.jar

Java auto-instrumentation covers:

  • HTTP clients/servers (OkHttp, Apache HttpClient, Spring WebMVC)
  • Database drivers (JDBC, MongoDB, Redis)
  • Messaging (Kafka, RabbitMQ, JMS)
  • Frameworks (Spring Boot, Quarkus, Micronaut)

Example for .NET:

# 1. Install OpenTelemetry .NET Automatic Instrumentation
# Download and install from GitHub releases
wget -O otel-dotnet-auto-install.sh \
  https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/releases/latest/download/otel-dotnet-auto-install.sh
chmod +x otel-dotnet-auto-install.sh
./otel-dotnet-auto-install.sh

# 2. Set environment variables
export CORECLR_ENABLE_PROFILING=1
export CORECLR_PROFILER={918728DD-259F-4A6A-AC2B-B85E1B658318}
export CORECLR_PROFILER_PATH=/opt/opentelemetry/OpenTelemetry.AutoInstrumentation.Native.so
export DOTNET_ADDITIONAL_DEPS=/opt/opentelemetry/AdditionalDeps
export DOTNET_SHARED_STORE=/opt/opentelemetry/store
export DOTNET_STARTUP_HOOKS=/opt/opentelemetry/net/OpenTelemetry.AutoInstrumentation.StartupHook.dll
export OTEL_DOTNET_AUTO_HOME=/opt/opentelemetry

# 3. Configure OpenTelemetry
export OTEL_SERVICE_NAME=my-dotnet-app
export OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf

# 4. Run the application
dotnet MyApplication.dll

NET auto-instrumentation covers:

  • HTTP clients/servers (HttpClient, ASP.NET Core)
  • Database providers (Entity Framework, SqlClient, MongoDB)
  • Messaging (Azure Service Bus, RabbitMQ, Kafka)
  • gRPC clients/servers

🧱 3. OBI – OpenTelemetry Binary Instrumentation

  • Ready-made binary or sidecar that intercepts telemetry data
  • βœ… Advantages:
    • ideal for legacy systems
    • quick deployment
  • ❌ Disadvantages:
    • less control
    • harder debugging

🧩 Integration with Grafana Stack

  • Prometheus / Mimir β†’ metrics
  • Loki β†’ logs
  • Tempo β†’ traces
  • Grafana β†’ visualization and data correlation

πŸ“¦ Collector in Kubernetes

  • Deployed as:
    • DaemonSet – one agent per node
    • Sidecar – alongside the application
    • Deployment – in gateway mode
  • Configuration in YAML (receivers, processors, exporters)

πŸ“š Additional Resources

results matching ""

    No results matching ""