Why OpenTelemetry Matters
In a microservices architecture, a single user request might touch 10-20 services. When that request fails or is slow, identifying which service is responsible — and exactly why — is the central challenge of operations. Logs tell you what happened in one service. Distributed tracing tells you the full story across all services, with timing.
OpenTelemetry (OTel) is the CNCF standard for collecting observability data: traces, metrics, and logs. It replaces a fragmented ecosystem of vendor-specific agents (Jaeger client, StatsD, Fluentd) with a single, vendor-neutral SDK and wire protocol (OTLP). You instrument once, send anywhere.
OpenTelemetry Architecture
Your Services (Python/Node/Go)
|-- OTel SDK (auto-instrumentation)
| |-- Traces (spans)
| |-- Metrics (counters, gauges, histograms)
| |-- Logs (structured, with trace_id correlation)
|
+--> OTel Collector (sidecar or daemonset)
|-- Receivers: OTLP gRPC/HTTP, Jaeger, Zipkin, Prometheus
|-- Processors: batch, sampling, attribute transformation
|-- Exporters:
|-- Traces --> Jaeger / Tempo / Honeycomb
|-- Metrics --> Prometheus / Datadog
|-- Logs --> Loki / Elasticsearch
Auto-Instrumentation for Python (FastAPI)
# Install OTel Python packages
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install # Auto-installs framework instrumentation
# This installs:
# opentelemetry-instrumentation-fastapi
# opentelemetry-instrumentation-sqlalchemy
# opentelemetry-instrumentation-redis
# opentelemetry-instrumentation-httpx
# opentelemetry-instrumentation-celery
# ... (all detected frameworks)
# app/telemetry.py
from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.semconv.resource import ResourceAttributes
def setup_telemetry(service_name: str, service_version: str = "1.0.0"):
resource = Resource.create({
ResourceAttributes.SERVICE_NAME: service_name,
ResourceAttributes.SERVICE_VERSION: service_version,
ResourceAttributes.DEPLOYMENT_ENVIRONMENT: "production",
})
# Traces
otlp_trace_exporter = OTLPSpanExporter(
endpoint="http://otel-collector:4317", # gRPC
insecure=True
)
tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(
BatchSpanProcessor(
otlp_trace_exporter,
max_queue_size=2048,
max_export_batch_size=512,
export_timeout_millis=30000,
)
)
trace.set_tracer_provider(tracer_provider)
# Metrics
otlp_metric_exporter = OTLPMetricExporter(
endpoint="http://otel-collector:4317",
insecure=True
)
metric_reader = PeriodicExportingMetricReader(
otlp_metric_exporter,
export_interval_millis=10000 # 10 second intervals
)
meter_provider = MeterProvider(resource=resource, metric_readers=[metric_reader])
metrics.set_meter_provider(meter_provider)
# main.py
from fastapi import FastAPI
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
from opentelemetry.instrumentation.redis import RedisInstrumentor
from app.telemetry import setup_telemetry
setup_telemetry(service_name="order-service")
app = FastAPI()
# Auto-instrument FastAPI (captures all routes, request/response, status codes)
FastAPIInstrumentor.instrument_app(app)
# Auto-instrument SQLAlchemy (captures all DB queries with parameters)
SQLAlchemyInstrumentor().instrument(enable_commenter=True)
# Auto-instrument Redis
RedisInstrumentor().instrument()
# Custom spans for business logic
from opentelemetry import trace
from opentelemetry import metrics
tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)
# Custom metrics
order_counter = meter.create_counter(
"orders.created",
description="Total orders created",
unit="1"
)
order_value = meter.create_histogram(
"orders.value",
description="Order value in USD",
unit="USD"
)
payment_duration = meter.create_histogram(
"payment.processing.duration",
description="Payment gateway latency",
unit="ms"
)
async def process_order(order_data: dict):
with tracer.start_as_current_span("process_order") as span:
span.set_attribute("order.id", order_data["id"])
span.set_attribute("order.total", order_data["total"])
span.set_attribute("customer.id", order_data["customer_id"])
# Nested span for payment
with tracer.start_as_current_span("payment.charge") as payment_span:
payment_span.set_attribute("payment.method", order_data["payment_method"])
payment_span.set_attribute("payment.amount", order_data["total"])
import time
start = time.time()
result = await charge_payment(order_data)
duration_ms = (time.time() - start) * 1000
payment_span.set_attribute("payment.status", result["status"])
payment_duration.record(duration_ms, {"method": order_data["payment_method"]})
# Record business metrics
order_counter.add(1, {
"payment_method": order_data["payment_method"],
"region": order_data["region"]
})
order_value.record(order_data["total"], {
"currency": order_data["currency"]
})
Auto-Instrumentation for Node.js
npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node @opentelemetry/exporter-trace-otlp-grpc @opentelemetry/exporter-metrics-otlp-grpc
// instrumentation.js — Load BEFORE your app
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-grpc');
const { PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics');
const { Resource } = require('@opentelemetry/resources');
const { SEMRESATTRS_SERVICE_NAME } = require('@opentelemetry/semantic-conventions');
const sdk = new NodeSDK({
resource: new Resource({
[SEMRESATTRS_SERVICE_NAME]: 'api-gateway',
'service.version': process.env.SERVICE_VERSION || '1.0.0',
'deployment.environment': process.env.NODE_ENV || 'production',
}),
traceExporter: new OTLPTraceExporter({
url: 'grpc://otel-collector:4317',
}),
metricReader: new PeriodicExportingMetricReader({
exporter: new OTLPMetricExporter({
url: 'grpc://otel-collector:4317',
}),
exportIntervalMillis: 10000,
}),
instrumentations: [
getNodeAutoInstrumentations({
'@opentelemetry/instrumentation-http': {
// Filter out health check noise
ignoreIncomingRequestHook: (req) => req.url === '/health',
},
'@opentelemetry/instrumentation-fs': {
enabled: false, // Disable noisy fs instrumentation
},
}),
],
});
sdk.start();
process.on('SIGTERM', () => sdk.shutdown());
// package.json
// "scripts": {
// "start": "node --require ./instrumentation.js server.js"
// }
OTel Collector Configuration
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
# Scrape Prometheus metrics from services that don't use OTLP
prometheus:
config:
scrape_configs:
- job_name: 'legacy-service'
static_configs:
- targets: ['legacy-service:9090']
processors:
# Batch for efficiency
batch:
send_batch_size: 1000
timeout: 10s
send_batch_max_size: 2000
# Memory limit to prevent OOM
memory_limiter:
check_interval: 1s
limit_percentage: 75
spike_limit_percentage: 30
# Tail-based sampling: only keep interesting traces
tail_sampling:
decision_wait: 10s
num_traces: 50000
policies:
- name: errors-policy
type: status_code
status_code:
status_codes: [ERROR] # Always keep error traces
- name: slow-traces
type: latency
latency:
threshold_ms: 1000 # Keep traces > 1 second
- name: probabilistic-sample
type: probabilistic
probabilistic:
sampling_percentage: 5 # 5% of healthy fast traces
- name: always-sample-payment
type: string_attribute
string_attribute:
key: "service.name"
values: ["payment-service"]
# Always sample payment service (critical path)
# Enrich spans with Kubernetes metadata
k8sattributes:
auth_type: serviceAccount
passthrough: false
filter:
node_from_env_var: KUBE_NODE_NAME
extract:
metadata:
- k8s.pod.name
- k8s.pod.uid
- k8s.deployment.name
- k8s.namespace.name
- k8s.node.name
# Transform attributes
transform:
error_mode: ignore
trace_statements:
- context: span
statements:
# Redact PII from HTTP URLs
- replace_pattern(attributes["http.url"], "token=[^&]+", "token=REDACTED")
- replace_pattern(attributes["http.url"], "password=[^&]+", "password=REDACTED")
exporters:
# Traces to Jaeger
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true
# Metrics to Prometheus (pull-based)
prometheus:
endpoint: 0.0.0.0:8889
namespace: otel
# Logs to Loki
loki:
endpoint: http://loki:3100/loki/api/v1/push
labels:
attributes:
service.name: service_name
severity: severity
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, k8sattributes, tail_sampling, batch]
exporters: [otlp/jaeger]
metrics:
receivers: [otlp, prometheus]
processors: [memory_limiter, batch]
exporters: [prometheus]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [loki]
Kubernetes Deployment
# otel-collector-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: otel-collector
namespace: monitoring
spec:
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
serviceAccountName: otel-collector
containers:
- name: collector
image: otel/opentelemetry-collector-contrib:0.107.0
args:
- "--config=/conf/otel-collector-config.yaml"
ports:
- containerPort: 4317 # OTLP gRPC
- containerPort: 4318 # OTLP HTTP
- containerPort: 8889 # Prometheus metrics
env:
- name: KUBE_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
resources:
requests:
cpu: 200m
memory: 400Mi
limits:
cpu: 1000m
memory: 1Gi
volumeMounts:
- name: config
mountPath: /conf
volumes:
- name: config
configMap:
name: otel-collector-config
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: otel-collector
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: otel-collector
rules:
- apiGroups: [""]
resources: ["pods", "nodes", "namespaces"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["deployments", "replicasets"]
verbs: ["get", "list", "watch"]
Correlating Traces with Logs
# Python: inject trace context into log records
import logging
from opentelemetry import trace
class TraceContextFilter(logging.Filter):
"""Injects trace_id and span_id into every log record."""
def filter(self, record):
span = trace.get_current_span()
if span.is_recording():
ctx = span.get_span_context()
record.trace_id = format(ctx.trace_id, '032x')
record.span_id = format(ctx.span_id, '016x')
else:
record.trace_id = "0000000000000000"
record.span_id = "0000000000000000"
return True
# Configure structured logging with trace context
import structlog
structlog.configure(
processors=[
structlog.stdlib.add_log_level,
structlog.stdlib.add_logger_name,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.JSONRenderer()
]
)
logger = structlog.get_logger()
# When you log, trace_id is automatically included:
# {"timestamp": "2026-03-28T10:00:00Z", "level": "info",
# "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
# "span_id": "00f067aa0ba902b7",
# "event": "order created", "order_id": "ORD-123"}
# Grafana: Link traces to logs using trace_id
# In Grafana datasource configuration for Loki:
# grafana/provisioning/datasources/loki.yaml
apiVersion: 1
datasources:
- name: Loki
type: loki
url: http://loki:3100
jsonData:
derivedFields:
- datasourceUid: jaeger
matcherRegex: '"trace_id":"(w+)"'
name: TraceID
url: '${__value.raw}'
# Click trace_id in logs to jump to Jaeger trace!
Grafana Dashboard for OTel Data
{
"panels": [
{
"title": "Request Rate by Service",
"type": "timeseries",
"targets": [{
"expr": "sum(rate(http_server_request_duration_seconds_count[5m])) by (service_name)",
"legendFormat": "{{service_name}}"
}]
},
{
"title": "P99 Latency by Service",
"type": "timeseries",
"targets": [{
"expr": "histogram_quantile(0.99, sum(rate(http_server_request_duration_seconds_bucket[5m])) by (service_name, le))",
"legendFormat": "{{service_name}} p99"
}]
},
{
"title": "Error Rate by Service",
"type": "stat",
"targets": [{
"expr": "sum(rate(http_server_request_duration_seconds_count{http_response_status_code=~'5..'}[5m])) by (service_name) / sum(rate(http_server_request_duration_seconds_count[5m])) by (service_name)",
"legendFormat": "{{service_name}}"
}]
}
]
}
Production Sampling Strategy
Sampling Strategy Decision Tree:
Traffic: 10,000 req/s
- At 100% sampling: 864M spans/day = expensive storage
Recommended approach:
1. Head-based sampling (at SDK level):
- Development: 100%
- Staging: 50%
- Production: 10% default
2. Tail-based sampling (at Collector level):
- Always keep: all errors (HTTP 5xx, exceptions)
- Always keep: all slow traces (> 1 second)
- Always keep: all payment/checkout traces
- Keep 5%: normal healthy fast traces
- Result: ~15-20% overall — captures all interesting data
3. Storage estimate with tail sampling:
- 10,000 req/s Ă— 20% = 2,000 traces/s
- Average 10 spans/trace = 20,000 spans/s
- 1KB per span = 20MB/s = 1.7TB/day — still high
4. Add span cardinality control:
- Limit custom attributes to bounded values
- Don't trace health checks
- Don't store raw request/response bodies
- Result: ~200-500 bytes/span = 170-430GB/day âś“
5. Retention policy:
- Keep last 7 days in Tempo/Jaeger
- Archive to S3 for 30 days (cheap cold storage)
- Delete after 30 days
Alerting on Trace Data
# Prometheus rules for OTel metrics
groups:
- name: otel.rules
rules:
- alert: ServiceErrorRateHigh
expr: |
(
sum(rate(http_server_request_duration_seconds_count{http_response_status_code=~"5.."}[5m])) by (service_name)
/
sum(rate(http_server_request_duration_seconds_count[5m])) by (service_name)
) > 0.05
for: 2m
labels:
severity: critical
annotations:
summary: "Service {{ $labels.service_name }} error rate > 5%"
runbook: "https://wiki/runbooks/service-error-rate"
- alert: ServiceLatencyHigh
expr: |
histogram_quantile(0.99,
sum(rate(http_server_request_duration_seconds_bucket[5m])) by (service_name, le)
) > 2.0
for: 5m
labels:
severity: warning
annotations:
summary: "{{ $labels.service_name }} P99 latency > 2s"
- alert: OTelCollectorDropping
expr: |
rate(otelcol_processor_dropped_spans_total[5m]) > 0
for: 1m
labels:
severity: warning
annotations:
summary: "OTel Collector dropping spans — check memory limits"
Conclusion
OpenTelemetry transforms microservices debugging from guesswork to forensics. With traces, you can reconstruct the exact path of any request across every service, database call, and cache hit. With correlated logs, you jump from a trace span directly to the relevant log lines. With metrics derived from trace data, you never need to maintain separate instrumentation for latency histograms and error rates.
The investment is the initial instrumentation work — typically 1-2 days per service for auto-instrumentation, plus custom spans for critical business logic. The return is permanent: every incident investigation that previously took hours of log searching becomes minutes of trace navigation. For a team running 10+ microservices, OpenTelemetry pays for itself on the first major production incident.
Daniel Park
AI/ML Engineer focused on practical applications of machine learning in DevOps and cloud operations.