Observability in 2026 has converged around OpenTelemetry (OTel). Every major observability vendor — Datadog, New Relic, Grafana, Honeycomb, Dynatrace, Splunk — now supports OpenTelemetry natively. The CNCF reports that OpenTelemetry is the second most active project in its portfolio, behind only Kubernetes. The old world of vendor-specific agents, proprietary SDKs, and lock-in is ending. The new world is a single standard for collecting telemetry data (traces, metrics, and logs) that works with any backend.
This guide covers practical OpenTelemetry implementation: auto-instrumentation for quick wins, manual instrumentation for custom spans, the Collector pipeline for data processing, and the patterns that separate a useful observability setup from a noisy one.
The Three Pillars: Traces, Metrics, and Logs
OpenTelemetry unifies the three pillars of observability under a single SDK and data model:
Traces show the journey of a request through your distributed system. A trace consists of spans — each span represents a unit of work (an HTTP request, a database query, a message queue operation). Spans have a parent-child relationship that shows how work propagates across services. Traces answer: "Why was this specific request slow?"
Metrics are numerical measurements aggregated over time: request count, latency percentiles, error rate, CPU utilization, queue depth. Metrics answer: "What's the overall health of this service right now?"
Logs are timestamped records of discrete events. In the OTel model, logs are correlated with traces — each log entry can include a trace ID and span ID, making it possible to see all logs associated with a specific request. Logs answer: "What happened during this specific operation?"
Auto-Instrumentation: 5-Minute Setup
The fastest way to get value from OpenTelemetry is auto-instrumentation. OTel provides agents that automatically instrument popular frameworks, HTTP libraries, database drivers, and message queue clients — without any code changes.
# Node.js auto-instrumentation
npm install @opentelemetry/auto-instrumentations-node @opentelemetry/sdk-node
npm install @opentelemetry/exporter-trace-otlp-http @opentelemetry/exporter-metrics-otlp-http
// tracing.ts — Load before your application
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
const sdk = new NodeSDK({
serviceName: 'backend-api',
traceExporter: new OTLPTraceExporter({
url: 'http://otel-collector:4318/v1/traces',
}),
metricReader: new PeriodicExportingMetricReader({
exporter: new OTLPMetricExporter({
url: 'http://otel-collector:4318/v1/metrics',
}),
exportIntervalMillis: 30000,
}),
instrumentations: [
getNodeAutoInstrumentations({
// Customize which instrumentations to enable
'@opentelemetry/instrumentation-http': { enabled: true },
'@opentelemetry/instrumentation-express': { enabled: true },
'@opentelemetry/instrumentation-pg': { enabled: true },
'@opentelemetry/instrumentation-redis': { enabled: true },
'@opentelemetry/instrumentation-grpc': { enabled: true },
}),
],
});
sdk.start();
// Start your application
// node --require ./tracing.ts app.ts
# Python auto-instrumentation (even easier)
pip install opentelemetry-distro opentelemetry-exporter-otlp
# Auto-detect and install instrumentations for installed packages
opentelemetry-bootstrap --action=install
# Run your application with auto-instrumentation
OTEL_SERVICE_NAME=my-python-api OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318 opentelemetry-instrument python app.py
# That's it — HTTP requests, database queries, and framework operations
# are automatically traced without any code changes.
Manual Instrumentation: Adding Custom Context
Auto-instrumentation gives you infrastructure-level visibility (HTTP requests, database queries), but it doesn't know about your business logic. Manual instrumentation adds custom spans and attributes that make traces meaningful to your team:
import { trace, SpanStatusCode, metrics } from '@opentelemetry/api';
const tracer = trace.getTracer('order-service');
const meter = metrics.getMeter('order-service');
// Custom metrics
const orderCounter = meter.createCounter('orders.created', {
description: 'Number of orders created',
});
const orderValueHistogram = meter.createHistogram('orders.value', {
description: 'Order value in USD',
unit: 'USD',
});
async function processOrder(userId: string, items: OrderItem[]) {
// Create a custom span for business logic
return tracer.startActiveSpan('processOrder', async (span) => {
try {
// Add business context as span attributes
span.setAttribute('order.user_id', userId);
span.setAttribute('order.item_count', items.length);
span.setAttribute('order.total_value', calculateTotal(items));
// Nested span for payment processing
const paymentResult = await tracer.startActiveSpan(
'processPayment',
async (paymentSpan) => {
paymentSpan.setAttribute('payment.method', 'stripe');
paymentSpan.setAttribute('payment.amount', calculateTotal(items));
const result = await stripe.charges.create({
amount: calculateTotal(items) * 100,
currency: 'usd',
source: userId,
});
paymentSpan.setAttribute('payment.charge_id', result.id);
paymentSpan.setStatus({ code: SpanStatusCode.OK });
paymentSpan.end();
return result;
}
);
// Record metrics
orderCounter.add(1, {
'order.region': getUserRegion(userId),
'order.payment_method': 'stripe',
});
orderValueHistogram.record(calculateTotal(items), {
'order.region': getUserRegion(userId),
});
span.setStatus({ code: SpanStatusCode.OK });
return { orderId: generateId(), payment: paymentResult };
} catch (error) {
span.setStatus({
code: SpanStatusCode.ERROR,
message: error.message,
});
span.recordException(error);
throw error;
} finally {
span.end();
}
});
}
The OpenTelemetry Collector: Your Data Pipeline
The OTel Collector is a vendor-agnostic proxy that receives, processes, and exports telemetry data. Instead of sending data directly from your applications to a backend, you send to the Collector, which handles batching, filtering, sampling, and routing to one or more backends.
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
# Batch spans for efficient export
batch:
timeout: 5s
send_batch_size: 512
# Add resource attributes to all telemetry
resource:
attributes:
- key: environment
value: production
action: insert
- key: team
value: backend
action: insert
# Tail-based sampling: keep 100% of error traces,
# 10% of successful traces
tail_sampling:
decision_wait: 10s
policies:
- name: errors-always
type: status_code
status_code: { status_codes: [ERROR] }
- name: slow-traces
type: latency
latency: { threshold_ms: 1000 }
- name: probabilistic-sample
type: probabilistic
probabilistic: { sampling_percentage: 10 }
# Filter out noisy health check spans
filter:
spans:
exclude:
match_type: strict
attributes:
- key: http.target
value: /health
exporters:
# Export to Grafana Tempo (traces)
otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true
# Export to Prometheus (metrics)
prometheus:
endpoint: 0.0.0.0:8889
# Export to Loki (logs)
loki:
endpoint: http://loki:3100/loki/api/v1/push
service:
pipelines:
traces:
receivers: [otlp]
processors: [filter, tail_sampling, resource, batch]
exporters: [otlp/tempo]
metrics:
receivers: [otlp]
processors: [resource, batch]
exporters: [prometheus]
logs:
receivers: [otlp]
processors: [resource, batch]
exporters: [loki]
Best Practices for Production OpenTelemetry
Start with auto-instrumentation, then add manual spans. Get basic visibility in 30 minutes with auto-instrumentation, then add custom spans for the business logic that matters most. You'll know what to instrument based on the questions you can't answer with auto-instrumentation alone.
Use semantic conventions. OTel defines standard attribute names for common concepts: http.method, db.system, messaging.system, etc. Using these conventions makes your data consistent across services and enables built-in dashboards and alerts in observability backends.
Sample intelligently. At scale, 100% trace collection is expensive. Use tail-based sampling in the Collector: keep 100% of error traces and slow traces, sample 5-10% of everything else. This ensures you never miss an interesting trace while controlling costs.
Correlate traces with logs. Configure your logging library to include trace ID and span ID in every log entry. This lets you jump from a slow trace to the exact logs that explain what happened.
ZeonEdge implements OpenTelemetry-based observability stacks using the Grafana ecosystem (Tempo, Prometheus, Loki, Grafana). Learn about our observability services.
Alex Thompson
CEO & Cloud Architecture Expert at ZeonEdge with 15+ years building enterprise infrastructure.