ZeonEdge - Enterprise DevSecOps & Cyber Security Solutions

Observability in 2026 has converged around OpenTelemetry (OTel). Every major observability vendor — Datadog, New Relic, Grafana, Honeycomb, Dynatrace, Splunk — now supports OpenTelemetry natively. The CNCF reports that OpenTelemetry is the second most active project in its portfolio, behind only Kubernetes. The old world of vendor-specific agents, proprietary SDKs, and lock-in is ending. The new world is a single standard for collecting telemetry data (traces, metrics, and logs) that works with any backend.

This guide covers practical OpenTelemetry implementation: auto-instrumentation for quick wins, manual instrumentation for custom spans, the Collector pipeline for data processing, and the patterns that separate a useful observability setup from a noisy one.

The Three Pillars: Traces, Metrics, and Logs

OpenTelemetry unifies the three pillars of observability under a single SDK and data model:

Traces show the journey of a request through your distributed system. A trace consists of spans — each span represents a unit of work (an HTTP request, a database query, a message queue operation). Spans have a parent-child relationship that shows how work propagates across services. Traces answer: "Why was this specific request slow?"

Metrics are numerical measurements aggregated over time: request count, latency percentiles, error rate, CPU utilization, queue depth. Metrics answer: "What's the overall health of this service right now?"

Logs are timestamped records of discrete events. In the OTel model, logs are correlated with traces — each log entry can include a trace ID and span ID, making it possible to see all logs associated with a specific request. Logs answer: "What happened during this specific operation?"

Auto-Instrumentation: 5-Minute Setup

The fastest way to get value from OpenTelemetry is auto-instrumentation. OTel provides agents that automatically instrument popular frameworks, HTTP libraries, database drivers, and message queue clients — without any code changes.

# Node.js auto-instrumentation
npm install @opentelemetry/auto-instrumentations-node @opentelemetry/sdk-node
npm install @opentelemetry/exporter-trace-otlp-http @opentelemetry/exporter-metrics-otlp-http

// tracing.ts — Load before your application
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';

const sdk = new NodeSDK({
  serviceName: 'backend-api',
  traceExporter: new OTLPTraceExporter({
    url: 'http://otel-collector:4318/v1/traces',
  }),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({
      url: 'http://otel-collector:4318/v1/metrics',
    }),
    exportIntervalMillis: 30000,
  }),
  instrumentations: [
    getNodeAutoInstrumentations({
      // Customize which instrumentations to enable
      '@opentelemetry/instrumentation-http': { enabled: true },
      '@opentelemetry/instrumentation-express': { enabled: true },
      '@opentelemetry/instrumentation-pg': { enabled: true },
      '@opentelemetry/instrumentation-redis': { enabled: true },
      '@opentelemetry/instrumentation-grpc': { enabled: true },
    }),
  ],
});

sdk.start();

// Start your application
// node --require ./tracing.ts app.ts

# Python auto-instrumentation (even easier)
pip install opentelemetry-distro opentelemetry-exporter-otlp

# Auto-detect and install instrumentations for installed packages
opentelemetry-bootstrap --action=install

# Run your application with auto-instrumentation
OTEL_SERVICE_NAME=my-python-api OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318 opentelemetry-instrument python app.py

# That's it — HTTP requests, database queries, and framework operations
# are automatically traced without any code changes.

Manual Instrumentation: Adding Custom Context

Auto-instrumentation gives you infrastructure-level visibility (HTTP requests, database queries), but it doesn't know about your business logic. Manual instrumentation adds custom spans and attributes that make traces meaningful to your team:

import { trace, SpanStatusCode, metrics } from '@opentelemetry/api';

const tracer = trace.getTracer('order-service');
const meter = metrics.getMeter('order-service');

// Custom metrics
const orderCounter = meter.createCounter('orders.created', {
  description: 'Number of orders created',
});
const orderValueHistogram = meter.createHistogram('orders.value', {
  description: 'Order value in USD',
  unit: 'USD',
});

async function processOrder(userId: string, items: OrderItem[]) {
  // Create a custom span for business logic
  return tracer.startActiveSpan('processOrder', async (span) => {
    try {
      // Add business context as span attributes
      span.setAttribute('order.user_id', userId);
      span.setAttribute('order.item_count', items.length);
      span.setAttribute('order.total_value', calculateTotal(items));

      // Nested span for payment processing
      const paymentResult = await tracer.startActiveSpan(
        'processPayment',
        async (paymentSpan) => {
          paymentSpan.setAttribute('payment.method', 'stripe');
          paymentSpan.setAttribute('payment.amount', calculateTotal(items));

          const result = await stripe.charges.create({
            amount: calculateTotal(items) * 100,
            currency: 'usd',
            source: userId,
          });

          paymentSpan.setAttribute('payment.charge_id', result.id);
          paymentSpan.setStatus({ code: SpanStatusCode.OK });
          paymentSpan.end();
          return result;
        }
      );

      // Record metrics
      orderCounter.add(1, {
        'order.region': getUserRegion(userId),
        'order.payment_method': 'stripe',
      });
      orderValueHistogram.record(calculateTotal(items), {
        'order.region': getUserRegion(userId),
      });

      span.setStatus({ code: SpanStatusCode.OK });
      return { orderId: generateId(), payment: paymentResult };

    } catch (error) {
      span.setStatus({
        code: SpanStatusCode.ERROR,
        message: error.message,
      });
      span.recordException(error);
      throw error;
    } finally {
      span.end();
    }
  });
}

The OpenTelemetry Collector: Your Data Pipeline

The OTel Collector is a vendor-agnostic proxy that receives, processes, and exports telemetry data. Instead of sending data directly from your applications to a backend, you send to the Collector, which handles batching, filtering, sampling, and routing to one or more backends.

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  # Batch spans for efficient export
  batch:
    timeout: 5s
    send_batch_size: 512

  # Add resource attributes to all telemetry
  resource:
    attributes:
      - key: environment
        value: production
        action: insert
      - key: team
        value: backend
        action: insert

  # Tail-based sampling: keep 100% of error traces,
  # 10% of successful traces
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: errors-always
        type: status_code
        status_code: { status_codes: [ERROR] }
      - name: slow-traces
        type: latency
        latency: { threshold_ms: 1000 }
      - name: probabilistic-sample
        type: probabilistic
        probabilistic: { sampling_percentage: 10 }

  # Filter out noisy health check spans
  filter:
    spans:
      exclude:
        match_type: strict
        attributes:
          - key: http.target
            value: /health

exporters:
  # Export to Grafana Tempo (traces)
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true

  # Export to Prometheus (metrics)
  prometheus:
    endpoint: 0.0.0.0:8889

  # Export to Loki (logs)
  loki:
    endpoint: http://loki:3100/loki/api/v1/push

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [filter, tail_sampling, resource, batch]
      exporters: [otlp/tempo]
    metrics:
      receivers: [otlp]
      processors: [resource, batch]
      exporters: [prometheus]
    logs:
      receivers: [otlp]
      processors: [resource, batch]
      exporters: [loki]

Best Practices for Production OpenTelemetry

Start with auto-instrumentation, then add manual spans. Get basic visibility in 30 minutes with auto-instrumentation, then add custom spans for the business logic that matters most. You'll know what to instrument based on the questions you can't answer with auto-instrumentation alone.

Use semantic conventions. OTel defines standard attribute names for common concepts: http.method, db.system, messaging.system, etc. Using these conventions makes your data consistent across services and enables built-in dashboards and alerts in observability backends.

Sample intelligently. At scale, 100% trace collection is expensive. Use tail-based sampling in the Collector: keep 100% of error traces and slow traces, sample 5-10% of everything else. This ensures you never miss an interesting trace while controlling costs.

Correlate traces with logs. Configure your logging library to include trace ID and span ID in every log entry. This lets you jump from a slow trace to the exact logs that explain what happened.

ZeonEdge implements OpenTelemetry-based observability stacks using the Grafana ecosystem (Tempo, Prometheus, Loki, Grafana). Learn about our observability services.

OpenTelemetry in 2026: The Complete Guide to Unified Observability

The Three Pillars: Traces, Metrics, and Logs

Auto-Instrumentation: 5-Minute Setup

Manual Instrumentation: Adding Custom Context

The OpenTelemetry Collector: Your Data Pipeline

Best Practices for Production OpenTelemetry

Tags

Related Articles

DNS Deep Dive in 2026: How DNS Works, How to Secure It, and How to Optimize It

Linux Server Hardening for Production in 2026: The Complete Security Checklist

AI Observability in 2026: Monitoring LLM Applications in Production

Ready to Transform Your Infrastructure?