Observability
Snakeway provides built‑in observability primitives intended for production deployments of a high‑performance edge proxy. The observability subsystem is designed to expose operational signals while keeping overhead predictable and minimal in the request hot path.
The system focuses on three goals:
- structured logs suitable for machine ingestion
- distributed tracing using OpenTelemetry
- a separation between control plane instrumentation and data plane execution
The implementation favors explicit initialization and deterministic startup ordering to avoid hidden runtime behavior.
Summary
The observability subsystem in Snakeway is intentionally minimal in the request hot path while still exposing the signals required to operate the proxy in production environments. Control plane components manage exporter lifecycles and asynchronous telemetry tasks, while data plane code emits structured events and tracing spans with minimal overhead.
Design Principles
Structured Logging
Snakeway uses the tracing ecosystem for structured logging. Logs are
emitted as JSON events with flattened fields to make ingestion by log
aggregation systems straightforward.
Logs are intended to represent operational state transitions such as:
- startup and shutdown events
- configuration reload activity
- certificate automation lifecycle
- device pipeline execution outcomes
- upstream proxy interactions
The logging system supports two output modes:
- standard output for container environments
- rolling file logs when
SNAKEWAY_LOG_DIRis configured
File logging uses a non‑blocking writer to ensure disk IO does not affect the request path.
Distributed Tracing
Distributed tracing is implemented through OpenTelemetry and the
tracing-opentelemetry integration layer.
The OpenTelemetry pipeline is configured through the Snakeway runtime configuration and includes:
- OTLP exporter endpoint
- service metadata such as
service.nameandservice.version - sampling strategy
Tracing spans originate from the tracing instrumentation already used
for structured logging. The subscriber bridges these spans into the
OpenTelemetry exporter.
The exporter operates using a batch processor that runs on the control plane runtime. This avoids blocking request processing while traces are exported.
Runtime Metadata
Snakeway attaches resource metadata to every exported trace. The metadata follows OpenTelemetry semantic conventions and includes service attributes such as:
- service name
- service version
- instance identifier derived from the system hostname
This information allows telemetry backends to distinguish between instances in multi‑node deployments.
Sampling
Snakeway uses a parent-based sampling model. When an incoming request
carries a sampled W3C Trace Context, the proxy always honors that
decision and samples the request. When no parent context is present,
the sampling_ratio setting determines what fraction of root traces
are sampled using a deterministic trace-ID-ratio algorithm.
The default sampling_ratio of 1.0 samples all root traces. Setting
it to a lower value (e.g., 0.1 for 10%) reduces trace volume in
high-traffic deployments while preserving complete traces for every
sampled request.
Request Instrumentation
A root request span is created for every proxied request inside the
Pingora request_filter hook. The span carries the following fields:
http.methodhttp.hosthttp.pathclient.iprequest.idlistenerroute
When the incoming request includes W3C Trace Context headers
(traceparent / tracestate), the request span is automatically
parented to the upstream trace. The same trace context is injected into
the request sent to the upstream service, so Snakeway appears as an
intermediate span in a distributed trace.
Child spans are created for each major phase of request processing:
routing-- on-request device pipeline execution and route matchingupstream_selection-- traffic decision, upstream peer creation, and circuit breaker admissionupstream_request-- before-proxy device pipeline, header mutation, and trace context injectionupstream_response-- after-proxy device pipeline and response status mutationresponse-- on-response device pipeline and upstream outcome determination
Log Export
When OpenTelemetry is enabled, log events emitted through the tracing
framework are also exported to the configured OTLP endpoint. The
opentelemetry-appender-tracing bridge converts tracing events into
OpenTelemetry log records and sends them via a batch processor on the
control plane runtime.
An internal filter suppresses noisy crates (pingora, tonic, h2,
reqwest) so that only application-level events are exported.
Metrics
When OpenTelemetry is enabled, Snakeway exports the following metric instruments via the OTLP exporter:
| Metric | Type | Attributes |
|---|---|---|
snakeway.http.requests | Counter | method, status, service, route |
snakeway.http.request.duration | Histogram (ms) | service, upstream |
snakeway.http.errors | Counter | service, upstream, error.type |
snakeway.upstream.active_requests | Gauge | service, upstream |
snakeway.upstream.health | Gauge (0/1) | service, upstream |
snakeway.circuit_breaker.state | Gauge (0/1/2) | service, upstream |
All per-request metrics are recorded in the Pingora logging hook,
which runs last and has access to the complete request context including
service, upstream, outcome, and timing.
Circuit breaker state values:
0 = closed (healthy) 1 = open (tripped) 2 = half-open (recovery testing).
Shutdown Behavior
The OpenTelemetry tracer, logger, and meter providers are stored globally so that exporters can flush pending data during shutdown. This ensures that traces and logs generated during the final moments of process execution are not lost.
Graceful shutdown hooks trigger provider shutdown before the process exits.