Latency at Orbit
We publish our voice-agent latency openly, broken down by stage, because customer experience starts here. Every number on this page maps 1:1 to the Prometheus histogram emitted by our voice gateway.
Target p50
1100 ms
Steady-state turn, post-streaming chunker
Current p50
1255 ms
Synthetic test env · 2026-05-13
Target p95
1500 ms
Current p95
1650 ms
Synthetic test env · 2026-05-13
Per-stage budget (E1–E5)
Every turn passes through five measurable stages. The total is the sum minus parallel overlap (LLM TTFT and TTS first byte run concurrently once the streaming chunker is engaged).
| Stage | What it measures | Target | Current |
|---|---|---|---|
| Endpointing (E1) | Time from the user finishing their utterance to Deepgram's `UtteranceEnd` event. Driven primarily by the `DEVOTEL_VOICE_UTTERANCE_END_MS` setting (default 600 ms). | 600 ms | 600 ms |
| STT final (E2) | Time from Deepgram's `UtteranceEnd` to the final `Transcript.is_final = true` event with the full transcript. Includes Deepgram's nova-2 / nova-3 model post-processing. | 50 ms | 45 ms |
| LLM TTFT (E3) | Time-to-first-token from Anthropic Claude Sonnet via the agent-runtime `/chat/stream` endpoint. With prompt-cache hit (warm session) we typically see 300-400 ms; cold turn-1 hits 600-1500 ms but is masked by the W10-G pre-warm IIFE. | 400 ms | 380 ms |
| First chunk → TTS (E4) | Time from receiving the first streamed LLM chunk to handing it to the TTS pipeline. Pure orchestration latency; bounded by the sentence-chunker's boundary detection. | 100 ms | 90 ms |
| TTS first byte (E5) | Time from TTS handoff to the first audio byte arriving from Cartesia / ElevenLabs failover. Sub-150ms with Cartesia Sonic; ElevenLabs Flash v2.5 fallback adds ~50 ms. | 150 ms | 140 ms |
Sum of targets: 1300 ms (parallelism collapses ~200 ms — net p50 target 1100 ms).
Methodology
Every voice turn emits a Prometheus histogramvoice_agent_turn_stage_duration_mswith labelsstage / tenant_org_id / model. The pre-launch numbers on this page come from a 200-turn synthetic test against the same wire path production uses — Deepgram nova-3 STT, Anthropic Claude Sonnet 4.6 via the agent-runtime /chat/stream endpoint, Cartesia Sonic TTS with the ElevenLabs Flash v2.5 failover armed, LiveKit Cloud media routing.
The histogram is fail-open: if the metrics module is absent (boot-check harness, local dev) the timing accumulator silently becomes a no-op, so a misconfigured probe never takes the voice path down. Verification path documented atapps/voice-gateway/src/lib/voice-metrics.ts.
When production telemetry begins emitting, this page refreshes automatically from a scheduled probe against Google Cloud Monitoring. Until then the numbers carry the "synthetic test env" footnote above.
How we compare
Public competitor numbers, attributed inline. Where a vendor's marketed claim is independently measured to be different, we show the independent number.
Independent · Synthetic test environment, 2026-05-13 · Pre-launch; live histogram refresh after first 1000 prod calls.
Independent · tested.media March 2026 · Marketed ~600 ms; independent p50 680 ms / p95 920 ms. Best in AI-voice-native cohort.
Independent · tested.media March 2026 · Marketed <500 ms; independent p50 720 ms / p95 1050 ms. Flexibility tax shows in tail latency.
Independent · tested.media March 2026 · Marketed 400 ms historically (no longer published); independent p50 850 ms / p95 1180 ms.
Marketed · livekit.io public claim · Claims sub-200 ms with their own global media-server mesh; no independent number published.
Marketed · elevenlabs.io Turbo tier · Turbo v2 ~400 ms marketed (gpt-4o-mini + Flash v2).
Marketed · Developer reports, 2026-Q1 · Native speech-to-speech; <500 ms reported in early-access community. No published p95.
What moves the number
Real-world latency is sensitive to factors we don't control from a single benchmark page:
- Region. Today we serve voice from
europe-west1. A US-East caller eats ~100 ms RTT on every hop. Multi-region (us-east-1 + ap-south-1) is on the 90-day roadmap. - Model selection. Claude Sonnet 4.6 is the production default. Faster classifier-tier models (Haiku, gpt-5-mini) shave 100–200 ms off the LLM TTFT stage for simple intents — tunable per-agent in the dashboard.
- Prompt length. Prompt caching (Anthropic
extended-cache-ttl-2025-04-11beta) keeps the system prompt + tool registry pinned for the session. Cold-turn pre-warm viaDEVOTEL_VOICE_PREWARM_ENABLED=trueheats the cache before turn 1. - Tool calls. Each LLM-initiated tool call adds round-trip latency. Async tool support (W11 backlog) lets long-running tools resolve in the background without blocking the next turn.
- Speculative-on-partial. We pre-fire the LLM on a stable interim transcript so the model is already streaming when STT emits final. Disable with
DEVOTEL_VOICE_SPECULATIVE=falseif you want a deterministic cold-floor.
Test our voice agent yourself
Spin up a sandbox tenant, attach a knowledge base, place a test call — the dashboard renders the per-turn stage histogram inline so you can verify these numbers on your own traffic.