Deepgram

Speech in, speech out, over one endpoint.

A gated Deepgram proxy your agent calls without ever holding a vendor key: nova-3 transcription, Aura-2 synthesis, and a Voice Agent that listens, thinks, and speaks in a single conversation. Tiered, metered, and managed end to end.

transcription

Hear every word, with timing

nova-3 turns English audio into text you can act on — words, confidence, and timestamps — whether you post a finished file or stream frames for live partials. One route covers both, so there is no driver to install and no key to babysit.

Streaming partials over a socket
Batch transcripts for stored audio

voice agent

A conversation on one socket

The Voice Agent fuses listen, think, and speak into a single connection. flux-general-en v2 handles the listen side, a Settings frame pins the linear16 or wav audio path up front, and Aura-2 answers back — no glue code between three separate services.

flux-general-en v2 listen
Settings frame audio config

pricing

Metered per call

Deepgram is a per-call upstream, so it bills from your shared fluid wallet rather than a monthly bucket. Session caps grow with your tier — see the pricing page. Two numbers to know:

sessions on plus & pro

30 min / stt & agent

failed calls

$0 / never billed

The Deepgram proxy fronts three model families behind a single application key: nova-3 for English transcription, Aura-2 for natural speech synthesis, and the Voice Agent for full conversational turns. You pick the endpoint; we handle the upstream credentials, headers, and routing.

nova-3, Aura-2, Voice Agent, Single key

For more information, ask your agent.

Deepgram

Hear every word, with timing

A conversation on one socket

Metered per call

Three voice surfaces, one key

What does the speech-to-text cover?

How is the text-to-speech tuned?

What is the Voice Agent?

Why are bare BYO keys blocked?