How it works

A 4-stage pipeline.
Sub-second latency.

MOD Translate is a single live pipeline that captures audio, transcribes it, translates it, and delivers it back to listeners — all in less time than it takes you to draw breath between sentences.

Capture

Audio enters from your broadcaster device

A phone, a laptop, or our hardware appliance feeds audio chunks every 250ms over a persistent WebSocket. Chunks are simultaneously archived to object storage so every service is recoverable.

MediaRecorder API WebSocket transport R2 object storage

Transcribe

Streaming speech-to-text in the speaker's language

We run two ASR engines in parallel — Deepgram Nova for low latency and AssemblyAI Universal for accuracy — and fall back automatically if either lags. Your custom glossary is injected at this layer.

Deepgram Nova-3 AssemblyAI Universal-Streaming Custom glossary terms

Translate

Sentence-by-sentence translation, not word-by-word

A sentence buffer collects partial transcripts, fires translation at natural sentence boundaries, and rewrites in flight if context corrects an earlier phrase. Your glossary maps proper nouns and theological terms.

Sentence-level batching GPT translation DeepL glossary

Deliver

To listeners on any device, in any language

Translated text streams over Server-Sent Events to listener phones. Optional neural TTS audio streams in parallel. Each language is its own channel — adding listeners is free; adding languages is one click.

SSE streaming Cloudflare Durable Objects Edge cache, 300+ POPs

Under the hood

Built on the Cloudflare edge.

Every component runs as close to your listeners as physically possible — because translation latency is the cost of feeling left out.

◈

One Durable Object per service

Each live session gets its own coordinator — managing the broadcaster, listener pool, and translation state. State stays sticky so reconnects are seamless.

⌽

Audio archived to R2

Every chunk is written to a per-session prefix. After the service, we stitch them into a single file. Recordings stay yours; we never train on them.

⬚

Transcripts in D1

Translated lines land in a relational store you can query. Word-level timestamps, language codes, speaker attribution. Export anytime as JSON or CSV.

⬢

Glossaries you control

Add proper nouns, theological terms, and language-specific overrides. Glossaries apply at both transcription and translation, so 'Yahweh' never becomes 'Jehovah' downstream.

⏱

Pacing presets

Quick (700ms target), Standard (1.1s), or Careful (1.6s). Trade latency for fluency depending on whether you’re translating energetic preaching or deliberate teaching.

◑

Failover, not fingers-crossed

If a vendor falters mid-service we hot-swap to the backup in under two seconds. Your service never pauses for our infrastructure problems.

What you ship

The deliverables a host actually cares about.

During the service

✦ Listener QR + URL
✦ Captions in 30+ languages
✦ Optional voice in earbuds
✦ Decision Moment cards
✦ Live moderator console

After the service

✦ Full audio recording
✦ Transcript in every language
✦ Listener analytics by language
✦ Decision response ledger
✦ Shareable replay link

Every week

✦ Glossary refinements
✦ Accuracy reports
✦ Service health digest
✦ New languages on request
✦ Direct line to our team

Live in < 1 second

Make every word land.

Run a free live test in your own service. We will help you set it up in 20 minutes.

Book a demo See how it works

A 4-stage pipeline. Sub-second latency.