Audio Transcription

Use voice input with speech-to-text in OpenRails Chat

Overview

OpenRails integrates a speech-to-text engine for audio transcription, allowing you to use voice input in conversations. The transcription engine can run locally for privacy or remotely via a cloud API for convenience. Transcribed text is automatically inserted as a chat message.

Transcription Modes

Mode	Description	Requirements
Local	Runs the speech-to-text engine on your server hardware. Audio never leaves your network.	Transcription model downloaded locally, sufficient GPU/CPU resources
Remote	Sends audio to a cloud speech-to-text API for transcription.	Valid OpenAI API key configured in LLM Keys

Using Voice Input

Open a Conversation

Navigate to a bot and open the chat interface.

Click the Microphone Button

Click the microphone icon next to the message input field. Your browser will request microphone permission if not already granted.

Speak Your Message

Speak clearly into your microphone. A visual indicator shows that recording is in progress.

Stop Recording

Click the microphone button again or press Escape to stop recording. The audio is sent for transcription.

Review and Send

The transcribed text appears in the message input field. Review the transcription, make any edits, and press Enter to send.

Supported Audio Formats

The following audio formats are supported for file upload transcription:

MP3 — MPEG Audio Layer III
WAV — Waveform Audio File Format
M4A — MPEG-4 Audio
WEBM — WebM audio (used by browser recording)
OGG — Ogg Vorbis
FLAC — Free Lossless Audio Codec

Audio Document Transcription

Beyond chat voice input, the speech-to-text engine is also used in the document ingestion pipeline to transcribe uploaded audio and video files. When you upload an audio file to a data lake, it is automatically transcribed, and the transcript is chunked and indexed like any other document.

Tip: For best transcription accuracy, use a quality microphone and speak in a quiet environment. The transcription engine handles multiple languages and will auto-detect the spoken language.

Important: When using Remote mode, audio data is sent to a cloud API. For sensitive or regulated environments, use Local mode to keep all audio data on-premises.

Next Steps

Starting a Conversation — Learn the basics of chat in OpenRails
Using RAG Context — Combine voice input with RAG-enhanced responses
Uploading Documents — Upload audio files for transcription and indexing