AI Chat

Multi-LLM intelligent chat with RAG context, audio transcription, session management, and embeddable widgets

smart_toy

Overview

OpenRails AI Chat is the primary user-facing interface of the platform. It provides a rich, real-time conversational experience powered by multiple large language models, enhanced with retrieval-augmented generation (RAG) from your organization's knowledge base. Users can chat with AI that has full context of ingested documents, receive source-cited answers, transcribe audio, and interact through an embeddable widget on external sites.

Key Value: Unlike single-vendor chatbots, OpenRails AI Chat lets you route conversations to the best model for each task — use local self-hosted models for sensitive data, OpenAI for creative tasks, and Anthropic for analytical work — all within the same interface.

stars

Key Capabilities

hub Multi-LLM Routing

Support for multiple LLM providers including local self-hosted models, OpenAI (GPT-4o, GPT-4), Anthropic (Claude), and Google (Gemini). Switch models per conversation or per project. No code changes required when switching providers.

search RAG-Powered Context

Every response can be enriched with context from ingested documents. Dual retrieval using vector search and knowledge graph ensures accurate, relationship-aware answers with source citations.

mic Audio Transcription

Built-in speech-to-text integration for audio transcription. Users can send voice messages that are automatically transcribed and processed by the AI. Supports multiple languages and audio formats.

history Session Management

Persistent conversation history with search, bookmarking, and export. Conversations are organized by project and can be shared with team members while respecting access controls.

code Widget Embedding

Deploy AI chat on any website with a single script tag. JWT-authenticated, domain-whitelisted, and fully customizable with your brand colors and avatar. Full details on the Widget Embedding feature sheet.

stream Real-Time Streaming

Token-by-token streaming via WebSocket for instant responsiveness. Users see answers as they are generated, with progress indicators and the ability to stop generation mid-stream.

code

Technical Highlights

Feature	Implementation	Details
LLM Abstraction	Unified provider interface	Swap models without application changes; per-project model configuration
Streaming	Real-time streaming	Smooth, real-time delivery
RAG Pipeline	Vector + Knowledge Graph	Dual retrieval for accurate, relevant answers
Audio	Speech-to-Text Engine	Multi-language transcription, WAV/MP3/M4A support
Context Window	Automatic management	Smart truncation and summarization for long conversations
File Attachments	Inline processing	Attach documents directly in chat for instant analysis

lightbulb

Use Cases

support_agent

Internal Help Desk

Employees ask questions against company documentation and get cited answers instantly

person

Customer Support

Embed the widget on support pages for AI-powered self-service with escalation to agents

school

Training Assistant

New employees interact with a knowledge-rich AI trained on company procedures and policies

gavel

Legal Research

Lawyers query case law and contracts with semantic search and relationship-aware retrieval

analytics

Data Analysis

Analysts ask questions about ingested reports and receive summarized insights with sources

translate

Multilingual Support

Leverage multi-language LLMs and speech-to-text transcription for global team collaboration

architecture

Chat Architecture Flow

User Input → Context Assembly (RAG retrieval + conversation history) → LLM Router (model selection) → Streaming Response → Source Citations

Optional: Audio transcription (speech-to-text) before context assembly | PII de-identification before LLM call

Related Feature Sheets

RAG Pipeline — how documents become chat context
Widget Embedding — deploy chat on external websites
Document Ingestion — supported formats and processing
Governance & Compliance — PII handling in chat