How RAG Collections Work

Understanding how OpenRails processes and retrieves your documents

Overview

When you create a data lake and upload documents, OpenRails automatically handles the heavy lifting — parsing, chunking, embedding, and indexing your content so it can be searched and referenced by your chatbots and agents. You don't need to configure individual collections manually.

What Happens When You Upload Documents

Documents Are Parsed

Each file is processed based on its format — text is extracted from PDFs, Word documents, and presentations. Images are OCR'd. Audio and video are transcribed.

Content Is Chunked

The extracted text is split into smaller segments that are suitable for search and retrieval. The platform handles chunking automatically based on the content type.

Chunks Are Embedded and Indexed

Each chunk is converted into a searchable representation and stored in the knowledge base. This is what enables your chatbots and agents to find relevant information when answering questions.

Dual Retrieval

OpenRails uses two complementary search methods to find the most relevant content:

Semantic Search

Finds content that matches the meaning of the question, even when the exact words don't appear in the document.

Knowledge Graph

Understands relationships between people, organizations, concepts, and events — answering questions that require connecting information across documents.

Both methods run automatically when a user asks a question. Results are combined and filtered by the user's security tier before being passed to the AI model.

Best Practices

Organize by topic: Create separate data lakes for different knowledge domains (e.g., HR policies, product documentation, customer support). This keeps retrieval focused and relevant.
Keep documents current: When source documents are updated, re-upload them to the data lake. The platform will re-process and re-index the content automatically.
Use security tiers: Assign appropriate security tiers to data lakes containing sensitive content. This ensures that restricted documents are never surfaced to unauthorized users, even through AI responses.

Next Steps