Uploading Documents

Upload files to OpenRails with support for batch processing, chunked uploads, and multiple formats

Overview

OpenRails supports uploading a wide range of document types including PDF, DOCX, PPTX, images, video, and audio files. Uploaded documents are automatically processed through the ingestion pipeline where they are parsed, chunked, embedded, and indexed for RAG retrieval.

Supported Formats

Category Formats Processing
Documents PDF, DOCX, PPTX, TXT, CSV, JSON, HTML, Markdown Text extraction and chunking
Images PNG, JPG, JPEG, TIFF, BMP, GIF OCR for text extraction
Video MP4, AVI, MOV, MKV, WEBM Speech-to-text transcription
Audio MP3, WAV, M4A, OGG, FLAC, WEBM Speech-to-text transcription

Upload Documents

Navigate to a Data Lake

From the sidebar, go to Data Lakes and select the data lake where you want to upload documents.

Click "Upload Files"

Click the Upload Files button or drag and drop files directly onto the upload area.

Select Files

Choose one or more files from your file system. You can upload multiple files per request.

Configure Batch Settings

For large uploads, configure batch processing:

  • Batch Size — Configurable files per batch
  • Max Batch Size — Configurable total size per batch

Monitor Upload Progress

The upload progress bar shows the status of each file. Large files use chunked uploads automatically, splitting the file into smaller segments for reliable transfer.

Verify Ingestion

After upload completes, files enter the ingestion pipeline. Check the Documents tab in your data lake to see ingestion status for each file.

Upload Limits

Upload limits are configurable per deployment. Large files are automatically chunked for reliable transfer. Use multiple batches for larger collections to optimize processing throughput.

Tip: For best results, upload well-formatted documents with clear headings and structure. This improves chunking quality and RAG retrieval accuracy.
Important: Large video and audio files can take significant time to transcribe. Monitor the ingestion status in the data lake to track progress.

Next Steps