Embedding Harness

Upload regulatory documents, compliance frameworks, or any structured text and get back RAG-ready JSONL chunks with full normalisation, section-aware chunking, and provenance metadata.

Extract

Parse PDF, DOCX, YAML, JSON, Markdown and plain text into clean UTF-8.

Normalise

OCR cleanup, heading promotion, paragraph splitting, and markdown standardisation.

Chunk

Section-aware splitting with configurable word targets, heading depth, and preamble.

Export

Download JSONL chunks or normalised Markdown. Results also pushed to S3.

How it works
1Upload files or paste text
2Pipeline processes automatically
3Download RAG-ready JSONL

Supported formats: PDF DOCX YAML JSON Markdown Plain Text

Connect with your API key

Paste the API key provided by your Ontic admin to unlock the pipeline.

or
Please enter an API key.

Document Ingestion

Upload files or paste structured text to produce RAG-ready JSONL chunks.

Upload Files

Drag and drop or browse for documents to process.

Drop files here or click to browse

PDF, DOCX, YAML, JSON, Markdown, Plain Text

Paste Text

Alternatively, paste structured text, markdown, or raw content.

Pipeline Options

Results

StepDurationDetail