Engine

rag-agent

Local-first hybrid RAG engine

rag-agent is a single-process Go service that ingests your documents, indexes them with hybrid BM25 and vector retrieval, and generates grounded answers using local or self-hosted LLM endpoints.

Start a pilot Pricing

Try it live

Ask a question against a real rag-agent instance. First see ranked evidence from /retrieve, then a grounded answer streamed from /search.

Number of excerpts

Evidence

Ranked excerpts from GET /retrieve (no LLM).

Run a query to see retrieval hits.

Answer

Grounded response streamed from GET /search.

The generated answer appears here.

Capabilities

Hybrid BM25 + vector retrieval with tunable fusion (bm25_k, vector_k, top_k)
/retrieve endpoint for auditable, citation-ready evidence excerpts
/search endpoint for grounded LLM-generated answers
Markdown and HTML ingestion with structure-aware chunking
Ollama, LM Studio, and any OpenAI-compatible LLM endpoint
Built-in eval — Recall@k and MRR against gold sets
Optional 9P file-tree API for scriptable Unix workflows
Pluggable lexical engines: Bleve, Tantivy, or in-memory BM25

Benchmark

1.000 Recall@8

0.875 MRR

Recall@8 1.000 · MRR 0.875 on the public gold set (BM25-only baseline, reproducible with eval/ fixtures).

Use cases

Internal legal knowledge shelf for policies and contracts
Controlled enterprise wiki assistant for engineering docs
On-prem retrieval layer embedded into an existing product

Who it is for

Legal and compliance teams in France/EU that cannot use cloud AI vendors
Platform and backend teams that need a local RAG sidecar
System integrators delivering sovereign AI deployments

Pilot

2–4 weeks · one corpus

Start a pilot