Engine
rag-agent
Local-first hybrid RAG engine
rag-agent is a single-process Go service that ingests your documents, indexes them with hybrid BM25 and vector retrieval, and generates grounded answers using local or self-hosted LLM endpoints.
Capabilities
- Hybrid BM25 + vector retrieval with tunable fusion (bm25_k, vector_k, top_k)
- /retrieve endpoint for auditable, citation-ready evidence excerpts
- /search endpoint for grounded LLM-generated answers
- Markdown and HTML ingestion with structure-aware chunking
- Ollama, LM Studio, and any OpenAI-compatible LLM endpoint
- Built-in eval — Recall@k and MRR against gold sets
- Optional 9P file-tree API for scriptable Unix workflows
- Pluggable lexical engines: Bleve, Tantivy, or in-memory BM25
Benchmark
1.000 Recall@8
0.875 MRR
Recall@8 1.000 · MRR 0.875 on the public gold set (BM25-only baseline, reproducible with eval/ fixtures).
Use cases
- Internal legal knowledge shelf for policies and contracts
- Controlled enterprise wiki assistant for engineering docs
- On-prem retrieval layer embedded into an existing product
Who it is for
- Legal and compliance teams in France/EU that cannot use cloud AI vendors
- Platform and backend teams that need a local RAG sidecar
- System integrators delivering sovereign AI deployments
Pilot
2–4 weeks · one corpus