Engine

rag-agent

Local-first hybrid RAG engine

rag-agent is a single-process Go service that ingests your documents, indexes them with hybrid BM25 and vector retrieval, and generates grounded answers using local or self-hosted LLM endpoints.

  • Hybrid BM25 + vector retrieval with tunable fusion (bm25_k, vector_k, top_k)
  • /retrieve endpoint for auditable, citation-ready evidence excerpts
  • /search endpoint for grounded LLM-generated answers
  • Markdown and HTML ingestion with structure-aware chunking
  • Ollama, LM Studio, and any OpenAI-compatible LLM endpoint
  • Built-in eval — Recall@k and MRR against gold sets
  • Optional 9P file-tree API for scriptable Unix workflows
  • Pluggable lexical engines: Bleve, Tantivy, or in-memory BM25
1.000 Recall@8
0.875 MRR

Recall@8 1.000 · MRR 0.875 on the public gold set (BM25-only baseline, reproducible with eval/ fixtures).

  • Internal legal knowledge shelf for policies and contracts
  • Controlled enterprise wiki assistant for engineering docs
  • On-prem retrieval layer embedded into an existing product
  • Legal and compliance teams in France/EU that cannot use cloud AI vendors
  • Platform and backend teams that need a local RAG sidecar
  • System integrators delivering sovereign AI deployments

Pilot

2–4 weeks · one corpus

Start a pilot