How it works

Reasoning, not retrieval.

Most tools answer questions about your data by retrieving chunks of it, or by looping a model over your files one tool call at a time. RLM does something different: the model writes a program, runs it in a sandbox over your data, and only the answer comes back.

vs. coding agents

Agents loop. RLM writes code.

Coding agents (Claude Code, Codex, Pi) run a tool-call loop: read a file, grep, run a command—each result streamed back into the model's context window. That's powerful for editing a codebase, but the context window is the ceiling: to reason over more data than fits, you can't.

RLM writes a program that is the loop. It filters and aggregates in the sandbox and fans out sub-calls (llm_query) over slices—so the data never has to fit in one context window.

Tool-loop agent

Claude Code · Codex · Pi

LLM

read_file grep bash

every result returns

Context window

file1.md … (4k tokens)

grep results … (2k)

file2.md … (6k)

file3.md … ⚠ window full

Data over the window size can't be reasoned over—it never fits.

RLM

ModelRelay

LLM

writes a program ↓

# runs in sandbox
rows = query("…")
hits = grep(docs, q)
parts = llm_batch(hits)
answer["ready"] = summarize(parts)

only the answer returns

Answer

Intermediate data stayed in the sandbox—out of the context window.

Scales past the context window: the code does the reduction.

vs. RAG & search

No retrieval pipeline.

Search and RAG tools (vector DBs, hybrid search like qmd) answer by retrieving: chunk everything, embed it, rank the top matches, and hope the answer is in there. You build and maintain that pipeline, and it only ever returns snippets.

RLM skips it. The model queries and reads your data directly, follows links and structure, and reasons across what it finds—no chunking, no embeddings, no reranking to maintain.

RAG / search pipeline

Chunk every document

Embed → vector DB

Retrieve top-k matches

Re-rank

Stuff into prompt

Returns snippets—not answers. You own the pipeline.

RLM

Inspect schema / files

Query & read what matters

Reason across results

Recurse with sub-calls

Returns an answer. Nothing to build or maintain.

At a glance

Where RLM fits.

RLM isn't a coding agent, a search engine, or a BI tool. It's a runtime you embed to reason over your data—across any model, in your own boundary.

	Coding agents	RAG / search	Text-to-SQL	RLM
Examples	Claude Code, Codex, Pi	qmd, vector DBs	Cortex, Genie	ModelRelay
Job	Act on a codebase	Retrieve snippets	One-shot query	Reason & answer
How	Tool-call loop	Chunk + embed + rank	NL → SQL → rows	Writes & runs code
Bigger than context?	No—window-bound	Only top-k	N/A	Yes—code reduces it
Models	Single lab	Varies	Warehouse-locked	Any lab
Shape	App you use	Library / index	BI tool	Embeddable runtime

These tools aren't competitors so much as different jobs—RLM can even call a search index or run SQL as one tool inside its loop.

Reason over your data.

Connect a database or a folder of docs and ask anything.

Request access