PageIndex and Claude Code: Vectorless RAG That Reasons Over Long Documents

Retrieval-Augmented Generation has become the default architecture for grounding language models in external knowledge, and the idea is sound: rather than baking every fact into model weights, retrieve relevant text at query time and hand it to the model as context. The problem is that most implementations retrieve poorly, because vector similarity search finds text that looks like the query, not text that actually answers it. For short factual questions over clean corpora that distinction barely matters. For long professional documents – SEC filings, technical specifications, hardware datasheets, regulatory submissions – it matters enormously, and it is where most production RAG systems quietly fall apart.

PageIndex takes a different approach. There is no vector database and no chunking; instead, an LLM reads the document, builds a hierarchical tree index (essentially a table of contents with summaries), and a second reasoning model later navigates that tree to find the right pages at query time. I recently forked the project to add native integration with Claude Code via the Model Context Protocol and to make Claude the default model for both indexing and retrieval, which is what prompted this post.

Similarity is not relevance

The standard RAG pipeline cuts documents into fixed-size chunks – typically 512 to 1024 tokens – embeds them into a vector space, and retrieves chunks whose embeddings are nearest to the query embedding. There are two structural problems with this.

First, chunking is arbitrary. A paragraph boundary in the source document may fall mid-chunk, splitting context across two retrievals. A section that answers the query may span four pages; the retriever returns one chunk from the middle of it. The document’s natural structure – headings, subsections, numbered clauses – is discarded.

Second, embedding similarity measures how closely two pieces of text resemble each other in the model’s learned feature space, not whether one piece of text answers the question posed by the other. A query about “capital allocation policy” will score highly against every passage that mentions capital and allocation, including unrelated ones, and may miss the one passage that says “the board has adopted a policy of returning 50% of free cash flow to shareholders” because those words do not appear near each other in the embedding neighbourhood.

PageIndex treats retrieval as a reasoning problem. The indexing step uses an LLM to read the document and produce a hierarchical tree structure – titles, page ranges, and summaries for each section, nested to reflect the document’s own organisation. Here is a fragment of what that output looks like for a Federal Reserve report:

{
  "title": "Financial Stability",
  "node_id": "0006",
  "start_index": 21,
  "end_index": 22,
  "summary": "The Federal Reserve's framework for monitoring systemic risk...",
  "nodes": [
    {
      "title": "Monitoring Financial Vulnerabilities",
      "node_id": "0007",
      "start_index": 22,
      "end_index": 28,
      "summary": "The Federal Reserve's monitoring of leverage, liquidity, and asset valuations..."
    },
    {
      "title": "Domestic and International Cooperation and Coordination",
      "node_id": "0008",
      "start_index": 28,
      "end_index": 31,
      "summary": "In 2023, the Federal Reserve collaborated with domestic and international..."
    }
  ]
}

The retrieval step gives an agent this tree – without any page text, just titles, page ranges, and summaries – and asks it to reason about which sections are relevant. The agent then fetches only those pages. No vectors, no approximate search; just an LLM reading a structured table of contents and deciding where to look.

The benchmark result is striking. Mafin 2.5, a financial document QA system built on PageIndex, achieved 98.7% accuracy on FinanceBench, a benchmark of questions over SEC filings and earnings disclosures. Traditional vector RAG systems typically score in the 70-80% range on the same benchmark. The gap comes almost entirely from retrieval precision: vector search retrieves the wrong chunks; tree-based reasoning retrieves the right pages.

The Claude Code integration

PageIndex has always supported multiple LLM providers via LiteLLM. My fork adds two things to the upstream project: an MCP server (pageindex/mcp_server.py) that exposes the indexing and retrieval API as tools Claude Code can call directly in a session, and a switch to Anthropic Claude as the default provider – claude-sonnet-4-6 for indexing (where throughput matters) and claude-opus-4-6 for retrieval (where reasoning depth matters).

Setup is minimal. Add your API key and a workspace path to .env:

ANTHROPIC_API_KEY=your_anthropic_key_here
PAGEINDEX_WORKSPACE=/Users/yourname/.pageindex/workspace

Pre-index your documents before the session – indexing is LLM-driven and takes 30 to 120 seconds per PDF, so it is better done in advance:

python3 run_pageindex.py --dir_path /path/to/your/documents/

The workspace persists across sessions as JSON files, so you only pay the indexing cost once per document. Start Claude Code from the PageIndex directory and the MCP server registers automatically via .claude/settings.json:

claude

Five tools are now available to Claude inside the session:

index_document – index a PDF or Markdown file on demand, returns a doc_id
list_documents – show everything in the workspace with descriptions and page counts
get_document – fetch metadata for a specific document
get_document_structure – retrieve the full tree hierarchy without page text (token-efficient)
get_page_content – fetch text for specific page ranges ("5-7", "3,8", "12")

The intended workflow is: Claude calls get_document_structure to receive the tree, reasons over the titles and summaries to identify relevant sections, then calls get_page_content with a tight page range to retrieve only the content it needs. No chunk retrieval, no approximate search – the model decides where to look and fetches exactly those pages.

A concrete example from the README: you ask Claude “What are the API specifications for Sharesies stock purchases?” Claude calls list_documents, finds the relevant spec, calls get_document_structure, navigates to the section covering stock purchases, then calls get_page_content with the three-page range that covers it. The answer arrives with a page reference, not a chunk hash.

Where this matters

Conversational AI. Chatbots backed by vector RAG frequently hallucinate when the answer spans multiple sections, because the retriever fetches fragments that are each individually plausible but collectively incomplete. Tree-based retrieval fetches contiguous page ranges with natural section boundaries intact. The model receives a complete, coherent excerpt and can cite it precisely. Users get answers with page references they can verify – “see pages 22-28 of the Financial Stability section” – rather than answers sourced from anonymous chunks.

Research. Academic papers, regulatory filings, and technical reports are structured documents. Their authors wrote introductions, methods sections, results, and appendices for a reason. A vector retriever ignores that structure; a tree-based retriever exploits it. When you are working through a stack of PDFs trying to understand a topic, PageIndex lets Claude navigate each document the way a human expert would: check the table of contents, skim relevant sections, read the pages that matter. The 98.7% FinanceBench result demonstrates this on a domain with complex, deeply structured documents where precision is non-negotiable.

Coding. Technical documentation – language specifications, API references, RFC documents, hardware datasheets – is often hundreds of pages long and highly structured. Chunking these documents loses the structure that makes them navigable. With PageIndex running as an MCP server inside Claude Code, you can ask Claude to look up a specific clause of a specification mid-session. Claude navigates the index, fetches the relevant pages, and answers in context – without you having to find the right pages yourself or upload anything to an external service. The workspace is local, the index persists, and the retrieval is traceable.

How it compares

	Vector RAG	PageIndex
Storage	Vector database	JSON tree files
Retrieval	Nearest-neighbour similarity	LLM reasoning over hierarchy
Document structure	Artificial chunks	Natural sections
Explainability	Opaque (no traceable path)	Traceable (page and section refs)
Token usage	Full chunks returned	Tight page ranges on demand
FinanceBench accuracy	~70-80%	98.7%

The table understates one advantage: the absence of a vector database simplifies the deployment considerably. There is no embedding model to maintain, no index to rebuild when documents are updated, no separate service to run. The workspace is a directory of JSON files. For a self-hosted setup or a coding assistant that needs to reference a fixed set of technical documents, that simplicity is worth something.

Getting started

The upstream project is at github.com/VectifyAI/PageIndex. The README covers Claude Code setup in full, including .env configuration and the pre-indexing workflow. The examples/ directory has a complete agentic RAG demo using OpenAI Agents SDK if you prefer a programmatic integration over the Claude Code MCP approach.

Vectorless, reasoning-based retrieval suits tool-using AI agents better than vector similarity search. Not because similarity search is wrong in principle; because agents already reason, and retrieval expressed as reasoning is something they can do natively. PageIndex makes that available without a vector database or chunking step, and now without leaving your Claude Code session.