🦙
LlamaIndex+Stompy

Indexes that remember what they indexed

Because your index knows your docs, but not your project

The Problem

LlamaIndex is incredible at finding relevant documents. Your RAG pipeline retrieves with precision.

But retrieval isn't understanding.

Your index knows what's in your docs. It doesn't know what you've been building, what decisions you've made, or what you tried last week that didn't work. Every query exists in isolation.

It's like having a librarian with perfect recall of every book—and zero memory of what you've been researching.

How Stompy Helps

Stompy adds project context to your retrieval. Your LlamaIndex queries gain awareness of:

- What you're building and why - Previous query patterns and their outcomes - Project-specific terminology and conventions - Decisions made based on retrieved information

Your RAG pipeline doesn't just find relevant docs—it finds them in the context of your ongoing work.

Integration Walkthrough

1

Install llama-index-tools-mcp

LlamaIndex provides an official MCP tools package for connecting to any MCP server.

pip install llama-index-tools-mcp
2

Connect Stompy via SSE transport

Use SSEMCPClient to connect to Stompy with bearer auth. McpToolSpec converts tools to LlamaIndex FunctionTools.

from llama_index.tools.mcp import SSEMCPClient, McpToolSpec
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI
import os
# Connect to Stompy via SSE
mcp_client = SSEMCPClient(
url="https://mcp.stompy.ai/sse",
headers={"Authorization": f"Bearer {os.environ['STOMPY_TOKEN']}"}
)
mcp_tool_spec = McpToolSpec(client=mcp_client)
# Get Stompy's memory tools
tools = await mcp_tool_spec.to_tool_list_async()
# Create agent with persistent memory
agent = FunctionAgent(
llm=OpenAI(model="gpt-4o"),
tools=tools,
system_prompt="You are a helpful assistant with persistent memory."
)
3

Agent saves project patterns with lock_context

When your agent learns project-specific patterns, it saves them. These persist across sessions and inform future RAG queries.

# Session 1: Establish caching strategy
response = await agent.run(
"Let's use Redis for auth token caching"
)
# Agent calls lock_context:
# lock_context(topic="caching_strategy",
# content="Redis for hot data. 15min TTL for auth tokens.
# Connection pool in src/cache/redis.py.
# Using redis-py with connection pooling.",
# priority="important")
# → Creates v1.0
4

RAG + project memory = magic

Future queries combine document retrieval with project context. Your agent knows both what the docs say AND what you've decided.

# Session 12: Agent remembers project patterns
response = await agent.run(
"Add caching to the rate limiter"
)
# Agent calls: recall_context("caching_strategy") → v1.0
# Agent also retrieves rate limiting docs from your index
# Agent combines both:
# "Adding Redis caching to rate limiter using your existing
# connection pool in src/cache/redis.py. Same 15min TTL
# pattern as auth tokens. Here's the implementation..."

What You Get

  • Automatic session handovers bridge RAG sessions
  • Semantic search (embeddings) combines with your existing RAG pipeline
  • Delta evaluation prevents knowledge duplication
  • Priority system ensures critical rules always surface
  • Conflict detection catches contradictory information

Ready to give LlamaIndex a memory?

Join the waitlist and be the first to know when Stompy is ready. Your LlamaIndex projects will never forget again.