Indexes that remember what they indexed
Because your index knows your docs, but not your project
The Problem
LlamaIndex is incredible at finding relevant documents. Your RAG pipeline retrieves with precision.
But retrieval isn't understanding.
Your index knows what's in your docs. It doesn't know what you've been building, what decisions you've made, or what you tried last week that didn't work. Every query exists in isolation.
It's like having a librarian with perfect recall of every book—and zero memory of what you've been researching.
How Stompy Helps
Stompy adds project context to your retrieval. Your LlamaIndex queries gain awareness of:
- What you're building and why - Previous query patterns and their outcomes - Project-specific terminology and conventions - Decisions made based on retrieved information
Your RAG pipeline doesn't just find relevant docs—it finds them in the context of your ongoing work.
Integration Walkthrough
Install llama-index-tools-mcp
LlamaIndex provides an official MCP tools package for connecting to any MCP server.
pip install llama-index-tools-mcp
Connect Stompy via SSE transport
Use SSEMCPClient to connect to Stompy with bearer auth. McpToolSpec converts tools to LlamaIndex FunctionTools.
from llama_index.tools.mcp import SSEMCPClient, McpToolSpecfrom llama_index.core.agent.workflow import FunctionAgentfrom llama_index.llms.openai import OpenAIimport os# Connect to Stompy via SSEmcp_client = SSEMCPClient(url="https://mcp.stompy.ai/sse",headers={"Authorization": f"Bearer {os.environ['STOMPY_TOKEN']}"})mcp_tool_spec = McpToolSpec(client=mcp_client)# Get Stompy's memory toolstools = await mcp_tool_spec.to_tool_list_async()# Create agent with persistent memoryagent = FunctionAgent(llm=OpenAI(model="gpt-4o"),tools=tools,system_prompt="You are a helpful assistant with persistent memory.")
Agent saves project patterns with lock_context
When your agent learns project-specific patterns, it saves them. These persist across sessions and inform future RAG queries.
# Session 1: Establish caching strategyresponse = await agent.run("Let's use Redis for auth token caching")# Agent calls lock_context:# lock_context(topic="caching_strategy",# content="Redis for hot data. 15min TTL for auth tokens.# Connection pool in src/cache/redis.py.# Using redis-py with connection pooling.",# priority="important")# → Creates v1.0
RAG + project memory = magic
Future queries combine document retrieval with project context. Your agent knows both what the docs say AND what you've decided.
# Session 12: Agent remembers project patternsresponse = await agent.run("Add caching to the rate limiter")# Agent calls: recall_context("caching_strategy") → v1.0# Agent also retrieves rate limiting docs from your index# Agent combines both:# "Adding Redis caching to rate limiter using your existing# connection pool in src/cache/redis.py. Same 15min TTL# pattern as auth tokens. Here's the implementation..."
What You Get
- Automatic session handovers bridge RAG sessions
- Semantic search (embeddings) combines with your existing RAG pipeline
- Delta evaluation prevents knowledge duplication
- Priority system ensures critical rules always surface
- Conflict detection catches contradictory information
Ready to give LlamaIndex a memory?
Join the waitlist and be the first to know when Stompy is ready. Your LlamaIndex projects will never forget again.