Open-source vectors with persistent context

Open-source embeddings, persistent memory

The Problem

Chroma makes embedding storage delightfully simple. Open-source, runs anywhere, zero configuration headaches. pip install chromadb and you're storing vectors in minutes. It's the SQLite of embedding databases.

But simplicity doesn't mean your memory problems are solved.

Chroma stores your document embeddings beautifully. It retrieves similar chunks with ease. What it doesn't do is remember that you queried "authentication best practices" yesterday and decided to go with session tokens instead of JWTs. Every query to Chroma is a fresh start—perfect isolation, zero context.

You're prototyping a RAG application. The iteration cycle is fast: tweak the chunking, re-embed, query, evaluate. But between sessions, all that experimental knowledge evaporates. Which chunking strategies worked? What queries returned garbage? What decisions did you make based on retrieved docs?

Chroma remembers your documents. Nothing remembers your journey.

For a tool that's supposed to make AI development simple, there's nothing simple about re-discovering the same insights every session.

How Stompy Helps

Stompy adds persistent project memory to Chroma's delightful simplicity.

Your open-source RAG workflow gains continuity: - **Session-spanning memory**: The experiments you ran yesterday, the decisions you made, the dead ends you discovered—all searchable - **Development context**: Remember why you chose 512-token chunks, why certain queries need metadata filters, why the summarization approach didn't work - **Prototype to production**: As your Chroma setup evolves from notebook to service, your project knowledge evolves with it - **Local-first compatibility**: Both Chroma and Stompy work in local development; both scale to production

The combination preserves what made Chroma attractive in the first place: simplicity. Stompy adds memory without adding complexity. Two pip installs, and your RAG prototype has both document search and project context.

Simple tools with serious memory.

Integration Walkthrough

Set up Chroma + Stompy

Both tools install easily and work locally. Chroma for document vectors, Stompy for project memory.

import chromadb
import httpx
import os

# Chroma: Simple local vector storage
chroma_client = chromadb.Client()  # or PersistentClient for disk storage
collection = chroma_client.get_or_create_collection(
    name="project_docs",
    metadata={"hnsw:space": "cosine"}
)

# Stompy: Simple cloud memory
STOMPY_TOKEN = os.environ["STOMPY_TOKEN"]

async def stompy_recall(topic: str) -> str:
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "https://mcp.stompy.ai/sse",
            headers={"Authorization": f"Bearer {STOMPY_TOKEN}"},
            json={"tool": "recall_context", "topic": topic}
        )
        data = response.json()
        return data.get("content", "")

Context-aware RAG queries

Query Chroma for similar documents, then combine with project context from Stompy for truly informed responses.

async def contextual_rag(user_query: str):
    # Get relevant documents from Chroma
    results = collection.query(
        query_texts=[user_query],
        n_results=5,
        include=["documents", "metadatas", "distances"]
    )

    # Get project context from Stompy
    project_context = await stompy_recall("architecture_decisions")
    tech_constraints = await stompy_recall("tech_stack")

    # Build informed prompt
    retrieved_docs = "\n---\n".join(results["documents"][0])

    prompt = f"""You are helping with a project. Here's the context:

PROJECT DECISIONS:
{project_context}

TECHNICAL CONSTRAINTS:
{tech_constraints}

RETRIEVED DOCUMENTATION:
{retrieved_docs}

USER QUESTION: {user_query}

Answer considering our specific project context:"""

    return await call_llm(prompt)

Track RAG development insights

As you iterate on your Chroma setup, save what works so you don't rediscover the same lessons.

async def save_rag_insight(insight_type: str, content: str):
    """Save development insights for future sessions."""
    async with httpx.AsyncClient() as client:
        await client.post(
            "https://mcp.stompy.ai/sse",
            headers={"Authorization": f"Bearer {STOMPY_TOKEN}"},
            json={
                "tool": "lock_context",
                "topic": f"rag_development_{insight_type}",
                "content": content,
                "tags": "rag,development,chroma"
            }
        )

# Examples of what to save:
await save_rag_insight("chunking", """
Chunking Strategy Decision:
- 512 tokens with 50 token overlap works best for our docs
- Larger chunks lost specificity for code questions
- Sentence-based splitting broke code blocks badly
""")

await save_rag_insight("retrieval", """
Retrieval Tuning:
- n_results=5 is sweet spot (3 too few, 10 adds noise)
- Metadata filter on 'doc_type' helps for API questions
- Distance threshold 0.7 filters low-quality matches
""")

What You Get

Zero-friction setup: Both Chroma and Stompy prioritize developer experience and easy installation
Local + cloud: Develop with Chroma locally, sync project knowledge via Stompy—best of both worlds
Development memory: Remember chunking experiments, retrieval tuning, and architectural decisions
Prototype-friendly: Fast iteration without losing context between sessions
Production-ready: Both tools scale from notebooks to services without architecture changes

Ready to give Chroma a memory?

Join the waitlist and be the first to know when Stompy is ready. Your Chroma projects will never forget again.

Related Integrations

Pinecone

RAG & Knowledge

Vector search meets project memory

Learn more

Weaviate

RAG & Knowledge

AI-native search with AI-native memory

Learn more

Qdrant

RAG & Knowledge

High-performance vectors with high-performance memory

Learn more

View all integrations