Vector search meets project memory

Vectors for documents, Stompy for projects

The Problem

Pinecone is vector search perfected. Your embeddings live in a database optimized for similarity. Retrieval is fast, accurate, and scalable. You've built the RAG pipeline of your dreams.

But retrieval isn't understanding.

Your Pinecone index knows that document A is semantically similar to query B. It doesn't know that you're building an authentication system, that you decided on JWT last week, or that the retrieved docs about OAuth might contradict decisions you've already made. Every query exists in isolation—brilliant at finding needles, oblivious to why you're looking.

You've spent hours configuring your RAG pipeline. Perfect chunking strategy, optimized embeddings, metadata filtering dialed in. Then your teammate asks the same question you answered yesterday, and the pipeline has no idea you already solved this problem.

The documents remember everything. The pipeline remembers nothing.

Vector search finds what's relevant to your query. It can't find what's relevant to your project. That's a fundamentally different kind of memory—and Pinecone wasn't designed to provide it.

How Stompy Helps

Stompy adds the project memory layer your Pinecone pipeline is missing.

Your RAG setup gains a second brain: - **Project context in every query**: Before retrieving, your agent knows the architecture decisions, the tech stack choices, the "we tried X and it didn't work" history - **Query pattern memory**: Remember which documents were useful for which problems, building institutional knowledge about your own corpus - **Decision persistence**: When you make choices based on retrieved docs, those decisions become searchable context for future queries - **Contradiction detection**: Stompy warns when retrieved content conflicts with established project decisions

Pinecone finds documents by semantic similarity. Stompy adds "similar to our project's needs"—a layer of relevance that pure vector search can't provide.

Your RAG pipeline becomes not just retrieval-augmented, but context-augmented.

Integration Walkthrough

Set up dual-memory architecture

Connect both Pinecone (document vectors) and Stompy (project memory) to your RAG pipeline.

from pinecone import Pinecone
import httpx
import os

# Document memory: Pinecone for vector similarity
pc = Pinecone(api_key=os.environ['PINECONE_API_KEY'])
index = pc.Index("documentation")

# Project memory: Stompy for decisions and context
async def get_project_context(topic: str) -> str:
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "https://mcp.stompy.ai/sse",
            headers={"Authorization": f"Bearer {os.environ['STOMPY_TOKEN']}"},
            json={"tool": "recall_context", "topic": topic}
        )
        return response.json().get("content", "")

Context-aware retrieval

Before querying Pinecone, load relevant project context. This lets your LLM interpret retrieved docs through the lens of your specific situation.

async def context_aware_rag(user_query: str):
    # Step 1: Get project context from Stompy
    project_context = await get_project_context("architecture_decisions")
    tech_stack = await get_project_context("tech_stack")

    # Step 2: Query Pinecone for relevant documents
    query_embedding = embed(user_query)  # Your embedding function
    pinecone_results = index.query(
        vector=query_embedding,
        top_k=5,
        include_metadata=True
    )

    # Step 3: Build context-aware prompt
    retrieved_docs = "\n".join([r.metadata["text"] for r in pinecone_results.matches])

    prompt = f"""Project Context:
{project_context}

Tech Stack: {tech_stack}

Retrieved Documentation:
{retrieved_docs}

Question: {user_query}

Answer considering our specific project context:"""

    return call_llm(prompt)

Save decisions to project memory

When your RAG pipeline helps you make decisions, save them to Stompy so future queries benefit.

async def save_decision(topic: str, decision: str, rationale: str):
    """Save decisions made based on retrieved documents."""
    async with httpx.AsyncClient() as client:
        await client.post(
            "https://mcp.stompy.ai/sse",
            headers={"Authorization": f"Bearer {os.environ['STOMPY_TOKEN']}"},
            json={
                "tool": "lock_context",
                "topic": topic,
                "content": f"""Decision: {decision}

Rationale: {rationale}

Based on retrieved documentation from Pinecone index.
Date: {datetime.now().isoformat()}""",
                "tags": "decisions,rag,architecture"
            }
        )

# Example: After RAG helps decide on caching strategy
await save_decision(
    topic="caching_strategy",
    decision="Use Redis with 15-minute TTL for API responses",
    rationale="Retrieved docs showed Redis outperforms Memcached for our read patterns"
)

What You Get

Context-aware retrieval: Documents are interpreted through your project's lens, not in isolation
Decision memory: Choices made from retrieved docs become searchable context for future questions
Query pattern learning: Remember which docs were useful for which problems over time
Contradiction detection: Get warned when retrieved content conflicts with established decisions
Two complementary memory systems: Pinecone for document similarity, Stompy for project relevance

Ready to give Pinecone a memory?

Join the waitlist and be the first to know when Stompy is ready. Your Pinecone projects will never forget again.

Related Integrations

Weaviate

RAG & Knowledge

AI-native search with AI-native memory

Learn more

Chroma

RAG & Knowledge

Open-source vectors with persistent context

Learn more

Qdrant

RAG & Knowledge

High-performance vectors with high-performance memory

Learn more

View all integrations