Facebook AI vectors with persistent memory

Industry-standard vectors, persistent context

The Problem

FAISS is where serious similarity search begins. Facebook AI's library has been battle-tested at scales most databases can only dream of. GPU acceleration, multiple index types, quantization options—it's the foundational technology powering production systems at companies you've heard of.

But FAISS is a library, not a memory system.

Your FAISS index can search millions of vectors in milliseconds. It's a brilliantly optimized data structure that knows exactly one thing: which vectors are near which other vectors. That's it. It doesn't know about your project. It doesn't remember your queries. It doesn't understand context.

You've carefully chosen your index type—IVF for speed, HNSW for accuracy, PQ for memory efficiency. You've tuned nprobe, you've benchmarked recall@k. Your similarity search is as good as it gets.

And every query exists in complete isolation.

FAISS finds the nearest neighbors. It can't tell you if those neighbors are actually useful given what you're trying to build. It can't filter by "documents we've already decided are relevant" or "approaches that didn't work." The gold standard for similarity search has bronze-level contextual awareness.

Fast isn't smart. Nearest isn't relevant.

How Stompy Helps

Stompy adds the contextual layer that FAISS wasn't designed to provide.

Your FAISS-powered system gains project intelligence: - **Pre-query context**: Know what you're looking for and why before searching—making results immediately interpretable - **Post-query memory**: Track which retrieved vectors led to good decisions, building wisdom over time - **Index-agnostic**: Whether you're using IVF, HNSW, or flat indexes, Stompy adds the same contextual layer - **Library-friendly**: FAISS is a library you integrate; Stompy adds memory without changing how you use FAISS

The combination respects FAISS's design philosophy: do one thing extremely well. FAISS handles similarity search at production scale. Stompy handles project memory. Together, they create contextually-aware retrieval without compromising on performance.

The gold standard for vectors, with gold-standard memory.

Integration Walkthrough

Set up FAISS with Stompy memory layer

Keep your optimized FAISS index, add Stompy for the context FAISS can't provide.

import faiss
import numpy as np
import httpx
import os

# FAISS: Your optimized vector index
# (You've already tuned this for your use case)
index = faiss.read_index("production_index.faiss")

# Optional: GPU acceleration
# gpu_res = faiss.StandardGpuResources()
# index = faiss.index_cpu_to_gpu(gpu_res, 0, index)

# Document store (FAISS only stores vectors, not documents)
with open("documents.json") as f:
    documents = json.load(f)

# Stompy: Project context layer
async def get_project_context(query: str) -> dict:
    """Semantic search for relevant project context."""
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "https://mcp.stompy.ai/sse",
            headers={"Authorization": f"Bearer {os.environ['STOMPY_TOKEN']}"},
            json={"tool": "context_search", "query": query, "limit": 3}
        )
        return response.json()

Context-enriched similarity search

Run FAISS for vector similarity, then combine with project context for informed responses.

async def smart_faiss_search(
    query_text: str,
    k: int = 10,
    nprobe: int = 64
):
    """FAISS similarity search with project context."""

    # Embed the query
    query_vector = embed(query_text).reshape(1, -1).astype('float32')

    # Configure search parameters (if using IVF)
    if hasattr(index, 'nprobe'):
        index.nprobe = nprobe

    # FAISS similarity search
    distances, indices = index.search(query_vector, k)

    # Retrieve actual documents
    retrieved_docs = [documents[i] for i in indices[0] if i >= 0]

    # Get project context from Stompy
    project_context = await get_project_context(query_text)
    relevant_contexts = project_context.get("contexts", [])

    # Build context-aware response
    return {
        "query": query_text,
        "documents": retrieved_docs,
        "distances": distances[0].tolist(),
        "project_context": relevant_contexts,
        "prompt": build_contextual_prompt(query_text, retrieved_docs, relevant_contexts)
    }

def build_contextual_prompt(query, docs, context):
    return f"""Project Context:
{context}

Retrieved Documents (by similarity):
{docs}

Question: {query}

Answer considering both the retrieved documents and our project's context:"""

Track retrieval quality over time

Save insights about which FAISS results were actually useful, building retrieval wisdom.

async def log_retrieval_feedback(
    query: str,
    useful_indices: list[int],
    not_useful_indices: list[int],
    notes: str = ""
):
    """Track which retrievals were useful for continuous improvement."""
    async with httpx.AsyncClient() as client:
        await client.post(
            "https://mcp.stompy.ai/sse",
            headers={"Authorization": f"Bearer {os.environ['STOMPY_TOKEN']}"},
            json={
                "tool": "lock_context",
                "topic": "retrieval_feedback",
                "content": f"""Retrieval Quality Feedback:
Query: "{query}"
Useful results: indices {useful_indices}
Not useful: indices {not_useful_indices}
Notes: {notes}

Timestamp: {datetime.now().isoformat()}

This feedback helps identify patterns in what makes retrieval useful for our project.""",
                "tags": "faiss,feedback,retrieval-quality"
            }
        )

# Example: After using search results
await log_retrieval_feedback(
    query="authentication middleware patterns",
    useful_indices=[0, 2, 4],  # First, third, fifth results were helpful
    not_useful_indices=[1, 3],  # Second, fourth were off-topic
    notes="Results about session auth were more relevant than JWT for our use case"
)

What You Get

Library-level performance: FAISS's optimized C++ core is untouched—Stompy adds context on top
Index-agnostic: Works with IVF, HNSW, PQ, flat indexes—any FAISS configuration
Retrieval learning: Track which results are useful over time, building institutional wisdom
GPU-compatible: If you're using FAISS GPU acceleration, Stompy doesn't interfere
Production-proven: Both FAISS and Stompy are built for real-world scale and reliability

Ready to give FAISS a memory?

Join the waitlist and be the first to know when Stompy is ready. Your FAISS projects will never forget again.

Related Integrations

Pinecone

RAG & Knowledge

Vector search meets project memory

Learn more

Qdrant

RAG & Knowledge

High-performance vectors with high-performance memory

Learn more

LlamaIndex

Agent Frameworks

Indexes that remember what they indexed

Learn more

View all integrations