🚀
Qdrant+Stompy

High-performance vectors with high-performance memory

Fast vectors, persistent context

The Problem

Qdrant is performance obsessed. Written in Rust, optimized from the ground up, deployed at companies that measure latency in microseconds. Your vector searches return before you finish blinking.

But fast retrieval without context is just fast confusion.

Qdrant finds the five most similar documents in 2ms. It has no idea why you're asking. It doesn't know that three of those documents describe approaches you already rejected, or that the top result contradicts a decision you made yesterday. The speed is impressive; the contextual awareness is nil.

You've optimized your Qdrant deployment for production. Quantization tuned, HNSW parameters dialed in, filtering indexes configured. You're serving a thousand queries per second with sub-10ms p99 latency.

And every single query starts from zero context.

Your search is fast enough to answer questions in real-time. But "fast" doesn't mean "smart." Without project context, you're just efficiently retrieving potentially irrelevant documents. Speed without meaning is just high-performance confusion.

How Stompy Helps

Stompy adds the context layer that transforms fast search into smart search.

Your high-performance pipeline gains situational awareness: - **Millisecond context recall**: Stompy's retrieval is fast enough to match Qdrant's speed—context doesn't become a bottleneck - **Pre-filter intelligence**: Know which types of results are useful before you even query, reducing wasted retrieval - **Decision velocity**: When you're serving thousands of queries per second, project context helps each one be meaningful - **Performance-conscious architecture**: Stompy adds memory without adding latency that matters

The combination is powerful: Qdrant's Rust-speed vector search with Stompy's project-aware context. Your production RAG pipeline doesn't just return fast results—it returns fast, relevant, context-informed results.

Production speed with production memory.

Integration Walkthrough

1

Set up high-performance dual memory

Connect Qdrant for lightning-fast vector search and Stompy for project context—both optimized for production.

from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue
import httpx
import os
# Qdrant: High-performance vector search
qdrant = QdrantClient(
host="localhost",
port=6333,
prefer_grpc=True # gRPC for maximum speed
)
# Stompy: Fast context retrieval
async def get_context_fast(topic: str) -> str:
"""Retrieve project context with minimal latency."""
async with httpx.AsyncClient(timeout=5.0) as client:
response = await client.post(
"https://mcp.stompy.ai/sse",
headers={"Authorization": f"Bearer {os.environ['STOMPY_TOKEN']}"},
json={"tool": "recall_context", "topic": topic, "preview_only": True}
)
return response.json().get("content", "")
2

Context-informed vector search

Use project context to inform what you're searching for and how to interpret results.

async def smart_search(user_query: str, use_context: bool = True):
"""High-performance search with optional project context."""
# Parallel retrieval: Qdrant search + Stompy context
import asyncio
async def qdrant_search():
return qdrant.search(
collection_name="documents",
query_vector=embed(user_query),
limit=10,
with_payload=True,
score_threshold=0.7
)
async def stompy_context():
if not use_context:
return ""
return await get_context_fast("project_architecture")
# Execute in parallel for minimum latency
search_results, project_context = await asyncio.gather(
qdrant_search(),
stompy_context()
)
# Build context-aware response
docs = [hit.payload["text"] for hit in search_results]
return {
"documents": docs,
"project_context": project_context,
"query": user_query
}
3

Track high-volume query patterns

When serving thousands of queries, save aggregate insights about what your users are actually looking for.

from collections import Counter
import asyncio
query_patterns = Counter()
async def track_and_search(user_query: str):
"""Track query patterns while searching."""
# Categorize query (simple example)
category = categorize_query(user_query) # Your categorization logic
query_patterns[category] += 1
# Periodically save insights to Stompy
if sum(query_patterns.values()) % 1000 == 0:
asyncio.create_task(save_query_insights())
return await smart_search(user_query)
async def save_query_insights():
"""Save aggregated query insights."""
async with httpx.AsyncClient() as client:
await client.post(
"https://mcp.stompy.ai/sse",
headers={"Authorization": f"Bearer {os.environ['STOMPY_TOKEN']}"},
json={
"tool": "lock_context",
"topic": "query_patterns_analysis",
"content": f"""Query Pattern Analysis:
Top categories: {query_patterns.most_common(10)}
Total queries: {sum(query_patterns.values())}
Timestamp: {datetime.now().isoformat()}
Insights: Most users ask about {query_patterns.most_common(1)[0][0]}""",
"tags": "analytics,qdrant,patterns"
}
)

What You Get

  • Parallel retrieval: Fetch Qdrant results and Stompy context simultaneously—no added latency
  • Production-grade: Both systems are built for high-throughput, low-latency workloads
  • Pattern learning: Aggregate insights from high-volume queries to improve over time
  • Rust + cloud: Qdrant's Rust performance with Stompy's managed memory infrastructure
  • Scale-ready: Architecture that works at 10 QPS or 10,000 QPS

Ready to give Qdrant a memory?

Join the waitlist and be the first to know when Stompy is ready. Your Qdrant projects will never forget again.