High-performance vectors with high-performance memory
Fast vectors, persistent context
The Problem
Qdrant is performance obsessed. Written in Rust, optimized from the ground up, deployed at companies that measure latency in microseconds. Your vector searches return before you finish blinking.
But fast retrieval without context is just fast confusion.
Qdrant finds the five most similar documents in 2ms. It has no idea why you're asking. It doesn't know that three of those documents describe approaches you already rejected, or that the top result contradicts a decision you made yesterday. The speed is impressive; the contextual awareness is nil.
You've optimized your Qdrant deployment for production. Quantization tuned, HNSW parameters dialed in, filtering indexes configured. You're serving a thousand queries per second with sub-10ms p99 latency.
And every single query starts from zero context.
Your search is fast enough to answer questions in real-time. But "fast" doesn't mean "smart." Without project context, you're just efficiently retrieving potentially irrelevant documents. Speed without meaning is just high-performance confusion.
How Stompy Helps
Stompy adds the context layer that transforms fast search into smart search.
Your high-performance pipeline gains situational awareness: - **Millisecond context recall**: Stompy's retrieval is fast enough to match Qdrant's speed—context doesn't become a bottleneck - **Pre-filter intelligence**: Know which types of results are useful before you even query, reducing wasted retrieval - **Decision velocity**: When you're serving thousands of queries per second, project context helps each one be meaningful - **Performance-conscious architecture**: Stompy adds memory without adding latency that matters
The combination is powerful: Qdrant's Rust-speed vector search with Stompy's project-aware context. Your production RAG pipeline doesn't just return fast results—it returns fast, relevant, context-informed results.
Production speed with production memory.
Integration Walkthrough
Set up high-performance dual memory
Connect Qdrant for lightning-fast vector search and Stompy for project context—both optimized for production.
from qdrant_client import QdrantClientfrom qdrant_client.models import Filter, FieldCondition, MatchValueimport httpximport os# Qdrant: High-performance vector searchqdrant = QdrantClient(host="localhost",port=6333,prefer_grpc=True # gRPC for maximum speed)# Stompy: Fast context retrievalasync def get_context_fast(topic: str) -> str:"""Retrieve project context with minimal latency."""async with httpx.AsyncClient(timeout=5.0) as client:response = await client.post("https://mcp.stompy.ai/sse",headers={"Authorization": f"Bearer {os.environ['STOMPY_TOKEN']}"},json={"tool": "recall_context", "topic": topic, "preview_only": True})return response.json().get("content", "")
Context-informed vector search
Use project context to inform what you're searching for and how to interpret results.
async def smart_search(user_query: str, use_context: bool = True):"""High-performance search with optional project context."""# Parallel retrieval: Qdrant search + Stompy contextimport asyncioasync def qdrant_search():return qdrant.search(collection_name="documents",query_vector=embed(user_query),limit=10,with_payload=True,score_threshold=0.7)async def stompy_context():if not use_context:return ""return await get_context_fast("project_architecture")# Execute in parallel for minimum latencysearch_results, project_context = await asyncio.gather(qdrant_search(),stompy_context())# Build context-aware responsedocs = [hit.payload["text"] for hit in search_results]return {"documents": docs,"project_context": project_context,"query": user_query}
Track high-volume query patterns
When serving thousands of queries, save aggregate insights about what your users are actually looking for.
from collections import Counterimport asyncioquery_patterns = Counter()async def track_and_search(user_query: str):"""Track query patterns while searching."""# Categorize query (simple example)category = categorize_query(user_query) # Your categorization logicquery_patterns[category] += 1# Periodically save insights to Stompyif sum(query_patterns.values()) % 1000 == 0:asyncio.create_task(save_query_insights())return await smart_search(user_query)async def save_query_insights():"""Save aggregated query insights."""async with httpx.AsyncClient() as client:await client.post("https://mcp.stompy.ai/sse",headers={"Authorization": f"Bearer {os.environ['STOMPY_TOKEN']}"},json={"tool": "lock_context","topic": "query_patterns_analysis","content": f"""Query Pattern Analysis:Top categories: {query_patterns.most_common(10)}Total queries: {sum(query_patterns.values())}Timestamp: {datetime.now().isoformat()}Insights: Most users ask about {query_patterns.most_common(1)[0][0]}""","tags": "analytics,qdrant,patterns"})
What You Get
- Parallel retrieval: Fetch Qdrant results and Stompy context simultaneously—no added latency
- Production-grade: Both systems are built for high-throughput, low-latency workloads
- Pattern learning: Aggregate insights from high-volume queries to improve over time
- Rust + cloud: Qdrant's Rust performance with Stompy's managed memory infrastructure
- Scale-ready: Architecture that works at 10 QPS or 10,000 QPS
Ready to give Qdrant a memory?
Join the waitlist and be the first to know when Stompy is ready. Your Qdrant projects will never forget again.