GPT models with persistent memory
GPT power with persistent context
The Problem
The OpenAI API is the most widely deployed AI API in the world. GPT-4o, GPT-4 Turbo, o1, function calling, embeddings—if you're building with AI, odds are you've written code that calls OpenAI at some point.
But the API is designed for statelessness.
Every API call is independent. The conversation history you include in the messages array is the only context the model sees. Close your application, restart your server, wait five minutes—GPT has absolutely no memory of previous interactions.
You've built an AI-powered feature. It works great within a session. Users are impressed. Then they come back tomorrow and discover the AI has complete amnesia. "Remember when we discussed..." No, GPT does not remember. GPT will never remember. That's not how the API works.
Your application is responsible for managing conversation history. Most teams max out at "store the last N messages in a session." Project context, past decisions, learned patterns? Not in the messages array. Lost.
The world's most popular AI API has the world's shortest memory span.
How Stompy Helps
Stompy gives your OpenAI integration the memory layer it's missing.
Your GPT-powered applications gain true persistence: - **Beyond conversation history**: Not just recent messages, but project decisions, user preferences, and institutional knowledge - **Cross-session continuity**: Users pick up where they left off, whether it's been minutes or months - **Function calling enhanced**: Your GPT functions can retrieve and store context as part of their execution - **Cost-efficient context**: Load only relevant context via semantic search, not your entire conversation history
The integration pattern is simple: before calling OpenAI, fetch relevant context from Stompy. Include it in the system message or as context in user messages. Your GPT responses become contextually aware without changing your existing OpenAI code.
The API OpenAI provides, with the memory they don't.
Integration Walkthrough
Create a Stompy-enhanced OpenAI client
Build a wrapper that automatically enriches OpenAI calls with project context.
from openai import AsyncOpenAIimport httpximport osfrom typing import Optional# Initialize OpenAI clientopenai_client = AsyncOpenAI()# Stompy context helpersSTOMPY_URL = "https://mcp.stompy.ai/sse"STOMPY_HEADERS = {"Authorization": f"Bearer {os.environ['STOMPY_TOKEN']}"}async def fetch_stompy_context(topic: str) -> str:"""Retrieve specific context by topic."""async with httpx.AsyncClient() as client:response = await client.post(STOMPY_URL,headers=STOMPY_HEADERS,json={"tool": "recall_context", "topic": topic})return response.json().get("content", "")async def search_stompy_context(query: str, limit: int = 3) -> list[str]:"""Semantic search for relevant context."""async with httpx.AsyncClient() as client:response = await client.post(STOMPY_URL,headers=STOMPY_HEADERS,json={"tool": "context_search", "query": query, "limit": limit})contexts = response.json().get("contexts", [])return [c["content"] for c in contexts]
Context-aware chat completions
Enrich your chat completion calls with project context fetched from Stompy.
async def smart_gpt_call(user_message: str,conversation_history: list[dict] = None,model: str = "gpt-4o") -> str:"""Chat completion with persistent project context."""# Fetch context in parallelimport asyncioproject_rules, tech_context, relevant_context = await asyncio.gather(fetch_stompy_context("project_rules"),fetch_stompy_context("tech_stack"),search_stompy_context(user_message, limit=3))# Build enriched system promptsystem_content = f"""You are an AI assistant with access to project context.PROJECT RULES:{project_rules or 'No specific rules defined.'}TECHNICAL CONTEXT:{tech_context or 'No technical context available.'}RELEVANT PREVIOUS CONTEXT:{chr(10).join(relevant_context) if relevant_context else 'No relevant previous context.'}Use this context to provide responses consistent with established patterns."""# Build messages arraymessages = [{"role": "system", "content": system_content}]if conversation_history:messages.extend(conversation_history)messages.append({"role": "user", "content": user_message})# Make the OpenAI callresponse = await openai_client.chat.completions.create(model=model,messages=messages,temperature=0.7)return response.choices[0].message.content
Save decisions from GPT conversations
When GPT helps make decisions, persist them to Stompy for future reference.
async def save_gpt_insight(topic: str,content: str,tags: str = "openai,decisions"):"""Persist insights from GPT conversations."""async with httpx.AsyncClient() as client:await client.post(STOMPY_URL,headers=STOMPY_HEADERS,json={"tool": "lock_context","topic": topic,"content": content,"tags": tags})# Example usage: After GPT helps decide on database approachawait save_gpt_insight(topic="database_architecture",content="""Database Decision (with GPT-4o analysis):Choice: PostgreSQL with pgvector for RAGRationale:- Team has Postgres expertise- pgvector avoids separate vector DB- ACID compliance important for our use case- Cost-effective for our scaleAlternatives considered:- Pinecone (rejected: additional cost and complexity)- MongoDB Atlas (rejected: weaker transaction support)""",tags="database,architecture,decisions")
What You Get
- Universal compatibility: Works with GPT-4o, GPT-4 Turbo, o1, and any future OpenAI models
- Zero breaking changes: Add context enrichment to existing OpenAI code without refactoring
- Cost-efficient: Semantic search loads only relevant context, not entire conversation history
- Function calling ready: Stompy context can be passed to function calls for smarter tool use
- Multi-user support: Different users or projects get different context automatically
Ready to give OpenAI API a memory?
Join the waitlist and be the first to know when Stompy is ready. Your OpenAI API projects will never forget again.