GPT models with persistent memory

GPT power with persistent context

The Problem

The OpenAI API is the most widely deployed AI API in the world. GPT-4o, GPT-4 Turbo, o1, function calling, embeddings—if you're building with AI, odds are you've written code that calls OpenAI at some point.

But the API is designed for statelessness.

Every API call is independent. The conversation history you include in the messages array is the only context the model sees. Close your application, restart your server, wait five minutes—GPT has absolutely no memory of previous interactions.

You've built an AI-powered feature. It works great within a session. Users are impressed. Then they come back tomorrow and discover the AI has complete amnesia. "Remember when we discussed..." No, GPT does not remember. GPT will never remember. That's not how the API works.

Your application is responsible for managing conversation history. Most teams max out at "store the last N messages in a session." Project context, past decisions, learned patterns? Not in the messages array. Lost.

The world's most popular AI API has the world's shortest memory span.

How Stompy Helps

Stompy gives your OpenAI integration the memory layer it's missing.

Your GPT-powered applications gain true persistence: - **Beyond conversation history**: Not just recent messages, but project decisions, user preferences, and institutional knowledge - **Cross-session continuity**: Users pick up where they left off, whether it's been minutes or months - **Function calling enhanced**: Your GPT functions can retrieve and store context as part of their execution - **Cost-efficient context**: Load only relevant context via semantic search, not your entire conversation history

The integration pattern is simple: before calling OpenAI, fetch relevant context from Stompy. Include it in the system message or as context in user messages. Your GPT responses become contextually aware without changing your existing OpenAI code.

The API OpenAI provides, with the memory they don't.

Integration Walkthrough

Create a Stompy-enhanced OpenAI client

Build a wrapper that automatically enriches OpenAI calls with project context.

from openai import AsyncOpenAI
import httpx
import os
from typing import Optional

# Initialize OpenAI client
openai_client = AsyncOpenAI()

# Stompy context helpers
STOMPY_URL = "https://mcp.stompy.ai/sse"
STOMPY_HEADERS = {"Authorization": f"Bearer {os.environ['STOMPY_TOKEN']}"}

async def fetch_stompy_context(topic: str) -> str:
    """Retrieve specific context by topic."""
    async with httpx.AsyncClient() as client:
        response = await client.post(
            STOMPY_URL,
            headers=STOMPY_HEADERS,
            json={"tool": "recall_context", "topic": topic}
        )
        return response.json().get("content", "")

async def search_stompy_context(query: str, limit: int = 3) -> list[str]:
    """Semantic search for relevant context."""
    async with httpx.AsyncClient() as client:
        response = await client.post(
            STOMPY_URL,
            headers=STOMPY_HEADERS,
            json={"tool": "context_search", "query": query, "limit": limit}
        )
        contexts = response.json().get("contexts", [])
        return [c["content"] for c in contexts]

Context-aware chat completions

Enrich your chat completion calls with project context fetched from Stompy.

async def smart_gpt_call(
    user_message: str,
    conversation_history: list[dict] = None,
    model: str = "gpt-4o"
) -> str:
    """Chat completion with persistent project context."""

    # Fetch context in parallel
    import asyncio
    project_rules, tech_context, relevant_context = await asyncio.gather(
        fetch_stompy_context("project_rules"),
        fetch_stompy_context("tech_stack"),
        search_stompy_context(user_message, limit=3)
    )

    # Build enriched system prompt
    system_content = f"""You are an AI assistant with access to project context.

PROJECT RULES:
{project_rules or 'No specific rules defined.'}

TECHNICAL CONTEXT:
{tech_context or 'No technical context available.'}

RELEVANT PREVIOUS CONTEXT:
{chr(10).join(relevant_context) if relevant_context else 'No relevant previous context.'}

Use this context to provide responses consistent with established patterns."""

    # Build messages array
    messages = [{"role": "system", "content": system_content}]
    if conversation_history:
        messages.extend(conversation_history)
    messages.append({"role": "user", "content": user_message})

    # Make the OpenAI call
    response = await openai_client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.7
    )

    return response.choices[0].message.content

Save decisions from GPT conversations

When GPT helps make decisions, persist them to Stompy for future reference.

async def save_gpt_insight(
    topic: str,
    content: str,
    tags: str = "openai,decisions"
):
    """Persist insights from GPT conversations."""
    async with httpx.AsyncClient() as client:
        await client.post(
            STOMPY_URL,
            headers=STOMPY_HEADERS,
            json={
                "tool": "lock_context",
                "topic": topic,
                "content": content,
                "tags": tags
            }
        )

# Example usage: After GPT helps decide on database approach
await save_gpt_insight(
    topic="database_architecture",
    content="""Database Decision (with GPT-4o analysis):

Choice: PostgreSQL with pgvector for RAG
Rationale:
- Team has Postgres expertise
- pgvector avoids separate vector DB
- ACID compliance important for our use case
- Cost-effective for our scale

Alternatives considered:
- Pinecone (rejected: additional cost and complexity)
- MongoDB Atlas (rejected: weaker transaction support)""",
    tags="database,architecture,decisions"
)

What You Get

Universal compatibility: Works with GPT-4o, GPT-4 Turbo, o1, and any future OpenAI models
Zero breaking changes: Add context enrichment to existing OpenAI code without refactoring
Cost-efficient: Semantic search loads only relevant context, not entire conversation history
Function calling ready: Stompy context can be passed to function calls for smarter tool use
Multi-user support: Different users or projects get different context automatically

Ready to give OpenAI API a memory?

Join the waitlist and be the first to know when Stompy is ready. Your OpenAI API projects will never forget again.

Related Integrations

Anthropic Claude API

Vertical & Specialized

Claude with memory that matches its intelligence

Learn more

Vercel AI SDK

Vertical & Specialized

Edge-ready AI with persistent memory

Learn more

LangChain

Agent Frameworks

Chains that remember every link

Learn more

View all integrations