Run any model with persistent context

Any model, persistent memory

The Problem

Replicate is the ultimate AI model marketplace. Run anything: Llama 3, Stable Diffusion XL, Whisper, CodeLlama, custom fine-tunes, experimental research models—if someone trained it, Replicate probably hosts it. One API to run any model, with automatic scaling and pay-per-second billing.

But model diversity doesn't mean memory diversity.

Here's the limitation: You're building an application that uses multiple models. Llama for text generation, SDXL for images, Whisper for transcription, a custom fine-tune for your specific domain. Each model call is completely independent. Your text model doesn't know what images were generated. Your image model doesn't know the conversation context. Your fine-tuned model starts fresh every time.

Replicate solved the infrastructure problem—run any model without managing GPUs. But they can't solve the coordination problem—how do multiple models share context about your project, your users, your accumulated knowledge?

When you're building complex AI applications, you need more than model diversity. You need memory that spans across models.

Any model should have any memory.

How Stompy Helps

Stompy gives your Replicate ecosystem unified, persistent memory.

Your diverse model portfolio gains shared intelligence: - **Cross-model context**: Text generation models know what images were created. Image models know the conversation history. Every model shares the same project memory. - **Experiment tracking**: Custom fine-tunes and experimental models log their outputs for future reference. Compare results across model versions. - **User continuity**: User preferences and history available to every model type—personalization that spans modalities - **Project knowledge**: Accumulated learnings from months of model interactions, accessible to any new model you add

Build applications where Llama generates descriptions, SDXL creates visuals, and your custom model does domain-specific processing—all sharing the same persistent context.

Run any model, remember everything.

Integration Walkthrough

Create a memory-enabled Replicate client

Build a wrapper that gives any Replicate model access to persistent project context.

import replicate
import httpx
import os
from typing import Any

class StompyReplicate:
    def __init__(self):
        self.stompy_url = "https://mcp.stompy.ai/sse"
        self.headers = {"Authorization": f"Bearer {os.environ['STOMPY_TOKEN']}"}

    async def get_context(self, topics: list[str]) -> str:
        """Retrieve context from Stompy for any model."""
        async with httpx.AsyncClient() as client:
            responses = await asyncio.gather(*[
                client.post(self.stompy_url, headers=self.headers,
                           json={"tool": "recall_context", "topic": t})
                for t in topics
            ])
        return "\n---\n".join([
            r.json().get("content", "") for r in responses if r.is_success
        ])

    async def search_context(self, query: str, limit: int = 5) -> list[dict]:
        """Semantic search for relevant context."""
        async with httpx.AsyncClient() as client:
            response = await client.post(
                self.stompy_url, headers=self.headers,
                json={"tool": "context_search", "query": query, "limit": limit}
            )
        return response.json().get("results", [])

client = StompyReplicate()

Run any model with project context

Give Llama, SDXL, Whisper, or any custom model access to persistent memory.

async def run_with_memory(
    model: str,
    input_params: dict,
    context_topics: list[str] = None,
    search_query: str = None
) -> Any:
    """Run any Replicate model with Stompy context."""

    # Get relevant context
    context = ""
    if context_topics:
        context = await client.get_context(context_topics)
    if search_query:
        results = await client.search_context(search_query)
        context += "\n---\n".join([r["content"] for r in results])

    # Inject context based on model type
    if "llama" in model.lower() or "instruct" in model.lower():
        # Language models: add to system prompt
        if context:
            input_params["system_prompt"] = f"""You have access to project memory:

{context}

Use this context to inform your responses."""

    elif "stable-diffusion" in model.lower() or "sdxl" in model.lower():
        # Image models: prepend style context to prompt
        style_context = await client.get_context(["brand_guidelines", "visual_style"])
        if style_context and "prompt" in input_params:
            input_params["prompt"] = f"{style_context}. {input_params['prompt']}"

    return replicate.run(model, input=input_params)

# Examples:
# Text generation with context
response = await run_with_memory(
    "meta/meta-llama-3-70b-instruct",
    {"prompt": "Explain our authentication system"},
    context_topics=["architecture", "auth_design"]
)

# Image generation with brand context
image = await run_with_memory(
    "stability-ai/sdxl",
    {"prompt": "A modern dashboard interface"},
    context_topics=["brand_guidelines"]
)

Track experiments and save outputs

Log model outputs for future reference and cross-model learning.

async def run_and_track(
    model: str,
    input_params: dict,
    experiment_name: str,
    context_topics: list[str] = None
) -> dict:
    """Run model and track results in Stompy for future reference."""

    output = await run_with_memory(model, input_params, context_topics)

    # Log the experiment
    async with httpx.AsyncClient() as http:
        await http.post(
            "https://mcp.stompy.ai/sse",
            headers={"Authorization": f"Bearer {os.environ['STOMPY_TOKEN']}"},
            json={
                "tool": "lock_context",
                "topic": f"experiment_{experiment_name}",
                "content": f"""Model: {model}
Input: {json.dumps(input_params, indent=2)}
Output: {str(output)[:2000]}  # Truncate large outputs
Timestamp: {datetime.now().isoformat()}""",
                "tags": f"experiment,model:{model.split('/')[-1]}",
                "priority": "reference"
            }
        )

    return {"output": output, "experiment_logged": True}

# Track fine-tune experiments
result = await run_and_track(
    "your-username/custom-fine-tune:version",
    {"prompt": "Test input for domain-specific task"},
    experiment_name="finetune_v3_test_01"
)

# Later: Compare experiments
experiments = await client.search_context("experiment finetune", limit=10)

What You Get

Universal memory: Same context layer works with any Replicate model—language, image, audio, video, custom
Cross-modal context: Text models know about generated images. Image models know conversation history. Full coordination.
Experiment tracking: Log outputs from custom fine-tunes and research models for comparison and iteration
Brand consistency: Image generation models automatically inherit brand guidelines and visual style context
Future-proof: As new models appear on Replicate, they instantly inherit your existing project memory

Ready to give Replicate a memory?

Join the waitlist and be the first to know when Stompy is ready. Your Replicate projects will never forget again.

Related Integrations

Together AI

Vertical & Specialized

Open models with persistent memory

Learn more

Groq

Vertical & Specialized

Lightning-fast inference with persistent context

Learn more

OpenAI API

Vertical & Specialized

GPT models with persistent memory

Learn more

View all integrations