The Memory Plugin: Giving AI Agents a Brain That Persists

Daita's Memory Plugin equips AI agents with durable, intelligent memory featuring hybrid semantic search, importance scoring, and automatic curation across both local and hosted environments.

Daita Team

February 16, 2026

One of the most fundamental limitations of today's AI agents is that they forget everything the moment a session ends. Each new conversation begins from a blank slate with no knowledge of past decisions, no awareness of user preferences, no memory of what was tried and failed before. For simple, "one shot" tasks this is acceptable. But for agents operating over days, weeks, or months on complex, evolving projects, statelessness is a serious architectural problem.

Our Daita "Memory Plugin" solves this. It gives agents a durable, queryable, intelligent memory layer that persists across sessions, surfaces relevant context automatically, and gets smarter over time through an automated curation pipeline. Within this post we will discuss the storage architecture, the scoring system, the curation lifecycle, and what makes it different in both local development and hosted production environments.


The Core Problem: Why Agents Forget

Before getting into how the Memory Plugin works, it's worth being precise about why stateless agents are problematic for extended usage periods.

Every LLM has a context window, a finite buffer of tokens it can consider at once. Within a single session, an agent can "remember" by carrying information forward in the prompt. But once the session ends, that context is gone. The next session starts cold.

This creates several concrete failure modes:

  • Repeated mistakes: An agent that learned a constraint ("don't use endpoint X, it's deprecated") forgets it next session and has to rediscover it the hard way.
  • Redundant work: An agent that spent time researching a topic will repeat the research when the same topic comes up again.
  • Lost preferences: User preferences and style choices need to be re-established every time.
  • No learning curve: The agent never improves. It doesn't accumulate institutional knowledge.

The naive fix: stuffing everything into the context window — doesn't scale. Token limits are real, costs are real, and injecting irrelevant history into every prompt adds noise and degrades output quality.

What was needed is something more discriminating: a system that stores information durably, retrieves only what's relevant to the current task, and maintains itself over time.


Architecture Overview

The Memory Plugin operates across three distinct layers that work together to form a complete memory lifecycle:

┌─────────────────────────────────────────────────────────────┐
│                        Agent Session                        │
│                                                             │
│  on_before_run() ──→ Auto-inject relevant memories          │
│                       into system prompt                    │
│                                                             │
│  During execution:                                          │
│  • remember(content, importance, category)                  │
│  • recall(query, limit, score_threshold)                    │
│  • update_memory(query, new_content)                        │
│  • read_memory() / list_memories()                          │
│                                                             │
│  on_agent_stop() ──→ Trigger curation pipeline              │
└─────────────────────────────────────────────────────────────┘
            │                           │
            ▼                           ▼
   ┌─────────────────┐        ┌──────────────────────┐
   │   Vector Store  │        │  Daily Activity Log  │
   │  (embeddings +  │        │  (raw session notes, │
   │   metadata)     │        │   timestamped)       │
   └─────────────────┘        └──────────────────────┘
            │                           │
            └────────────┬──────────────┘
                         ▼
               ┌──────────────────────┐
               │   MEMORY.md          │
               │  (local dev only:    │
               │   curated, clean     │
               │   long-term summary) │
               └──────────────────────┘

Two storage artifacts, two purposes:

  1. Vector Store: Immediate, queryable memory. Every remember() call lands here. Used for semantic search during recall.
  2. Daily Activity Log: A timestamped markdown log that records what happened during agent sessions. This feeds the curation pipeline.

In local development, a third artifact is also maintained: MEMORY.md a curated, human readable long-term summary regenerated from the vector store after each curation run. In the hosted environment, memories live in the vector database directly and are queried there rather than through a flat file.

Because memory is implemented as a plugin, it integrates directly into the agent's lifecycle. Context injection happens automatically before the first LLM turn, and curation fires when the agent stops, no manual wiring required. The same plugin code also runs on a local SQLite store during development or against a managed vector database in production, with no changes required from the developer.


Two Environments: Local and Hosted

The Memory Plugin ships with support for two deployment contexts. The same agent code works in both, the backend switches automatically based on the runtime environment.

Local Development

In local mode, the memory system runs entirely on your machine with no external dependencies.

  • Vector storage: SQLite database with cosine similarity search
  • File storage: Local filesystem under .daita/memory/
  • Scope: Project-scoped by default (each project has isolated memory), with opt in global scope shared across all projects
.daita/
└── memory/
    └── workspaces/
        └── {workspace_name}/
            ├── vectors.db       # SQLite vector store
            ├── MEMORY.md        # Curated long-term summary
            └── logs/
                ├── 2026-02-15.md
                └── 2026-02-16.md

This setup is self contained, deterministic, and easy to inspect. You can open MEMORY.md or the daily logs at any time to see exactly what the agent knows. It's the ideal environment for development, testing, and single user workflows.

Hosted / Cloud

In production (cli command daita push), the backend upgrades to a managed infrastructure:

  • Vector storage: Vector store with nearest-neighbor search, memories are stored and queried here directly
  • Daily log storage: Cloud object storage for append only session activity logs, which feed the curation pipeline
  • Scope: Both project scoped and global memory are supported
  • Concurrency: Database transactions ensure concurrent writes from multiple invocations are safe

The hosted backend provides O(log N) recall performance that scales as memory grows. It also enables workspace sharing (multiple agents can read from and write to the same memory store), enabling multi-agent systems where agents build on each other's knowledge.

# Two agents sharing a workspace
agent1.add_plugin(MemoryPlugin(workspace="research_team"))
agent2.add_plugin(MemoryPlugin(workspace="research_team"))

When agents share a workspace, attribution is tracked in the daily logs so the curator knows which agent recorded what.


Memory Tools: The Agent's Interface

From the agent's perspective, memory is accessed through a small, focused set of tools that the LLM can invoke directly.

remember(content, importance, category)

Stores a piece of information immediately in the vector store.

# Storing a critical architectural decision
await agent.remember(
    content="Rate limiting is set to 100 requests/min per user using a token bucket algorithm.",
    importance=0.9,
    category="decision"
)

# Storing a user preference
await agent.remember(
    content="User prefers concise email replies under 200 words.",
    importance=0.6,
    category="preference"
)

Before storing, the system performs a deduplication check, if a near-identical memory already exists, the new write is skipped silently. This prevents the same fact from being stored dozens of times across repeated agent runs.

recall(query, limit, score_threshold)

Searches the vector store using hybrid semantic + keyword search.

results = await agent.recall(
    query="What did we decide about authentication?",
    limit=5,
    score_threshold=0.6
)

Results come back ranked by relevance, with each result including the content, a relevance score, importance score, category, and when the memory was created. Agents can also filter by importance range which is useful when you only want to surface high stakes information.

update_memory(query, new_content)

Replaces an existing memory. Useful when a fact has changed, a deadline moved, a decision was reversed, a bug was fixed.

await agent.update_memory(
    query="database migration deadline",
    new_content="Database migration deadline moved to 2026-03-15 due to schema review.",
    importance=0.9
)

read_memory() and list_memories()

Provides direct access to today's activity log and, in local development, the MEMORY.md summary. Useful when an agent wants a broad view of its accumulated knowledge rather than a targeted search.


Automatic Memory Injection

One of the most powerful features of the Memory Plugin is that agents don't need to explicitly recall memories before every task. The plugin handles this automatically.

Before the first LLM turn in every agent run, on_before_run() fires. It performs a semantic search against the current prompt and injects the most relevant memories directly into the system prompt:

System: You are a helpful engineering assistant.

## Relevant Memory
- [decision] Rate limiting: 100 req/min per user (token bucket) (importance: 0.9)
- [contact] Sarah Chen (lead engineer) prefers async communication (importance: 0.7)
- [project] Auth flow redesign complete, 40% reduction in login latency (importance: 0.8)
- [preference] User prefers concise responses under 200 words (importance: 0.6)

User: Review the new authentication middleware and flag any issues.

The agent arrives at its first turn already equipped with the context it needs. No boilerplate, no manual recall calls, no risk of the agent forgetting to check its memory.

If the semantic search returns nothing (which can happen for broad, generic prompts like "check inbox"), the system falls back to injecting the top memories by importance score, ensuring the agent always has something useful to work with.


Importance Scoring

Not all memories are created equal. The importance score (a float from 0.0 to 1.0) is the system's primary signal for prioritizing what to surface, how strongly to weight it in search results, and when to prune it.

RangeLevelTypical Use Cases
0.9 – 1.0CriticalSecurity issues, hard deadlines, major architectural decisions, system failures
0.7 – 0.8ImportantKey facts, significant events, completed milestones
0.5 – 0.6UsefulGeneral context, background information, confirmed preferences
0.3 – 0.4LowMinor notes, speculative findings, exploratory observations
< 0.3MinimalRoutine queries, noise (candidates for pruning)

Importance is set by the agent at write time and can be updated later. The plugin also exposes higher level methods for common importance operations:

# Mark memories matching a query as high importance
await memory_plugin.mark_important(query="production database", importance=0.95)

# Pin a memory — immunity from pruning, maximum importance
await memory_plugin.pin(query="API key rotation schedule")

# Remove memories that are no longer relevant
await memory_plugin.forget(query="deprecated v1 API endpoints")

Pinning is the strongest signal you can give the system. A pinned memory is set to maximum importance, excluded from all pruning, and exempt from temporal decay, it scores at full weight in search results regardless of how old it is. Use it for facts that should never be forgotten: hard architectural constraints, standing user preferences, or anything that would be costly to rediscover.

How Importance Affects Search

Importance isn't just metadata, it actively influences recall results through a scoring adjustment applied after the raw relevance score is computed:

adjusted_score = (raw_relevance_score + importance_boost) × temporal_decay

Where:

  • importance_boost scales from −0.1 to +0.1 based on importance (0.5 is neutral)
  • temporal_decay starts at 1.0 and decreases with age, flooring at 0.7

This means a highly important memory that scores 0.7 semantically can outrank a low importance memory that scores 0.75 semantically. It also means old memories never completely disappear from results, the 0.7 floor ensures that a critical decision from a year ago is still surfaced when relevant.

Pinned memories are exempt from temporal decay entirely. They score at full weight regardless of age.


Hybrid Search: Semantic + Keyword

The recall mechanism combines two complementary search strategies into a single hybrid score.

Semantic Search (60% weight)

Each memory is embedded and stored as a vector. At recall time, the query is embedded and cosine similarity is computed against all stored vectors. This captures meaning , a query about "rate limiting" will match a memory about "throttling API calls" even if the words don't overlap.

Keyword Search (40% weight)

BM25 (Best Matching 25) is applied in parallel. BM25 is a probabilistic ranking algorithm that handles term frequency and document length normalization, well suited for the short, dense text that characterizes individual memories. Keyword search excels where semantic search struggles: exact version numbers, proper nouns, specific identifiers, and technical terms that embeddings can blur together.

Scoring Bonuses

Two additional bonuses are applied on top of the weighted combination:

  • Exact phrase match (+0.15): If two or more query words appear verbatim in the memory content, relevance gets a significant boost. This rewards memories that are a direct hit.
  • Consensus bonus (+0.10): If both the semantic component and the keyword component score above 0.5 independently, a bonus is added. Memories that rank well by both measures are more reliably relevant.

This hybrid approach outperforms either strategy alone, semantic search catches conceptual matches that keyword search misses, while keyword search catches exact matches that semantic search can dilute.


The Curation Pipeline

Raw memories stored by agents during a session are valuable but imperfect. Agents may store redundant facts, outdated information, or observations that are only temporarily relevant. Left unchecked, the memory store grows noisy and bloated.

The curation pipeline runs automatically when an agent session ends (on_agent_stop). Its job is to distill the raw activity logs into clean, durable, long-term knowledge.

Step 1: Read Recent Activity

The curator reads the daily log(s) from the most recent session(s). These logs contain timestamped, agent-attributed notes, a ground truth record of what happened.

Step 2: Extract Facts via LLM

An LLM (fast, low-cost model) reads the activity logs and extracts durable facts. It's explicitly instructed on what to include and what to ignore:

Include:

  • Decisions, preferences, and goals
  • Key domain knowledge and discoveries
  • Important relationships or contacts
  • Project status, deadlines, and commitments
  • Recurring patterns or insights
  • Critical action items

Exclude:

  • Routine queries with no lasting value
  • Time sensitive information (e.g., "meeting in 2 hours")
  • Small talk
  • Information already well represented in memory

Each extracted fact comes with a suggested category and importance score from the LLM.

Step 3: Classify Each Fact

For every extracted fact, the curator checks the existing memory store. Three outcomes are possible:

New fact: No similar memory exists. Store it.

Duplicate: A near-identical memory already exists (very high similarity). Skip it.

Gray zone (0.7–0.9 similarity): The new fact is related to an existing memory but not identical. A secondary LLM call decides: does the new fact supersede the old one, or does it add genuinely new information?

A fact supersedes when it:

  • Resolves or closes an issue described in the old memory
  • Corrects inaccurate information
  • Represents a later status update that makes the old one misleading

Otherwise, both are kept. Importantly, the system defaults to keeping both on any LLM error, avoiding accidental data loss.

Step 4: Consolidate Long Term Memory

After facts are classified and stored, the curator consolidates the memory store. In local development, this means regenerating MEMORY.md from scratch, pulling all current memories from the vector store, grouping them by category, and writing a clean, organized summary. In the hosted environment, memories are persisted directly in the vector database and queried there (no flat file summary is maintained).

Regenerating rather than appending is critical for local mode. An append only file would accumulate redundant, contradictory, and outdated entries over time. By regenerating from the deduplicated vector store, the file always reflects the current state of knowledge.

Step 5: Prune Low Value Memories

Finally, the curator prunes memories that have aged out of relevance:

  • Pinned: Never pruned
  • Age > 90 days AND importance < 0.3: Pruned
  • Age > 60 days AND never accessed AND importance < 0.5: Pruned early

Access counts matter here. A memory that's been recalled frequently is kept longer, even if its importance score is modest. The system rewards utility.

The full curation result is returned with metrics:

result = await memory_plugin.curate()
# CurationResult(
#   facts_extracted=12,
#   facts_added=8,
#   memories_updated=2,
#   memories_pruned=5,
#   existing_memories=147,
#   cost_usd=0.0003
# )

Memory Categories

The curator and agents organize memories into categories that enable category-based filtering in recall and, in local development, make the MEMORY.md summary navigable.

CategoryPurpose
securitySecurity issues, access control decisions, risk items
decisionArchitectural choices, strategic directions, commitments
knowledgeDomain facts, definitions, discovered patterns
contactPeople, teams, communication preferences
projectStatus updates, deadlines, milestones
eventSignificant occurrences, incidents, outcomes
preferenceUser preferences, stylistic choices, habitual patterns

Categories are optional, memories without one fall under "General" but they significantly improve organization at scale and enable precise filtering in recall.


Configuring the Plugin

The plugin is designed to work out of the box with sensible defaults, with escape hatches for advanced configuration.

from daita import Agent
from daita.plugins import MemoryPlugin

# Default — project-scoped, auto-curates when the agent stops
agent = Agent(name="My Agent", model="gpt-4o-mini")
agent.add_plugin(MemoryPlugin())

await agent.start()
await agent.run("Your task here...")
await agent.stop()  # curation runs here automatically
# Shared workspace for multi-agent systems
memory = MemoryPlugin(workspace="research_team")
agent1.add_plugin(memory)
agent2.add_plugin(memory)
# Global scope — shared across all projects in the org
agent.add_plugin(MemoryPlugin(scope="global"))
# Manual curation control with a custom curation model
memory = MemoryPlugin(
    auto_curate="manual",
    curation_model="gpt-4o",
)
agent.add_plugin(memory)

# Trigger curation explicitly when ready
result = await memory.curate()

Why This Matters for Agent Systems

Memory is not just a convenience feature. For agents operating in production on real world workflows, persistent memory changes the character of what's possible.

Agents that learn from mistakes. When an agent discovers that a particular approach doesn't work: a flaky API, a schema quirk, a constraint in the system, it stores that finding. Future runs don't repeat the error.

Agents with institutional knowledge. Over time, agents accumulate the kind of knowledge that human employees build over months: which systems are reliable, what the team's conventions are, how decisions were made and why. This knowledge compounds.

Agents that respect user preferences. Communication style, formatting preferences, workflow preferences, these get stored once and respected automatically in every subsequent interaction.

Multi-agent collaboration. When multiple agents share a workspace, they can build on each other's findings. A research agent's discoveries can directly inform a writing agent's output, without any explicit handoff code.

Reduced onboarding cost. When a new agent is introduced to an existing project, it can query the accumulated memory store and get up to speed on the project's history, conventions, and current state without the human needing to re-explain everything.


Conclusion

The Memory Plugin transforms AI agents from stateless request handlers into systems that genuinely accumulate knowledge over time. The combination of immediate vector backed storage, hybrid semantic + keyword search, importance weighted ranking, temporal decay, and an automated curation pipeline produces a memory system that is both powerful and self-maintaining.

Agents that use it arrive at every session equipped with the context they've earned. They store what matters, surface what's relevant, and shed what's no longer useful...automatically.

The plugin ships as part of the Daita SDK and works in both local and hosted environments with zero configuration changes required between environments.

For future iterations of this system, the Daita team will be experimenting with new data security methods to ensure that developers can sleep soundly knowing their data, their customers data, or any critical information is securely managed.

Daita Logo

Production AI agent framework with zero-config observability and managed cloud deployment.

© 2026 Daita Corp. All rights reserved.