Daita's Memory Plugin equips AI agents with durable, intelligent memory featuring hybrid semantic search, importance scoring, and automatic curation across both local and hosted environments.
Daita Team
February 16, 2026
One of the most fundamental limitations of today's AI agents is that they forget everything the moment a session ends. Each new conversation begins from a blank slate with no knowledge of past decisions, no awareness of user preferences, no memory of what was tried and failed before. For simple, "one shot" tasks this is acceptable. But for agents operating over days, weeks, or months on complex, evolving projects, statelessness is a serious architectural problem.
Our Daita "Memory Plugin" solves this. It gives agents a durable, queryable, intelligent memory layer that persists across sessions, surfaces relevant context automatically, and gets smarter over time through an automated curation pipeline. Within this post we will discuss the storage architecture, the scoring system, the curation lifecycle, and what makes it different in both local development and hosted production environments.
Before getting into how the Memory Plugin works, it's worth being precise about why stateless agents are problematic for extended usage periods.
Every LLM has a context window, a finite buffer of tokens it can consider at once. Within a single session, an agent can "remember" by carrying information forward in the prompt. But once the session ends, that context is gone. The next session starts cold.
This creates several concrete failure modes:
The naive fix: stuffing everything into the context window — doesn't scale. Token limits are real, costs are real, and injecting irrelevant history into every prompt adds noise and degrades output quality.
What was needed is something more discriminating: a system that stores information durably, retrieves only what's relevant to the current task, and maintains itself over time.
The Memory Plugin operates across three distinct layers that work together to form a complete memory lifecycle:
┌─────────────────────────────────────────────────────────────┐
│ Agent Session │
│ │
│ on_before_run() ──→ Auto-inject relevant memories │
│ into system prompt │
│ │
│ During execution: │
│ • remember(content, importance, category) │
│ • recall(query, limit, score_threshold) │
│ • update_memory(query, new_content) │
│ • read_memory() / list_memories() │
│ │
│ on_agent_stop() ──→ Trigger curation pipeline │
└─────────────────────────────────────────────────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌──────────────────────┐
│ Vector Store │ │ Daily Activity Log │
│ (embeddings + │ │ (raw session notes, │
│ metadata) │ │ timestamped) │
└─────────────────┘ └──────────────────────┘
│ │
└────────────┬──────────────┘
▼
┌──────────────────────┐
│ MEMORY.md │
│ (local dev only: │
│ curated, clean │
│ long-term summary) │
└──────────────────────┘
Two storage artifacts, two purposes:
remember() call lands here. Used for semantic search during recall.In local development, a third artifact is also maintained: MEMORY.md a curated, human readable long-term summary regenerated from the vector store after each curation run. In the hosted environment, memories live in the vector database directly and are queried there rather than through a flat file.
Because memory is implemented as a plugin, it integrates directly into the agent's lifecycle. Context injection happens automatically before the first LLM turn, and curation fires when the agent stops, no manual wiring required. The same plugin code also runs on a local SQLite store during development or against a managed vector database in production, with no changes required from the developer.
The Memory Plugin ships with support for two deployment contexts. The same agent code works in both, the backend switches automatically based on the runtime environment.
In local mode, the memory system runs entirely on your machine with no external dependencies.
.daita/memory/.daita/
└── memory/
└── workspaces/
└── {workspace_name}/
├── vectors.db # SQLite vector store
├── MEMORY.md # Curated long-term summary
└── logs/
├── 2026-02-15.md
└── 2026-02-16.md
This setup is self contained, deterministic, and easy to inspect. You can open MEMORY.md or the daily logs at any time to see exactly what the agent knows. It's the ideal environment for development, testing, and single user workflows.
In production (cli command daita push), the backend upgrades to a managed infrastructure:
The hosted backend provides O(log N) recall performance that scales as memory grows. It also enables workspace sharing (multiple agents can read from and write to the same memory store), enabling multi-agent systems where agents build on each other's knowledge.
# Two agents sharing a workspace
agent1.add_plugin(MemoryPlugin(workspace="research_team"))
agent2.add_plugin(MemoryPlugin(workspace="research_team"))
When agents share a workspace, attribution is tracked in the daily logs so the curator knows which agent recorded what.
From the agent's perspective, memory is accessed through a small, focused set of tools that the LLM can invoke directly.
remember(content, importance, category)Stores a piece of information immediately in the vector store.
# Storing a critical architectural decision
await agent.remember(
content="Rate limiting is set to 100 requests/min per user using a token bucket algorithm.",
importance=0.9,
category="decision"
)
# Storing a user preference
await agent.remember(
content="User prefers concise email replies under 200 words.",
importance=0.6,
category="preference"
)
Before storing, the system performs a deduplication check, if a near-identical memory already exists, the new write is skipped silently. This prevents the same fact from being stored dozens of times across repeated agent runs.
recall(query, limit, score_threshold)Searches the vector store using hybrid semantic + keyword search.
results = await agent.recall(
query="What did we decide about authentication?",
limit=5,
score_threshold=0.6
)
Results come back ranked by relevance, with each result including the content, a relevance score, importance score, category, and when the memory was created. Agents can also filter by importance range which is useful when you only want to surface high stakes information.
update_memory(query, new_content)Replaces an existing memory. Useful when a fact has changed, a deadline moved, a decision was reversed, a bug was fixed.
await agent.update_memory(
query="database migration deadline",
new_content="Database migration deadline moved to 2026-03-15 due to schema review.",
importance=0.9
)
read_memory() and list_memories()Provides direct access to today's activity log and, in local development, the MEMORY.md summary. Useful when an agent wants a broad view of its accumulated knowledge rather than a targeted search.
One of the most powerful features of the Memory Plugin is that agents don't need to explicitly recall memories before every task. The plugin handles this automatically.
Before the first LLM turn in every agent run, on_before_run() fires. It performs a semantic search against the current prompt and injects the most relevant memories directly into the system prompt:
System: You are a helpful engineering assistant.
## Relevant Memory
- [decision] Rate limiting: 100 req/min per user (token bucket) (importance: 0.9)
- [contact] Sarah Chen (lead engineer) prefers async communication (importance: 0.7)
- [project] Auth flow redesign complete, 40% reduction in login latency (importance: 0.8)
- [preference] User prefers concise responses under 200 words (importance: 0.6)
User: Review the new authentication middleware and flag any issues.
The agent arrives at its first turn already equipped with the context it needs. No boilerplate, no manual recall calls, no risk of the agent forgetting to check its memory.
If the semantic search returns nothing (which can happen for broad, generic prompts like "check inbox"), the system falls back to injecting the top memories by importance score, ensuring the agent always has something useful to work with.
Not all memories are created equal. The importance score (a float from 0.0 to 1.0) is the system's primary signal for prioritizing what to surface, how strongly to weight it in search results, and when to prune it.
| Range | Level | Typical Use Cases |
|---|---|---|
| 0.9 – 1.0 | Critical | Security issues, hard deadlines, major architectural decisions, system failures |
| 0.7 – 0.8 | Important | Key facts, significant events, completed milestones |
| 0.5 – 0.6 | Useful | General context, background information, confirmed preferences |
| 0.3 – 0.4 | Low | Minor notes, speculative findings, exploratory observations |
| < 0.3 | Minimal | Routine queries, noise (candidates for pruning) |
Importance is set by the agent at write time and can be updated later. The plugin also exposes higher level methods for common importance operations:
# Mark memories matching a query as high importance
await memory_plugin.mark_important(query="production database", importance=0.95)
# Pin a memory — immunity from pruning, maximum importance
await memory_plugin.pin(query="API key rotation schedule")
# Remove memories that are no longer relevant
await memory_plugin.forget(query="deprecated v1 API endpoints")
Pinning is the strongest signal you can give the system. A pinned memory is set to maximum importance, excluded from all pruning, and exempt from temporal decay, it scores at full weight in search results regardless of how old it is. Use it for facts that should never be forgotten: hard architectural constraints, standing user preferences, or anything that would be costly to rediscover.
Importance isn't just metadata, it actively influences recall results through a scoring adjustment applied after the raw relevance score is computed:
adjusted_score = (raw_relevance_score + importance_boost) × temporal_decay
Where:
importance_boost scales from −0.1 to +0.1 based on importance (0.5 is neutral)temporal_decay starts at 1.0 and decreases with age, flooring at 0.7This means a highly important memory that scores 0.7 semantically can outrank a low importance memory that scores 0.75 semantically. It also means old memories never completely disappear from results, the 0.7 floor ensures that a critical decision from a year ago is still surfaced when relevant.
Pinned memories are exempt from temporal decay entirely. They score at full weight regardless of age.
The recall mechanism combines two complementary search strategies into a single hybrid score.
Each memory is embedded and stored as a vector. At recall time, the query is embedded and cosine similarity is computed against all stored vectors. This captures meaning , a query about "rate limiting" will match a memory about "throttling API calls" even if the words don't overlap.
BM25 (Best Matching 25) is applied in parallel. BM25 is a probabilistic ranking algorithm that handles term frequency and document length normalization, well suited for the short, dense text that characterizes individual memories. Keyword search excels where semantic search struggles: exact version numbers, proper nouns, specific identifiers, and technical terms that embeddings can blur together.
Two additional bonuses are applied on top of the weighted combination:
This hybrid approach outperforms either strategy alone, semantic search catches conceptual matches that keyword search misses, while keyword search catches exact matches that semantic search can dilute.
Raw memories stored by agents during a session are valuable but imperfect. Agents may store redundant facts, outdated information, or observations that are only temporarily relevant. Left unchecked, the memory store grows noisy and bloated.
The curation pipeline runs automatically when an agent session ends (on_agent_stop). Its job is to distill the raw activity logs into clean, durable, long-term knowledge.
The curator reads the daily log(s) from the most recent session(s). These logs contain timestamped, agent-attributed notes, a ground truth record of what happened.
An LLM (fast, low-cost model) reads the activity logs and extracts durable facts. It's explicitly instructed on what to include and what to ignore:
Include:
Exclude:
Each extracted fact comes with a suggested category and importance score from the LLM.
For every extracted fact, the curator checks the existing memory store. Three outcomes are possible:
New fact: No similar memory exists. Store it.
Duplicate: A near-identical memory already exists (very high similarity). Skip it.
Gray zone (0.7–0.9 similarity): The new fact is related to an existing memory but not identical. A secondary LLM call decides: does the new fact supersede the old one, or does it add genuinely new information?
A fact supersedes when it:
Otherwise, both are kept. Importantly, the system defaults to keeping both on any LLM error, avoiding accidental data loss.
After facts are classified and stored, the curator consolidates the memory store. In local development, this means regenerating MEMORY.md from scratch, pulling all current memories from the vector store, grouping them by category, and writing a clean, organized summary. In the hosted environment, memories are persisted directly in the vector database and queried there (no flat file summary is maintained).
Regenerating rather than appending is critical for local mode. An append only file would accumulate redundant, contradictory, and outdated entries over time. By regenerating from the deduplicated vector store, the file always reflects the current state of knowledge.
Finally, the curator prunes memories that have aged out of relevance:
Access counts matter here. A memory that's been recalled frequently is kept longer, even if its importance score is modest. The system rewards utility.
The full curation result is returned with metrics:
result = await memory_plugin.curate()
# CurationResult(
# facts_extracted=12,
# facts_added=8,
# memories_updated=2,
# memories_pruned=5,
# existing_memories=147,
# cost_usd=0.0003
# )
The curator and agents organize memories into categories that enable category-based filtering in recall and, in local development, make the MEMORY.md summary navigable.
| Category | Purpose |
|---|---|
security | Security issues, access control decisions, risk items |
decision | Architectural choices, strategic directions, commitments |
knowledge | Domain facts, definitions, discovered patterns |
contact | People, teams, communication preferences |
project | Status updates, deadlines, milestones |
event | Significant occurrences, incidents, outcomes |
preference | User preferences, stylistic choices, habitual patterns |
Categories are optional, memories without one fall under "General" but they significantly improve organization at scale and enable precise filtering in recall.
The plugin is designed to work out of the box with sensible defaults, with escape hatches for advanced configuration.
from daita import Agent
from daita.plugins import MemoryPlugin
# Default — project-scoped, auto-curates when the agent stops
agent = Agent(name="My Agent", model="gpt-4o-mini")
agent.add_plugin(MemoryPlugin())
await agent.start()
await agent.run("Your task here...")
await agent.stop() # curation runs here automatically
# Shared workspace for multi-agent systems
memory = MemoryPlugin(workspace="research_team")
agent1.add_plugin(memory)
agent2.add_plugin(memory)
# Global scope — shared across all projects in the org
agent.add_plugin(MemoryPlugin(scope="global"))
# Manual curation control with a custom curation model
memory = MemoryPlugin(
auto_curate="manual",
curation_model="gpt-4o",
)
agent.add_plugin(memory)
# Trigger curation explicitly when ready
result = await memory.curate()
Memory is not just a convenience feature. For agents operating in production on real world workflows, persistent memory changes the character of what's possible.
Agents that learn from mistakes. When an agent discovers that a particular approach doesn't work: a flaky API, a schema quirk, a constraint in the system, it stores that finding. Future runs don't repeat the error.
Agents with institutional knowledge. Over time, agents accumulate the kind of knowledge that human employees build over months: which systems are reliable, what the team's conventions are, how decisions were made and why. This knowledge compounds.
Agents that respect user preferences. Communication style, formatting preferences, workflow preferences, these get stored once and respected automatically in every subsequent interaction.
Multi-agent collaboration. When multiple agents share a workspace, they can build on each other's findings. A research agent's discoveries can directly inform a writing agent's output, without any explicit handoff code.
Reduced onboarding cost. When a new agent is introduced to an existing project, it can query the accumulated memory store and get up to speed on the project's history, conventions, and current state without the human needing to re-explain everything.
The Memory Plugin transforms AI agents from stateless request handlers into systems that genuinely accumulate knowledge over time. The combination of immediate vector backed storage, hybrid semantic + keyword search, importance weighted ranking, temporal decay, and an automated curation pipeline produces a memory system that is both powerful and self-maintaining.
Agents that use it arrive at every session equipped with the context they've earned. They store what matters, surface what's relevant, and shed what's no longer useful...automatically.
The plugin ships as part of the Daita SDK and works in both local and hosted environments with zero configuration changes required between environments.
For future iterations of this system, the Daita team will be experimenting with new data security methods to ensure that developers can sleep soundly knowing their data, their customers data, or any critical information is securely managed.
© 2026 Daita Corp. All rights reserved.