Mode · Agentic memory

Agentic memory model

GRAIL's memory mode isn't "RAG over agent conversations". It's a deliberate cognitive model of how an agent should remember: with typed observations, clear provenance, temporal decay, and a consolidation routine you control.

The four design decisions

1. The agent writes directly — no intermediate LLM

In KB mode, an LLM reads your documents and guesses which entities to extract. It can be wrong, miss things, or duplicate.

In memory mode, the agent declares explicitly the entities and relationships when calling add_observation(). There's no extraction step that can go wrong — the agent already knows what it meant.

await mp.add_observation(
    title="...",
    content="...",
    entities=[
        {"name": "Acme", "type": "ORGANIZATION"},
        {"name": "Postgres", "type": "TECHNOLOGY"},
    ],
    relationships=[
        {"source": "Acme", "target": "Postgres", "relationship_type": "CHOSE"},
    ],
)

This is dramatically cheaper and more precise than the KB flow. Each observation costs zero LLM calls. And since the agent declares, quality is at the agent's level, not the extractor's.

2. Folders as communities

The structure of memories/<category>/ is the community structure. No need to run Leiden every time you write:

memories/
├── work/clients/acme/        ← "acme" community
├── work/clients/zorp/        ← "zorp" community
├── personal/health/          ← "health" community
└── learning/python/asyncio/  ← "asyncio" community

This is called folders-as-communities. The good part: each new observation enters its community automatically, no re-clustering. Extra good: one entity can live in multiple folders (multi-membership).

3. Provenance and time, always

Every observation carries structured metadata in its YAML frontmatter:

---
id: 01HXY...                      # unique ULID
title: "Acme picked Postgres..."
observed_at: 2026-06-02T14:23:00Z
category: work/clients/acme
tags: [decision, architecture]
confidence: 0.95
source: "architecture review meeting"
entities: [...]                    # those that apply to this observation
relationships: [...]
---

(markdown body with the content)

That metadata is what makes recall work. No LLM, no embedding, just filters over structured columns.

As memory grows, patterns deserving reorganisation appear: aliases to merge, folders to split, emergent communities to discover.

GRAIL doesn't mutate your graph without permission. The consolidate() function runs the analysis and emits proposals — the agent reviews and accepts or rejects.

grail consolidate ./my-memory
# Generates output/proposals/<timestamp>.json

grail proposals list ./my-memory
# Lists pending proposals

grail proposals apply ./my-memory --accept <id>
# Applies a specific proposal

This "propose → review → apply" loop is the opposite of frameworks that mutate automatically with every call. It gives you auditable control over your own graph.

Recall: zero LLM, maximum control

recall is the memory-exclusive search mode. Filters observations by structured columns:

result = await mp.recall(
    mode="recall",
    since="7d",                          # last 7 days
    category="work/clients/acme/**",     # folder glob
    tags=["decision"],                   # must have this tag
    entity_names=["Postgres"],           # must mention this entity
    min_confidence=0.8,                  # min confidence
)

Important properties:

No LLM call. It's a pandas filter over parquet.
No query embedding. You don't need embeddings configured.
Instant for memories up to tens of thousands of observations.
Exact: no approximate similarity, just equality/comparison over columns.

You can combine it as a modifier on other modes:

# Cascade scoped to recent Acme observations
result = await mp.recall(
    "Why did Acme rule out DynamoDB?",
    mode="cascade",
    since="30d",
    category="work/clients/acme/**",
)

The natural cycle of agentic memory

                ┌─→ add_observation   ←──┐
                │   (agent writes)       │
                │                        │
        many times                       │
                │                        │
                └─→ recall / cascade  ───┤
                    (agent reads)        │
                                         │
                                         │
                every N observations     │
                                         │
                consolidate              │
                (proposals)              │
                                         │
                proposals apply  ────────┘
                (agent accepts what's useful)

The first two boxes (writing and reading) are cheap and frequent. Consolidation is rare (weekly, monthly) and deliberate.

When NOT to use memory mode

If your agent only needs ephemeral context (current session, then gone), an in-memory buffer is simpler.
If you want automatic memory without control (everything the agent says gets indexed), GRAIL isn't the simplest — Letta or mem0 are more opinionated.
If your "observations" are actually documents that already exist (PDFs, papers), use KB mode.

GRAIL shines when you want a deliberate, auditable, structured memory the agent can query with surgical precision.

Next step

Agentic memory quickstart — implement the full cycle in 5 minutes.
Skill quickstart — wire this to Claude Code, Codex, or OpenCode.
Search modes — how to combine recall with cascade for hybrid queries.

The four design decisions​

1. The agent writes directly — no intermediate LLM​

2. Folders as communities​

3. Provenance and time, always​

4. Consolidation with agent consent​

Recall: zero LLM, maximum control​

The natural cycle of agentic memory​

When NOT to use memory mode​

Next step​