A map of the long fight between AI systems and memory — from the markdown Post-it notes Claude writes about you, all the way up to agentic, multimodal graph‑RAG. Where you stand, and what gets you one rung higher.
Open by naming the real subject: this is not a "how to set up RAG" tutorial — it's a deconstruction of the fight between AI systems and memory. The 7 levels are a map so you can locate where you stand and what gets you to the next rung. Audience is AI pros, so we move fast through levels 1–4 and spend the time on 5–7. Note the deep-dive videos for each system are linked rather than re-explained here.
Getting AI to reliably and accurately answer questions about past conversations, or about giant troves of documents, is a problem we've been chasing for years. The reflex answer is RAG. Before you reach for it, deconstruct the problem — and notice where you actually are.
Reliably, accurately recall past conversations — or query a trove of documents that won't fit in context.
Reach for RAG — retrieval‑augmented generation — and assume that's the destination.
Maps the 7 levels of the problem so you can self‑locate — and climb only as far as you need.
ⓘ Most people start at Level 1 — and that's where they stay. The arrow only goes one way; the trick is climbing it deliberately.
The recurring problem: making AI reliably and accurately answer about past conversations, or about giant document troves. The default reflex is RAG. But before reaching for it, deconstruct the problem — most people start at Level 1 (do nothing intentional) and stay there. This deck is a roadmap so the audience can self-locate. Set expectation: at every level we cover the same four things — what to expect, the skill to master, the trap, and how to advance.
ⓘ Try them in ascending order. If markdown files are enough, stop there. If they're not, try Obsidian — then LightRAG — then multimodal. Nobody can tell you where your line in the sand is.
Three tiers. Tier I (1–2): memory native to Claude Code — automemory, then CLAUDE.md. Tier II (3–4): a structured markdown architecture, then outside tools like Obsidian. Tier III (5–7): real RAG — naive vector RAG, graph RAG, then agentic/multimodal. The arrow only goes one way, but you should climb only as far as the problem requires. Reinforce the "try in ascending order" advice — and that nobody can tell you exactly where the RAG line is.
People stopped clearing — they keep one chat alive forever so it never "forgets." Now nobody manages their own context.
Effectiveness drops and token / usage burn climbs — 800k of context costs far more per turn than 80k.
We always trade "ingest more context" against "don't bloat it." That tension never goes away.
Context rot: the more you fill a session's context window, the worse the model gets. Real example from the talk — at ~256k of a 1M window (a quarter full) quality was ≈92%; by the end ≈78%. The 1M window made things worse because people stopped clearing — they keep one chat alive forever to avoid "losing" the conversation. Two costs: effectiveness drops, and token/usage burn spikes (800k context costs far more per turn than 80k). Keep this in mind for every level — we're always trading "ingest more context" against "don't bloat it."
What the level actually is — and how you know you're standing on it.
The one capability you have to get right before you climb.
The comfortable mistake that keeps people stuck here for years.
The concrete move that unlocks the next rung.
Climb in order. Stop the moment it's good enough. Skipping rungs is how you end up "scammed into a RAG system you didn't need."
How to read the next ten slides. For each level: (1) what to expect — what the level actually is and how you know you're on it; (2) the skill to master before you move on; (3) the trap that keeps people stuck; (4) the concrete move that unlocks the next rung. Don't skip levels — try them in ascending order and stop when it's good enough. Skipping is how people get sold RAG systems they don't need.
The markdown Post-it notes Claude Code writes about you — on its own, from vibes, not from instruction.
On by default. Claude writes markdown files about you and the project into a hidden memory/ folder — purely from its own intuition.
You've never set up anything intentional for memory — you just rely on a bloated context window to "remember."
Zero control. Claude decides what to carry — and shoehorns it in like ChatGPT bringing up things you don't care about.
Accept automemory isn't enough. Take an active role: know which files exist, and start editing them yourself.
Automemory is on by default — Claude Code writes markdown files about you and the project from its own intuition. Look under ~/.claude/projects/<hash>/memory/: a MEMORY.md index plus topic notes ("wants 100k subs by 2026"). Cute, occasionally useful, mostly noise — like ChatGPT shoehorning old context. You're here if you've never set up anything intentional and just lean on a bloated context window. The trap: you have no control over what Claude considers. The skill: realize automemory isn't enough and take an active role. Advance: explicit memory — know the files exist, edit them yourself.
One file of rules & conventions, read before basically every task. Feels like a godsend — until it's a bloated rulebook.
Auto‑created; you edit it (or refresh with /init). Claude consults it before nearly every task — so it follows it well.
You've discovered CLAUDE.md and now stuff everything you ever want remembered into it.
A bloated rulebook. Studies on AGENTS.md / CLAUDE.md show these files can reduce model effectiveness when overloaded.
Stop making CLAUDE.md do everything. Turn it into an index that points elsewhere.
CLAUDE.md.CLAUDE.md feels like a godsend: one place for rules, conventions, and things to always remember — read before basically every task, so Claude follows it well. Auto-created; edit it, or refresh with /init. The trap: the very thing that makes it powerful — injected into every prompt — makes it dangerous when bloated. Studies on AGENTS.md / CLAUDE.md show these files can reduce model effectiveness. Less is more; context pollution and context rot are real. Skill: write high-signal project context — if it isn't relevant to virtually every prompt, it doesn't belong. Advance: stop making CLAUDE.md do everything — turn it into an index.
CLAUDE.md stops doing everything. It becomes an index that points to purpose‑built files.
Many small files, each for one job — like GSD‑style orchestration: project.md, requirements.md, roadmap.md, state.md.
A clear path Claude walks on demand — not everything injected always. That's the antidote to context rot.
Still doesn't scale to thousands of documents — or to relationships across documents.
Pull memory out of the repo into a dedicated knowledge tool.
CLAUDE.md becomes a pointer, not the whole brain. Split memory by purpose — GSD-style orchestration is the canonical example: project.md (the north star), requirements.md (what we're building), roadmap.md (done · now · next), state.md (where we are right now). CLAUDE.md just routes Claude to the right file. This fights context rot — clear paths and on-demand loading instead of injecting everything always. Skill: design a memory architecture. Trap: it still doesn't scale to thousands of docs or to cross-document relationships. Advance: pull memory out of the repo into a dedicated tool.
A knowledge base in its own app, usually Obsidian: linked notes, a graph view, Claude reading and writing into it.
A real note‑graph outside the repo. Great when you treat the KB as a rulebook and need specific notes pulled.
Links are manual & arbitrary — [[brackets]] you or Claude added — not derived from content. It only looks like graph‑RAG.
"Obsidian enough" vs. "need RAG"? No clean answer — depends on document count and the kind of questions. Experiment.
When you need relationships across thousands of docs that never mention each other.
Level 4: outside tools — a knowledge vault, typically Obsidian. Linked notes, a graph view, Claude reads and writes the vault. Good when you treat the KB as a rulebook and need specific notes pulled. But the links are manual and somewhat arbitrary — [[brackets]] you or Claude added, not derived from content; Obsidian's graph only looks like graph-RAG. The grey zone: when is Obsidian enough vs. when do you need RAG? No clean answer — depends on document count and the kind of questions you ask. You have to experiment. Advance: when you need relationships across thousands of docs that never mention each other.
Three stages: embed the chunks → store them as vectors → retrieve the nearest ones to augment the answer.
Real RAG begins here — embeddings, vector DBs, and how data flows in and out.
Vectors are points placed by meaning; "fruit" clusters here, "ships" there. Repeat for thousands of docs → your knowledge base.
Question → vector → nearest‑neighbour → pull top‑k chunks → the LLM augments its answer with them.
Not "WWII battleships" — proprietary data, at scale, the model never saw in training.
Level 5: real RAG. Three stages. (1) Embed: a document isn't ingested whole — it's chunked, and each chunk goes through an embedding model into a vector. (2) Store: vectors are points in a high-dimensional space (think 3-D for intuition), placed by semantic meaning — "bananas / apples / pears" cluster here, "ships / boats" there. Repeat for thousands of docs → that's your knowledge base. (3) Retrieve: your question becomes a vector too; find the nearest vectors, pull the top-k chunks into the model, and it generates an answer augmented by them — retrieval-augmented generation. The sell isn't "WWII battleships" — it's proprietary data, at scale, that the model never saw in training.
Arbitrary chunking, siloed vectors, no relationships — congratulations, you've built an over‑complicated Ctrl‑F.
You built an over‑complicated Ctrl‑F — and you can get sold one, dressed up as "my Pinecone / Supabase RAG."
If you don't get chunking & embeddings, you can't make good calls about graph RAG. Understand the foundation; don't deploy it.
If it's just a rulebook lookup, Obsidian — or even naive RAG — is probably enough.
When the question is about relationships between docs that never mention each other → graph RAG.
Naive RAG falls apart fast. Chunking is arbitrary — by tokens? with overlap? does the doc even chunk sensibly? Chunk 3 might reference chunk 1, but retrieval grabs 3 without 1 — the context that makes 3 meaningful is missing; often you need the whole document. And vectors live in silos — you can't ask about relationships ("how do boats relate to bananas?"). Rerankers help a little. Real-world hit rate of unsophisticated vector RAG can be ~25% — you're nearly better guessing. Trap: you've built an over-complicated Ctrl+F — and you can get sold one. Level 5 is about understanding the foundation, not deploying it. Before you say "I need RAG" — do you? If it's just a rulebook lookup, Obsidian's probably enough.
Everything connected — entities and typed relationships extracted from the content itself, not brackets you typed.
A hybrid vector + graph store. LightRAG is the lightest open‑source option; Microsoft GraphRAG is the heavyweight — and not cheap.
Obsidian and naive RAG don't cut it — you need entities, relationships, and queries that traverse them.
On LightRAG's own benchmarks, graph beats naive across the board — often 100%+ jumps. Their numbers, so grain of salt; the direction is real.
It's text‑only. Scanned PDFs? Images? Video? That pushes you to Level 7.
When you genuinely need relationships across many docs that don't reference each other — graph RAG. Everything is connected: entities and typed relationships, extracted by an embedding pipeline from the actual content — not brackets you typed. LightRAG is the lightest-weight open-source option; Microsoft's GraphRAG is the heavyweight (and not cheap). On LightRAG's own benchmarks, graph beats naive across the board — often 100%+ jumps (e.g. ~32→68, ~24→76). Their numbers, so grain of salt — but the direction is real. You're here when Obsidian and naive RAG don't cut it and you need hybrid vector + graph queries over entities and relationships. Traps: it's text-only — what about scanned PDFs, images, video?
Two themes: ingest images, scanned PDFs and video — and a router agent that picks where to look. The devil is the pipeline.
Multimodal ingestion (RAG‑Anything, Gemini Embedding for video) + an agent that routes each question — graph DB? SQL? vault? CLAUDE.md?
You must index images, tables and video — and a top‑of‑funnel agent decides the path. You're stacking all the levels.
Forcing yourself here when you don't need it. Honestly — most people are fine with Obsidian, and most don't need RAG at all.
Solo operator? RAG‑Anything + LightRAG — open source, lightweight, no lock‑in. Avoid the systems you can't walk away from.
The bleeding edge (≈ April 2026): two themes. (1) Multimodal ingestion — RAG-Anything pulls images and scanned PDFs into a LightRAG-style graph; Gemini Embedding can embed video itself. A transcript isn't enough. (2) The devil is in the pipeline — in a real agentic system, the vast majority of the infrastructure is data ingestion and syncing: parsing, embedding, cleaning, dedupe, versioning, access control — only a sliver is "retrieval." Plus a top-of-funnel router agent that decides per query: graph-RAG DB? Postgres/SQL? Obsidian vault? CLAUDE.md? A mature memory architecture stacks all the levels. Trap: forcing yourself here when you don't need it — most people are fine with Obsidian, most don't need RAG at all. If you do need multimodal: RAG-Anything + LightRAG — open source, lightweight, no lock-in.
| Level | Name | What it is | You're here when… |
|---|---|---|---|
| 1 | Automemory | Markdown notes Claude writes about you, unprompted | You've never set up anything intentional for memory. |
| 2 | CLAUDE.md | One rules file, read before nearly every task | You stuff everything you want remembered into it. |
| 3 | Memory architecture | Many purpose-built files; CLAUDE.md becomes an index | You've split memory into project / requirements / roadmap / state. |
| 4 | Outside tools — a vault | Obsidian-style linked notes & graph view | You need specific notes pulled, but links are still hand-made. |
| 5 | Naive RAG | Chunk → embed → vector DB → nearest-neighbour retrieval | You need scale, but only chunk-level lookups — and it's brittle. |
| 6 | Graph RAG | Entities + typed relationships; hybrid vector + graph (LightRAG) | You need relationships across docs that never mention each other. |
| 7 | Agentic & multimodal | Images / PDFs / video ingestion + a router agent over every source | You're stacking all the levels — and the pipeline is the hard part. |
ⓘ Find your row → look one row down for what to do next. Tiers: 1–2 native · 3–4 outside / structured · 5–7 real RAG. Most people live in rows 2–4.
Recap — the whole ladder on one slide. Use this to self-locate: find the row that matches your setup today, then look one row down for what to do next. Note the tiers: native (1–2), structured / outside (3–4), real RAG (5–7). Most of the audience lives in rows 2–4 — and that's fine.
Markdown files → Obsidian → LightRAG → RAG‑Anything + LightRAG. Stop the moment it's good enough.
The RAG‑vs‑long‑context tradeoff keeps shrinking. No video will tell you where it is — experiment.
Open‑source, lightweight tools win for exploration — and keep context rot in mind the whole way up.
Open source · lightweight · no money or weeks sunk to find out it doesn't fit · easy to walk away from.
Three takeaways. One: climb in order — markdown files → Obsidian → LightRAG → RAG-Anything + LightRAG; stop when it's good enough. Two: nobody can tell you where your line in the sand is — the RAG-vs-long-context tradeoff keeps shrinking; you have to experiment. Three: avoid lock-in — open-source, lightweight tools win for exploration. And keep context rot in mind the whole way. That's the map — go find your rung.