Memory · RAG · Claude Code — a field roadmap

The 7 Levels of Claude Code & RAG

A map of the long fight between AI systems and memory — from the markdown Post-it notes Claude writes about you, all the way up to agentic, multimodal graph‑RAG. Where you stand, and what gets you one rung higher.

7 levels · 3 tiers native memory → real RAG context rot, throughout
Adapted & restructured from “The 7 Levels of Claude Code & RAG” — Chase AI  ·  youtu.be/kQu5pWKS8GA
Jentrix · Claude Code & RAG — the 7‑level roadmap 01 / 15

Open by naming the real subject: this is not a "how to set up RAG" tutorial — it's a deconstruction of the fight between AI systems and memory. The 7 levels are a map so you can locate where you stand and what gets you to the next rung. Audience is AI pros, so we move fast through levels 1–4 and spend the time on 5–7. Note the deep-dive videos for each system are linked rather than re-explained here.

00 — The real problem

It was never about RAG. It's about AI systems and memory.

Getting AI to reliably and accurately answer questions about past conversations, or about giant troves of documents, is a problem we've been chasing for years. The reflex answer is RAG. Before you reach for it, deconstruct the problem — and notice where you actually are.

01  The recurring problem

Reliably, accurately recall past conversations — or query a trove of documents that won't fit in context.

02  The reflex answer

Reach for RAG — retrieval‑augmented generation — and assume that's the destination.

03  What this deck does

Maps the 7 levels of the problem so you can self‑locate — and climb only as far as you need.

ⓘ  Most people start at Level 1 — and that's where they stay. The arrow only goes one way; the trick is climbing it deliberately.

Jentrix · Claude Code & RAG— framing 02 / 15

The recurring problem: making AI reliably and accurately answer about past conversations, or about giant document troves. The default reflex is RAG. But before reaching for it, deconstruct the problem — most people start at Level 1 (do nothing intentional) and stay there. This deck is a roadmap so the audience can self-locate. Set expectation: at every level we cover the same four things — what to expect, the skill to master, the trap, and how to advance.

The roadmap — 7 levels, 3 tiers

The arrow goes one way. Climb only as far as your problem needs.

TIER I — NATIVE MEMORY TIER II — OUTSIDE TOOLS & STRUCTURE TIER III — REAL RAG

ⓘ  Try them in ascending order. If markdown files are enough, stop there. If they're not, try Obsidian — then LightRAG — then multimodal. Nobody can tell you where your line in the sand is.

Jentrix · Claude Code & RAG— the ladder 03 / 15

Three tiers. Tier I (1–2): memory native to Claude Code — automemory, then CLAUDE.md. Tier II (3–4): a structured markdown architecture, then outside tools like Obsidian. Tier III (5–7): real RAG — naive vector RAG, graph RAG, then agentic/multimodal. The arrow only goes one way, but you should climb only as far as the problem requires. Reinforce the "try in ascending order" advice — and that nobody can tell you exactly where the RAG line is.

Throughout — the constant tension

Context rot: the more you fill the window, the worse it gets.

EFFECTIVE QUALITY  vs  CONTEXT FILLED  ·  1M‑token window
100%93%86%79% ~0256k~550k~1M TOKENS OF CONTEXT USED → "fresh chat" ceiling ≈92% — a quarter full ≈78% — at the end …and usage burn spikes
Real run, from the talk: a 1M‑token window at 256k tokens (~¼ full) measured ≈92% — by the end, ≈78%. Same model, same session, worse answers.
The 1M window backfired

People stopped clearing — they keep one chat alive forever so it never "forgets." Now nobody manages their own context.

Two costs, not one

Effectiveness drops and token / usage burn climbs — 800k of context costs far more per turn than 80k.

Hold this through every level

We always trade "ingest more context" against "don't bloat it." That tension never goes away.

Jentrix · Claude Code & RAG— the cross‑cutting tension 04 / 15

Context rot: the more you fill a session's context window, the worse the model gets. Real example from the talk — at ~256k of a 1M window (a quarter full) quality was ≈92%; by the end ≈78%. The 1M window made things worse because people stopped clearing — they keep one chat alive forever to avoid "losing" the conversation. Two costs: effectiveness drops, and token/usage burn spikes (800k context costs far more per turn than 80k). Keep this in mind for every level — we're always trading "ingest more context" against "don't bloat it."

How to read the next ten slides

Every level, the same four questions.

1 · What to expect

What the level actually is — and how you know you're standing on it.

2 · The skill to master

The one capability you have to get right before you climb.

3 · The trap

The comfortable mistake that keeps people stuck here for years.

4 · How you advance

The concrete move that unlocks the next rung.

Climb in order. Stop the moment it's good enough. Skipping rungs is how you end up "scammed into a RAG system you didn't need."

Jentrix · Claude Code & RAG— the level template 05 / 15

How to read the next ten slides. For each level: (1) what to expect — what the level actually is and how you know you're on it; (2) the skill to master before you move on; (3) the trap that keeps people stuck; (4) the concrete move that unlocks the next rung. Don't skip levels — try them in ascending order and stop when it's good enough. Skipping is how people get sold RAG systems they don't need.

Tier I · Native memory  ———  Level 01
01

Automemory

The markdown Post-it notes Claude Code writes about you — on its own, from vibes, not from instruction.

What to expect

On by default. Claude writes markdown files about you and the project into a hidden memory/ folder — purely from its own intuition.

You're here when

You've never set up anything intentional for memory — you just rely on a bloated context window to "remember."

The trap

Zero control. Claude decides what to carry — and shoehorns it in like ChatGPT bringing up things you don't care about.

How you advance

Accept automemory isn't enough. Take an active role: know which files exist, and start editing them yourself.

~/.claude/projects/<hash>/memory/  ·  written by Claude, unprompted
~/.claude/projects/ 38620f90…/ memory/ MEMORY.md— index of the notes below youtube-growth.md"wants 100k subs by 2026" revenue-goals.md references.md … 4 of them, and counting POST-IT NOTES Claude wrote these. On its own. From vibes — no instruction from you.
Cute, occasionally handy — mostly noise. Like a chatbot shoehorning old context: "I get it, you remember — I don't care."
Jentrix · Claude Code & RAG— Level 01 · automemory 06 / 15

Automemory is on by default — Claude Code writes markdown files about you and the project from its own intuition. Look under ~/.claude/projects/<hash>/memory/: a MEMORY.md index plus topic notes ("wants 100k subs by 2026"). Cute, occasionally useful, mostly noise — like ChatGPT shoehorning old context. You're here if you've never set up anything intentional and just lean on a bloated context window. The trap: you have no control over what Claude considers. The skill: realize automemory isn't enough and take an active role. Advance: explicit memory — know the files exist, edit them yourself.

Tier I · Native memory  ———  Level 02
02

CLAUDE.md

One file of rules & conventions, read before basically every task. Feels like a godsend — until it's a bloated rulebook.

What to expect

Auto‑created; you edit it (or refresh with /init). Claude consults it before nearly every task — so it follows it well.

You're here when

You've discovered CLAUDE.md and now stuff everything you ever want remembered into it.

The trap

A bloated rulebook. Studies on AGENTS.md / CLAUDE.md show these files can reduce model effectiveness when overloaded.

How you advance

Stop making CLAUDE.md do everything. Turn it into an index that points elsewhere.

CLAUDE.md  ·  injected into ~every prompt — signal, or noise?
CLAUDE.md ## About me ## File-system layout ## Conventions to follow ## Remember: my YouTube goals… ## Remember: prefers tabs… ## Remember: that one bug from… ## Remember: … ## Remember: … ## …and on, and on prepended to EVERY PROMPT relevant signal? or just more noise? not relevant ~always → it doesn't belong here BLOAT → EFFECTIVENESS lean bloated
Less is more. Context pollution is real. If a line isn't relevant to virtually every prompt, it doesn't belong in CLAUDE.md.
Jentrix · Claude Code & RAG— Level 02 · CLAUDE.md 07 / 15

CLAUDE.md feels like a godsend: one place for rules, conventions, and things to always remember — read before basically every task, so Claude follows it well. Auto-created; edit it, or refresh with /init. The trap: the very thing that makes it powerful — injected into every prompt — makes it dangerous when bloated. Studies on AGENTS.md / CLAUDE.md show these files can reduce model effectiveness. Less is more; context pollution and context rot are real. Skill: write high-signal project context — if it isn't relevant to virtually every prompt, it doesn't belong. Advance: stop making CLAUDE.md do everything — turn it into an index.

Tier II · Outside tools & structure  ———  Level 03
03

A memory architecture

CLAUDE.md stops doing everything. It becomes an index that points to purpose‑built files.

What to expect

Many small files, each for one job — like GSD‑style orchestration: project.md, requirements.md, roadmap.md, state.md.

Why it works

A clear path Claude walks on demand — not everything injected always. That's the antidote to context rot.

The trap

Still doesn't scale to thousands of documents — or to relationships across documents.

How you advance

Pull memory out of the repo into a dedicated knowledge tool.

CLAUDE.md  →  an index, not the whole brain
project.mdthe north star · why we're here requirements.mdwhat we're building roadmap.mddone · now · next state.mdwhere we are right now CLAUDE.md index · router only
Break memory, context and conventions into chunks with a clear path — and you're fighting context rot, not feeding it.
Jentrix · Claude Code & RAG— Level 03 · architecture 08 / 15

CLAUDE.md becomes a pointer, not the whole brain. Split memory by purpose — GSD-style orchestration is the canonical example: project.md (the north star), requirements.md (what we're building), roadmap.md (done · now · next), state.md (where we are right now). CLAUDE.md just routes Claude to the right file. This fights context rot — clear paths and on-demand loading instead of injecting everything always. Skill: design a memory architecture. Trap: it still doesn't scale to thousands of docs or to cross-document relationships. Advance: pull memory out of the repo into a dedicated tool.

Tier II · Outside tools & structure  ———  Level 04
04

Outside tools — a vault

A knowledge base in its own app, usually Obsidian: linked notes, a graph view, Claude reading and writing into it.

What to expect

A real note‑graph outside the repo. Great when you treat the KB as a rulebook and need specific notes pulled.

The catch

Links are manual & arbitrary[[brackets]] you or Claude added — not derived from content. It only looks like graph‑RAG.

The grey zone

"Obsidian enough" vs. "need RAG"? No clean answer — depends on document count and the kind of questions. Experiment.

How you advance

When you need relationships across thousands of docs that never mention each other.

VAULT GRAPH  ·  a query pulls the notes you linked — nothing it didn't
manual link — arbitrary pricing contracts renewals onboarding support SLAs refunds EU region Q "renewal terms?"
It resembles a knowledge graph — but the connections are yours, not the content's. That's the ceiling.
Jentrix · Claude Code & RAG— Level 04 · the vault 09 / 15

Level 4: outside tools — a knowledge vault, typically Obsidian. Linked notes, a graph view, Claude reads and writes the vault. Good when you treat the KB as a rulebook and need specific notes pulled. But the links are manual and somewhat arbitrary — [[brackets]] you or Claude added, not derived from content; Obsidian's graph only looks like graph-RAG. The grey zone: when is Obsidian enough vs. when do you need RAG? No clean answer — depends on document count and the kind of questions you ask. You have to experiment. Advance: when you need relationships across thousands of docs that never mention each other.

Tier III · Real RAG  ———  Level 05 · I / II
05

Naive RAG — the fundamentals

Three stages: embed the chunks → store them as vectors → retrieve the nearest ones to augment the answer.

① EMBED  ·  ② STORE  ·  ③ RETRIEVE  —  the journey of one document, then one question
① embed② store③ retrieve 1 document "WWII battleships" chunk 1 chunk 2 chunk 3 EMBEDDINGMODELchunk → vector VECTOR DATABASE — SEMANTIC SPACE fruit · apples · pears · bananas ships · boats · battleships 1·2·3 (0.52, 5.12, 9.31, … hundreds more) your question → embed → its own vector nearest‑neighbour · top‑k + training data CLAUDE CODELLM · Opusretrieved chunks + training answeraugmented + cited
What to expect

Real RAG begins here — embeddings, vector DBs, and how data flows in and out.

The mental model

Vectors are points placed by meaning; "fruit" clusters here, "ships" there. Repeat for thousands of docs → your knowledge base.

The retrieval move

Question → vector → nearest‑neighbour → pull top‑k chunks → the LLM augments its answer with them.

The real sell

Not "WWII battleships" — proprietary data, at scale, the model never saw in training.

Jentrix · Claude Code & RAG— Level 05 · how RAG works 10 / 15

Level 5: real RAG. Three stages. (1) Embed: a document isn't ingested whole — it's chunked, and each chunk goes through an embedding model into a vector. (2) Store: vectors are points in a high-dimensional space (think 3-D for intuition), placed by semantic meaning — "bananas / apples / pears" cluster here, "ships / boats" there. Repeat for thousands of docs → that's your knowledge base. (3) Retrieve: your question becomes a vector too; find the nearest vectors, pull the top-k chunks into the model, and it generates an answer augmented by them — retrieval-augmented generation. The sell isn't "WWII battleships" — it's proprietary data, at scale, that the model never saw in training.

Tier III · Real RAG  ———  Level 05 · II / II
05

…and why naive RAG breaks

Arbitrary chunking, siloed vectors, no relationships — congratulations, you've built an over‑complicated Ctrl‑F.

FAILURE MODES  ·  why "grab a few chunks" doesn't hold up
A · CHUNKS LOSE CONTEXT chunk 1 chunk 2 chunk 3 ← retrieved chunk 3 only makes sensewith chunk 1 — but retrievalgrabbed 3 without 1. Often youneed the whole document. B · VECTORS LIVE IN SILOS boats bananas "how do boats relate to bananas?" — you can't ask. Rerankers help a little. Not enough. C · REAL‑WORLD HIT RATE a coin flip — 50% unsophisticated vector RAG ≈ 25% "almost better guessing"
The trap

You built an over‑complicated Ctrl‑F — and you can get sold one, dressed up as "my Pinecone / Supabase RAG."

So why learn it?

If you don't get chunking & embeddings, you can't make good calls about graph RAG. Understand the foundation; don't deploy it.

Ask first: do you?

If it's just a rulebook lookup, Obsidian — or even naive RAG — is probably enough.

When you actually need more

When the question is about relationships between docs that never mention each other → graph RAG.

Jentrix · Claude Code & RAG— Level 05 · where it breaks 11 / 15

Naive RAG falls apart fast. Chunking is arbitrary — by tokens? with overlap? does the doc even chunk sensibly? Chunk 3 might reference chunk 1, but retrieval grabs 3 without 1 — the context that makes 3 meaningful is missing; often you need the whole document. And vectors live in silos — you can't ask about relationships ("how do boats relate to bananas?"). Rerankers help a little. Real-world hit rate of unsophisticated vector RAG can be ~25% — you're nearly better guessing. Trap: you've built an over-complicated Ctrl+F — and you can get sold one. Level 5 is about understanding the foundation, not deploying it. Before you say "I need RAG" — do you? If it's just a rulebook lookup, Obsidian's probably enough.

Tier III · Real RAG  ———  Level 06
06

Graph RAG

Everything connected — entities and typed relationships extracted from the content itself, not brackets you typed.

What to expect

A hybrid vector + graph store. LightRAG is the lightest open‑source option; Microsoft GraphRAG is the heavyweight — and not cheap.

You're here when

Obsidian and naive RAG don't cut it — you need entities, relationships, and queries that traverse them.

The win

On LightRAG's own benchmarks, graph beats naive across the board — often 100%+ jumps. Their numbers, so grain of salt; the direction is real.

The trap → what's next

It's text‑only. Scanned PDFs? Images? Video? That pushes you to Level 7.

NAIVE RAG vs LIGHTRAG  ·  win‑rate %, "comprehensiveness" — LightRAG repo, ~6–8 mo old
10066330 31.668.4 32.567.5 2476 2575 AgricultureCSLegalMixed naive vector RAG LightRAG (graph) LightRAG also reports beating Microsoft GraphRAG — their numbers; treat directionally. EXTRACTED — TYPED Acme Corp Q3 contract EU region signedgoverns OBSIDIAN — UNTYPED [[link]] — no meaning
The relationship — "signed", "governs", "supersedes" — is mined from the content. That's the gap with Obsidian's bare brackets.
Jentrix · Claude Code & RAG— Level 06 · graph RAG 12 / 15

When you genuinely need relationships across many docs that don't reference each other — graph RAG. Everything is connected: entities and typed relationships, extracted by an embedding pipeline from the actual content — not brackets you typed. LightRAG is the lightest-weight open-source option; Microsoft's GraphRAG is the heavyweight (and not cheap). On LightRAG's own benchmarks, graph beats naive across the board — often 100%+ jumps (e.g. ~32→68, ~24→76). Their numbers, so grain of salt — but the direction is real. You're here when Obsidian and naive RAG don't cut it and you need hybrid vector + graph queries over entities and relationships. Traps: it's text-only — what about scanned PDFs, images, video?

Tier III · Real RAG — the edge  ———  Level 07
07

Agentic & multimodal RAG

Two themes: ingest images, scanned PDFs and video — and a router agent that picks where to look. The devil is the pipeline.

ROUTER  ·  pick the source per question   |   INGESTION & SYNC  ·  where most of the system actually lives
team chata question ROUTER AGENTwhich path answers this? GRAPH‑RAG DB · LightRAG Postgres / SQL Obsidian vault CLAUDE.md + repo .md ← this time A mature memory architecture stacks all the levelsCLAUDE.md + repo md + a vault + a graph DB + SQL,with an agent on top deciding which to hit. SOURCESDrive · scanned PDFsimages · video PARSERAG‑Anything EMBEDGemini Embedding · text + video CLEAN · DEDUPE · VERSIONaccess control · sync GRAPH STOREthe knowledge base retrieval ← "the RAG"this little box ≈ 90% of the real system is ingestion & keeping it in sync
What to expect

Multimodal ingestion (RAG‑Anything, Gemini Embedding for video) + an agent that routes each question — graph DB? SQL? vault? CLAUDE.md?

You're here when

You must index images, tables and video — and a top‑of‑funnel agent decides the path. You're stacking all the levels.

The trap

Forcing yourself here when you don't need it. Honestly — most people are fine with Obsidian, and most don't need RAG at all.

If you do need it

Solo operator? RAG‑Anything + LightRAG — open source, lightweight, no lock‑in. Avoid the systems you can't walk away from.

Jentrix · Claude Code & RAG— Level 07 · agentic & multimodal 13 / 15

The bleeding edge (≈ April 2026): two themes. (1) Multimodal ingestion — RAG-Anything pulls images and scanned PDFs into a LightRAG-style graph; Gemini Embedding can embed video itself. A transcript isn't enough. (2) The devil is in the pipeline — in a real agentic system, the vast majority of the infrastructure is data ingestion and syncing: parsing, embedding, cleaning, dedupe, versioning, access control — only a sliver is "retrieval." Plus a top-of-funnel router agent that decides per query: graph-RAG DB? Postgres/SQL? Obsidian vault? CLAUDE.md? A mature memory architecture stacks all the levels. Trap: forcing yourself here when you don't need it — most people are fine with Obsidian, most don't need RAG at all. If you do need multimodal: RAG-Anything + LightRAG — open source, lightweight, no lock-in.

The map — at a glance

The 7 levels, one table.

LevelNameWhat it isYou're here when…
1AutomemoryMarkdown notes Claude writes about you, unpromptedYou've never set up anything intentional for memory.
2CLAUDE.mdOne rules file, read before nearly every taskYou stuff everything you want remembered into it.
3Memory architectureMany purpose-built files; CLAUDE.md becomes an indexYou've split memory into project / requirements / roadmap / state.
4Outside tools — a vaultObsidian-style linked notes & graph viewYou need specific notes pulled, but links are still hand-made.
5Naive RAGChunk → embed → vector DB → nearest-neighbour retrievalYou need scale, but only chunk-level lookups — and it's brittle.
6Graph RAGEntities + typed relationships; hybrid vector + graph (LightRAG)You need relationships across docs that never mention each other.
7Agentic & multimodalImages / PDFs / video ingestion + a router agent over every sourceYou're stacking all the levels — and the pipeline is the hard part.

ⓘ  Find your row → look one row down for what to do next. Tiers: 1–2 native · 3–4 outside / structured · 5–7 real RAG. Most people live in rows 2–4.

Jentrix · Claude Code & RAG— at a glance 14 / 15

Recap — the whole ladder on one slide. Use this to self-locate: find the row that matches your setup today, then look one row down for what to do next. Note the tiers: native (1–2), structured / outside (3–4), real RAG (5–7). Most of the audience lives in rows 2–4 — and that's fine.

Where you stand

Three things to take with you.

01  Climb in order

Markdown files → Obsidian → LightRAG → RAG‑Anything + LightRAG. Stop the moment it's good enough.

02  Nobody knows your line

The RAG‑vs‑long‑context tradeoff keeps shrinking. No video will tell you where it is — experiment.

03  Avoid lock‑in

Open‑source, lightweight tools win for exploration — and keep context rot in mind the whole way up.

Recommended stack — solo operator who needs multimodal
RAG‑Anything + LightRAG

Open source · lightweight · no money or weeks sunk to find out it doesn't fit · easy to walk away from.

The whole climb
Want a hand finding your rung?
30 minutes — we map whether a harness pays off, with a written recommendation either way.
Start with one process
Deck styled with the Jentrix design system· Source: “The 7 Levels of Claude Code & RAG” — Chase AI · youtu.be/kQu5pWKS8GA · content distilled & restructured
Jentrix · Claude Code & RAG— go find your rung 15 / 15

Three takeaways. One: climb in order — markdown files → Obsidian → LightRAG → RAG-Anything + LightRAG; stop when it's good enough. Two: nobody can tell you where your line in the sand is — the RAG-vs-long-context tradeoff keeps shrinking; you have to experiment. Three: avoid lock-in — open-source, lightweight tools win for exploration. And keep context rot in mind the whole way. That's the map — go find your rung.

← / → move · N notes · F fullscreen
Speaker notesslide 1 / 15