What Karpathy's LLM Wiki Is and How to Use It

A simple-English guide to Andrej Karpathy's LLM Wiki idea. Explains what it is, how to set one up, why it is different from RAG, useful real-world examples, and where it works well or breaks down.

Version 1.0.0Updated 04/25/2026, 08:00 PM EST28 views

What Karpathy's LLM Wiki Is and How to Use It

Every few months, somebody in AI gives a name to something people were already circling around. Andrej Karpathy's LLM Wiki is one of those ideas.

It is not a product. It is not an app. It is not a magical new model.

It is a way of working with AI.

The basic idea is simple: instead of asking an LLM to rediscover everything from a pile of documents every single time you have a question, you let the LLM build and maintain a persistent wiki for you. That wiki lives in normal markdown files. It grows over time. It gets cleaned up, cross-linked, and improved as you add more sources.

In Karpathy's own framing, Obsidian is the IDE, the LLM is the programmer, and the wiki is the codebase.

That is the whole idea in one sentence.

What It Is in Simple English

Most people use AI with documents in one of two ways.

The first is the ordinary chat approach: upload some files, ask a question, get an answer, and then move on. The second is a RAG-style approach: the system searches through your documents, pulls back relevant chunks, and answers from those.

That works. But it has an obvious weakness: the system starts from scratch every time.

Karpathy's LLM Wiki idea changes that.

Instead of searching raw files fresh on every question, the AI slowly builds a layer in the middle:

  • summaries
  • concept pages
  • people pages
  • comparison notes
  • timelines
  • question pages
  • cross-links between all of them

So over time, you stop talking directly to a messy pile of raw material. You talk to a maintained knowledge base that the AI has already organised.

That is why people say it can feel like a "second brain", though one ought to be careful with such grand language. It is better to think of it as a compounding research workspace.

How It Is Different from RAG

This is the central distinction.

With traditional RAG:

  • your source documents stay mostly raw
  • the system retrieves snippets at question time
  • the answer is created on the fly
  • little or no knowledge is preserved between questions

With an LLM Wiki:

  • the raw source documents still exist and stay untouched
  • but the AI creates a structured layer on top of them
  • knowledge is gradually accumulated instead of rediscovered
  • good answers can be saved back into the wiki for later use

So the value is not merely retrieval. The value is accumulation.

Karpathy's gist makes this point very clearly: normal document chat is fine, but it does not really compound. The wiki does.

The Three-Part Setup

The cleanest version of the idea has three layers.

1. Raw sources

This is where the original material lives.

Examples:

  • articles
  • PDFs
  • transcripts
  • notes
  • screenshots
  • reports
  • research papers

These are the source of truth. The AI can read them, but should not casually rewrite them.

2. The wiki

This is the AI-maintained layer.

These are normal markdown files containing things like:

  • summaries
  • definitions
  • topic pages
  • entity pages
  • comparisons
  • saved answers to useful questions

This layer is where the value compounds.

3. The schema or instructions

This is the set of rules telling the AI how to behave.

For example:

  • where files should go
  • how pages should be named
  • what frontmatter to use
  • how links should be created
  • how new sources should be ingested
  • how to perform a maintenance or "lint" pass

That is why this works well with coding agents and markdown-based note systems. The structure is explicit.

A Simple Way to Set One Up

You do not need a fancy stack to begin.

A very basic setup might be:

wiki/
  raw/
    articles/
    papers/
    transcripts/
  concepts/
  entities/
  comparisons/
  queries/
  index.md
  log.md
  SCHEMA.md
Click to copy

Then you do three things.

Step 1: Pick a topic or domain

Do not begin by dumping your entire digital life into it. That is absurd.

Start with one area where compounding knowledge is genuinely useful.

Good examples:

  • AI research
  • a business market you track
  • fitness and health notes
  • course material
  • books and essays
  • product research
  • customer or competitor intelligence

Step 2: Add a few high-quality sources

Karpathy's idea works best when the raw material is worth keeping.

So instead of dumping 800 random bookmarks into the folder, start with perhaps:

  • 10 strong essays
  • 5 papers
  • a few transcripts
  • a handful of notes you already know matter

Quality first. Quantity later.

Step 3: Let the AI build pages from those sources

Ask the AI to:

  • summarise each source
  • extract key concepts
  • create pages for important people, tools, or ideas
  • link related topics together
  • update index.md
  • append to log.md

That is already enough to make the system useful.

Why Obsidian Keeps Coming Up

Obsidian is not mandatory, but it is an obvious fit.

Why?

Because the whole thing is just markdown files.

Obsidian gives you:

  • easy browsing
  • wikilinks
  • graph view
  • frontmatter support
  • plugins like Dataview
  • a clean visual way to inspect what the AI is building

A lot of the YouTube explanations of Karpathy's wiki focus on this because it makes the concept feel concrete. You can literally watch the knowledge base grow.

So when people say, "Obsidian is the IDE," they mean that Obsidian is the human-facing workspace, while the AI is doing the tedious file maintenance in the background.

What the Actual Workflow Looks Like

In practice, the workflow usually comes down to three operations.

Ingest

You add a new source.

The AI reads it, writes a summary, updates relevant pages, creates new ones if needed, adds links, and logs what changed.

One good source might update ten existing pages. That is the compounding effect.

Query

You ask the system a real question.

Not merely, "What does this article say?" but things like:

  • What changed in this debate over time?
  • Which researchers disagree on this topic?
  • What are the recurring themes across these sources?
  • What blind spots keep appearing?

And if the answer is useful, you file it back into the wiki.

Lint

This is the maintenance pass.

The AI checks for things like:

  • duplicate pages
  • broken links
  • missing tags
  • orphaned notes
  • stale information
  • contradictions

This is one of the more interesting parts of the whole idea. Humans are usually bad at this sort of housekeeping because it is boring. LLMs are often perfectly fine at it.

Cool Use Cases

This is where the idea becomes more than clever note-taking.

Personal research

If you are reading heavily in one field, the wiki becomes a place where ideas accumulate instead of disappearing into separate chats.

Book notes that actually stay useful

Instead of isolated chapter summaries, you get linked pages for themes, people, events, and arguments across multiple books.

Business intelligence

A team or founder can ingest interviews, market reports, customer calls, and competitor material into one structured body of knowledge.

Learning difficult topics

For technical domains, the wiki can slowly become your own explanation layer in plain English.

Building your own AI memory system

This is one reason the idea took off in the agent crowd. It is a simple way to give an agent persistent, editable, inspectable memory using normal files.

A Concrete Example

Suppose you are researching open-source AI agents.

Your raw sources might include:

  • GitHub READMEs
  • launch blog posts
  • YouTube transcripts
  • your own notes
  • benchmark writeups

The wiki might then grow pages like:

  • agents-overview.md
  • hermes-agent.md
  • openclaw.md
  • agent-memory.md
  • rag-vs-wiki-memory.md
  • tool-use-patterns.md
  • questions/which-agent-is-best-for-research.md

Now when you ask a new question, the AI is not starting with chaos. It is starting with your already-structured knowledge base.

That is much better.

Why People Find It Exciting

The appeal is not merely that it stores notes.

The exciting part is the division of labour.

The human does the parts humans are good at:

  • choosing what matters
  • judging source quality
  • asking good questions
  • deciding what is worth keeping
  • making sense of the big picture

The AI does the parts machines are surprisingly good at:

  • summarising
  • indexing
  • cross-linking
  • formatting
  • updating multiple files
  • keeping structure consistent
  • doing repetitive maintenance without complaining

Karpathy's gist makes this point well: the tedious part of keeping a knowledge base is not always the thinking. It is the bookkeeping.

And bookkeeping is exactly the sort of thing you should be happy to outsource.

Where It Breaks Down

Now for the less breathless part.

This pattern is not perfect.

It is not magic truth

If the AI writes a bad synthesis into the wiki, the error can become persistent. A wrong answer in a chat disappears. A wrong answer in a maintained wiki can harden into fake knowledge unless you review it.

It works best with curation

If you feed it junk, you will still get junk, just better organised.

Personal use is easier than team use

One of the more useful counterpoints from the YouTube material is that this pattern is much cleaner for an individual than for a large team. Once many people or many agents are writing to shared memory, versioning and conflict resolution become more serious problems.

It is not always better than RAG at huge scale

For a personal wiki or a focused research base, the idea is excellent. For massive enterprise-scale corpora, traditional retrieval systems may still make more sense in many cases.

So no, it has not "replaced RAG" in some universal sense. That is YouTube title nonsense.

It has, however, offered a better pattern for many small-to-medium knowledge systems.

A Good Beginner Version

If you want to try this without overcomplicating it, do this:

  1. Make a wiki folder.
  2. Add 5 to 10 genuinely important sources.
  3. Write a simple SCHEMA.md telling the AI how to organise pages.
  4. Use markdown files only.
  5. Ask the AI to ingest one source at a time.
  6. Keep index.md and log.md up to date.
  7. Run periodic cleanup passes.

That is enough to understand the idea properly.

You do not need to build an "AI operating system" by Thursday.

Final Thought

Karpathy's LLM Wiki is one of those ideas that feels obvious the moment you hear it.

Of course you would want the model to maintain a persistent layer of organised knowledge instead of improvising from raw files every time.

Of course markdown files are a sensible format.

Of course a graphable, editable, versionable wiki is better than a pile of forgotten chats.

The idea is not flashy because it is complicated. It is powerful because it is simple.

It gives AI a place to remember.

And once an AI system can remember in a structured, inspectable way, it becomes much more useful.


Sources

  • Andrej Karpathy, LLM Wiki gist: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
  • Karpathy's LLM Wiki Explained — The Idea File That's Replacing RAG — AI Simplified
  • How to Build a Personal LLM Knowledge Base (Karpathy’s Method) — Upgraded
  • Karpathy's LLM Wiki: What It Means & How to Build One — Onchain AI Garage
  • Karpathy's LLM Wiki Doesn't Work for Teams. Here's What Does. — SenseLab
  • Andrej Karpathy Just 10x’d Everyone’s Claude Code — Nate Herk | AI Automation
  • Karpathy's LLM Wiki - Full Beginner Setup Guide — Teacher's Tech