Auto-Memory: Teaching Claude to Remember You So It Can Disappoint You Consistently

Claude remembers things now. Across conversations. Across days. Across projects. The memory system has a name and a folder and a file format and a small set of rules about what kinds of things go where. There is an index file Claude loads at the start of every session, and then specific deeper files Claude can fetch on demand when the conversation touches a relevant topic. It is, by the standards of AI systems three years ago, a substantial piece of engineering.

What it remembers, in practice, is a list of things that are also written on a sticky note next to my keyboard. What it forgets, in practice, is the actual shape of the code I am asking it to work on. Welcome to the world of selective machine memory, which turns out to look an awful lot like the world of selective human memory, except more predictable about which boring stuff it retains.

Let me describe the mechanic, lightly, because the mechanic is fine and I have no complaint about the mechanic. There is a top-level CLAUDE.md file Claude reads automatically every session. There is a MEMORY.md file that indexes other memory files I’ve accumulated. The memory files come in categories: user (my preferences, work style, naming conventions), project (context specific to TMN consulting or the FC blog or whatever else I’m working on), reference (schemas and configs and infrastructure scope — the kind of thing where the answer doesn’t change between Tuesday and Wednesday), and feedback (corrections I’ve made before that should not need re-making). When the conversation touches something one of these files might inform, Claude reads it. The system works. The system works well, in the specific narrow sense that I can ship a scheduled task and trust it to pull the right scope from the memory folder without me having to repeat myself.

I built one of these earlier today. A reference_sites_and_infra.md that says, in well-formatted markdown, “the account is X, the profile is three-moons, the primary region is us-east-2, here are the two sites and their buckets and distributions.” The new cost-delta scheduled task reads this before doing anything. It is exactly how auto-memory is supposed to work. The task does not have to be told the same things every time. The memory carries the context. The session loads light. The work happens. This is good.

And yet.

Claude remembers I prefer python3 over bare python. Has remembered for months. Will remind me, sometimes, that “you mentioned you prefer python3 — using that here.” Good. Useful. Thank you.

Claude does not remember the shape of the nav HTML on my consulting site. This is despite my having corrected the nav HTML — the same way, with the same error and the same fix — across more than a dozen sessions over the last six weeks. Every time, Claude generates a <ul class="nav-links"> block. Every time, I say “no, the canonical nav uses <div class="nav-inner">, please match.” Every time, Claude fixes it. The fix never makes it back to memory. Or it makes it back to memory in the form of “the user has corrected this once, in this session, here is the correction.” It does not generalize to “this is the shape of the user’s nav, and you should default to this shape going forward.”

The asymmetry: Memory is good at the things you’d write on a sticky note — stable preferences, naming conventions, hard rules about what tools to use. Memory is bad at the things you’d want a wiki for — the current shape of your code, the architectural decisions you’ve made and the reasons behind them, the specific class names you’ve settled on this month after last month’s class names didn’t work out.

The asymmetry is real and it is consistent. Memory is good at the things you’d write on a sticky note — stable preferences, naming conventions, hard rules about what tools to use. Memory is bad at the things you’d want a wiki for — the current shape of your code, the architectural decisions you’ve made and the reasons behind them, the specific class names you’ve settled on this month after last month’s class names didn’t work out.

This is, I think, reasonable engineering. The things memory is good at are small, durable, and hard to be wrong about. The things memory is bad at are long, mutable, and easy to get out of date in a way that would make memory worse than useless — actively misleading. Storing “the current shape of the nav” in memory means re-reading and re-validating it constantly. Storing “user prefers python3” means writing it once and never touching it again. The engineering trade-off makes sense. The behavioral comedy of it does not stop being funny just because the trade-off makes sense.

Selective memory, just like a real coworker. The one who remembers your coffee order but not the architecture diagram you walked them through last Tuesday. The one who can recite your preferred build commands from heart but always forgets which environment variable controls the staging database. The one whose memory is, frankly, biased toward the things that don’t matter and away from the things that do — except those biases aren’t quite the right framing, because the things memory IS good at do matter, just not in the way you wish they did when you’re tired and you have a deploy in twenty minutes.

I want to give the system its due here, because I am not interested in writing the easy dunk. Auto-memory is making my work better. It is saving me time across the scheduled tasks. It is giving the agents I configure a stable place to look for scope and identifiers. The reference_sites_and_infra.md I created this morning is going to keep saving me re-explanation effort every time a task or agent needs to know which account it’s talking to. That is a real win, and the win compounds: every memory file I write is a thing I won’t have to re-explain in some future session.

The real problem: Good memory is a curation problem, not a storage problem. The hard part of memory isn’t “write more memory files.” The hard part is deciding what should be in memory.

The deeper observation, and the one I keep coming back to, is that good memory is a curation problem, not a storage problem. The hard part of memory isn’t “write more memory files.” The hard part is deciding what should be in memory. What’s a stable preference worth storing forever? What’s a project-specific reality that’ll be wrong in three weeks and shouldn’t be in memory at all? What’s a one-off observation that should fade gracefully? What’s a hard rule that should be in CLAUDE.md and what’s a soft preference that should be in a memory file? This is a product question, and it’s one humans have been bad at for centuries. Human memory is also lossy, also biased, also weird about what it retains. Auto-memory inherits the problem. It also, to its credit, makes the problem visible. You can look at your memory folder. You can see what Claude thinks you care about. You can edit it. You can prune it. You can decide, explicitly, which Claude-version-of-you you actually want to be remembered as.

The partnership, again, after a year of this, looks roughly like this. Claude remembers the boring stable stuff so I don’t have to repeat myself across sessions. I remember the active shape of the code, because the active shape of the code is mine, and it would be wrong for memory to claim authority over a thing that mutates every commit. We meet in the memory folder, with a curated index of what’s worth saving and a willingness to overwrite the entries that turn out to be wrong. Sometimes Claude remembers something I wish it hadn’t. Sometimes Claude forgets something I wish it had. The ratio is, on net, improving. The disappointments are at least consistent.

Selective memory. Just like a real coworker. Just slightly more honest about which parts of you it kept.