My AI Agent Keeps Forgetting Everything — DeepSeek Blog | Neura Market
    Neura MarketNeura Market/DeepSeek
    ChatGPTChatGPTClaudeClaudeGeminiGeminiCursorCursorGrokGrokPerplexityPerplexityDeepSeekDeepSeek
    CoPilotCoPilotStable DiffusionStable DiffusionMidjourneyMidjourney
    View All Directories
    OverviewRulesPromptsMCPsAgentsBlogVideosGuidesCoursesCommunityTrendingGenerate
    DeepSeekBlogMy AI Agent Keeps Forgetting Everything
    Back to Blog
    My AI Agent Keeps Forgetting Everything
    claudecode

    My AI Agent Keeps Forgetting Everything

    Stephen J Newhouse April 7, 2026
    0 views

    My AI Agent Keeps Forgetting Everything; So do I... I have multiple sclerosis. Some days are...

    > My AI Agent Keeps Forgetting Everything; So do I... I have multiple sclerosis. Some days are better than others, but one thing is constant: repeating myself is expensive. Cognitive fatigue means every wasted explanation costs me something I can't get back. So when the AI coding agent started each session from scratch, forgetting every architecture decision, every constraint, every piece of context I'd painstakingly built up, it wasn't just annoying. It was a genuine problem. ## AA-MA Forge > AA-MA: Advanced Agentic Memory Architecture ... ![AA_MA_Forge](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/x46l06lder6xy49f0dug.png) ## The context wall If you've used Claude Code (or Cursor, or Copilot) for anything longer than a single session, you know the feeling. Monday morning, you open a new conversation. The agent has no memory of Friday's work. You re-explain the architecture. You re-state the constraints. You watch it drift from the plan you agreed on two days ago. Three sessions in, you've spent more time re-establishing context than writing code. For small tasks, this is tolerable. For multi-week projects with dependencies, milestones, and real stakes, it's a dealbreaker. ## What I tried first Big instruction files. Massive CLAUDE.md documents stuffed with architecture summaries, coding standards, and project history. They helped, but they mixed things that change (execution state, what's done, what's next) with things that don't (API endpoints, file paths, schema definitions). The agent couldn't tell the difference. It would hallucinate facts that were sitting right there in the doc, or re-litigate decisions I'd already made. Conversation summaries were worse. Lossy compression of context meant the important details evaporated first. ## The spark At 3am one night, scrolling Reddit because my brain wouldn't shut up and the MS "tingled" me awake, I found [Diet-Coder's post](https://dev.to/diet-code103/claude-code-is-a-beast-tips-from-6-months-of-hardcore-use-572n), and something about a "Dev Docs System": three files per task that give the agent structured memory. Plan, context, tasks. That was the seed. I took those three files and turned them into five. ## Why five, not three Three files tangle different kinds of knowledge together. Strategy sits next to execution state. Facts mix with decisions. When the agent loads context, it can't prioritise. It reads everything, weighs nothing. Five files separate knowledge by how it behaves: - Things that **don't change** (API endpoints, file paths, constants) go in one place. - Things that **explain why** (decisions, trade-offs, gate approvals) go in another. - **Where you are right now** (task status, what's done, what's next) gets its own file. - **Strategy** (the plan, milestones, acceptance criteria) stays separate from execution. - **What happened** (commits, session checkpoints, audit trail) goes in an append-only log. When the agent picks up a new session, it loads the facts and the task state first. It only pulls in the decision history when it needs to make a choice. The plan stays available but doesn't clutter working memory. The separation sounds obvious in hindsight. It took months of trial and error and battle tested against real projects and deliverables to get right - or at least working well enough to stop me screaming at the machine and freaking out my kid and the neighbours.. ## What it looks like I built this into a set of Claude Code commands. The workflow is three steps: ```bash # Plan: brainstorm with the agent, then generate structured artifacts /aa-ma-plan "build a REST API for user authentication" # Execute: work through each milestone, sync the files, commit /execute-aa-ma-milestone # Archive: move completed work to the done pile /archive-aa-ma auth-api ``` Between planning and archiving, the agent reads the five files at the start of every session, updates them as it works, and commits after every task. Context survives across sessions. Decisions don't get re-litigated. The audit trail is there if you need it. ## It goes deeper than three commands I didn't plan to build all of this. Each feature exists because something went wrong without it. * **11 mandatory planning outputs.** Every plan includes an executive summary, milestones, acceptance criteria, rollback strategy, risk register, effort estimates, and six more. If you can't write a pytest assertion from the acceptance criteria, they're not specific enough. * **6-angle adversarial verification.** Before execution begins, parallel agents attack the plan from six independent angles: do the files actually exist? What assumptions are we making? What breaks if we change these files? Can a fresh agent with no context execute this plan? Are there domain-specific risks the generalist missed? CRITICALs block execution. * **HITL/AFK task dispatch.** Each task is marked as needing human input (HITL) or fully autonomous (AFK). Architectural decisions pause for you. Test writing runs on its own. The agent knows the difference. * **HARD/SOFT milestone gates.** Some checkpoints are advisory: the agent seeks approval but continues if you're away. Others are hard stops: the execution command refuses to advance without a signed approval entry in the context log. * **Compaction hook.** Claude Code compacts its context window when it fills up. Without intervention, your agent's working memory vanishes mid-task. The hook intercepts that moment, writes checkpoint entries to the task's provenance log and context log, and preserves state for the next session. * **Complexity routing.** Tasks scoring 80% or above on a weighted algorithm (scope, architectural impact, technical risk, dependencies, requirements ambiguity) automatically route to deeper review. Human sign-off, chain-of-thought reasoning, or both. None of this was designed upfront. Each piece was bolted on after a failure made it obvious. The verification system exists because I shipped a plan with API endpoints that didn't exist. The gate system exists because the agent once completed a production deployment while I was making coffee. ## How this compares I looked hard at what else is out there before publishing. [claude-mem](https://github.com/thedotmack/claude-mem) is excellent. Over 44,000 stars, and for good reason. It captures observations automatically and builds a searchable memory across sessions. I use it alongside AA-MA. But it has no concept of planning, milestones, or execution tracking. It remembers what happened. AA-MA remembers what should happen next. [Cursor Memory Bank](https://github.com/vanzan01/cursor-memory-bank) and [Cline Memory Bank](https://docs.cline.bot/features/memory-bank) use six markdown files per project. Similar philosophy, and they've earned wide adoption. The difference: they're project-scoped (one memory bank per repo), not task-scoped (one set per active task). No immutable reference file, no gates, no provenance logging. [Simone](https://github.com/Helmi/claude-simone) is the closest competitor in spirit. A full project management framework for Claude Code. Less formalised than AA-MA: no versioned specification, no gate approvals, no commit signatures linking git history to active plans. [Compound Engineering](https://github.com/EveryInc/compound-engineering-plugin) focuses on compounding knowledge across sessions. 26 specialised agents. More about the learning loop than structured execution tracking. These are good tools. They solve real problems. The gap I couldn't fill with any of them: no single system combines execution tracking, adversarial plan verification, gate classification, commit signatures, and compaction hooks into one coordinated framework. That's what AA-MA is. ## What this is It's opinionated. Built around how I work: regulated industries, multi-week timelines, zero tolerance for context drift. The overhead of five files per task isn't for everyone. But if you've ever lost a week of context to a Monday morning, or watched an agent confidently re-implement something you'd already rejected, it pays for itself. The specification is [versioned](https://github.com/snewhouse/aa-ma-forge/blob/main/docs/spec/aa-ma-specification.md) (v2.1). The file formats are defined. There are standalone templates for every file type. It's the kind of rigour you'd expect from a system built by someone who works in regulatory environments, because that's exactly what it is. ## Credits Diet-Coder planted the seed with those three files. [Matt Pocock's skills repo](https://github.com/mattpocock/skills) helped shape how I organised the commands. [Helix.ml](https://helix.ml) informed the gate classification system. Full provenance is in the repo. ## Take what's useful **The whole thing is on GitHub: [aa-ma-forge](https://github.com/snewhouse/aa-ma-forge)**. Clone it, try it, fork it, make it your own. There's an installer that deploys everything into your Claude Code setup with one command, and an uninstaller that reverses it cleanly. Fair warning: maintenance will be sporadic. If I've gone quiet, I'm either deep in client work, arguing with an API, or the MS is having a louder day than usual. Pull requests welcome, but don't hold your breath on response times. If it saves you time or sanity, consider donating to an [MS charity](https://www.mssociety.org.uk/). Small acts, big ripples. > PS. If you want cross-session memory retrieval rather than task execution structure, The 5th Element has a gitrepo: https://github.com/milla-jovovich/mempalace

    Tags

    claudecodeaiproductivityopensource

    Comments

    More Blog

    View all
    How I'm using ASTs and Gemini to solve the "Codebase Onboarding" problem 🧠ai

    How I'm using ASTs and Gemini to solve the "Codebase Onboarding" problem 🧠

    Hi everyone! 👋 I’m Tara, a Senior Software Engineer and Consultant. Over the years, I've jumped...

    T
    tworrell
    Local AI Will Save Us All (The Math Says So, Trust Me)ai

    Local AI Will Save Us All (The Math Says So, Trust Me)

    Every few weeks a take goes viral in tech circles making the case for ditching cloud AI and running...

    S
    Sebastian Schürmann
    Lost in the AI Hype, I Started Smallai

    Lost in the AI Hype, I Started Small

    And it helped me get back into tech without drowning TL;DR at the end Coming back to...

    R
    Rohini Gaonkar
    Building a Replay-Tested Interactive Brokers Client in Gogo

    Building a Replay-Tested Interactive Brokers Client in Go

    I wanted an IBKR library that felt like Go and had testing I could trust. So I wrote one.

    T
    Thomas Marcelis
    Playwright in Pictures: Fully Parallel Modeplaywright

    Playwright in Pictures: Fully Parallel Mode

    Playwright’s fullyParallel mode is often treated as a simple performance switch. In practice, it...

    V
    Vitaliy Potapov
    Designing a CLI for Both Humans and Agentscli

    Designing a CLI for Both Humans and Agents

    Learn how Alpic designed its CLI for both human developers and AI agents — covering tradeoffs like polling, context windows, interactivity, and statelessness.

    J
    Julien Vallini

    Stay up to date

    Get the latest DeepSeek prompts, rules, and resources delivered to your inbox weekly.

    Neura Market LogoNeura Market

    Discover the best AI prompts, plugins, and resources for DeepSeek and more.

    Content Types

    • Rules
    • Prompts
    • MCPs
    • Agents
    • Guides

    Platforms

    • ChatGPT Directory
    • Claude Directory
    • Gemini Directory
    • Cursor Directory
    • Grok Directory
    • Perplexity Directory
    • DeepSeek Directory
    • CoPilot Directory
    • Stable Diffusion Directory
    • Midjourney Directory
    • All Directories

    Resources

    • Blog
    • Documentation
    • Help Center
    • Marketplace

    Legal

    • Privacy Policy
    • Terms of Service

    © 2026 Neura Market. All rights reserved.

    |

    Not affiliated with any AI platform vendors.