Architecture

AI systems VS Context Rot

Building Persistent Memory and Guardrails for AI Coding Assistants

Boyan Balev
Boyan Balev Software Engineer
15 min
AI systems VS  Context Rot

Every AI coding session starts from zero.

You explain your project structure. Again. You describe the coding patterns you’ve established. Again. You remind it about that database migration issue from last week. Again. And when it inevitably makes the same mistake you corrected three sessions ago, you wonder why you’re paying for an “intelligent” assistant.

15 min Context Rebuilding
40% Session Overhead
3x Repeated Mistakes
0 Built-in Memory

The pattern is predictable. You start a session, explain everything, get productive work done, end the session. Tomorrow, the slate is wiped clean. Your context is gone. Your lessons evaporated.

This isn’t a limitation of the AI itself. It’s a limitation of how we use it. The AI doesn’t remember because we haven’t given it anything to remember from.


Where This Comes From

Before diving in, here’s the context for these ideas.

This approach draws from two key sources:

Ralph: The AI Agent Whisperer by Geoffrey Huntley explores how to effectively work with AI agents, including the importance of structured context and clear task definitions.

Effective Harnesses for Long-Running Agents from Anthropic’s engineering blog discusses building systems that help AI agents work effectively over extended periods, with insights on context management, verification, and error recovery.

The Agentic Context System is a practical implementation of these ideas, tailored for everyday AI-assisted development.


The Problem: Context Rot

To understand why external files work, you need to understand what happens inside an AI conversation.

Every AI model has a context window, a fixed amount of text it can process at once. For Claude, this is around 200,000 tokens. Sounds like a lot, but in a coding session with file contents, error messages, and back-and-forth discussion, you burn through it fast.

Here’s what happens in practice:

Early in a session: You explain the architecture. The AI “knows” that your app uses a specific database pattern, that certain files shouldn’t be touched, that tests need to pass before committing.

Mid-session: The context window fills. Older messages get compressed or dropped to make room for new ones. That architecture explanation from the start? It’s fuzzy now, or gone entirely.

Late in a session: The AI makes a decision that contradicts something you established earlier. It’s not being stubborn. It literally doesn’t have access to that information anymore.

This is context rot: the gradual degradation of shared understanding as conversations grow longer than the AI’s working memory.

The longer your project runs, the more valuable these files become. Lessons learned three months ago stay accessible. Architecture decisions from the first week remain clear. Nothing degrades.


The Solution: Externalize Everything

The fix is straightforward: write it down.

Instead of keeping context in your head (or hoping the AI magically retains it), you store it in files. Plain text files that the AI reads at session start and updates at session end. No database. No API. No subscription. Just files.

This is the Agentic Context System: a structured set of files that give any AI coding assistant persistent memory across sessions.

MetricWithout Context SystemWith Context System
Session start time 10-15 min explaining context 30 sec reading files
Coding consistency Varies wildly session to session Follows established patterns
Progress tracking Mental notes, Slack messages Documented in progress.txt
Mistake repetition Same errors every week Lessons learned persist
Onboarding new AI tools Start from scratch Same context works everywhere

The Seven Files

The entire system consists of seven files organized into two categories: Context Files that the AI reads at session start, and Task Files that track active work.

your-project/
├── CLAUDE.md                 # Technical specifications (project bible)
├── CURRENT_TASK.md           # Current task being worked on
├── feature_list.json         # Master registry of all features
├── progress.txt              # Append-only session log
└── .claude/
    ├── lessons-learned.md    # Accumulated wisdom from sessions
    └── static/
        ├── rules.md          # Critical operating rules
        └── checklists.md     # Verification procedures

Context Files (Read at Session Start)

CLAUDE.md Project Bible Technical specs, architecture, coding standards
rules.md Operating Rules Critical rules that must never be violated
lessons-learned.md Accumulated Wisdom Patterns and mistakes from previous sessions
checklists.md Verification Procedures Step-by-step verification for tasks and sessions

Task Files (Updated During Sessions)

feature_list.json Feature Registry All features with acceptance criteria and status
CURRENT_TASK.md Active Work The task being worked on right now
progress.txt Session Log Append-only audit trail of all sessions

Each file has a specific purpose. None are optional. Together, they give the AI everything it needs to pick up exactly where you left off.


CLAUDE.md: The Project Bible

The CLAUDE.md file is the most important file in the system. It’s the technical specification for your project, the source of truth that the AI references for every decision.

Here’s how a CLAUDE.md might look for a todo CLI project. The file starts with a project overview:

# CLAUDE.md - Todo CLI Technical Reference

## PROJECT OVERVIEW

A command-line todo list application in Python.

- **Project Name:** todo-cli
- **Tech Stack:** Python 3.11+, Click (CLI framework), SQLite (storage)

### Core Principles

1. **Simple First**: Start with basic features, add complexity later
2. **Test Everything**: Every feature has tests before it's complete
3. **User-Friendly**: Clear error messages, helpful --help text

Next, the architecture section documents the project structure:

## ARCHITECTURE

### Directory Structure

todo-cli/
├── src/
│   └── todo/
│       ├── __init__.py
│       ├── cli.py          # Click commands
│       ├── database.py     # SQLite operations
│       └── models.py       # Data classes
├── tests/
│   ├── test_cli.py
│   └── test_database.py
├── pyproject.toml
└── README.md

Finally, coding standards and common commands:

## CODING STANDARDS

- Use type hints everywhere
- Format with `black`
- Lint with `ruff`
- Test with `pytest`

## COMMANDS

# Development
pip install -e ".[dev]"    # Install in dev mode
pytest                      # Run tests
black .                     # Format code
ruff check .               # Lint code

What to include in CLAUDE.md:

  • Project overview and tech stack
  • Directory structure with explanations
  • Coding standards and conventions
  • Common commands for development

The file should answer one question: “What do I need to know to work on this project?“


feature_list.json: The Master Registry

The feature_list.json file is the authoritative list of everything your project should do. Each feature has a unique ID, acceptance criteria, and a pass/fail status.

{
  "project": {
    "name": "todo-cli",
    "version": "0.1.0",
    "description": "Command-line todo list application"
  },
  "phases": [
    {
      "id": "SETUP",
      "name": "Project Setup",
      "description": "Initialize the Python project",
      "features": [
        {
          "id": "SETUP-001",
          "phase": "SETUP",
          "name": "Initialize Python Project",
          "description": "Create pyproject.toml and basic structure",
          "dependencies": [],
          "acceptanceCriteria": [
            "pyproject.toml exists with project metadata",
            "src/todo/__init__.py exists",
            "pytest runs (even with 0 tests)",
            "black and ruff are configured"
          ],
          "passes": false
        }
      ]
    },
    {
      "id": "CORE",
      "name": "Core Features",
      "description": "Basic todo functionality",
      "features": [
        {
          "id": "CORE-001",
          "phase": "CORE",
          "name": "Database Layer",
          "description": "SQLite database for storing todos",
          "dependencies": ["SETUP-001"],
          "acceptanceCriteria": [
            "Can create a new database file",
            "Can add a todo item",
            "Can list all todo items",
            "Can mark a todo as complete",
            "Can delete a todo",
            "All operations have unit tests"
          ],
          "passes": false
        }
      ]
    }
  ]
}

The workflow is simple:

  1. Define features with acceptance criteria upfront
  2. Work on one feature at a time
  3. When ALL criteria are met, set "passes": true
  4. Move to the next feature

The passes field is binary. Not “mostly done.” Not “80% complete.” Either all criteria pass, or the feature isn’t done.


CURRENT_TASK.md: Real-Time Progress

The CURRENT_TASK.md file tracks the active task. It’s updated continuously as work progresses.

┌─────────────────────────────────────────────────────────────────────┐
│                        TASK LIFECYCLE                                │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   NOT_STARTED ────────▶ IN_PROGRESS ────────▶ COMPLETED             │
│        │                     │                    │                  │
│        │                     │                    ▼                  │
│        │                     │              Next Task                │
│        │                     │                                       │
│        │                     ▼                                       │
│        │               BLOCKED ──────▶ Human Input ──────▶ Resume   │
│        │                                                             │
└─────────────────────────────────────────────────────────────────────┘

Here’s a template for CURRENT_TASK.md:

# CORE-001: Database Layer

**Status:** IN_PROGRESS | **Previous:** SETUP-001

## Acceptance Criteria

- [x] Can create a new database file
- [x] Can add a todo item
- [ ] Can list all todo items
- [ ] Can mark a todo as complete
- [ ] Can delete a todo
- [ ] All operations have unit tests

## Progress

- 2024-01-15: Created database.py with init and add functions
- 2024-01-15: Added unit tests for init and add
- Currently working on list functionality

## Blockers

None currently.

## Next

- Implement list_todos() function
- Add mark_complete() and delete() functions
- Complete unit test coverage

The file gets updated as work progresses. Each session picks up where the last one ended.


The Supporting Files

Three additional files round out the system:

rules.md

Critical rules that must never be violated: one task at a time, commit frequently, never break tests, leave codebase clean. These are the non-negotiables.

checklists.md

Step-by-step verification procedures. Feature completion checklist. Session end checklist. The AI runs these before marking anything done.

lessons-learned.md

Accumulated wisdom from past sessions. That SQLite connection issue. That import that always breaks. Patterns that work. Mistakes to avoid.

rules.md excerpt:

## CRITICAL RULES - NEVER VIOLATE

1. **ONE TASK AT A TIME**
   - Never work on more than one feature per session
   - Complete the current task before starting another
   - If blocked, mark as BLOCKED rather than switching tasks

2. **COMMIT FREQUENTLY**
   - Commit after every meaningful change
   - Small, focused commits are better than large ones
   - Never end a session with uncommitted changes

3. **LEAVE CODEBASE CLEAN**
   - `git status` must show "nothing to commit" at session end
   - All tests must pass
   - No linting errors

lessons-learned.md example:

## Database

1. ALWAYS USE CONTEXT MANAGERS FOR SQLITE
   - Bad: `conn = sqlite3.connect(db); conn.execute(...)`
   - Good: `with sqlite3.connect(db) as conn: conn.execute(...)`
   - Discovered in CORE-001 when database locked after crash

The lessons-learned file grows over time. Each mistake becomes a permanent memory. The AI never makes the same error twice because it reads the file at session start.


The Verification Feedback Loop

Here’s a common failure mode: the AI writes code, says “done,” and moves on. Later you discover it didn’t actually work. Tests weren’t run. Edge cases weren’t handled. The AI claimed completion without verification.

The fix is explicit verification requirements. The checklists.md file contains step-by-step procedures that the AI must run before marking anything complete.

## Feature Completion Checklist

Before marking any feature as `passes: true`:

1. [ ] All acceptance criteria have passing tests
2. [ ] `pytest` passes with no failures
3. [ ] `ruff check .` shows no errors
4. [ ] `mypy .` shows no type errors
5. [ ] Code is committed with meaningful message
6. [ ] CURRENT_TASK.md updated with completion status

This creates a feedback loop: Code → Test → Fail → Fix → Test → Pass. The checklist forces the AI to close the loop before claiming completion.

Visual Verification for UI Work

For frontend work, test suites alone aren’t enough. A test might pass while the UI looks broken. Modern tools can help: screenshot-based testing with Playwright or Puppeteer, visual regression tools, and even AI agents that can take screenshots and verify layouts.

If you’re doing UI work, consider adding visual verification steps to your checklists:

## UI Feature Checklist (addition to standard checklist)

1. [ ] Component renders without console errors
2. [ ] Responsive at mobile/tablet/desktop breakpoints
3. [ ] Visual comparison against design spec

The tooling here is evolving quickly. Vision-capable AI agents can now take screenshots and verify that UI matches expectations. This is a capability worth keeping an eye on as these tools mature.


Planning and Acceptance Criteria

The feature_list.json file embodies a specific approach to planning: define success before starting work.

Each feature has acceptance criteria: specific, testable statements about what “done” means. This isn’t just documentation; it shapes how the AI works.

Without upfront criteria:

  • “Implement the database layer”
  • AI decides what to build
  • Scope creep, over-engineering, or missing features
  • Ambiguous “is this done?” debates

With upfront criteria:

  • “Can create a new database file” - testable
  • “Can add a todo item” - testable
  • “All operations have unit tests” - verifiable
  • Binary pass/fail, no ambiguity

The temptation is to skip this planning and just tell the AI what to build. For simple tasks, that works. For anything spanning multiple sessions, you’ll spend more time debugging scope confusion than you saved by skipping planning.

The discipline is simple: before the AI writes any code for a feature, the acceptance criteria must exist. Before the feature is marked complete, every criterion must pass. No exceptions.


The Workflow

The Agentic Context System follows a simple loop:

┌─────────────────────────────────────────────────────────────────┐
│                     THE AGENTIC LOOP                             │
│                                                                  │
│   ┌─────────┐      ┌─────────┐      ┌─────────┐      ┌───────┐  │
│   │  READ   │ ───▶ │  WORK   │ ───▶ │ VERIFY  │ ───▶ │UPDATE │  │
│   │ Context │      │ On Task │      │ Quality │      │ Files │  │
│   └─────────┘      └─────────┘      └─────────┘      └───────┘  │
│        ▲                                                  │      │
│        └──────────────────────────────────────────────────┘      │
│                          (repeat)                                │
└─────────────────────────────────────────────────────────────────┘

Session Start:

  1. AI reads CLAUDE.md, CURRENT_TASK.md, rules.md, lessons-learned.md
  2. AI understands where the project is and what needs to be done
  3. AI begins working on the current task

During Session:

  1. AI makes small, incremental changes
  2. AI commits after each meaningful change
  3. AI updates CURRENT_TASK.md with progress

Session End:

  1. AI runs verification checklist (tests, linting, type checking)
  2. AI commits all changes
  3. AI updates CURRENT_TASK.md status
  4. AI updates feature_list.json if task complete
  5. AI appends session summary to progress.txt

A Real Session Walkthrough

Here’s how a session typically flows. We’re building a todo CLI and working on the database layer.

Starting state: CURRENT_TASK.md shows 2 of 6 acceptance criteria complete for CORE-001.

You say: “Read my project files and continue the current task.”

What the AI does:

  1. Reads context: CLAUDE.md (Python/Click/SQLite stack), CURRENT_TASK.md (2 done, 4 remaining), rules.md (commit frequently, test everything), lessons-learned.md (use context managers for SQLite)

  2. Implements remaining criteria: list_todos(), mark_complete(), delete(), with tests for each

  3. Verifies: runs pytest (all pass), runs ruff (no errors)

  4. Updates files: marks all checkboxes in CURRENT_TASK.md, sets "passes": true in feature_list.json, appends summary to progress.txt

Ending state: Task complete. All tests passing. Next session can start CORE-002.

The key observation: the AI used the context manager pattern from lessons-learned.md without being told. It read the file, saw the note from a previous session, and applied the learning. That’s the point of the system: accumulated knowledge that persists across sessions.


Getting Started

Setting up the Agentic Context System takes about 5 minutes:

# Clone the template
git clone https://github.com/balevdev/agentic-context-system.git my-project
cd my-project

# Initialize git (if starting fresh)
git init
git add .
git commit -m "chore: initialize agentic context system"

# Customize for your project
# 1. Edit CLAUDE.md with your project specs
# 2. Edit feature_list.json with your features
# 3. Edit .claude/static/checklists.md with your test commands
# 4. Start your first session

Customization Guide

The template includes placeholder commands. Replace them with your stack-specific tools:

StackTest CommandType CheckLint
Python pytest mypy .
Node.js/TypeScript npm test tsc --noEmit
Go go test ./... go vet ./...
Rust cargo test cargo check

Update .claude/static/checklists.md with your specific commands. The AI will use these during verification.

Project Type Customization

Web App: Add API endpoints to CLAUDE.md, component patterns, state management approach.

CLI Tool: Add command structure, argument parsing patterns, help text conventions.

Library: Add public API documentation, versioning strategy, backward compatibility rules.

Microservice: Add service boundaries, communication patterns, deployment configuration.


Common Pitfalls


Why Files?

Why not a database, or an API, or a purpose-built tool?

Files are the lowest common denominator in software development. Any tool can read them. Any human can understand them. They diff cleanly, merge reasonably, and survive tool changes. When you switch from Claude to GPT or from Cursor to VS Code, the files remain exactly the same.


Conclusion

AI coding assistants start every session from zero because there’s nothing to remember from. The Agentic Context System provides that something: seven files that externalize your project’s architecture, coding standards, progress, and accumulated lessons.

The AI reads these files at session start and updates them at session end. Context persists. Progress accumulates. Mistakes become documented lessons that prevent future repetition.

Getting Started

If you work with AI coding assistants on projects spanning multiple sessions, set up the Agentic Context System. Clone the template, customize the files for your project, and start your next session with context instead of explanations.

github.com/balevdev/agentic-context-system