Architecture • January 18, 2026

AI systems VS Context Rot

Building Persistent Memory and Guardrails for AI Coding Assistants

Boyan Balev Software Engineer

18 min

Every AI coding session starts from zero.

You explain your project structure. Again. You describe the coding patterns you’ve established. Again. You remind it about that database migration issue from last week. Again. And when it inevitably makes the same mistake you corrected three sessions ago, you wonder why you’re paying for an “intelligent” assistant.

15 min Context Rebuilding

40% Session Overhead

3x Repeated Mistakes

0 Built-in Memory

The pattern is predictable. You start a session, explain everything, get productive work done, end the session. Tomorrow, the slate is wiped clean. Your context is gone. Your lessons evaporated.

This isn’t a limitation of the AI itself. It’s a limitation of how we use it. The AI doesn’t remember because we haven’t given it anything to remember from.

Where This Comes From

Before diving in, here’s the context for these ideas.

This approach draws from two key sources:

Ralph: The AI Agent Whisperer by Geoffrey Huntley explores how to effectively work with AI agents, including the importance of structured context and clear task definitions.

Effective Harnesses for Long-Running Agents from Anthropic’s engineering blog discusses building systems that help AI agents work effectively over extended periods, with insights on context management, verification, and error recovery.

The Agentic Context System is a practical implementation of these ideas, tailored for everyday AI-assisted development.

The Problem: Context Rot

To understand why external files work, you need to understand what happens inside an AI conversation.

Every AI model has a context window, a fixed amount of text it can process at once. For Claude, this is around 200,000 tokens. Sounds like a lot, but in a coding session with file contents, error messages, and back-and-forth discussion, you burn through it fast.

Here’s what happens in practice:

Early in a session: You explain the architecture. The AI “knows” that your app uses a specific database pattern, that certain files shouldn’t be touched, that tests need to pass before committing.

Mid-session: The context window fills. Older messages get compressed or dropped to make room for new ones. That architecture explanation from the start? It’s fuzzy now, or gone entirely.

Late in a session: The AI makes a decision that contradicts something you established earlier. It’s not being stubborn. It literally doesn’t have access to that information anymore.

This is context rot: the gradual degradation of shared understanding as conversations grow longer than the AI’s working memory.

The longer your project runs, the more valuable these files become. Lessons learned three months ago stay accessible. Architecture decisions from the first week remain clear. Nothing degrades.

The Solution: Externalize Everything

The fix is straightforward: write it down.

Instead of keeping context in your head (or hoping the AI magically retains it), you store it in files. Plain text files that the AI reads at session start and updates at session end. No database. No API. No subscription. Just files.

This is the Agentic Context System: a structured set of files that give any AI coding assistant persistent memory across sessions.

Metric	Without Context System	With Context System
Session start time	10-15 min explaining context	30 sec reading files
Coding consistency	Varies wildly session to session	Follows established patterns
Progress tracking	Mental notes, Slack messages	Documented with phase gates
Mistake repetition	Same errors every week	Lessons learned persist
Agent handoffs	Start from scratch each time	Structured context transfer

The File System

The Agentic Context System organizes files into three categories: Context Files that the AI reads at session start, Task Files that track active work, and Templates that standardize workflows.

your-project/
├── CLAUDE.md                     # Project bible (tech specs, architecture)
├── CURRENT_TASK.md               # Active task with phase tracking
├── Prompt.md                     # Session execution protocol
├── features/
│   ├── index.json                # Lightweight status map + session plan
│   ├── backlog.json              # Slim list (id, name, deps only)
│   └── active/                   # Full definitions for active features
│       └── FEATURE_ID.json
└── .claude/
    ├── lessons-learned.md        # Accumulated wisdom from sessions
    ├── commands/
    │   └── work.md               # Planning mode with implementation hints
    ├── templates/
    │   └── HANDOFF.md            # Context spawn template
    └── static/
        ├── patterns.md           # Stack-specific patterns
        ├── checklists.md         # Phase-specific verification
        └── rules.md              # Critical operating rules

Context Files (Read at Session Start)

CLAUDE.md Project Bible Technical specs, architecture, coding standards

Prompt.md Execution Protocol Session startup, phase gates, blocking rules

rules.md Operating Rules Critical rules that must never be violated

lessons-learned.md Accumulated Wisdom Patterns and mistakes from previous sessions

Task Files (Updated During Sessions)

features/index.json Status Map Lightweight feature status + session planning

features/active/ Feature Definitions Full specs for in-progress features

CURRENT_TASK.md Active Work Current phase, attempts, blockers

HANDOFF.md Context Transfer State for spawning new agents

Each file has a specific purpose. The key improvement over simpler systems is separation of concerns: lightweight files for quick status checks, detailed files only when needed.

CLAUDE.md: The Project Bible

The CLAUDE.md file is the most important file in the system. It’s the technical specification for your project, the source of truth that the AI references for every decision.

Here’s how a CLAUDE.md might look for a todo API project. The file starts with a project overview:

# CLAUDE.md - Todo API Technical Reference

## PROJECT OVERVIEW

A REST API for managing tasks.

- **Project Name:** todo-api
- **Tech Stack:** Bun, Hono, SQLite (via Bun's built-in sqlite)

### Core Principles

1. **Type Safety**: TypeScript everywhere, no `any` types
2. **Test First**: Write tests before implementation when possible
3. **Simple APIs**: RESTful design, clear error messages

Next, the architecture section documents the project structure:

## ARCHITECTURE

### Directory Structure

todo-api/
├── src/
│   ├── index.ts          # Entry point, Hono app setup
│   ├── routes/
│   │   └── todos.ts      # Todo CRUD routes
│   ├── db/
│   │   └── index.ts      # Database connection
│   └── types/
│       └── todo.ts       # TypeScript interfaces
├── tests/
│   ├── routes/
│   │   └── todos.test.ts
│   └── setup.ts          # Test utilities
├── package.json
└── tsconfig.json

Finally, coding standards and common commands:

## CODING STANDARDS

- Use TypeScript with strict mode
- Format with Prettier
- Lint with ESLint
- Test with Bun's built-in test runner

## COMMANDS

# Development
bun install                 # Install dependencies
bun run dev                 # Start dev server
bun test                    # Run tests
bun run lint               # Lint code

What to include in CLAUDE.md:

Project overview and tech stack
Directory structure with explanations
Coding standards and conventions
Common commands for development

The file should answer one question: “What do I need to know to work on this project?”

Prompt.md: The Execution Protocol

The Prompt.md file is new in this improved system. It defines how the AI should work, not just what it should know.

# Session Protocol

## Startup
1. CURRENT_TASK.md → Active feature
2. features/index.json → Check status_map for dependency passes
3. features/active/FEATURE_ID.json → Get full feature definition
4. Read by task type (patterns.md for backend, design.md for frontend)

## Phase Gate Rule

After completing Phase N:
1. Run: `bun test && bun run lint`
2. If pass: Commit `[FEATURE_ID] Phase N: Description`
3. If fail: Fix before Phase N+1 (counts toward 3-attempt limit)
4. Update CURRENT_TASK.md:
   - Set Phase N status to `complete` with commit hash
   - Increment Current Phase to N+1
   - Reset Attempts to 0/3

## Attempt Tracking

On any test/lint failure:
1. Increment attempt counter (e.g., 1/3 → 2/3)
2. Log error in Attempt Log: `[YYYY-MM-DD HH:MM] - Error description`
3. At 3/3: Set status to BLOCKED, document all attempts, STOP work

## End
git status    # Clean
bun test      # Pass
bun run lint  # Pass

Phase Gates & Verification

The biggest improvement in this system is phase gates: mandatory checkpoints that prevent the AI from moving forward until the current phase is verified.

┌─────────────────────────────────────────────────────────────────┐
│                     PHASE GATE FLOW                              │
│                                                                  │
│   ┌────────┐    ┌────────┐    ┌────────┐    ┌────────┐          │
│   │Phase 1 │───▶│ VERIFY │───▶│Phase 2 │───▶│ VERIFY │───▶ ...  │
│   │ Build  │    │& Commit│    │ Build  │    │& Commit│          │
│   └────────┘    └────────┘    └────────┘    └────────┘          │
│                      │                           │               │
│                      ▼                           ▼               │
│                 FAIL? Fix                   FAIL? Fix            │
│                 (counts as                  (counts as           │
│                  attempt)                    attempt)            │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Each feature is broken into execution phases:

Schema & Types - Data structures, interfaces
Validation - Input validation with tests
Queries - Database/API operations with tests
UI Components - User interface pieces
Routes - Page/endpoint wiring
Integration - End-to-end testing

After each phase:

Run tests and linting
If pass: Commit with descriptive message
If fail: Fix the issue (counts as an attempt)
Update CURRENT_TASK.md with phase status

This creates accountability. The AI can’t claim “phase 2 done” without a passing commit. You can trace exactly when each phase completed and what changes were included.

Attempt Tracking & Blocking

AI agents can get stuck in loops, trying the same failing approach repeatedly. The attempt tracking system prevents this.

## Current Phase

**Phase:** 2 - Validation
**Attempts:** 2/3

### Attempt Log
[2024-01-15 10:30] - bun test failed: test_validate_title expect error
[2024-01-15 10:45] - bun test failed: validation function not exported

The rules:

Each test/lint failure increments the attempt counter
Log what went wrong and what was tried
At 3 attempts: STOP. Mark as BLOCKED. Wait for human.

┌─────────────────────────────────────────────────────────────────┐
│                    ATTEMPT TRACKING                              │
│                                                                  │
│   Attempt 1 ──▶ Fail ──▶ Log + Fix ──▶ Attempt 2 ──▶ Fail       │
│                                              │                   │
│                                              ▼                   │
│                                    Log + Fix ──▶ Attempt 3       │
│                                                      │           │
│                                                      ▼           │
│                                                Fail ──▶ BLOCKED  │
│                                                      │           │
│                                                      ▼           │
│                                               Wait for Human     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

When a task is BLOCKED:

## Status

**BLOCKED** - Exceeded 3 attempts on Phase 2

### Attempt Log
[2024-01-15 10:30] - bun test failed: validation function not found
[2024-01-15 10:45] - Added export, now type error in todo.ts
[2024-01-15 11:00] - Fixed types, now circular import error

### What Was Tried
1. Added missing export to validators/index.ts
2. Fixed CreateTodoInput type definition
3. Attempted to break circular dependency by moving types to separate file

### What's Needed
Human review of the circular dependency between validators/todo.ts
and types/todo.ts. May need architectural decision about where
validation types should live.

The AI stops making things worse. You come back to a documented situation, not a mess.

Feature Registry: The features/ Directory

Instead of a monolithic feature_list.json, the improved system uses a directory structure:

features/
├── index.json       # Lightweight: status + session plan
├── backlog.json     # Slim: id, name, dependencies only
└── active/          # Full definitions for current work
    └── CORE-001.json

index.json (lightweight, always read):

{
  "meta": {
    "project": "todo-api",
    "total_features": 12,
    "completed_count": 3,
    "next_priority": "CORE-002",
    "session_plan": {
      "1": {
        "name": "Foundation",
        "features": ["SETUP-001", "SETUP-002"],
        "token_budget": 50000
      },
      "2": {
        "name": "Core API",
        "features": ["CORE-001", "CORE-002", "CORE-003"],
        "token_budget": 80000
      }
    }
  },
  "status_map": {
    "SETUP-001": { "passes": true, "session": 1 },
    "SETUP-002": { "passes": true, "session": 1 },
    "CORE-001": { "passes": false, "session": 2 }
  }
}

backlog.json (slim reference):

{
  "features": [
    {
      "id": "CORE-001",
      "name": "Database Layer",
      "session": 2,
      "dependencies": ["SETUP-001"]
    },
    {
      "id": "CORE-002",
      "name": "Todo CRUD Routes",
      "session": 2,
      "dependencies": ["CORE-001"]
    }
  ]
}

active/CORE-001.json (full definition):

{
  "id": "CORE-001",
  "name": "Database Layer",
  "description": "SQLite database for storing todos using Bun's built-in sqlite",
  "session": 2,
  "token_budget": 25000,
  "dependencies": ["SETUP-001"],
  "acceptance_criteria": [
    "Can create a new database file",
    "Can add a todo item",
    "Can list all todo items",
    "Can mark a todo as complete",
    "Can delete a todo",
    "All operations have unit tests"
  ],
  "execution_phases": [
    {
      "phase": 1,
      "name": "Schema & Types",
      "tasks": ["Todo interface", "Database schema"]
    },
    {
      "phase": 2,
      "name": "Validation",
      "tasks": ["Title validation", "Input sanitization"]
    },
    {
      "phase": 3,
      "name": "Queries",
      "tasks": ["CRUD operations", "Query tests"]
    }
  ],
  "quality_gates": [
    "All CRUD operations work",
    "Database handles errors gracefully",
    "Error responses follow ApiError format"
  ]
}

Session Planning

Features are grouped into sessions — coherent chunks of work that fit within a token budget and represent a logical milestone.

"session_plan": {
  "1": {
    "name": "Foundation",
    "features": ["SETUP-001", "SETUP-002"],
    "token_budget": 50000
  },
  "2": {
    "name": "Core Database",
    "features": ["CORE-001", "CORE-002"],
    "token_budget": 80000
  },
  "3": {
    "name": "API Routes",
    "features": ["ROUTE-001", "ROUTE-002", "ROUTE-003"],
    "token_budget": 100000
  }
}

Before each session:

Check which session you’re on
Verify all features from previous sessions pass
Load the full definitions for this session’s features
Check that dependencies are met

Session completion:

All features in the session have passes: true
All tests pass
Clean git status
CURRENT_TASK.md updated to next session’s first feature

This creates natural checkpoints and helps estimate project progress.

CURRENT_TASK.md: Real-Time Progress

The CURRENT_TASK.md file tracks the active task with phase-level detail.

# CORE-001: Database Layer

**Status:** IN_PROGRESS | **Session:** 2

## Phase Progress

| Phase | Name            | Status   | Commit  |
|-------|-----------------|----------|---------|
| 1     | Schema & Types  | complete | a1b2c3d |
| 2     | Validation      | complete | e4f5g6h |
| 3     | Queries         | in_progress | -    |
| 4     | UI Components   | pending  | -       |
| 5     | Routes          | pending  | -       |
| 6     | Integration     | pending  | -       |

## Current Phase

**Phase:** 3 - Queries
**Attempts:** 1/3

### Attempt Log
[2024-01-15 14:30] - pytest failed: test_list_todos assertion error

## Acceptance Criteria

- [x] Can create a new database file
- [x] Can add a todo item
- [ ] Can list all todo items
- [ ] Can mark a todo as complete
- [ ] Can delete a todo
- [ ] All operations have unit tests

## Next

- Fix list_todos() to return correct order
- Implement mark_complete() and delete()
- Complete test coverage

The phase table makes it immediately clear:

What’s been committed
What’s in progress
Where the AI is stuck (if blocked)

Implementation Hints

When planning a feature, the improved system includes implementation hints — concrete file paths, pattern references, and code snippets.

## Implementation Hints

### Files to Create/Modify
- `src/db/index.ts` (create)
- `src/types/todo.ts` (modify: add Todo interface)
- `tests/db/todos.test.ts` (create)

### Pattern References
- Database setup: `src/db/connection.ts` lines 1-20
- Interface with validation: `src/types/user.ts` lines 1-25

### Test Cases (write first)
1. Empty database returns empty list
2. Add todo returns generated ID
3. List todos returns in creation order
4. Complete todo updates completedAt field
5. Delete non-existent todo returns null

### Code Snippets
```typescript
interface Todo {
  id: number;
  title: string;
  description: string | null;
  completedAt: string | null;
  createdAt: string;
}


These hints serve multiple purposes:
- New agents know exactly where to look
- Patterns are explicitly referenced, not guessed
- Test cases are defined before implementation
- Key interfaces are sketched out

---

## Context Handoff Protocol

When spawning a new agent (new conversation, parallel worker, or session timeout), context can be lost. The **handoff protocol** preserves it.

Create `.claude/HANDOFF.md` before spawning:

```markdown
# Handoff: CORE-001

## Quick Start

1. Read this file
2. Read `/CURRENT_TASK.md`
3. Run `bun test` to verify clean state

## Current State

- **Phase:** 3 - Queries
- **Status:** in_progress
- **Last commit:** e4f5g6h

## Key Files

### Pattern to follow
- `src/db/connection.ts` lines 1-20 (database setup pattern)

### Types modified
- `src/types/todo.ts`: Todo interface

### Files created this session
- `src/db/todos.ts`
- `tests/db/todos.test.ts`

## Test Command

```bash
bun test tests/db/todos.test.ts

Context Notes

Using Bun’s built-in SQLite (synchronous API, not async). Decided to use integer IDs, not UUIDs, for simplicity.

Remaining Work

Fix listTodos() ordering
Implement markComplete()
Implement deleteTodo()
Add remaining tests


The new agent:
1. Reads HANDOFF.md first
2. Gets immediate context about where things stand
3. Knows exactly which files to check
4. Understands decisions already made

---

## The Workflow

The improved Agentic Context System follows a phase-gated loop:

```text
┌─────────────────────────────────────────────────────────────────┐
│                     THE AGENTIC LOOP                             │
│                                                                  │
│   ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐      │
│   │  READ   │───▶│  WORK   │───▶│ VERIFY  │───▶│ COMMIT  │      │
│   │ Context │    │ Phase N │    │ & Test  │    │ Phase N │      │
│   └─────────┘    └─────────┘    └─────────┘    └─────────┘      │
│        ▲                              │              │           │
│        │                              │              ▼           │
│        │                         FAIL?│        ┌─────────┐      │
│        │                              │        │ UPDATE  │      │
│        │                              ▼        │ TASK.md │      │
│        │                        [Attempt++]    └─────────┘      │
│        │                              │              │           │
│        │                              │              ▼           │
│        │                         3 attempts?   Phase N+1?       │
│        │                              │              │           │
│        │                              ▼              │           │
│        │                          BLOCKED            │           │
│        │                                             │           │
│        └─────────────────────────────────────────────┘           │
│                          (next phase)                            │
└─────────────────────────────────────────────────────────────────┘

Session Start:

AI reads CLAUDE.md, Prompt.md, CURRENT_TASK.md
AI checks features/index.json for session context
AI loads full feature definition from features/active/
AI continues from the current phase

During Session (for each phase):

Build the phase deliverables
Run tests and linting
If pass: Commit with [FEATURE_ID] Phase N: Description
If fail: Fix (increment attempt counter), retry
At 3 failures: BLOCKED, stop
Update CURRENT_TASK.md phase table
Move to next phase

Session End:

All phases complete → Update status_map to passes: true
Clean git status
If spawning new agent → Create HANDOFF.md

The Supporting Files

Three additional files round out the system:

rules.md

Critical rules that must never be violated: one task at a time, commit after each phase, never break tests, 3-attempt blocking rule. These are the non-negotiables.

checklists.md

Phase-specific verification procedures. Schema checklist. Validation checklist. Integration checklist. The AI runs these before marking phases complete.

lessons-learned.md

Accumulated wisdom from past sessions. That SQLite connection issue. That import that always breaks. Patterns that work. Mistakes to avoid.

rules.md excerpt:

## CRITICAL RULES - NEVER VIOLATE

1. **PHASE GATE RULE**
   - Never move to Phase N+1 until Phase N passes tests
   - Every phase completion requires a commit
   - Failed tests count as attempts toward the 3-attempt limit

2. **ONE TASK AT A TIME**
   - Never work on more than one feature per session
   - Complete the current task before starting another
   - If blocked, mark as BLOCKED rather than switching tasks

3. **ATTEMPT TRACKING**
   - Log every failure with timestamp and description
   - At 3 attempts: STOP. Mark BLOCKED. Wait for human.
   - Never silently retry without logging

lessons-learned.md example:

## Database

1. BUN SQLITE IS SYNCHRONOUS
   - Bad: `await db.query(...).all()` // won't work
   - Good: `db.query(...).all()` // sync API
   - Discovered in CORE-001 when async/await caused errors

2. USE INTEGER IDS, NOT UUIDS
   - SQLite auto-increment is simpler and faster
   - UUIDs add complexity without benefit for this project
   - Decided in CORE-001 planning phase

3. USE IN-MEMORY DB FOR TESTS
   - Good: `new Database(':memory:')`
   - Keeps tests fast and isolated

The lessons-learned file grows over time. Each mistake becomes a permanent memory. The AI never makes the same error twice because it reads the file at session start.

A Real Session Walkthrough

Here’s how a session typically flows. We’re building a todo API and working on the database layer.

Starting state: CURRENT_TASK.md shows Phase 1 complete, Phase 2 in progress at attempt 1/3.

You say: “Read my project files and continue the current task.”

What the AI does:

Reads context: CLAUDE.md (Bun/Hono/SQLite stack), Prompt.md (phase gates, attempt rules), CURRENT_TASK.md (Phase 2, 1/3 attempts), lessons-learned.md (Bun SQLite is sync)
Checks attempt log: Previous attempt failed on type export. Fixes the export.
Completes Phase 2: Validation tests pass. Commits [CORE-001] Phase 2: Validation.
Updates CURRENT_TASK.md: Phase 2 → complete with commit hash. Phase 3 → in_progress. Attempts → 0/3.
Starts Phase 3: Implements CRUD queries. Tests pass. Commits [CORE-001] Phase 3: Queries.
Continues through phases until feature complete or session ends.

Ending state: All phases complete. Feature passes. status_map updated. Ready for next feature.

The key observation: the AI used the synchronous SQLite pattern from lessons-learned.md without being told. It checked the attempt log and fixed the specific issue. It committed after each phase. That’s the point of the system: structured, verifiable progress.

Getting Started

Setting up the Agentic Context System takes about 5 minutes:

# Clone the template
git clone https://github.com/balevdev/agentic-context-system.git my-project
cd my-project

# Initialize git (if starting fresh)
git init
git add .
git commit -m "chore: initialize agentic context system"

# Customize for your project
# 1. Edit CLAUDE.md with your project specs
# 2. Edit features/index.json with your session plan
# 3. Create features/active/FEATURE_ID.json for your first feature
# 4. Edit .claude/static/checklists.md with your test commands
# 5. Start your first session

Customization Guide

The template includes placeholder commands. Replace them with your stack-specific tools:

Stack	Test Command	Type Check
Bun	bun test	bun run typecheck
Node.js/TypeScript	npm test	tsc --noEmit
Python	pytest	mypy .
Go	go test ./...	go vet ./...

Update .claude/static/checklists.md with your specific commands. The AI will use these during verification.

Project Type Customization

REST API: Add endpoint patterns, error response format, authentication approach, database schema.

Web App: Add component patterns, state management, routing conventions.

CLI Tool: Add command structure, argument parsing patterns, help text conventions.

Library: Add public API documentation, versioning strategy, backward compatibility rules.

Common Pitfalls

Why This Works

The improved system works because it addresses the specific failure modes of AI coding:

Conclusion

AI coding assistants start every session from zero because there’s nothing to remember from. The Agentic Context System provides that something: a structured set of files that externalize your project’s architecture, coding standards, progress, and accumulated lessons.

The improved system adds accountability through phase gates, reliability through attempt tracking, clarity through implementation hints, and continuity through handoff protocols.

The AI reads these files at session start and updates them at session end. Context persists. Progress accumulates. Mistakes become documented lessons. And you can trace exactly what happened, when, and why.

If you work with AI coding assistants on projects spanning multiple sessions, set up the Agentic Context System. Clone the template, customize the files for your project, and start your next session with context instead of explanations.

github.com/balevdev/agentic-context-system