Every AI coding session starts from zero.
You explain your project structure. Again. You describe the coding patterns you’ve established. Again. You remind it about that database migration issue from last week. Again. And when it inevitably makes the same mistake you corrected three sessions ago, you wonder why you’re paying for an “intelligent” assistant.
The pattern is predictable. You start a session, explain everything, get productive work done, end the session. Tomorrow, the slate is wiped clean. Your context is gone. Your lessons evaporated.
This isn’t a limitation of the AI itself. It’s a limitation of how we use it. The AI doesn’t remember because we haven’t given it anything to remember from.
Where This Comes From
Before diving in, here’s the context for these ideas.
This approach draws from two key sources:
Ralph: The AI Agent Whisperer by Geoffrey Huntley explores how to effectively work with AI agents, including the importance of structured context and clear task definitions.
Effective Harnesses for Long-Running Agents from Anthropic’s engineering blog discusses building systems that help AI agents work effectively over extended periods, with insights on context management, verification, and error recovery.
The Agentic Context System is a practical implementation of these ideas, tailored for everyday AI-assisted development.
The Problem: Context Rot
To understand why external files work, you need to understand what happens inside an AI conversation.
Every AI model has a context window, a fixed amount of text it can process at once. For Claude, this is around 200,000 tokens. Sounds like a lot, but in a coding session with file contents, error messages, and back-and-forth discussion, you burn through it fast.
Here’s what happens in practice:
Early in a session: You explain the architecture. The AI “knows” that your app uses a specific database pattern, that certain files shouldn’t be touched, that tests need to pass before committing.
Mid-session: The context window fills. Older messages get compressed or dropped to make room for new ones. That architecture explanation from the start? It’s fuzzy now, or gone entirely.
Late in a session: The AI makes a decision that contradicts something you established earlier. It’s not being stubborn. It literally doesn’t have access to that information anymore.
This is context rot: the gradual degradation of shared understanding as conversations grow longer than the AI’s working memory.
The longer your project runs, the more valuable these files become. Lessons learned three months ago stay accessible. Architecture decisions from the first week remain clear. Nothing degrades.
The Solution: Externalize Everything
The fix is straightforward: write it down.
Instead of keeping context in your head (or hoping the AI magically retains it), you store it in files. Plain text files that the AI reads at session start and updates at session end. No database. No API. No subscription. Just files.
This is the Agentic Context System: a structured set of files that give any AI coding assistant persistent memory across sessions.
| Metric | Without Context System | With Context System |
|---|---|---|
| Session start time | 10-15 min explaining context | 30 sec reading files |
| Coding consistency | Varies wildly session to session | Follows established patterns |
| Progress tracking | Mental notes, Slack messages | Documented in progress.txt |
| Mistake repetition | Same errors every week | Lessons learned persist |
| Onboarding new AI tools | Start from scratch | Same context works everywhere |
The Seven Files
The entire system consists of seven files organized into two categories: Context Files that the AI reads at session start, and Task Files that track active work.
your-project/
├── CLAUDE.md # Technical specifications (project bible)
├── CURRENT_TASK.md # Current task being worked on
├── feature_list.json # Master registry of all features
├── progress.txt # Append-only session log
└── .claude/
├── lessons-learned.md # Accumulated wisdom from sessions
└── static/
├── rules.md # Critical operating rules
└── checklists.md # Verification procedures
Context Files (Read at Session Start)
Task Files (Updated During Sessions)
Each file has a specific purpose. None are optional. Together, they give the AI everything it needs to pick up exactly where you left off.
CLAUDE.md: The Project Bible
The CLAUDE.md file is the most important file in the system. It’s the technical specification for your project, the source of truth that the AI references for every decision.
Here’s how a CLAUDE.md might look for a todo CLI project. The file starts with a project overview:
# CLAUDE.md - Todo CLI Technical Reference
## PROJECT OVERVIEW
A command-line todo list application in Python.
- **Project Name:** todo-cli
- **Tech Stack:** Python 3.11+, Click (CLI framework), SQLite (storage)
### Core Principles
1. **Simple First**: Start with basic features, add complexity later
2. **Test Everything**: Every feature has tests before it's complete
3. **User-Friendly**: Clear error messages, helpful --help text
Next, the architecture section documents the project structure:
## ARCHITECTURE
### Directory Structure
todo-cli/
├── src/
│ └── todo/
│ ├── __init__.py
│ ├── cli.py # Click commands
│ ├── database.py # SQLite operations
│ └── models.py # Data classes
├── tests/
│ ├── test_cli.py
│ └── test_database.py
├── pyproject.toml
└── README.md
Finally, coding standards and common commands:
## CODING STANDARDS
- Use type hints everywhere
- Format with `black`
- Lint with `ruff`
- Test with `pytest`
## COMMANDS
# Development
pip install -e ".[dev]" # Install in dev mode
pytest # Run tests
black . # Format code
ruff check . # Lint code
What to include in CLAUDE.md:
- Project overview and tech stack
- Directory structure with explanations
- Coding standards and conventions
- Common commands for development
The file should answer one question: “What do I need to know to work on this project?“
feature_list.json: The Master Registry
The feature_list.json file is the authoritative list of everything your project should do. Each feature has a unique ID, acceptance criteria, and a pass/fail status.
{
"project": {
"name": "todo-cli",
"version": "0.1.0",
"description": "Command-line todo list application"
},
"phases": [
{
"id": "SETUP",
"name": "Project Setup",
"description": "Initialize the Python project",
"features": [
{
"id": "SETUP-001",
"phase": "SETUP",
"name": "Initialize Python Project",
"description": "Create pyproject.toml and basic structure",
"dependencies": [],
"acceptanceCriteria": [
"pyproject.toml exists with project metadata",
"src/todo/__init__.py exists",
"pytest runs (even with 0 tests)",
"black and ruff are configured"
],
"passes": false
}
]
},
{
"id": "CORE",
"name": "Core Features",
"description": "Basic todo functionality",
"features": [
{
"id": "CORE-001",
"phase": "CORE",
"name": "Database Layer",
"description": "SQLite database for storing todos",
"dependencies": ["SETUP-001"],
"acceptanceCriteria": [
"Can create a new database file",
"Can add a todo item",
"Can list all todo items",
"Can mark a todo as complete",
"Can delete a todo",
"All operations have unit tests"
],
"passes": false
}
]
}
]
}
The workflow is simple:
- Define features with acceptance criteria upfront
- Work on one feature at a time
- When ALL criteria are met, set
"passes": true - Move to the next feature
The passes field is binary. Not “mostly done.” Not “80% complete.” Either all criteria pass, or the feature isn’t done.
CURRENT_TASK.md: Real-Time Progress
The CURRENT_TASK.md file tracks the active task. It’s updated continuously as work progresses.
┌─────────────────────────────────────────────────────────────────────┐
│ TASK LIFECYCLE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ NOT_STARTED ────────▶ IN_PROGRESS ────────▶ COMPLETED │
│ │ │ │ │
│ │ │ ▼ │
│ │ │ Next Task │
│ │ │ │
│ │ ▼ │
│ │ BLOCKED ──────▶ Human Input ──────▶ Resume │
│ │ │
└─────────────────────────────────────────────────────────────────────┘
Here’s a template for CURRENT_TASK.md:
# CORE-001: Database Layer
**Status:** IN_PROGRESS | **Previous:** SETUP-001
## Acceptance Criteria
- [x] Can create a new database file
- [x] Can add a todo item
- [ ] Can list all todo items
- [ ] Can mark a todo as complete
- [ ] Can delete a todo
- [ ] All operations have unit tests
## Progress
- 2024-01-15: Created database.py with init and add functions
- 2024-01-15: Added unit tests for init and add
- Currently working on list functionality
## Blockers
None currently.
## Next
- Implement list_todos() function
- Add mark_complete() and delete() functions
- Complete unit test coverage
The file gets updated as work progresses. Each session picks up where the last one ended.
The Supporting Files
Three additional files round out the system:
rules.md
Critical rules that must never be violated: one task at a time, commit frequently, never break tests, leave codebase clean. These are the non-negotiables.
checklists.md
Step-by-step verification procedures. Feature completion checklist. Session end checklist. The AI runs these before marking anything done.
lessons-learned.md
Accumulated wisdom from past sessions. That SQLite connection issue. That import that always breaks. Patterns that work. Mistakes to avoid.
rules.md excerpt:
## CRITICAL RULES - NEVER VIOLATE
1. **ONE TASK AT A TIME**
- Never work on more than one feature per session
- Complete the current task before starting another
- If blocked, mark as BLOCKED rather than switching tasks
2. **COMMIT FREQUENTLY**
- Commit after every meaningful change
- Small, focused commits are better than large ones
- Never end a session with uncommitted changes
3. **LEAVE CODEBASE CLEAN**
- `git status` must show "nothing to commit" at session end
- All tests must pass
- No linting errors
lessons-learned.md example:
## Database
1. ALWAYS USE CONTEXT MANAGERS FOR SQLITE
- Bad: `conn = sqlite3.connect(db); conn.execute(...)`
- Good: `with sqlite3.connect(db) as conn: conn.execute(...)`
- Discovered in CORE-001 when database locked after crash
The lessons-learned file grows over time. Each mistake becomes a permanent memory. The AI never makes the same error twice because it reads the file at session start.
The Verification Feedback Loop
Here’s a common failure mode: the AI writes code, says “done,” and moves on. Later you discover it didn’t actually work. Tests weren’t run. Edge cases weren’t handled. The AI claimed completion without verification.
The fix is explicit verification requirements. The checklists.md file contains step-by-step procedures that the AI must run before marking anything complete.
## Feature Completion Checklist
Before marking any feature as `passes: true`:
1. [ ] All acceptance criteria have passing tests
2. [ ] `pytest` passes with no failures
3. [ ] `ruff check .` shows no errors
4. [ ] `mypy .` shows no type errors
5. [ ] Code is committed with meaningful message
6. [ ] CURRENT_TASK.md updated with completion status
This creates a feedback loop: Code → Test → Fail → Fix → Test → Pass. The checklist forces the AI to close the loop before claiming completion.
Visual Verification for UI Work
For frontend work, test suites alone aren’t enough. A test might pass while the UI looks broken. Modern tools can help: screenshot-based testing with Playwright or Puppeteer, visual regression tools, and even AI agents that can take screenshots and verify layouts.
If you’re doing UI work, consider adding visual verification steps to your checklists:
## UI Feature Checklist (addition to standard checklist)
1. [ ] Component renders without console errors
2. [ ] Responsive at mobile/tablet/desktop breakpoints
3. [ ] Visual comparison against design spec
The tooling here is evolving quickly. Vision-capable AI agents can now take screenshots and verify that UI matches expectations. This is a capability worth keeping an eye on as these tools mature.
Planning and Acceptance Criteria
The feature_list.json file embodies a specific approach to planning: define success before starting work.
Each feature has acceptance criteria: specific, testable statements about what “done” means. This isn’t just documentation; it shapes how the AI works.
Without upfront criteria:
- “Implement the database layer”
- AI decides what to build
- Scope creep, over-engineering, or missing features
- Ambiguous “is this done?” debates
With upfront criteria:
- “Can create a new database file” - testable
- “Can add a todo item” - testable
- “All operations have unit tests” - verifiable
- Binary pass/fail, no ambiguity
The temptation is to skip this planning and just tell the AI what to build. For simple tasks, that works. For anything spanning multiple sessions, you’ll spend more time debugging scope confusion than you saved by skipping planning.
The discipline is simple: before the AI writes any code for a feature, the acceptance criteria must exist. Before the feature is marked complete, every criterion must pass. No exceptions.
The Workflow
The Agentic Context System follows a simple loop:
┌─────────────────────────────────────────────────────────────────┐
│ THE AGENTIC LOOP │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌───────┐ │
│ │ READ │ ───▶ │ WORK │ ───▶ │ VERIFY │ ───▶ │UPDATE │ │
│ │ Context │ │ On Task │ │ Quality │ │ Files │ │
│ └─────────┘ └─────────┘ └─────────┘ └───────┘ │
│ ▲ │ │
│ └──────────────────────────────────────────────────┘ │
│ (repeat) │
└─────────────────────────────────────────────────────────────────┘
Session Start:
- AI reads CLAUDE.md, CURRENT_TASK.md, rules.md, lessons-learned.md
- AI understands where the project is and what needs to be done
- AI begins working on the current task
During Session:
- AI makes small, incremental changes
- AI commits after each meaningful change
- AI updates CURRENT_TASK.md with progress
Session End:
- AI runs verification checklist (tests, linting, type checking)
- AI commits all changes
- AI updates CURRENT_TASK.md status
- AI updates feature_list.json if task complete
- AI appends session summary to progress.txt
A Real Session Walkthrough
Here’s how a session typically flows. We’re building a todo CLI and working on the database layer.
Starting state: CURRENT_TASK.md shows 2 of 6 acceptance criteria complete for CORE-001.
You say: “Read my project files and continue the current task.”
What the AI does:
-
Reads context: CLAUDE.md (Python/Click/SQLite stack), CURRENT_TASK.md (2 done, 4 remaining), rules.md (commit frequently, test everything), lessons-learned.md (use context managers for SQLite)
-
Implements remaining criteria: list_todos(), mark_complete(), delete(), with tests for each
-
Verifies: runs pytest (all pass), runs ruff (no errors)
-
Updates files: marks all checkboxes in CURRENT_TASK.md, sets
"passes": truein feature_list.json, appends summary to progress.txt
Ending state: Task complete. All tests passing. Next session can start CORE-002.
The key observation: the AI used the context manager pattern from lessons-learned.md without being told. It read the file, saw the note from a previous session, and applied the learning. That’s the point of the system: accumulated knowledge that persists across sessions.
Getting Started
Setting up the Agentic Context System takes about 5 minutes:
# Clone the template
git clone https://github.com/balevdev/agentic-context-system.git my-project
cd my-project
# Initialize git (if starting fresh)
git init
git add .
git commit -m "chore: initialize agentic context system"
# Customize for your project
# 1. Edit CLAUDE.md with your project specs
# 2. Edit feature_list.json with your features
# 3. Edit .claude/static/checklists.md with your test commands
# 4. Start your first session
Customization Guide
The template includes placeholder commands. Replace them with your stack-specific tools:
| Stack | Test Command | Type Check | Lint |
|---|---|---|---|
| Python | pytest | mypy . | |
| Node.js/TypeScript | npm test | tsc --noEmit | |
| Go | go test ./... | go vet ./... | |
| Rust | cargo test | cargo check |
Update .claude/static/checklists.md with your specific commands. The AI will use these during verification.
Project Type Customization
Web App: Add API endpoints to CLAUDE.md, component patterns, state management approach.
CLI Tool: Add command structure, argument parsing patterns, help text conventions.
Library: Add public API documentation, versioning strategy, backward compatibility rules.
Microservice: Add service boundaries, communication patterns, deployment configuration.
Common Pitfalls
Why Files?
Why not a database, or an API, or a purpose-built tool?
Files are the lowest common denominator in software development. Any tool can read them. Any human can understand them. They diff cleanly, merge reasonably, and survive tool changes. When you switch from Claude to GPT or from Cursor to VS Code, the files remain exactly the same.
Conclusion
AI coding assistants start every session from zero because there’s nothing to remember from. The Agentic Context System provides that something: seven files that externalize your project’s architecture, coding standards, progress, and accumulated lessons.
The AI reads these files at session start and updates them at session end. Context persists. Progress accumulates. Mistakes become documented lessons that prevent future repetition.
If you work with AI coding assistants on projects spanning multiple sessions, set up the Agentic Context System. Clone the template, customize the files for your project, and start your next session with context instead of explanations.