You’ve felt it. Every Claude Code user has.
Hour one: Claude is sharp. It catches edge cases you didn’t think of. It follows your conventions perfectly. It asks smart questions before diving in.
Hour three: Claude repeats a mistake you corrected an hour ago. It contradicts a decision you made together. It starts hedging on things it was confident about. The responses get longer but less precise.
Nothing changed. Same model. Same project. Same you.
What changed is the context window — and most developers have no idea what’s actually happening inside it.
This isn’t a vague “AI gets confused” problem. It has a precise technical cause, a measurable threshold, and a set of fixes that keep Claude sharp across sessions of any length. Professionals who’ve read the surface-level guides still miss most of what’s in this post.
Let’s go deep.
🧠 The Science First — “Lost in the Middle”
Here’s what actually happens when your session runs long.
Claude’s context window is not like RAM where every memory address has equal access speed. It’s more like human attention — strongest at the beginning and end of what it’s reading, weakest in the middle.
This is called the “Lost in the Middle” problem, documented in a Stanford research paper and directly applicable to Claude Code sessions.
When your session is fresh: your instructions, your CLAUDE.md rules, and your current task are all near the top of the context — where attention is strongest.
After two hours of back-and-forth: all of that is buried in the middle under thousands of tokens of file reads, bash outputs, failed attempts, and corrected mistakes. Claude can technically “see” it — but its attention weights on that buried content are a fraction of what they were when it was fresh.
This is why correcting Claude twice in a session makes things worse, not better. Each correction adds more tokens. The original correct instruction sinks deeper into the middle. Claude’s attention on it weakens. It starts drifting back toward the behavior you corrected.
The rule the official docs state explicitly: If you’ve corrected Claude more than twice on the same issue in one session — the context is cluttered. A clean session with a better prompt will almost always outperform a long session with accumulated corrections.
Understanding this changes how you work. You stop fighting the drift. You manage the context before it becomes a problem.
📐 Your Real Token Budget (Not What You Think)
Claude Code has a 200,000 token context window. That number is real — and almost completely misleading.
Here’s what actually consumes tokens before you type a single character:
Component Token Cost Claude Code system prompt ~7,000 tokens Tool definitions (built-in tools) ~8,000 tokens CLAUDE.md (if 200 lines) ~2,500 tokens MCP server schemas (per server) ~2,000–8,000 tokens each Memory files (if using Memory MCP) ~3,000–10,000 tokens
Run /context in any Claude Code session right now. You'll see the real breakdown.
A clean session with no MCP servers: roughly 160,000–170,000 tokens for actual work.
Add 5 MCP servers: drops to 120,000–130,000 tokens.
Add 9 MCP servers (like the ones from Day 3 of this series): you can lose 50,000+ tokens to tool schemas before you’ve written a word.
But here’s the number that actually matters:
Performance starts degrading around 147,000 tokens — not 200,000.
That’s not speculation. That’s what Anthropic’s own engineering data shows, and it’s why auto-compaction now triggers at 64–75% capacity, not 90%. They built in a completion buffer. The model needs headroom to finish its current task before compaction fires. If you wait until 90%, it’s already too late — you’ve been working in degraded performance for the last hour.
🧾 Practical implication: Your real working budget in a typical session with a few MCP servers is around 100,000–120,000 tokens. After that, you’re already in the degradation zone.
🔧 The Four Commands — What They Actually Do
Most guides mention these. Almost none explain the full picture.
/context — Your Diagnostic Tool
/context
Shows you: total tokens used, tokens remaining, percentage filled, and a breakdown by category (system, conversation, files, tools).
Run this at the start of every session to see your actual working budget. Run it mid-session to catch problems before they affect output quality.
What professionals do: set a mental alarm at 60% fill. When /context shows 60% — act. Don't wait for auto-compaction to save you.
/clear — The Scalpel
/clear
Wipes the entire conversation history. Claude re-reads your CLAUDE.md and still has access to all your files — but every exchange, failed attempt, and corrected mistake is gone.
When to use it:
- Moving to an unrelated task after finishing one
- After correcting Claude twice on the same issue
- Any time a session feels “stale” or Claude seems confused
- Between your morning work and afternoon work
What most developers get wrong: they treat /clear as a last resort when Claude is obviously confused. Use it proactively, between tasks, like closing browser tabs before starting new work. A clean 160K token budget beats a polluted 80K token budget every time.
What survives /clear: your CLAUDE.md rules, your files on disk, your slash commands, your hooks. Everything important is preserved. The only thing that dies is the accumulated noise.
/compact — The Surgeon
/compact
/compact Focus on the authentication changes only
/compact preserve the list of modified files and the decisions we made on error handling
Instead of wiping everything, compact runs a summarization pass — compressing a 70,000-token conversation down to roughly 4,000 tokens while preserving intent, decisions, and direction.
The math: a typical compaction reduces context by 60–70%. You go from 140,000 tokens to roughly 42,000–56,000. Session continues without starting over.
The cost: compaction is lossy. Specific file contents, exact error messages, granular implementation details don’t survive. You keep the “vibes” — the decisions, the direction, the constraints established. You lose the specifics.
The professional move: use /compact with explicit instructions about what to preserve.
/compact preserve: all modified file names, the decision to use Drizzle over Prisma,
the test failure pattern we identified in the auth module,
and the naming convention we established for API routes
The instructions you give to /compact are injected into the summary prompt. They become the compaction's priorities. Claude keeps what you told it to keep.
Default to /clear. Reach for /compact when continuity matters more than a clean slate.
/rewind — The Time Machine
# Press Esc twice, or:
/rewind
Opens a visual menu of conversation checkpoints — every message in the session, selectable. You can:
- Restore to a checkpoint — roll back to any point in the session, as if everything after it never happened
- Summarize from a checkpoint — compact only the messages from that point forward, leaving earlier context intact
When to use it: when Claude went down a wrong path 20 messages ago and you need to undo — without losing everything before that point. It’s surgical in a way that /clear and /compact can't be.
Most developers don’t know this exists. The ones who do use it constantly for debugging sessions that go off-rails.
🏗️ The Architecture That Keeps Claude Sharp — Session Handoffs
Here’s the technique that almost nobody in the community has written about clearly.
The problem with long projects: you can’t do everything in one session. And every time you start a new session, Claude starts from zero. Your CLAUDE.md covers the permanent rules — but it doesn’t know what you did yesterday, what decisions you made, or where you left off.
The fix: a session handoff file.
At the end of every session, before you close the terminal, run this:
Summarize this session into .claude/session-handoff.md:
- Every file we modified (with a one-line description of what changed)
- Every architectural decision we made and why
- The current state of in-progress work
- The exact next step to continue tomorrow
- Any gotchas or constraints we discovered
Claude writes .claude/session-handoff.md with exactly that structure. You add this to your CLAUDE.md:
## Session Continuity
At the start of each session, read .claude/session-handoff.md if it exists.
It contains the state from the previous session. Start there.
At the end of each session, update this file with current state.
Next session: Claude reads the handoff file before you say anything. It knows exactly where you left off. Zero re-explaining. Zero context lost.
The handoff file is updated every session. It’s the only file in your project that’s designed to be overwritten. Previous session state becomes irrelevant — only the current handoff matters.
🧾 Don’t add it to git. Add .claude/session-handoff.md to .gitignore. It's volatile state, not project knowledge.
⚡ The MCP Problem Nobody Talks About
From Day 3 of this series, you installed 9 MCP servers. They’re incredible. They’re also quietly eating your context budget from the moment each session opens.
Every MCP server has a schema — a JSON description of every tool it exposes. Claude Code loads all these schemas into context at session start. That’s your 30,000–50,000 token startup cost.
The fix: disable MCP servers you’re not using in a given session.
# Inside Claude Code:
/mcp
# Select the servers you don't need for this task → disable them
Or in Claude Code v2.0.10+:
@filesystem disable
@playwright disable
Disabled MCP servers don’t load their schemas. If you’re working on pure TypeScript refactoring, you don’t need Playwright or PostgreSQL running. Disabling them recovers 10,000–20,000 tokens immediately.
Add this to your workflow: at the start of each session, ask yourself which MCP servers this task actually needs. Disable the rest. Re-enable them only when needed.
This is one of the highest-ROI context optimizations that isn’t in any standard guide.
📊 The Token Appetite of Different Actions
Not all actions cost the same. This is where professionals optimize and beginners don’t.
Action Approximate Token Cost Reading a 400-line file entirely ~3,000 tokens Targeted grep for a specific function ~200 tokens MCP tool response (search result) ~2,000–5,000 tokens One round of conversation (Q+A) ~500–2,000 tokens Auto-compact summary ~4,000 tokens (replaces 70,000+) Your CLAUDE.md (200 lines) ~2,500 tokens
The implication: vague prompts are expensive. “Look at the codebase and find the bug” makes Claude read 10 files — 30,000 tokens. “Read src/auth/login.ts lines 45–90 and find why the token isn't being set" costs 800 tokens.
Specificity is not just good practice — it’s a 40x token efficiency multiplier.
Add this to your CLAUDE.md:
## Context Efficiency
- Read specific files, not directories
- Use grep for function lookups, not full file reads
- Specify line ranges when reading large files: Read file.ts:50-120
- Before reading any file, check if you already have its content in context
Claude will apply these rules automatically. Every session. No reminding.
🔄 The 30-Minute Sprint System
Here’s the workflow that SFEIR Institute measured at 85% performance retention across 4-hour sessions:
Work in 30-minute sprints.
Each sprint has one objective. One feature. One bug. One module.
At the end of each sprint:
/compact preserve: [specific decisions and files from this sprint]
Then start the next sprint in a compacted context.
Why does this work? A 30-minute session typically consumes 50,000–80,000 tokens — well below the 147,000 threshold where degradation begins. You compact before you hit the wall. You start each sprint fresh enough to stay in the high-performance zone.
Compare:
- Marathon session (4 hours, no management): performance at hour 3 is 40–60% of hour 1
- Sprint system (4 hours, 8 compactions): performance stays at 80–85% throughout
The compaction overhead (about 60 seconds per compact) costs you 8 minutes across a 4-hour session. You get 2+ hours of effective Claude performance back.
🪝 The PreCompact Hook — Save What Matters Before Compaction Fires
From Day 7 (Hooks), you know hooks run at specific moments. There’s one hook most developers haven’t used yet: PreCompact.
PreCompact fires immediately before any compaction — manual or auto. You get one last chance to write critical information to a file before the conversation gets summarized.
.claude/hooks/preserve-context.sh:
#!/bin/bash
# Runs before every compaction
# Read what's about to be compacted
INPUT=$(cat)
# Ask Claude to extract and save the most critical context
claude -p "From this conversation, extract and save to .claude/session-handoff.md:
1. All files modified
2. All decisions made with their reasoning
3. Current task state and next step
4. Any open questions or blockers
Format it as markdown. Be specific." <<< "$INPUT"
echo "Context preserved to .claude/session-handoff.md"
Wire it in .claude/settings.json:
{
"hooks": {
"PreCompact": [
{
"hooks": [
{
"type": "command",
"command": ".claude/hooks/preserve-context.sh"
}
]
}
]
}
}
Now every compaction — even auto-compaction that fires without warning — first saves critical context to your handoff file. You never lose a decision to a compaction again.
📋 The Complete Context Management Playbook
Put it all together into a workflow:
Session Start:
/context # Check your real token budget
# Disable MCP servers you don't need for this task
# Read session-handoff.md (happens automatically if in CLAUDE.md)
During Session (every 30 minutes):
/context # Check fill percentage
# At 60%: manual compact with instructions
/compact preserve [specific context that matters]
Task Complete, Moving to Unrelated Work:
/clear # Clean slate beats polluted context
Session End:
# Ask Claude to update session-handoff.md# Close terminal
Claude Keeps Repeating a Mistake:
/clear # Don't fight the drift — reset# Start with a better, more specific prompt
Went Down a Wrong Path:
/rewind # Roll back to the good checkpoint# Or: Summarize from that checkpoint forward
🎯 The One Mindset Shift That Changes Everything
Most developers treat the context window as a limitation — a bucket that fills up and causes problems.
The professionals treat it as a resource to actively manage — like disk space or memory. They make intentional decisions about what goes in, monitor how full it is, and act before it becomes a problem.
The difference:
- Reactive developer: notices Claude getting worse → panics → tries more detailed prompts → gets more frustrated
- Active developer: checks
/contextat 60% → compacts with specific instructions → maintains peak performance
You have a 200,000 token context window. Used well, it’s genuinely enormous. Used without management, it fills with noise in 30 minutes and spends the rest of the session working against you.
Now you know exactly how to use it well.
One question before you go:
Which context problem has hit you hardest ?Claude forgetting decisions mid-session, repeated mistakes you already corrected, or losing work when compaction fires unexpectedly? Drop it below ,I’ll write the specific recipe for the top answer.
If you want to see what I’m building between posts — I share quick findings and experiments on LinkedIn. See you there.
Comments
Loading comments…