If you use Claude Code or OpenAI Codex heavily, you’ve probably run into the same problem I did: hitting usage limits faster than expected. Autonomous coding loops are powerful, but they can burn through tokens — and your rolling usage limits — quickly, especially on larger repositories.
I found a simple strategy that helped me in three ways:
- Significantly reduce token usage
- Make AI development much more structured
- Enable cheap hybrid workflows with multiple models
The approach revolves around two ideas:
- a task-based workflow
- a small .ai/ workspace inside the repository
The Problem With Default AI Coding Workflows
Most AI coding agents operate like this:
- analyze repository
- implement code
- run tests
- fix errors
- repeat until done
This creates long reasoning loops where the model repeatedly sends large amounts of context back and forth.
On medium or large projects this means:
- repeated repo scans
- repeated tool outputs
- repeated reasoning loops
All of which consume tokens rapidly.
The Solution: Task-Based AI Development
Instead of letting the agent solve the entire feature autonomously, break the process into small controlled steps.
The workflow becomes:
- Plan the feature
- Generate structured tasks
- Execute tasks one at a time
- Review changes
The model is no longer running endless loops — it performs short focused operations.
This alone dramatically reduces token usage.
The .ai Workspace
Inside the repository I add a small workspace:
.ai/
plan.md
tasks/
repo-map.md
context.md
Each file has a specific purpose.
plan.md
High-level feature planning.
tasks/
Individual implementation tasks.
Example:
.ai/tasks/01-auth-service.md
.ai/tasks/02-oauth-provider.md
.ai/tasks/03-login-ui.md
Each task contains:
- files to modify
- expected code structure
- verification steps
tasks-done/
Your finished tasks are moved here.
repo-map.md
A lightweight overview of the project structure. Instead of scanning the entire repository repeatedly, the model can rely on this map to understand the codebase.
Adding Workflow Files (CLAUDE.md & AGENTS.md)
To make this work, you need to give your agent a set of instructions. Claude Code automatically reads a file called CLAUDE.md in the project root. Similarly, OpenAI Codex natively reads an open standard file called AGENTS.md.
In these files, you define how the agent should behave.
For example:
Planner:
- create implementation plan
- write plan to .ai/plan.md
- generate tasks
Coder:
- implement one task at a time
- modify only files required for that task
Tester:
- run tests and lint checks
Reviewer:
- review the diff and improve code quality
The key rule is simple:
Never implement the entire feature in a single loop. Always work task-by-task.
Of course, you have to tell Claude or Codex to work that way. So after asking to implement a feature, prompt your agent like this:
Based on my request create or update .ai/plan.md and create task files in .ai/tasks/. Do not implement anything yet. Stop after the plan and tasks are created. Use a timestamp prefix “YYYYMMDD-HHMM-” to ensure unique filenames.
Afterwards, just feed it tasks:
Implement the tasks from .ai/tasks/ sequentially. Only modify files required for the current task. Complete one task, run the verification steps, then stop.
I published a small bootstrap script that sets up this universal workflow (generating both CLAUDE.md and AGENTS.md) automatically in any repository: https://github.com/nils-fl/AgenticCodingInit
Why This Saves Tokens
Token usage usually explodes during execution loops, not during planning.
Typical distribution:
Planning → small Execution loops → very large Review → small
By forcing the model to:
- implement one task at a time
- avoid scanning the full repo
- stop after each step
we eliminate most of the expensive loops.
A Surprising Side Effect: Better Structure
Another benefit is that development becomes far more organized.
Instead of chaotic agent behavior, the AI behaves like a small team:
planner → coder → tester → reviewer
The .ai/tasks files also serve as a lightweight project plan.
Even when switching contexts or sessions, you always know:
- what task is next
- what files should change
- how to verify the result
Hybrid Workflows With Multiple Models
Because this structure relies on standardized markdown files rather than a single tool’s internal memory, it opens the door to something powerful: multi-agent and multi-model development workflows.
You can seamlessly switch between Anthropic and OpenAI depending on the task. For example:
- Terminal 1 (Claude Code): Planning, complex architectural reasoning, and code review.
- Terminal 2 (OpenAI Codex): Brute-force execution, test writing, or handling a concurrent background task.
To take this even further, you can integrate tools like LiteLLM paired with OpenRouter. This setup acts as a universal proxy, allowing your agents to seamlessly route requests to dozens of different models (like DeepSeek, Gemini, or Meta’s Llama) using a standard API format. By pointing your CLI tools to OpenRouter via LiteLLM, you can dynamically choose the cheapest, fastest, or most capable model for each specific sub-task in your .ai/tasks/ folder, completely bypassing vendor lock-in and strict rate limits. Because the tasks live as plain text in .ai/tasks/, any agent or model can read and work on the exact same plan. This means you can always use the best (or most cost-effective) model for each specific job, reducing both token usage and rate limit bottlenecks across the board.
The Result
With this setup:
- Claude Code and Codex usage becomes far more predictable
- token consumption drops significantly
- development becomes more structured
- hybrid multi-model workflows become easy
Instead of a single AI agent running uncontrolled loops, you effectively get a small AI development team working through clearly defined tasks.
And the best part: the entire setup takes only a few files and a simple repository convention. If you’re using Claude Code or OpenAI Codex heavily, it’s worth trying this approach. A small change in workflow can make a surprisingly big difference in both cost and productivity.
Comments
Loading comments…