If you use Claude Code or OpenAI Codex heavily, you’ve probably run into the same problem I did: hitting usage limits faster than expected. Autonomous coding loops are powerful, but they can burn through tokens — and your rolling usage limits — quickly, especially on larger repositories.

I found a simple strategy that helped me in three ways:

Significantly reduce token usage
Make AI development much more structured
Enable cheap hybrid workflows with multiple models

The approach revolves around two ideas:

a task-based workflow
a small .ai/ workspace inside the repository

The Problem With Default AI Coding Workflows

Most AI coding agents operate like this:

analyze repository
implement code
run tests
fix errors
repeat until done

This creates long reasoning loops where the model repeatedly sends large amounts of context back and forth.

On medium or large projects this means:

repeated repo scans
repeated tool outputs
repeated reasoning loops

All of which consume tokens rapidly.

The Solution: Task-Based AI Development

Instead of letting the agent solve the entire feature autonomously, break the process into small controlled steps.

The workflow becomes:

Plan the feature
Generate structured tasks
Execute tasks one at a time
Review changes

The model is no longer running endless loops — it performs short focused operations.

This alone dramatically reduces token usage.

The `.ai` Workspace

Inside the repository I add a small workspace:

.ai/
  plan.md
  tasks/
  repo-map.md
  context.md

Each file has a specific purpose.

`plan.md`

High-level feature planning.

`tasks/`

Individual implementation tasks.

Example:

.ai/tasks/01-auth-service.md
.ai/tasks/02-oauth-provider.md
.ai/tasks/03-login-ui.md

Each task contains:

files to modify
expected code structure
verification steps

`tasks-done/`

Your finished tasks are moved here.

`repo-map.md`

A lightweight overview of the project structure. Instead of scanning the entire repository repeatedly, the model can rely on this map to understand the codebase.

Adding Workflow Files (CLAUDE.md & AGENTS.md)

To make this work, you need to give your agent a set of instructions. Claude Code automatically reads a file called CLAUDE.md in the project root. Similarly, OpenAI Codex natively reads an open standard file called AGENTS.md.

In these files, you define how the agent should behave.

For example:

Planner:

create implementation plan
write plan to .ai/plan.md
generate tasks

Coder:

implement one task at a time
modify only files required for that task

Tester:

run tests and lint checks

Reviewer:

review the diff and improve code quality

The key rule is simple:

Never implement the entire feature in a single loop. Always work task-by-task.

Of course, you have to tell Claude or Codex to work that way. So after asking to implement a feature, prompt your agent like this:

Based on my request create or update .ai/plan.md and create task files in .ai/tasks/. Do not implement anything yet. Stop after the plan and tasks are created. Use a timestamp prefix “YYYYMMDD-HHMM-” to ensure unique filenames.

Afterwards, just feed it tasks:

Implement the tasks from .ai/tasks/ sequentially. Only modify files required for the current task. Complete one task, run the verification steps, then stop.

I published a small bootstrap script that sets up this universal workflow (generating both CLAUDE.md and AGENTS.md) automatically in any repository: https://github.com/nils-fl/AgenticCodingInit

Why This Saves Tokens

Token usage usually explodes during execution loops, not during planning.

Typical distribution:

Planning → small Execution loops → very large Review → small

By forcing the model to:

implement one task at a time
avoid scanning the full repo
stop after each step

we eliminate most of the expensive loops.

A Surprising Side Effect: Better Structure

Another benefit is that development becomes far more organized.

Instead of chaotic agent behavior, the AI behaves like a small team:


planner → coder → tester → reviewer

The .ai/tasks files also serve as a lightweight project plan.

Even when switching contexts or sessions, you always know:

what task is next
what files should change
how to verify the result

Hybrid Workflows With Multiple Models

Because this structure relies on standardized markdown files rather than a single tool’s internal memory, it opens the door to something powerful: multi-agent and multi-model development workflows.

You can seamlessly switch between Anthropic and OpenAI depending on the task. For example:

Terminal 1 (Claude Code): Planning, complex architectural reasoning, and code review.
Terminal 2 (OpenAI Codex): Brute-force execution, test writing, or handling a concurrent background task.

To take this even further, you can integrate tools like LiteLLM paired with OpenRouter. This setup acts as a universal proxy, allowing your agents to seamlessly route requests to dozens of different models (like DeepSeek, Gemini, or Meta’s Llama) using a standard API format. By pointing your CLI tools to OpenRouter via LiteLLM, you can dynamically choose the cheapest, fastest, or most capable model for each specific sub-task in your .ai/tasks/ folder, completely bypassing vendor lock-in and strict rate limits. Because the tasks live as plain text in .ai/tasks/, any agent or model can read and work on the exact same plan. This means you can always use the best (or most cost-effective) model for each specific job, reducing both token usage and rate limit bottlenecks across the board.

The Result

With this setup:

Claude Code and Codex usage becomes far more predictable
token consumption drops significantly
development becomes more structured
hybrid multi-model workflows become easy

Instead of a single AI agent running uncontrolled loops, you effectively get a small AI development team working through clearly defined tasks.

And the best part: the entire setup takes only a few files and a simple repository convention. If you’re using Claude Code or OpenAI Codex heavily, it’s worth trying this approach. A small change in workflow can make a surprisingly big difference in both cost and productivity.

Stop Burning Tokens in AI Coding Agents (Claude & Codex)

The Problem With Default AI Coding Workflows

The Solution: Task-Based AI Development

The `.ai` Workspace

`plan.md`

`tasks/`

`tasks-done/`

`repo-map.md`

Adding Workflow Files (CLAUDE.md & AGENTS.md)

Why This Saves Tokens

A Surprising Side Effect: Better Structure

Hybrid Workflows With Multiple Models

The Result

Comments

Promote your content

Join our developer community

Main Menu

Stop Burning Tokens in AI Coding Agents (Claude & Codex)

The Problem With Default AI Coding Workflows

The Solution: Task-Based AI Development

The .ai Workspace

plan.md

tasks/

tasks-done/

repo-map.md

Adding Workflow Files (CLAUDE.md & AGENTS.md)

Why This Saves Tokens

A Surprising Side Effect: Better Structure

Hybrid Workflows With Multiple Models

The Result

Comments

Promote your content

Join our developer community

The `.ai` Workspace

`plan.md`

`tasks/`

`tasks-done/`

`repo-map.md`