MainHistoryExamplesRecommended Reading

What is Context Length in LLMs?

Help others learn from this page

An LLM’s context window can be thought of as the equivalent of its working memory. It determines how long of a conversation it can carry out without forgetting details from earlier in the exchange. It also determines the maximum size of documents or code samples that it can process at once.
Dave Bergmann/ Senior Writer, AI Models, IBM
image for entry

Token window in LLMs determines how much context is available for generation.

What is Context Length?

Imagine you're trying to explain a complex idea to someone, but you can only use a limited number of words. In the world of Large Language Models (LLMs), this limitation is known as context length. It's the maximum number of tokens (think of tokens as pieces of words or characters) that a model can process in a single input.

For example, GPT-3 has a context length of 4,096 tokens, while newer models like GPT-4 Turbo can handle up to 128,000 tokens. This means GPT-4 can consider much more information at once — paragraphs, documents, or even multi-page conversations.

Why Does Context Length Matter?

The context length determines how much the model can remember or 'see' at once. More context means:

  • Fewer hallucinations
  • Better long-range coherence
  • Richer grounding for complex tasks (summarization, question answering, etc.)

But there are limits. Models don't 'remember' beyond this window — once tokens fall outside it, that information is no longer available.

Challenges:

  • Memory & Latency: Long contexts increase inference costs.
  • Training Exposure: Most models weren't trained on 100k-token sequences. Even if they accept long inputs, they might not use them effectively.
  • Lossy Condensing: Tools like summarization or retrieval are often needed to squeeze relevant info into the window.

Real-World Use Cases:

  • Long-form Q&A or tutoring across multi-turn dialogue
  • Enterprise document analysis (contracts, compliance, legal discovery)
  • Coding assistants dealing with entire files or repos

Understanding context length is key to building LLM applications that are grounded, reliable, and able to scale with complex inputs.

FAQ

What is a token in LLMs?
A token is a chunk of text — usually a few characters or part of a word — that the model processes. 'ChatGPT' is one word, but might be two or three tokens.
Why do context limits exist?
Transformer-based models process all tokens at once, so the longer the input, the more memory and compute is needed. That’s why there's a cap.
What happens when LLM context length is exceeded?
Most models will truncate the beginning of the input or return an error. In either case, part of your prompt or data is ignored.
How is LLM context length different from memory?
Context length is short-term — it’s what the model sees in the current input. Memory (like in AI agents) means remembering information across sessions or conversations.

Related Stuff

Enjoyed this explanation? Share it!