An LLM’s context window can be thought of as the equivalent of its working memory. It determines how long of a conversation it can carry out without forgetting details from earlier in the exchange. It also determines the maximum size of documents or code samples that it can process at once.
Token window in LLMs determines how much context is available for generation.
What is Context Length?
Imagine you're trying to explain a complex idea to someone, but you can only use a limited number of words. In the world of Large Language Models (LLMs), this limitation is known as context length. It's the maximum number of tokens (think of tokens as pieces of words or characters) that a model can process in a single input.
For example, GPT-3 has a context length of 4,096 tokens, while newer models like GPT-4 Turbo can handle up to 128,000 tokens. This means GPT-4 can consider much more information at once — paragraphs, documents, or even multi-page conversations.
The context length determines how much the model can remember or 'see' at once. More context means:
But there are limits. Models don't 'remember' beyond this window — once tokens fall outside it, that information is no longer available.
Understanding context length is key to building LLM applications that are grounded, reliable, and able to scale with complex inputs.