What is Temperature in LLMs?
Temperature is a setting that controls how random or deterministic a language model's output is, trading off creativity against predictability.
Temperature is a setting that controls how random a language model's output is. Low temperature makes the model predictable and focused; high temperature makes it more creative and varied. It's one of the most useful knobs when working with LLMs.
How It Works:
- The model produces a probability for each possible next token
- Temperature scales those probabilities before a token is chosen
- Low temperature (~0): Almost always pick the most likely token
- High temperature (~1+): Give less likely tokens a real chance
Choosing a Value:
- 0 – 0.3: Factual answers, code, extraction, classification
- 0.4 – 0.7: Balanced, general-purpose responses
- 0.8 – 1.2: Brainstorming, creative writing, varied ideas
Related Settings:
- Top-p (nucleus sampling): Limit choices to the smallest set of tokens whose probabilities add up to p
- Top-k: Only consider the k most likely tokens
- Max tokens: Cap the length of the response
FAQ
Should I always use temperature 0 for accuracy?
For deterministic, factual tasks, low temperature helps. But temperature 0 doesn't guarantee correctness — it just reduces randomness. Grounding and verification still matter.
What's the difference between temperature and top-p?
Temperature reshapes the whole probability distribution, while top-p truncates it to the most probable tokens. They can be used together, but tuning one at a time is usually clearer.