Artificial Intelligence

What is Temperature in LLMs?

Temperature is a setting that controls how random or deterministic a language model's output is, trading off creativity against predictability.

Temperature is a setting that controls how random a language model's output is. Low temperature makes the model predictable and focused; high temperature makes it more creative and varied. It's one of the most useful knobs when working with LLMs.

How It Works:

  1. The model produces a probability for each possible next token
  2. Temperature scales those probabilities before a token is chosen
  3. Low temperature (~0): Almost always pick the most likely token
  4. High temperature (~1+): Give less likely tokens a real chance

Choosing a Value:

  • 0 – 0.3: Factual answers, code, extraction, classification
  • 0.4 – 0.7: Balanced, general-purpose responses
  • 0.8 – 1.2: Brainstorming, creative writing, varied ideas

Related Settings:

  • Top-p (nucleus sampling): Limit choices to the smallest set of tokens whose probabilities add up to p
  • Top-k: Only consider the k most likely tokens
  • Max tokens: Cap the length of the response

FAQ

Should I always use temperature 0 for accuracy?

For deterministic, factual tasks, low temperature helps. But temperature 0 doesn't guarantee correctness — it just reduces randomness. Grounding and verification still matter.

What's the difference between temperature and top-p?

Temperature reshapes the whole probability distribution, while top-p truncates it to the most probable tokens. They can be used together, but tuning one at a time is usually clearer.

Promote your content

Reach over 400,000 developers and grow your brand.

Join our developer community

Hang out with over 4,500 developers and share your knowledge.