What is Temperature in LLMs?

Temperature is a setting that controls how random a language model's output is. Low temperature makes the model predictable and focused; high temperature makes it more creative and varied. It's one of the most useful knobs when working with LLMs.

How It Works:

The model produces a probability for each possible next token
Temperature scales those probabilities before a token is chosen
Low temperature (~0): Almost always pick the most likely token
High temperature (~1+): Give less likely tokens a real chance

Choosing a Value:

0 – 0.3: Factual answers, code, extraction, classification
0.4 – 0.7: Balanced, general-purpose responses
0.8 – 1.2: Brainstorming, creative writing, varied ideas

Related Settings:

Top-p (nucleus sampling): Limit choices to the smallest set of tokens whose probabilities add up to p
Top-k: Only consider the k most likely tokens
Max tokens: Cap the length of the response

FAQ

Should I always use temperature 0 for accuracy?

For deterministic, factual tasks, low temperature helps. But temperature 0 doesn't guarantee correctness — it just reduces randomness. Grounding and verification still matter.

What's the difference between temperature and top-p?

Temperature reshapes the whole probability distribution, while top-p truncates it to the most probable tokens. They can be used together, but tuning one at a time is usually clearer.

How It Works:

Choosing a Value:

Related Settings:

FAQ

Should I always use temperature 0 for accuracy?

What's the difference between temperature and top-p?

Promote your content

Join our developer community

Main Menu

How It Works:

Choosing a Value:

Related Settings:

FAQ

Should I always use temperature 0 for accuracy?

What's the difference between temperature and top-p?

Promote your content

Join our developer community