AI Crash Course: Tokens, Prediction and Temperature

We often describe AI models as “thinking,” but what is actually happening when an AI model “thinks”? When it’s drafting a response to us, how does it know what to say?


This content originally appeared on Telerik Blogs and was authored by Kathryn Grayson Nanz

We often describe AI models as “thinking,” but what is actually happening when an AI model “thinks”? When it’s drafting a response to us, how does it know what to say?

One of the most tempting (and common) misunderstandings related to AI models is the perception that they “think”—or have awareness of any kind, for that matter.

This is primarily a language problem: we (meaning humans) like to use the words and experiences that we’re most familiar with as a shorthand to communicate complex ideas. After all, how many times have you seen a webpage slowly loading and heard someone say “hang on, it’s thinking about it”? We “wake” computers up from being in “sleep” mode, we initiate network “handshakes,” we get annoyed with memory-”hungry” programs.

In the same way, we often describe AI models as “thinking,” sometimes even including the directive to “take as much time as you need to think about this” when prompting them! But what is actually happening when an AI model “thinks”? When it’s drafting a response to us, how does it know what to say?

The short answer is that AI models (especially text-focused LLMs, which we’ll use as the example for the rest of this article) are highly advanced token prediction machines. They use neural networks (a type of machine learning algorithm) to identify patterns across large contexts. Based on decades of research about how sentences are structured in a given language (like the prevalence of various words, and the statistical likelihood that one specific word will follow another), modern AI models are able to combine tokens into words, and then words into sentences.

Predictive Language Models

For the long answer … we actually have to start all the way back in the 1940s. Cryptography and cypher-breaking technology was developing at a breakneck pace in an attempt to intercept and decrypt enemy communications during WWII. If you could recognize and crack even one or two letters in an enciphered communication, these new predictive methods could be used to help determine what the other letters were likely to be.

For example, in English “E” is the most commonly used letter, and “T” and “H” are often used together. If we know that one letter in a word is “T,” we can calculate the likeliness that the next letter will be “H” (spoiler alert: it’s pretty high). This same probability calculation can be extended from letters to words, from words to phrases, and from phrases to sentences. If you’re interested in the true deep dive, you can still read the 1950 paper published about these learnings: “Prediction and Entropy of Printed English” (which, by the way, is where those earlier facts about “E,” “T” and “H” come from). If you want the overview, watch The Imitation Game (actually, just watch The Imitation Game anyway; it’s a great movie).

Fast-forward to today: computers have offered us ways to analyze huge amounts of language data in ways that were simply not available in the 1950s. Our knowledge on this topic and our ability to predict content has only gotten better over the last 70+ years.

When we’re training large language models (LLMs), most of what we’re doing is giving them these huge samples of language—which, in turn, allows them to leverage these predictive models to more accurately identify and generate specific word, phrase and sentence combinations. You can think of it like the predictive text on your smartphone, but with the dial turned up to 1000 because it’s not just looking at samples of how you text, it’s looking at millions of samples demonstrating various ways that humans have communicated in a given language over hundreds of years.

Tokens

However, it would be a bit of a misrepresentation to say that LLMs are “thinking” in words. In fact, LLMs process language via tokens which can be (but aren’t always) entire words. Tokens are the smallest units that a given language can be broken down into by a model.

If you’re familiar with design systems, you might have heard of design tokens. Design tokens are the smallest values in a design system: hex colors, font sizes, opacity percentages and so on. In the same way, language tokens can be thought of as the smallest pieces that words can be broken down into. This is commonly aligned with prefixes, suffixes, root words, possessives, contractions, etc., but can also include units that aren’t necessarily based on human language structure.

This is done for both flexibility and efficiency: for example, if you can train an English-based model to recognize “draw” and “ing,” then you don’t have to explicitly teach it “drawing.” The same idea can be extended to things like “has” or “should” + “n’t” and “make” or “teach” + “er.” This can also help it make “educated guesses” at user input words that weren’t included in its training material. So if a user says they’re “regoogling” something, the LLM can identify the prefix “re-”, the name “Google” and the suffix “-ing” and cobble together something reasonably close to a working definition.

Because of the intrinsic role they play in AI functionality, tokens have become one of the primary ways we measure various AI models. Tokens are used to measure the data that models are trained on (total tokens seen during training), how much a model can process at a given time (known as the context window), and—as you already know if you’re a developer building apps that integrate with popular foundation models—API usage (both input and output) for the purposes of monetization.

Temperature

Adjusting these predictive computations that determine which tokens are most likely to follow other tokens is also part of how we can shape the model’s responses. The temperature of an AI model refers to how often the model will choose tokens that are less statistically likely.

A model with a low temperature is more conservative; when selecting the next word in its predictive text chain, it will choose options that have a higher percentage of occurrence. For instance, a low temperature model would be far more likely to say “My favorite food is pizza” than “My favorite food is tteokbokki,” assuming it was trained on data where “pizza” followed the words “My favorite food is” 70% of the time and “tteokbokki” only followed 15% of the time. Increasing the temperature of the model increases the percentage of times the model will choose the less-popular token by flattening the probability distribution; lowering the temperature sharpens the distribution, making less-common responses less likely.

To be clear, these are made up statistics for the purpose of illustration—if we aren’t training a model ourselves, we cannot know what the actual percentage of occurrence is for these kinds of things (unless the people doing the training offer to share that information, which is rare).

A model with a low temperature is more predictable, whereas a model with a high temperature will be more novel—but also more prone to mistakes. As IBM says: “A high temperature value can make model outputs seem more creative but it's more accurate to think of them as being less determined by the training data.”

Ultimately, the temperature of the model should be determined based on its purpose and acceptable room for error. If you’re using an AI model in a professional application to answer questions about a company’s products, you probably want a very low temperature; the tolerance for error in that situation is low, and you don’t want the AI to offer less-common results. However, if you’re using a model personally to help you brainstorm D&D campaign ideas, a higher temperature could offer you less common suggestions (plus, you’re probably less bothered in this situation by results that don’t make sense).

Regardless of temperature, however, it’s important to acknowledge that if content is included in the training data, there’s some chance (no matter how low) that it will be selected for inclusion in a model’s response. Even with a very low temperature model, there’s still a non-zero chance that it will choose the less popular answer. Why not just always set models at the most conservative temperature? Mostly because, at that point, we could just program a set of dedicated responses—most users of LLMs (and generative AI models, in general) want the “intelligence” that comes with not getting exactly the same answer every time. After all, LLMs aren’t retrieving sentences from training data via a lookup-table; their primary benefit is in their ability to generate new sequences token-by-token based on what they’ve “learned.”

Bias

Finally, it’s worth noting that this also plays into how bias occurs in AI systems. To return to the food example we used when discussing temperature: it’s entirely possible for us to curate a dataset in which “tteokbokki” occurs more often than “pizza” and then train a model on that. In that case, if we were to ask the model about the food most people like the best, it would be more likely to say “tteokbokki” even though that’s (probably) not reflective of the general population.

Obviously, this is less of a concerning issue if we’re just talking about food—but more concerning for issues related to sex, gender, race, disability and more. If a model is trained on data where doctors are more often referred to with he/him pronouns, it will in turn be more likely to return content identifying doctors as male. If slurs or hate speech are included in significant percentages, that content will be returned by the model at a level reflective of its training data (unless actively mitigated, as described below). This can be further reinforced by feedback and responses from users that are referenced by the model as context or in post-training.

As you might imagine, this is a common issue for models trained on information scraped from the internet: from chat logs, message boards, forums and more. It is possible to counteract this by excluding harmful content from the training data or by including data that intentionally balances occurrences of specific content (i.e., including the phrases “She is a doctor.” and “They are a doctor.” at equal percentages to “He is a doctor.”). It can also (sometimes) be filtered on the output side, by building in checks for specific words and prompting the model to re-create the response if it includes forbidden content. However, this must be an intentional choice implemented by those responsible for creating the training data and maintaining the model.


This content originally appeared on Telerik Blogs and was authored by Kathryn Grayson Nanz


Print Share Comment Cite Upload Translate Updates
APA

Kathryn Grayson Nanz | Sciencx (2026-02-26T00:24:03+00:00) AI Crash Course: Tokens, Prediction and Temperature. Retrieved from https://www.scien.cx/2026/02/26/ai-crash-course-tokens-prediction-and-temperature/

MLA
" » AI Crash Course: Tokens, Prediction and Temperature." Kathryn Grayson Nanz | Sciencx - Thursday February 26, 2026, https://www.scien.cx/2026/02/26/ai-crash-course-tokens-prediction-and-temperature/
HARVARD
Kathryn Grayson Nanz | Sciencx Thursday February 26, 2026 » AI Crash Course: Tokens, Prediction and Temperature., viewed ,<https://www.scien.cx/2026/02/26/ai-crash-course-tokens-prediction-and-temperature/>
VANCOUVER
Kathryn Grayson Nanz | Sciencx - » AI Crash Course: Tokens, Prediction and Temperature. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2026/02/26/ai-crash-course-tokens-prediction-and-temperature/
CHICAGO
" » AI Crash Course: Tokens, Prediction and Temperature." Kathryn Grayson Nanz | Sciencx - Accessed . https://www.scien.cx/2026/02/26/ai-crash-course-tokens-prediction-and-temperature/
IEEE
" » AI Crash Course: Tokens, Prediction and Temperature." Kathryn Grayson Nanz | Sciencx [Online]. Available: https://www.scien.cx/2026/02/26/ai-crash-course-tokens-prediction-and-temperature/. [Accessed: ]
rf:citation
» AI Crash Course: Tokens, Prediction and Temperature | Kathryn Grayson Nanz | Sciencx | https://www.scien.cx/2026/02/26/ai-crash-course-tokens-prediction-and-temperature/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.