How AI works · part 1 of 4

What these tools actually are

A prediction engine, and a product built around it. The distinction carries everything that follows.

Most explanations of how these tools work are either too simplistic or written for engineers. This section explains just enough to use them well — no maths, no jargon — then stops.

If you understand this much, you will make better decisions than most people using these tools, including many who build with them.

The prediction engine

A large language model (an "LLM", the engine inside ChatGPT, Claude, Gemini and the rest) is a program that does one thing: it predicts the next word. You give it some text, and it guesses what is most likely to come next, then the next, then the next.

Strictly speaking, it predicts tokens, not words. A token is usually a word or part of a word. That sounds like a technicality, but it is one reason a model can stumble on spelling, exact word counts, citations, section numbers and number formats: it does not read text quite the way you do.

The model learned to predict by reading an enormous amount of text (much of the public internet, a great many books, and more), adjusting billions of internal settings until it became very good at the prediction game. Nobody programmed in the rules of grammar or the facts of the world. It absorbed patterns from the text until it could continue almost any passage plausibly.

The model and the product

That prediction engine is the core function. But the chatbot you actually use is a product built around the model, and the product adds layers (or "wrapper") the raw LLM model does not have: instructions, safety rules, and tools such as web search, file upload, memory and connectors. Keep that distinction in mind through everything that follows. The LLM model predicts text; the product wrapper around it adds the rest.

The distinction matters. When people worry about what happens to the text they type in — whether it is stored, reviewed, or used to train future models — those are questions about the product and its terms, and they differ from plan to plan. When people are surprised that a model invents a citation, that is the engine doing exactly what it is built to do.

Knowing which layer you are dealing with tells you which question to ask.

What they can and cannot see

The model itself does not remember you just because you once chatted with it. But the product around it may add memory, chat-history search, project context or saved instructions. Check the product's settings before assuming a new chat is private, blank, or disconnected from your earlier work.

Within a single conversation the model can only "see" a limited amount of text at once: its "context window", which is finite and measured in tokens. When a conversation or document is too large to fit, the product may drop older text, summarise it, or keep only the pieces it judges relevant. That helps, but it is lossy: the model can still forget, miss or distort earlier material, which is why a long chat can start to contradict itself.

The thing most people get wrong is to assume the model is aware of them, their matter, or the world as it changes. Unless a tool fetches it, the model sees the text in front of it, in this conversation, and nothing else.

Why their knowledge has a cut-off

A deployed model version has a training cut-off: its internal settings reflect training data up to some point, which is why a model can be brilliant on settled matters and blank on last week's news. Providers can release newer versions, route you to a different model, or add live tools. But during an ordinary chat the model is not quietly learning new facts about the world into its settings.

Some tools now add a live-search step that fetches current information into the context window. That can make an answer more current, but it does not make it automatically correct: the same need to check the source still applies.

Where this comes from

This page is a plain-language summary. The technical detail sits in the providers' own documentation: for example Google's introduction to large language models and Anthropic on the context window.

Last reviewed 10 June 2026