What is the difference between generative AI, large language models, and AI agents?

They describe three layers of the same stack. Generative AI is the broad category of systems built to create something new rather than classify or score existing data. A large language model is one type of generative AI that produces text by predicting the next token. An AI agent is what you get when you put a language model in a control loop, give it tools, and let it work toward a goal across multiple steps. Mixing the terms up leads to confused expectations about what a system can actually do.

What is generative AI?

Generative AI covers any AI system whose main job is to create new content rather than answer questions about data that already exists. That output might be text, an image, audio, code, or a protein structure. It differs from discriminative systems, which look at an input and sort it into a bucket, such as spam detection or fraud scoring. A search engine or a recommendation engine is not generative on its own, even when it uses generative AI to summarize results.

Are generative AI and large language models the same thing?

No. Generative AI is the broad category, and a large language model is one kind of generative AI that works with text. The category also includes diffusion models and generative adversarial networks for images, video, and audio, code generation systems, and scientific models that generate molecules or materials. What ties them together is that they create. What sets them apart is how they are built and what they are trained on.

Why do large language models hallucinate?

A language model was trained to produce text that looks like its training data, rather than text that is verified to be true. Truth and plausibility overlap a great deal when the training data is good, but they come apart in exactly the situations where accuracy matters most. So a model can state something false with complete confidence and perfect grammar. This is a property of how the systems work, which is why production uses feed models fresh, grounded information instead of trusting recall.

What is the difference between a chatbot and an AI agent?

A chatbot answers and an agent acts. Ask a chatbot for the weather in Paris and it will tell you it has no real-time data. Ask an agent with a weather tool the same question and it will call the API and return the forecast. The difference is the control loop and the tools. An agent can call external tools and work toward a goal across several steps of reasoning, while a plain chatbot only generates text from the context it was given.

An AI agent is a language model placed in a control loop and given two new powers: it can call external tools, and it can work toward a goal across multiple steps. A few pieces make it work in practice, including a model doing the reasoning, a set of described tools the model can call, some form of memory, a controller that runs the loop and enforces limits, and an environment the agent perceives and affects. The model on its own is a text machine. The loop and the tools are what let it do things in the world.

What does the reason, act, observe loop mean for an AI agent?

It describes the cycle at the center of an agent. You give the model a goal. It reasons about what to do next, then acts by asking to call a tool such as a web search, a database query, or a function in your code. The tool runs and the result comes back as new context, which the model observes. It then reasons again and either calls another tool or decides the goal is done. That loop is the difference between a system that answers and one that gets work done.

What is a training cutoff in a language model?

A training cutoff is the date where a model's knowledge of the world stops, set by when its training data ends. A model trained on data through a given month knows nothing about events after it. This is why a model cannot reliably answer questions about current events unless you supply fresh information at the moment you ask. It is one of the main reasons production systems connect models to live data sources.

AI 101: Generative AI, Agents, and LLMs

Q: What is a large language model?

A large language model is a statistical model trained to predict the next token in a piece of text, where a token is a small chunk such as a word, part of a word, or a punctuation mark. Given the tokens that came before, it assigns probabilities to every possible next token and picks one, then repeats. Scaled up to hundreds of billions of parameters and trained on a large body of text, that simple objective produces a system that can explain, translate, write in different styles, and hold a conversation.

Q: What is a context window in a large language model?

The context window is the limit on how much text a model can pay attention to at once. Everything the model knows about your specific problem has to fit inside it, including your instructions, the conversation so far, any documents you want it to consider, and the response it is producing. Context windows have grown from a few thousand tokens in early models to millions in current ones, but they remain finite. Filling them has real costs in both speed and money.

If you've read anything about AI in the past two years, you've probably seen three terms used as if they meant the same thing: generative AI, large language models, and agents. They don't. They describe three different layers of the same stack, and mixing them up leads to a lot of confused conversations, inflated expectations, and the occasional expensive mistake.

So let's sort them out. By the end of this piece, you should be able to define each one clearly, explain how they relate to each other, and spot why the difference actually matters.

Generative AI: the broadest term

Start at the top. Generative AI is the biggest umbrella of the three. It covers any AI system whose main job is to create something new, rather than classify, score, or predict things about data that already exists.

That distinction is worth slowing down on. For most of the history of applied machine learning, AI systems were built to answer questions about things you showed them. Is this email spam? How likely is this customer to cancel? Which products should we recommend? Those are discriminative tasks. The system looks at an input and puts it in a bucket.

Generative systems do something different. They produce something that wasn't there before. A paragraph of text. A photograph of a person who doesn't exist. A short tune. A block of code. A protein structure. The model has learned enough about the patterns in its training data to make new examples that plausibly belong to the same world.

This is why generative AI is a category, not a specific technology. It includes:

Large language models, which generate text.
Diffusion models and generative adversarial networks (GANs), which generate images, video, and audio.
Code generation systems, which are really just specialized text generation.
Scientific models that generate protein structures, molecules, or materials.

What ties them together is what comes out the other end: they create. What sets them apart is how they're built and what they're trained on.

It's also worth knowing what generative AI isn't. A search engine isn't generative on its own, even if it now uses generative AI to summarize results. A recommendation engine isn't either. It picks from things that already exist. A fraud-detection model isn't generative. A lot of the AI running quietly in production today is still discriminative, and that isn't going away.

Large language models: the engine behind most generative text

Within generative AI, the subcategory that gets the most attention right now is the large language model, or LLM. When most people say “AI” today, they usually mean an LLM, whether they know it or not.

At the most basic level, an LLM is a statistical model trained to predict the next token in a piece of text. A token is a small chunk of text: usually a word, part of a word, or a punctuation mark. Given the tokens that came before, the model assigns probabilities to every possible next token and picks one. Repeat that a few thousand times, and you get coherent paragraphs.

That description is accurate but also a little misleading. It's a bit like describing a movie as “a series of still pictures shown quickly.” True, but it leaves out why the thing is interesting. What makes LLMs remarkable is that this simple objective, scaled up to hundreds of billions of parameters (the internal dials the model learns during training) and fed a huge chunk of everything humans have ever written down, produces a system that can seem to reason, explain, write in different styles, translate, and hold a conversation.

A few properties of LLMs are worth knowing, even if you never plan to build one.

Context window. An LLM can only pay attention to so much text at once. That limit is called the context window. Everything the model “knows” about your specific problem has to fit inside it: your instructions, the conversation so far, any documents you want it to consider, and the response it's working on. Context windows have grown a lot in a short time, from a few thousand tokens in early models to millions in current ones, but they're still finite, and filling them up has real costs in both speed and money.

Training cutoff. An LLM's knowledge of the world stops at whatever date its training data stops. If the model was trained on data through, say, last April, it doesn't know anything about what happened in May. This is why LLMs can't reliably answer questions about current events unless you feed them fresh information at the moment you ask.

Non-determinism. Ask an LLM the same question twice and you'll often get two different answers. This is partly because the model samples from a probability distribution, and partly because of how modern hardware runs these computations in parallel. If you need reproducible outputs, you have to plan for this.

Hallucination. LLMs will, with total confidence and perfect grammar, tell you things that aren't true. This isn't really a bug. The model was trained to produce text that looks like its training data, not text that is true. Truth and plausibility overlap a lot when the training data is good, but they're not the same thing, and they come apart in exactly the situations where it matters most.

The LLM landscape splits roughly into closed models, built and operated by particular companies and used through APIs (application programming interfaces, the standard way software talks to other software over the internet), and open models, where the company releases the model itself for anyone to download and run. That download is a giant file of numbers, called the model's weights, which are basically the memory of everything the model learned during training. With the weights, you can run the model yourself, assuming you have the hardware for it. The closed group includes most of the frontier models you've heard about, meaning the most capable systems available at any given moment. The open group has come a long way and now includes models that get close to the frontier, if you've got the infrastructure to run them.

Agents: what happens when a model gets tools and a goal

Flowchart of an agent control loop. The controller assembles the prompt, sends it to the model, and the model returns a response. A 'Tool call?' decision follows: if no, the final answer is returned to the user; if yes, the controller parses the tool call, the tool produces a result, the controller appends the result, and the loop repeats.

An LLM on its own is a text machine. Text goes in, text comes out. By itself, it can't look something up, send an email, run a calculation, query your database, or do anything out in the world. That's a real limitation, because the world isn't made of text alone.

An agent is what you get when you drop an LLM into a control loop and give it two new powers: it can call external tools, and it can work toward a goal across multiple steps of reasoning.

The loop at the center of an agent usually gets described in three beats: reason, act, observe. You give the model a goal. It reasons about what to do next. It asks to call a tool, which might be a web search, a database query, a function in your code, or another model. The tool runs. The result comes back to the model as new context. The model looks at the result, reasons again, and either calls another tool or decides the goal is done.

That loop is the difference between a chatbot and an agent. A chatbot answers. An agent does. Ask a chatbot “what's the weather in Paris?” and it'll tell you it doesn't have real-time data. Ask an agent with a weather tool the same thing, and it'll call the API and give you the actual forecast. From a capability standpoint, that gap is enormous.

A few pieces make an agent work in practice:

A model doing the reasoning.
A set of tools the model can call, each one described in a format the model understands.
Some kind of memory, which might be as simple as the conversation history or as involved as a vector database of past interactions.
A controller (or planner) that runs the loop, enforces limits, and decides when the agent is done.
An environment the agent perceives and affects through its tools.

Diagram of how an agent works in practice. A central green hexagon labeled Model takes a Request as input and produces a Solution as output. The Model connects bidirectionally to four surrounding components: Tools (retrieval, computation, action, and handoff tools), Memory (session, long-term, and episodic memory), Controller, and Environment.

None of those pieces is new. What's new is putting them together with a model capable enough to act as the reasoning core. The results can do things that would have seemed like science fiction five years ago. They can also fail in ways that are surprising, embarrassing, and sometimes costly.

The mental model: layers, with one exception

Diagram showing the relationship between Generative AI, Large Language Models, and Agents. Generative AI is shown as a large beige circle containing a smaller purple circle representing the Large Language Model. An Agent is depicted as a dashed green box that wraps around the LLM, containing Tools, Memory, and Control loop components.

Three terms get used almost interchangeably: generative AI, large language models, and agents. They are related, though one of them fits differently from the other two.

Two of them do nest cleanly. Generative AI is the broad class of systems that produce new content. A large language model is one of them, the kind that produces text. The smaller sits wholly inside the larger.

The agent is where the picture changes. An agent is a system built around a model. Take a language model, give it tools it can call, a way to read the results, some memory, and a loop that keeps running until a goal is met, and you have an agent. The model decides what to say or do next. The tools, the memory, and the loop carry those decisions out. The generating happens only in the model. The agent is larger than the model it runs on, and it holds the model inside it.

The layer you are dealing with decides what you should expect and what you should ask. If you are looking at a generative image tool, questions about context windows and tool calls do not apply. If you are looking at a writing assistant running on a language model, asking about agent orchestration is jumping ahead. If you are looking at an actual agent, asking only about the model leaves most of the system out, since the tools, the memory, and the loop weigh as much as the model does.

Why the difference matters

Here's a concrete example. A vendor offers to build your company “an AI.” What are you actually buying?

If what they deliver is a generative model fine-tuned on your documents, you've got a text tool that can draft emails or summaries. Useful, but limited. It can't look anything up, can't verify its own claims, can't take any action for you. If what they deliver is an LLM with retrieval, you've got a system that can answer questions grounded in your data. Better, but still reactive. If what they deliver is an agent, you've got something that can plan, call tools, update records, and act in the world. A lot more powerful. Also a lot more capable of causing trouble when it's wrong.

Three layers, three risk profiles, three cost structures, three engineering challenges, three sets of expectations. Treating them as one undifferentiated thing, which most of the public conversation still does, leads to over-investing in simple problems and under-investing in the hard ones.

Where to go next

This article has been all about vocabulary. It's meant to give you the words you need to follow more technical conversations without getting lost. The next step, if you're going to actually work with any of these systems, is to look more closely at agents specifically: how they're built, what patterns have emerged for building them well, and why going from a working demo to a production deployment is a bigger leap than most teams expect.