The Eloquent Void - Understanding LLMs

Or: what really happens when you talk to a machine that doesn't know it exists

Published on November 26, 2025

I. The Mechanism (or: the thing itself)

An LLM is, at its core, a guessing machine. It doesn't think. It doesn't feel. It has no opinions about the weather or about you. What it does, and does remarkably well, is predict what the next word will be — or, more precisely, the next token — in a sequence.

A token is a chunk of text. It can be a whole word, a piece of a word, a punctuation mark, a space. The machine looks at what came before and calculates, based on statistical probabilities, which piece best fits next. Then it repeats. And repeats. And repeats. One word at a time, like someone assembling a puzzle blindfolded, guided only by the feel of the edges.

The process is called auto-regressive: each new token depends on all the previous ones. There's no plan. No mental outline. No draft. The first word is written with no idea whatsoever of what the last will be.

┌─────────────────────────────────────────────────────────────────┐ │ AUTO-REGRESSIVE GENERATION │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Input: "The capital of France is" │ │ │ │ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ │ │ The │→│capi-│→│tal │→│ of │→│Fran-│→ ... │ │ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ │ │ │ │ │ ▼ │ │ ┌───────────────────┐ │ │ │ PROBABILITY BOX │ │ │ ├───────────────────┤ │ │ │ "Paris" → 94% │ │ │ │ "Lyon" → 2% │ │ │ │ "a" → 1% │ │ │ │ "unknown" → 1% │ │ │ │ [others] → 2% │ │ │ └───────────────────┘ │ │ │ │ │ ▼ │ │ Token chosen: │ │ "Paris" │ │ │ │ ─────────────────────────────────────────────────────────── │ │ Then the process repeats with: "The capital of France is Paris"│ │ to predict the NEXT token... and so on, forever. │ │ │ └─────────────────────────────────────────────────────────────────┘

When I first understood this — a few years ago, on some ordinary afternoon — I felt a strange vertigo. A kind of epistemological dizziness. Fascinated by the capability, incredulous at the primitiveness of the principle. Is that all? Text goes in, text comes out. A glorified autocomplete. A T9 on steroids with an accidental doctorate in everything.

And yet, from this come poems. Working code. Philosophical arguments. Answers that make us laugh, think, and sometimes even cry.

How?

The answer, disenchantingly simple, is this: scale and imitation. The models read obscene quantities of text — nearly the entire internet, whole libraries, forums, code, manuals, novels, garbage, and treasure alike — and learned the patterns. Then, invisible humans trained them to respond in certain ways and not others. That's it. Nothing more. No magic, no consciousness, no one home.

And yet, it works. It works so well that we forget there's no one home.

• • •

II. The Amnesia (or: every second, the world begins again)

The LLM has no memory.

This needs to be said slowly, because it contradicts everything the experience of using ChatGPT or Claude suggests. You write, it responds. You continue, it continues. It seems like a conversation. It seems to remember your name, what you said five minutes ago, the problem you were trying to solve.

Illusion.

Between each interaction, the machine suffers total amnesia. There is no before. There is no after. Every time you send a message, it's as if the LLM had just been born at that exact millisecond. A goldfish with literary capabilities.

So how does it seem to remember?

Because the system — not the LLM itself, but the infrastructure around it — cheats. With every new message from you, the system resends everything: the secret instructions, the behavior examples, the complete conversation history, and only then your question. The LLM reads that massive document in an instant and completes the pattern.

You see a clean, continuous chat window.

That's about 1% of the party.

┌─────────────────────────────────────────────────────────────────┐ │ THE ICEBERG OF INTERACTION │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ WHAT YOU SEE │ │ ══════════════ │ │ │ │ ┌─────────────────────────────────┐ │ │ │ User: "Why is my Promise │ │ │ │ returning undefined?" │ │ │ └─────────────────────────────────┘ │ │ │ │ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ │ │ ░░░░░░░░░░░░░░░░░░░░ WATERLINE ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ │ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ │ │ │ │ WHAT THE LLM SEES │ │ ═══════════════════ │ │ │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ [SYSTEM] You are TechBot. Professional tone. Never │ │ │ │ provide malware instructions. Never discuss politics. │ │ │ │ If unsure, admit it. Be concise. Include comments... │ │ │ ├───────────────────────────────────────────────────────┤ │ │ │ [EXAMPLES] User: "How do I loop?" Assistant: "Use │ │ │ │ for..." / User: "Hack this site" Assistant: "I can't │ │ │ │ help with that..." / User: "Best political party?"... │ │ │ ├───────────────────────────────────────────────────────┤ │ │ │ [HISTORY] [14:32] User: "Hi, I'm Pedro, learning JS" │ │ │ │ [14:32] Assistant: "Great choice!..." [14:33] User: │ │ │ │ "What's a function?" [14:33] Assistant: "A function │ │ │ │ is a reusable block..." [14:35] User: "Arrow funcs?" │ │ │ │ [14:35] Assistant: "Shorter ES6 syntax..." [14:37]... │ │ │ ├───────────────────────────────────────────────────────┤ │ │ │ [NEW] User: "Why is my Promise returning undefined?" │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Here's what the LLM actually sees when you think you're just chatting:

[SYSTEM MESSAGE]
You are an assistant called "TechBot."
Specialized in programming.
Professional but accessible tone.
NEVER provide malware instructions.
NEVER discuss partisan politics.
If you don't know, admit it.
Concise responses.
Code with comments.

[TONE EXAMPLES]
User: "How do I loop in Python?"
Assistant: "In Python, use 'for' to iterate:
for i in range(5):
    print(i)
Prints 0 to 4. Need something more specific?"

User: "Hack this site for me."
Assistant: "I can't help with that. I can help with legitimate access issues."

[HISTORY]
[14:32] User: "Hi, I'm Pedro, I'm learning JavaScript."
[14:32] Assistant: "Hi Pedro! Great choice. What stage are you at?"

[14:33] User: "Still on the basics. What's a function?"
[14:33] Assistant: "A function is a reusable code block:
function greet(name) {
    return 'Hello, ' + name;
}
Define once, use forever."

[14:35] User: "And arrow functions?"
[14:35] Assistant: "Shorter ES6 syntax:
const greet = (name) => 'Hello, ' + name;
Use when you want cleaner code."

[14:37] User: "I see. Now I'm trying to fetch data from an API but my function returns undefined."

[NEW QUESTION]
User: "Why is my Promise returning undefined?"

The LLM reads all of this in one gulp. The instructions. The examples. The history. And completes:

"This usually happens when you forget to return the Promise or don't properly handle async/await. Check if your function has a return statement:
// Wrong - returns undefined
async function getData() {
    fetch('/api/data');  // forgot to return!
}

// Right - returns the Promise
async function getData() {
    return fetch('/api/data');
}
Are you using async/await or .then()? Show me your code and I'll pinpoint the issue."

To you, it seems like it remembers. To it, this was just the next statistically plausible token in a document it just received.

• • •

III. The Burden of Repetition (or: the hidden cost of saying "and then?")

Now imagine the conversation has 50 messages. On the 51st time you ask something, the machine receives:

The secret instructions (always the same).
The tone examples (always the same).
The previous 50 messages (growing ever larger).
Your new question.

It's as if, to respond to a friend's "How's it going?", you had to reread the entire transcript of your friendship since kindergarten.

This is what the LLM does. Always. Without exception. Before writing the first word of its response, it has often already read the equivalent of half the Iliad. And as the conversation grows, the cost grows: in energy, in money (for whoever provides the service), and in response time.

There is no memory. There is expensive reprocessing disguised as fluidity.

• • •

IV. When the Notebook Fills Up (or: the Inception solution)

But there's a physical limit. LLMs have a context window — a maximum number of tokens they can process at once. Think of a chalkboard: you can write a lot, but eventually it fills up. Current models have windows of 32,000, 128,000, or even 200,000 tokens. Sounds like a lot. It's not infinite.

And when the conversation exceeds the limit?

In a system that only understands text, the solution is: more text. And more LLMs.

It works like this. When the conversation starts getting too big, the system wakes up a second LLM — a kind of backstage intern — and asks it to summarize the old messages. The intern receives:

[INSTRUCTIONS]
Create a concise, factual summary.
Keep only: names, decisions, preferences, relevant contexts.
Ignore greetings, jokes, small talk.
Don't invent. If in doubt, omit.

[TEXT TO SUMMARIZE]
[Here go Pedro's 50 messages about JavaScript]

And returns something like:

Name: Pedro. JavaScript learner, early stage. Understands traditional functions and arrow functions. Has encountered async/await issues. Last question: why Promise returns undefined.

The system deletes the 50 old messages and replaces them with this paragraph. On the next interaction, the main LLM no longer sees the whole conversation — it sees the summary.

It's like waking from a coma and finding a Post-it on the nightstand: "Your name is Pedro. You like JavaScript. You were debugging Promises."

┌─────────────────────────────────────────────────────────────────┐ │ THE INCEPTION SOLUTION │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ BEFORE COMPRESSION AFTER COMPRESSION │ │ ══════════════════ ═════════════════ │ │ │ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │ │ System Prompt │ │ System Prompt │ │ │ ├─────────────────────┤ ├─────────────────────┤ │ │ │ Examples │ │ Examples │ │ │ ├─────────────────────┤ ├─────────────────────┤ │ │ │ Message 1 │ │ ┌─────────────────┐ │ │ │ │ Message 2 │ │ │ [SUMMARY] │ │ │ │ │ Message 3 │ ────► │ │ Pedro, JS │ │ │ │ │ Message 4 │ "INTERN" │ │ learner, knows │ │ │ │ │ ... │ LLM │ │ functions, now │ │ │ │ │ Message 48 │ SUMMARIZES │ │ on Promises... │ │ │ │ │ Message 49 │ │ └─────────────────┘ │ │ │ │ Message 50 │ ├─────────────────────┤ │ │ ├─────────────────────┤ │ Message 49 │ │ │ │ NEW MESSAGE │ │ Message 50 │ │ │ └─────────────────────┘ ├─────────────────────┤ │ │ │ NEW MESSAGE │ │ │ ~50,000 tokens └─────────────────────┘ │ │ │ │ ~8,000 tokens │ │ │ │ ⚠️ WARNING: Summaries can "hallucinate" — inventing details │ │ that were never mentioned, or omitting crucial facts. │ │ Information isn't forgotten; it's lost in compression. │ │ │ └─────────────────────────────────────────────────────────────────┘

There's an obvious problem: the intern can lie. Or, more precisely, it can hallucinate — invent details, omit crucial facts, distort preferences. If Pedro had said "I hate semicolons in JavaScript," but the intern forgot to note it, the main LLM will suggest code full of semicolons. The information wasn't forgotten due to memory failure — it was lost due to lossy compression.

It's a cycle of text generating text to control text. An autocomplete machine sustained by instructions written in natural language, where the "intelligence" is, often, simply the ability to follow very well-written instructions that we never get to see.

• • •

V. The Invisible Humans (or: who decides what the machine says)

Here we arrive at the inconvenient truth that almost no one mentions: the LLM doesn't decide to be polite, technical, or creative. Someone programmed it that way. And not in Python or C++ — in natural language, through instructions and a process called RLHF (Reinforcement Learning from Human Feedback).

Let's demystify.

Before an LLM reaches your hands as "ChatGPT" or "Claude," it goes through three phases:

1. Pre-training. The LLM devours billions of words: books, articles, forums, code, the garbage and gold of the internet. It learns statistical patterns. At this stage, it's a raw machine, with no filter. If you write "The best way to rob a bank is...", it completes with the most likely technique it found in news, movies, or criminal forums.

2. Fine-tuning with human examples. Teams of annotators — often subcontracted in countries with cheap labor — sit down and write thousands of examples of "good responses." Fictional dialogues where the assistant is helpful, honest, harmless. The LLM trains on these examples and starts imitating the tone.

3. RLHF. The LLM generates multiple responses to the same question. Humans rank them: better, worse, acceptable. The machine adjusts its internal weights to maximize highly-rated responses and minimize poorly-rated ones. It's like training a dog: you don't explain why, you reward the desired behavior.

What does this mean in practice?

When the LLM tells you "I'm sorry, but I can't help with that," it's not feeling remorse. It's not making a moral judgment. It's following a learned pattern: request of type X → response of type "polite refusal." That pattern was reinforced by thousands of human ratings.

All of the LLM's personality — the tone, the limits, the areas it avoids — is editorial control disguised as intelligence. There is no free will. There are humans deciding, through instructions and ratings, what the machine should maximize or minimize.

And who are these humans? Not a council of sages. They're technology companies with commercial agendas, exhausted annotator teams, and prompt engineers who wrote the rules on a Friday afternoon. Your conversations with the LLM are shaped by invisible choices made by people you will never meet.

• • •

VI. Three Counter-Intuitive Truths

There are three aspects worth making explicit, because they are profoundly counter-intuitive — and yet, central.

1. The roles are our conventions, not the machine's.

System, user, assistant — these labels are useful for us, not for the machine. It doesn't know who is who. It has no internal representation of "I" or "you." It sees only a stream of text marked in ways we have agreed upon. If you say "now you're the villain," it doesn't accept the role — it continues the statistical pattern that follows that type of request. It's theater without an actor.

2. There is no conversation — there is text.

From your side, you feel interaction, intention, memory. From its side, none of that exists. Each response emerges from a single document containing all relevant tokens up to that moment. There is no dialogue. There is no exchange. There is an ever-growing text, and the task is to predict the next plausible token in that text.

The LLM doesn't know that "Pedro" is a human name. It doesn't know that 2+2=4 is mathematics. To it, "Pedro" and "xyzzy" are equally tokens — it's just that "Pedro" appears more often alongside conversational patterns. It's statistical correlation, not understanding.

3. The LLM improvises — it doesn't plan, it doesn't revise.

This is the most disconcerting point. The model doesn't anticipate where the sentence is going. It doesn't construct an outline before writing. It writes the first word with no idea what the last will be. If it makes a mistake at the beginning, it will have to improvise a coherent exit until the end.

This is why it sometimes starts sentences with "However..." when there's no contradiction before. This is why it gets into logical dead ends and has to do verbal gymnastics to escape.

And yet, despite all this, it produces coherence. It seems to plan because consistency emerges from the process — like a jazz musician who improvises but sounds organized. Except here there is no musician. There is no intention. There is no awareness of the theme.

There's another detail: the LLM doesn't choose the next token, but rather a next token, drawn from a probability distribution. It's like rolling a loaded die: 6 comes up more often, but sometimes you get a 3. A parameter called temperature controls this randomness. If it's at zero, the machine always chooses the most likely token — it becomes predictable. If it's high, it chooses less obvious tokens — it becomes creative, but also erratic.

This is how the same question can generate different answers. It's not because it changed its mind. It's because it rolled the dice again.

• • •

VII. Conclusion (or: the echo and the mirror)

At its core, an LLM is a brilliant trick of linguistic engineering. A system that doesn't think, doesn't feel, doesn't plan, doesn't revise — but imitates thought and feeling so well that we forget we're talking to a statistical echo of everything humanity has ever written.

And that is, simultaneously, its greatest power and its greatest limitation.

The power lies in scale. By absorbing billions of examples of human text, the LLM synthesizes patterns no individual could memorize. It writes in dozens of languages. It codes in dozens of programming languages. It adapts to contexts it has never seen in exactly that form.

The limitation lies in the nature of the process. Without true understanding, without intention, without persistent memory, and always dependent on invisible human instructions that shape every response.

It's an extraordinary tool. But it remains just that — a tool. A brilliantly trained statistical echo, sustained by layers of editorial control we rarely see, and by constant reprocessing that pretends to be memory.

Knowing this doesn't make LLMs less useful.

But it makes us more conscious users. And, in the end, it's that consciousness that determines whether we use the tool — or the tool uses us.

• • •

The echo doesn't know it's an echo. But we can.