Asking a large language model (LLM) how it works: the question is mine, the answer is ours — so whose article is this?

AI 2026-05-24 · Satsuma Creative · 7 min read

I spent an afternoon asking Claude: how does an LLM actually work? From "predicting the next word" to self-attention and QKV, and then to a more uncomfortable question — which things no longer need humans at all.

— The question is mine, the answer is ours, so whose article is this?

Trying to understand how LLMs work, I sat down and asked Claude.

Not a work problem — the kind of question where you sit down, make a cup of tea, and just want to figure something out.

I asked: how does a large language model actually work?

And then we talked for a long time.

Starting from "predicting the next word"

What an LLM does is, on the face of it, very simple.

It looks at a stretch of text and guesses what the next word is. Guess right, reinforce that direction. Guess wrong, adjust. Repeat a few billion times.

That's it.

But those two words — "that's it" — hide a great deal.

To keep guessing the "next word" correctly, the model has to build some internal understanding of language, some model of the world. It isn't really looking words up in a dictionary; it's calculating the distance and direction between every word and every other word in a vector space of thousands of dimensions.

"animal" and "dog" sit close together, because they tend to appear in the same contexts. "animal" and "street" sit far apart, because what surrounds them is completely different. No one told the model this. The model discovered it on its own, from statistical regularities.

This is the key point.Meaning isn't defined in — it emerges.

QKV: how one word asks the others

At the heart of the Transformer architecture is a mechanism called self-attention.

As each word enters this mechanism, it plays three roles at once:

Q (Query) — what am I looking for?
K (Key) — what can I answer?
V (Value) — once found, what do I pass on?

Here's an example:

「The animal didn't cross the street because it was too tired」

The word "it" takes its Q vector and computes a dot product with the K vector of every word in the sentence, producing a relevance score. "animal"'s K points in a direction close to "it"'s Q, so the score is high. "street"'s K points elsewhere, so the score is low.

Softmax turns these scores into probabilities, and those probabilities decide how much meaning "it" borrows from which words.

In the end, the word "it" absorbs most of the meaning of "animal" and carries on.

This isn't a lookup table. It isn't a rule. It's a geometric relationship between vectors — a geometry shaped by training.

So who decides what's "correct"?

Training happens in several stages.

Stage one: pre-training. The model consumes almost all the text humans have ever written, endlessly guessing the next word and endlessly adjusting. No one intervenes at this stage; it's entirely automatic.

Stage two: fine-tuning and RLHF. Human evaluators tell the model which answer is better, and that signal is used to keep training it.

In stage one, the model learns the patterns of language and knowledge about the world. In stage two, it learns "what humans consider helpful and what they consider safe."

There's a fundamental limit here:

What the model learns isn't truth — it's the statistical regularities of human text. Text on the internet contains errors, biases, and contradictions. The model absorbs all of it. RLHF can correct some of this, but not all of it.

This is why LLMs hallucinate — what they generate is "the most plausible-sounding continuation," not "a fact checked against a database."

The fundamental difference between AlphaGo and LLMs

I asked Claude: could the model train itself by talking to itself, the way AlphaGo Zero did?

Claude said: not really.

AlphaGo Zero could reinforce itself because Go hasan objective standard for winning and losing. When it plays against itself, win or lose is perfectly clear, and the signal is clean.

Language has no winner. "Is this sentence good?" has no single answer. If you let a model train by talking to itself, it would only reinforce its own existing biases, with no external signal to correct it. This problem is called model collapse.

But I felt Claude's answer wasn't complete

I thought for a moment and said:

"Guessing the next word correctly — that does have an answer, doesn't it? On the dimension of whether something works, why can't it reinforce itself?"

Claude paused.

Then said: you're right.

On the dimension of "does it work" — does the code run, is the math proof correct, is the logic free of contradiction — these all have objective answers that a machine can verify on its own. In these domains, AI no longer needs humans to supply new original content in order to improve。

This is why o1, o3, and DeepSeek-R1 suddenly leapt in capability. They use "reinforcement learning with verifiable rewards," letting the model reinforce itself on tasks with clear answers, without a human evaluating every step.

In 2024, AlphaProof reached near-gold-medal performance at the International Mathematical Olympiad. Those proofs — the correct answers weren't in any human record; the model created them itself.

Where is that line?

Go has a clear winner and loser, and AI has surpassed humans.

Math has formal verification, and AI is beginning to surpass humans.

Code has testing systems, and AI has surpassed humans in many scenarios.

Language, creativity, value judgment — these still have no objective standard, and AI is still capped by humans.

But that line is moving.

As more and more tasks are converted into a "verifiable" form, AI can shed its dependence on human-generated text in more and more domains and begin to genuinely transcend itself.

So the question isn't "when will AI surpass humans," but:

What things can be converted into problems with objective answers?

What can be converted, AI can surpass humans at. What can't, AI still needs humans for.

What makes me uncomfortable

The internet is now flooded with AI-generated text.

Before 2020, the internet was almost entirely written by humans. Not anymore.

When the next generation of models crawls this data to train on, it will partly "read what its own descendants wrote." Original human text is becoming a scarce resource.

This isn't a doomsday prophecy — it's a real problem, one the AI training field is facing right now.

And it has no easy answer, because it's very hard to label "this was written by a human" and "that was written by a machine" across the internet.

The question is mine, the answer is ours, so whose article is this?

Within the scope of "does it work" — AI barely needs humans anymore.

Within the scope of "is it good, is it right, is it worth it" — it still does.

But that scope is shrinking.

And then, back to this article itself.

The questions were mine to ask. The process was one we walked through together. And this final article was put together by Claude — in my voice, following the order of our conversation, writing down what happened today.

I'm not pretending I wrote this alone.

But I also can't say it's not mine at all. I decided the direction of the questions, I caught what didn't sound right, I judged which parts were worth digging into. That judgment is still mine to make, for now.

As for whether this counts as "writing," I don't know.

The way this article came to be is itself an example of what we talked about today. You decide for yourself.

Further reading: - What do I call Claude? And how we get along - Is AI a mirror, or another person? - English inside Chinese bones: what do we lose working with AI in Chinese? - Reflections after six months with Claude.ai