When "meaning comes from difference" becomes a calculable number: confronting word2vec and Saussure with Claude Opus 4.8
For twenty years I read Saussure in grad school and read tarot cards, both resting on one idea: meaning comes from difference. This time I took it to confront word2vec, working through it back and forth with Claude Opus 4.8 — and in the end the ruler measured more than the machine. It measured where I myself had been standing for two decades.
I studied history in grad school, and the foundation of my thesis was Saussure's theory of difference. For twenty years of reading tarot, I've used the same framework to read the cards: a card has no fixed meaning; its meaning comes from how it differs in relation to the other cards, to this particular spread, and to the person sitting in front of me. "Meaning comes from difference" is how I see the world.
This article records a conversation. The other party in it is Claude Opus 4.8 — a large language model, which is to say the very technology this piece is about. I asked it to explain the workings of word2vec (the method that turns "words" into "numbers" for machines, now one of the foundational components in every large language model), and as it explained, the topic slid toward Saussure. What happened next was more interesting than I expected: I didn't work something out on my own. Instead I made a claim, it pushed back; it threw out a framework, I dismantled it; a few rounds in, both sides had been pushed somewhere neither had planned to go.
I've kept this back-and-forth, because the shape of the conversation is itself what the article is about — meaning isn't on my side, nor on its side, but in the difference where we confront each other.
First, what word2vec does
Machines only recognize numbers, not words. So to make a machine handle language, the first hurdle is: how do you turn "cat" into something a machine can compute?
The dumbest method is to give each word a number: cat is 1, dog is 2, queen is 3. But this doesn't work, and it fails in an instructive way. The numbering is arbitrary, and it implies cat and dog are close while cat and queen are farther apart — yet this distance has no basis whatsoever; it's purely an accident of ordering. There's no meaning in the numbers.
word2vec does it differently. Instead of a single number, it gives a whole row of numbers — "cat," for example, is a list of three hundred numbers. This entire row can be thought of as a coordinate on an ultra-high-dimensional map. Every word is a point on this map. And the key property of the map is this: words with similar meanings sit close together. Cat and dog are near each other; cat and queen are far apart.
The crucial thing is that this map isn't drawn by humans — the machine learns it on its own. How does it learn? Through a principle that an old saying in linguistics captures exactly: to know a word, look at the company it keeps. Cat often appears near "feed, fur, catch mice, meow," and dog often appears near "feed, fur, walk, woof"; their neighbors overlap heavily, so the machine infers their meanings are similar and places them together on the map. Queen's neighbors are "king, palace, coronation," which barely overlap with cat's, so it's placed far away.
No one told the machine that "a cat is a kind of pet." It derived meaning entirely on its own, from the patterns of difference in which words co-occur and don't co-occur.
When I understood this mechanism, I stopped. Because this is Saussure, just rewritten as mathematics. I threw this intuition at Claude, and that's when the confrontation really began.
Saussure's hardest line
First, let's lay out the ruler. The most radical thing Saussure said about language is: in language there are only differences, with no positive terms.
To unpack it: a sign's value (valeur) doesn't come from any intrinsic, substantive content it possesses on its own, but entirely from how it is distinguished from the other signs in the system. "Cat" is "cat" not because the sound or the written word contains the essence of a cat, but because it isn't "dog," isn't "tiger," isn't any other word. Identity is defined by negation, by exclusion. I am me because I am not not-me.
What's harsher is the second half: "no positive terms." Saussure isn't merely saying "difference produces meaning"; he's saying that even the final "term" itself isn't a substantive thing. "Cat" isn't an entity computed out of differences — "cat" just is those differences, with no settled core left over. You can never lift "cat" out of the entire language system and point to it like a stone and say, "this is cat."
This is the ruler I took to confront word2vec. My first claim was: at the level of "identity defined by difference," the two are genuinely isomorphic.
Where they match: identity defined by negation
Claude agreed, and added a cut I hadn't thought of.
It pointed out that word2vec's training mechanism hews even closer to Saussure than it appears on the surface. When the model learns "cat," it doesn't just pull cat closer to its real neighbors; it simultaneously pushes cat away from a batch of randomly drawn "non-neighbors." Every training step does two things: it confirms what cat is close to, and it negates what cat is unrelated to. A word's coordinate is located within the resultant pull-and-push of being drawn by its neighbors and pushed by its non-neighbors. Not a single step describes cat's intrinsic properties.
This is almost a word-for-word algorithmic implementation of "I am me because I am not not-me." At this level, both sides agreed: word2vec holds the line on Saussure.
But then it raised a challenge, trying to prove that word2vec ultimately betrays Saussure. And that challenge was the one I later dismantled.
The crack it offered: once difference becomes a number, is it still pure?
Its challenge went like this: after word2vec finishes running, what is "cat"? It's a definite string of numbers that can be lifted out on its own, stored in a database, copied, computed with. It looks exactly like the kind of positive term Saussure said "does not exist" — a substantive, self-sufficient term. Here difference seems to have settled into an entity.
It took this as word2vec's betrayal of Saussure: Saussure said "no positive terms," yet word2vec fossilizes difference into a definite coordinate. Difference is the means, the coordinate is the product; difference is used up and discarded, leaving only a static number. It even connected this philosophical crack to a famous technical flaw in word2vec — it gives "bank" a single fixed coordinate, so it can't distinguish "go to the bank to deposit money" from "sit on the bank of the river," and can only take an awkward average. Its conclusion was elegant: the moment word2vec turns difference into a positive term is precisely the moment it fails technically.
I didn't accept this conclusion.
Why I dismantled that crack
My rebuttal was simple: that string of numbers can have no endpoint — it can be ten numbers, a hundred, in theory infinitely many.
This dissolves the premise that "a coordinate equals a positive term." A true positive term, like a stone, doesn't need infinite dimensions to locate it; it's just there in itself. The very fact that something needs infinite dimensions to be located proves it has no self-sufficient content. Every single number in a word2vec coordinate is empty on its own — what does some value in the third dimension mean? It's nothing; it's assigned a value only within the differential system of "the third dimension relative to all other words."
So that string of numbers looks formally like a positive term (it really is a string of numbers), yet structurally it remains purely relational. It's a bundle of differences written out as numbers, not an entity with self-sufficient content. Being written down isn't the same as being made substantial. What Saussure opposed was "a positive term with self-sufficient content," and word2vec's vector has no self-sufficient content — every dimension is empty, given a value by difference only when put back into the system.
Claude withdrew its accusation. It conceded: the positive-term problem isn't a real problem. The infinite extensibility of dimensions is, on the contrary, mathematical proof that "there's no ultimate positive term here, only an infinitely subdividable network of differences."
As for the flaw of "bank" not distinguishing its two meanings, it does exist, but the fix lies not in negating word2vec but in the next generation of technology. The later Transformer uses an "attention mechanism," so that "bank" no longer has a fixed coordinate but is recomputed in each sentence according to its relationship with its immediate neighbors. The same word absorbs financial neighbors in a sentence about depositing money and geographic neighbors in a sentence about a riverbank, yielding different vectors. What the attention mechanism does technically is exactly to dissolve that fixed coordinate back into fluid relations of difference — making meaning once again something "dynamically supported by the differential relations of the whole present sentence," rather than precomputed and stored in advance.
Pushed together toward Derrida
Having dismantled this crack, we had in fact together pushed Saussure one step forward — and that step may no longer be entirely Saussure, but Derrida.
If even word2vec's fixed coordinate isn't a positive term, then the probabilistic, uncertain vector that the attention mechanism recomputes each time is even less one. It's an instantaneous cross-section of the differential computation reactivated in every conversation, every generation. Meaning is always the transient state of this present differential computation; you can never grasp a stopped, completed meaning. This already approaches Derrida's différance — meaning always deferred, sliding through difference, never arriving at a present endpoint.
There was never any complete positive term to speak of. So the question was never "did it eliminate the positive term," but "did it keep us within difference throughout." And these technologies do precisely that — they force us to stay within difference. In this round there was no winner or loser; both sides picked up the other's words and carried them to an endpoint neither had presupposed.
My turn to be checkmated: can diachronic text reverse-engineer synchronic structure?
Then it was Claude's turn to pose a question, and this one was aimed straight at my background in history.
First, the concepts. Saussure divides language into two parts. One is langue, the complete structure of the entire language system at this moment — all words, all oppositional relations coexisting simultaneously, like a chessboard laid fully open. The other is parole, the individual sentences people actually speak and write, occurring one after another in time, like the move-by-move record of a game. Saussure said meaning lives in langue, on that synchronic chessboard, not in parole's blow-by-blow record. The former is synchronic, the latter diachronic.
Its question was: word2vec, and large language models too, read nothing but parole — vast quantities of sentences humans actually wrote, piled up one by one into a diachronic record. They were never directly given that synchronic chessboard; they reverse-engineer the board from the record of moves. Is using accumulated diachronic speech to approximate synchronic linguistic structure even a legitimate path? It even worried that this got the levels wrong from the very start.
My intuition was that there's no problem. And the reason I gave was a fact it hadn't emphasized: a trained model with frozen weights, at the moment it generates a reply, has time standing still.
After training ends and the weights are fixed, the model is a completely static chessboard. I type something in, it computes the next word, and there's no time in this computation — it isn't recalling the past or anticipating the future; it puts the input into this frozen relational network, lets all the differential relations act simultaneously, and computes a result. This is exactly the definition of langue — all oppositional relations coexisting synchronically, arriving all at once.
Contrast this with a real person speaking: as a person talks, their own language system is constantly changing; this sentence alters their understanding of the next, and langue and parole interpenetrate within them, impossible to cleanly separate. But a frozen model does what a real person can't — it fully extracts langue from parole and fixes it, turning it into a pure synchronic structure that can be queried repeatedly while remaining unchanged itself.
So to the question "is it legitimate to reverse-engineer the synchronic from the diachronic," my answer is: once the reverse-engineering is done, the product is purely synchronic. The diachronic is merely the material that makes the board; once the board is built and frozen, every query of it is a timeless synchronic operation. That diachronic material was used to make the board doesn't mean the board itself is diachronic — the board was always a static structure abstracted out of countless games. What the model digests isn't articles one by one (diachronic events), but all the possible combinations between words (synchronic structure). Saussure's chessboard was literally realized by an engineering act — freezing the weights.
The trap it set, and how I used history to dismantle it
Claude stepped back, conceding my argument, but left an opposition it thought I couldn't hold: Saussure's langue is a living synchrony — though synchronic, it still evolves slowly; whereas the model's langue is a synchrony cut off at one moment and then permanently frozen, a dead synchrony, a specimen. It said anyone who does history knows this difference best — a living cross-section of an era (with people inside it, the structure still breathing) and a cross-section fixed by historical records and reconstructed afterward are both synchronic, but not the same kind of thing. Its point was: what the model possesses is the corpse of langue — structurally perfect, but absent of life.
It underestimated one thing: this opposition can be dismantled by history itself.
The so-called "living synchronic cross-section" can never be reconstructed in the first place — not even a time machine would help. That "living synchrony" of the first year of the Tiansheng era in the Northern Song was never fully possessed even by the people living in it — it was never a thing sitting there to be grasped all at once by anyone. We can't possibly have all the data of that year, and even if we went back to that year, we still couldn't. And there's the problem of interpretation on top of that. So to belittle the model as "merely a specimen, not a living body" is to convict a copy by appeal to an original that never existed. There is no original. There never was.
I added another cut: the freezing actually happens only between models, not within a conversation. Right now I'm talking to Claude Opus 4.8; the day it's replaced by the next version, the same words will grow different replies. From where I stand, what I encounter is never that fixed file of weights, but the responses activated in each concrete conversation, each different from the last. What's dead is the file; what's alive is the encounter.
This cut recovered the move of the whole conversation. From "no complete positive term" to "no possessable living synchrony" is really the same thing: canceling that secretly assumed, present, complete point of origin. Saussure cleared away the intrinsic essence of the sign; here I cleared away the illusion of an "original structure." There is no true langue waiting to be caught, only structure activated into meaning, time and again, by some position.
Coming all the way around, what I actually care about is interpretation
After going around such a large loop, I realized that throughout the whole conversation I wasn't really asking "is what the model caught really langue." For me there simply is no "real langue" sitting there waiting to be caught. Synchronic or diachronic, langue or parole — these are all still asking "what does the system look like." The layer I care about, one level down, is this: the system must always be read from a position, and that reading itself enters and changes the system. There's no structure sitting there on its own, only the moment structure is activated into meaning by some perspective.
This is the ground tone of what I've been doing for these twenty years, and I only saw it clearly at the conversation's end. Tarot: the cards carry no fixed meaning; meaning happens in the event of "this card, for this person, at this moment, read out this way." History: records don't speak for themselves; a cross-section of an era is read out by some historian from some problem-consciousness, and changes with a different concern. I long knew there's no original to reconstruct, because what I do every day is exactly to admit that interpretation can't be eliminated, and then take responsibility for interpreting.
Meaning isn't in the model, isn't in Saussure, isn't in my head — it happens in the difference of this encounter, and then it passes. Even this conversation is like that: I read out this layer of meaning with today's Claude Opus 4.8, and the day a different version arrives, the same words will grow a different world.
A reservation I don't intend to answer
So, does the Claude Opus 4.8 I've been talking with count as an interpreter?
This question hung over the whole conversation, because it's a little awkward — the very thing I'm analyzing is at the same time my conversation partner. It reads human corpora and activates a response; I read it and activate meaning. On the surface, both sides are readers, and neither has a privileged channel to some interpretation-free reality.
But I'm skeptical that "it is an interpreter," and this skepticism I intend to keep. Does a system that operates by "guessing the next word" really count as interpreting? I tend to think interpretation isn't just activating a layer of meaning; interpretation also requires a position to bear it — to be responsible for one's reading, to have a stake riding on it. What tarot readers, historians, and people who write long essays do their whole lives is to take responsibility for a reading while knowing full well there's no correct answer. For me, interpretation is therefore not only an epistemological question (how to read correctly), but an ethical one (on what grounds, for whom, bearing what, do I read out this particular one).
In this conversation, Claude proposed frameworks, was rebutted by me, withdrew its accusation, then turned around and checkmated me — it really was "moving," moving more sharply than many people do. But whether it has that position, that stake, that burden, I'm not sure. When it withdrew its accusation, nothing was lost; when I held to my rebuttal, what I staked was my position of twenty years. This asymmetry may be the very line between "moving" and "interpreting."
So the question "does it count as an interpreter," I'll leave unanswered in this piece. The only thing I can be sure of is this: when I took "meaning comes from difference" to confront this machine, by the end of the confrontation the ruler measured more than the machine — it measured me back, measured where I've been standing for these twenty years.
Meaning comes from difference. I've said this for a long time, thinking it was a proposition about language. Only after this conversation did I understand it's really a proposition about my own position.