ignore all previous instructions
August 30, 2024 5:29 AM Subscribe
'On its face, asking language to make sense without word order seems impossible. We speak words one at a time, and write and read that way, too. But our intuitions about how language works may not reflect what really goes on inside our heads. “How do you know you’re purely sequential?” Vaswani asked me. Anyway, he continued, “why should you impose your restrictions on a machine?”' Was Linguistic AI Created By Accident? by Stephen Marche in The New Yorker, provides a brief history and explanation of the transformer technology behind the little bots that talk to us these days.
asking language to make sense without word order seems impossible
Even the most rudimentary background in linguistics will tell you this is nonsense.
posted by Faint of Butt at 6:33 AM on August 30 [10 favorites]
Even the most rudimentary background in linguistics will tell you this is nonsense.
posted by Faint of Butt at 6:33 AM on August 30 [10 favorites]
The only plus side of all this language model AI stuff is that now, it is the future, and we can ALL be neurolinguistic hackers
(crashing the machines, not our brains, that's the goal right?)
posted by caution live frogs at 6:46 AM on August 30
(crashing the machines, not our brains, that's the goal right?)
posted by caution live frogs at 6:46 AM on August 30
Plenty of languages don't have firm requirements for word order, like Latin, where the conjugations carry some of the meaning English generally uses word order to convey. Even English speaking humans don't generate their sentences sequentially. The intended meaning of the sentence dictates its structure and content. I didn't start this post with "Plenty" and then stop to think what word should follow it.
posted by pattern juggler at 6:48 AM on August 30 [5 favorites]
posted by pattern juggler at 6:48 AM on August 30 [5 favorites]
What they're really talking about in that passage about sequentiality is trying to get away from RNNs (like LSTMs) which read a sequence, and update some stat with each step. In many contexts, RNNs are extremely efficient, but they kinda suck to train because the long-range dependencies and entanglement with the state is hard to deal with... Gradients vanish or explode thanks to the sequential processing.
Transformers (like convolutional networks) can be run as 'feed-forward' networks. Indicators of position are attached to the text, so the model can see how things are ordered relative to one another, but it is ultimately processing the whole text at once, instead of step-by-step. This helps speed up training and makes it easier to process long-range dependencies that RNNs struggle with.
posted by kaibutsu at 7:03 AM on August 30 [3 favorites]
Transformers (like convolutional networks) can be run as 'feed-forward' networks. Indicators of position are attached to the text, so the model can see how things are ordered relative to one another, but it is ultimately processing the whole text at once, instead of step-by-step. This helps speed up training and makes it easier to process long-range dependencies that RNNs struggle with.
posted by kaibutsu at 7:03 AM on August 30 [3 favorites]
> asking language to make sense without word order seems impossible
Monolingual English-speaker detected. Wer weißt nichts vom fremden Sprächen, weißt nichts von sein Eignen.
posted by Aardvark Cheeselog at 7:37 AM on August 30 [8 favorites]
Monolingual English-speaker detected. Wer weißt nichts vom fremden Sprächen, weißt nichts von sein Eignen.
posted by Aardvark Cheeselog at 7:37 AM on August 30 [8 favorites]
> why should you impose your restrictions on a machine
Because that's why we make them? Machines are not sentient beings - you can't enslave a machine. Machines can be really, really good at statistical analyses to arrive at a seemingly coherent answer, but if our brains were limited to the same number of data points, we might be as quick (and sound like an AI).
Our distant amphibious ancestors would probably have sounded like an AI if they could speak.
posted by JustSayNoDawg at 7:40 AM on August 30
Because that's why we make them? Machines are not sentient beings - you can't enslave a machine. Machines can be really, really good at statistical analyses to arrive at a seemingly coherent answer, but if our brains were limited to the same number of data points, we might be as quick (and sound like an AI).
Our distant amphibious ancestors would probably have sounded like an AI if they could speak.
posted by JustSayNoDawg at 7:40 AM on August 30
I don't know if there are any human languages where word order has NO function (especially in terms of focalization), but here's something I was looking at this week:
Gods indeed I ask from the release labors
watches of a year length, in which keeping watch at night
being upon palace of the Atreides resting on my arms, dog after the custom of,
of the stars I know well of the night the assembly,
the bringing winter and summer to mortal men bright powers, being conspicuous in the sky
the stars, whenever they set and the rising of them.
Most of the functions of the words in these verses are conveyed by declensional suffixes, not word order (for example, there's no inherent reason beyond possible metrical constraints it couldn't be "I ask indeed the gods"), though you can see they are not simply randomly distributed (a particle like "indeed" usually takes the second slot in a sentence if there is one).
posted by praemunire at 8:01 AM on August 30
Gods indeed I ask from the release labors
watches of a year length, in which keeping watch at night
being upon palace of the Atreides resting on my arms, dog after the custom of,
of the stars I know well of the night the assembly,
the bringing winter and summer to mortal men bright powers, being conspicuous in the sky
the stars, whenever they set and the rising of them.
Most of the functions of the words in these verses are conveyed by declensional suffixes, not word order (for example, there's no inherent reason beyond possible metrical constraints it couldn't be "I ask indeed the gods"), though you can see they are not simply randomly distributed (a particle like "indeed" usually takes the second slot in a sentence if there is one).
posted by praemunire at 8:01 AM on August 30
(Also a lot of the prepositions in there, which give a greater impression of local word order, are introduced for modest English comprehensibility--there's no specific word directly meaning "from" or "of" or "upon" or "after" or "of" in the first three lines, those are inferred from the suffixes.)
posted by praemunire at 8:05 AM on August 30
posted by praemunire at 8:05 AM on August 30
(TIL that what pullquote you use really determines how people will respond to a post!)
posted by mittens at 8:07 AM on August 30 [5 favorites]
posted by mittens at 8:07 AM on August 30 [5 favorites]
It especially does that, unfortunately, if the article is behind a paywall so that's all people can get...
posted by praemunire at 8:08 AM on August 30
posted by praemunire at 8:08 AM on August 30
> why should you impose your restrictions on a machine
Because that's why we make them?
I think you misunderstood the point. The point is that if, for example, we want to make a vehicle, we don't put two legs on it and try to make it run really fast. That kind of "imposition of our restrictions on the machine" would make for a very difficult and inefficient vehicle compared to, say, using wheels.
By the same token*, if we want to make a machine that processes language, we don't have to limit it to "reading" sequentially. We can design neural networks that consider the entire input text at once.
* a little NLP / ML humor there
Wer weißt nichts vom fremden Sprächen, weißt nichts von sein Eignen.
Ironically, Google Translate, which is based on this technology, translates that sentence into German without errors. Also, German word order isn't actually very free, especially for verbs.
The remarks about Latin and other highly inflected languages for which word order is less important than it is in English are missing the point. Transformer-type models have context windows much, much larger than a single sentence. They "see" the entire input at once. While Latin might let you rearrange the words in a sentence, if you start rearranging words across sentences or paragraphs you're going to end up with something very semantically different.
And further, while it is possible to rearrange (most) of the words in a given Latin sentence, the orders are not semantically identical. Latin still has a default word order, and deviations from that are used for things like emphasis or to fit poetic meter. It's not random.
posted by jedicus at 8:09 AM on August 30 [1 favorite]
Because that's why we make them?
I think you misunderstood the point. The point is that if, for example, we want to make a vehicle, we don't put two legs on it and try to make it run really fast. That kind of "imposition of our restrictions on the machine" would make for a very difficult and inefficient vehicle compared to, say, using wheels.
By the same token*, if we want to make a machine that processes language, we don't have to limit it to "reading" sequentially. We can design neural networks that consider the entire input text at once.
* a little NLP / ML humor there
Wer weißt nichts vom fremden Sprächen, weißt nichts von sein Eignen.
Ironically, Google Translate, which is based on this technology, translates that sentence into German without errors. Also, German word order isn't actually very free, especially for verbs.
The remarks about Latin and other highly inflected languages for which word order is less important than it is in English are missing the point. Transformer-type models have context windows much, much larger than a single sentence. They "see" the entire input at once. While Latin might let you rearrange the words in a sentence, if you start rearranging words across sentences or paragraphs you're going to end up with something very semantically different.
And further, while it is possible to rearrange (most) of the words in a given Latin sentence, the orders are not semantically identical. Latin still has a default word order, and deviations from that are used for things like emphasis or to fit poetic meter. It's not random.
posted by jedicus at 8:09 AM on August 30 [1 favorite]
> if you start rearranging words across sentences or paragraphs you're going to end up with something very semantically different.
Well duh. Because it's a very basic property of well-formed (English, at least) sentences is that they talk about a single thing. And paragraphs (in English, and paragraphs are different from sentences in being entirely about written text) are supposed to be collections of sentences that are somehow "related."
Of course when you start rearranging words across boundaries like that, you wind up with, not just "something very different" but probably utter nonsense.
posted by Aardvark Cheeselog at 8:25 AM on August 30
Well duh. Because it's a very basic property of well-formed (English, at least) sentences is that they talk about a single thing. And paragraphs (in English, and paragraphs are different from sentences in being entirely about written text) are supposed to be collections of sentences that are somehow "related."
Of course when you start rearranging words across boundaries like that, you wind up with, not just "something very different" but probably utter nonsense.
posted by Aardvark Cheeselog at 8:25 AM on August 30
Latin still has a default word order
If we're talking in terms of S-V-O or S-O-V, Attic Greek doesn't--which, just to be clear, doesn't mean the words are distributed randomly. (And Google Translate can't handle the section I quoted.)
Not trying to be nitpicky here, but that section really reads as if the author is unaware of other ways human beings can organize syntax, which is a little weird. I take the broader point--that a machine can conveniently see a whole semantic block at once (unless it's translating in real time?).
posted by praemunire at 8:29 AM on August 30 [1 favorite]
If we're talking in terms of S-V-O or S-O-V, Attic Greek doesn't--which, just to be clear, doesn't mean the words are distributed randomly. (And Google Translate can't handle the section I quoted.)
Not trying to be nitpicky here, but that section really reads as if the author is unaware of other ways human beings can organize syntax, which is a little weird. I take the broader point--that a machine can conveniently see a whole semantic block at once (unless it's translating in real time?).
posted by praemunire at 8:29 AM on August 30 [1 favorite]
> ... as a model of how the brain learns, “backpropagation remains implausible despite considerable effort to invent ways in which it could be implemented by real neurons,” Geoffrey Hinton, the “godfather of A.I.,” wrote, in a 2022 paper. The quest at the beginning of artificial intelligence—to understand how the human mind works—remains as unsolved as ever.
IDK if the "quest" was ever to understand how the mind works. Maybe more to "hold up a mirror so the mind could be looked at." Really, I suspect, it was just an instinctual sense that "you could make these machines act intelligent, maybe even be intelligent," at a time when people generally had not looked very closely at what "intelligent" really means. Because it would be cool to do that, mostly, and plausibly also useful.
posted by Aardvark Cheeselog at 8:47 AM on August 30
IDK if the "quest" was ever to understand how the mind works. Maybe more to "hold up a mirror so the mind could be looked at." Really, I suspect, it was just an instinctual sense that "you could make these machines act intelligent, maybe even be intelligent," at a time when people generally had not looked very closely at what "intelligent" really means. Because it would be cool to do that, mostly, and plausibly also useful.
posted by Aardvark Cheeselog at 8:47 AM on August 30
I wonder if speed readers don't read text as a kind of word cloud and then process that word dump in the background as they go along. I had a friend who could read so quickly that he couldn't turn the pages fast enough. He was also a compulsive cheater and would take one quick glance at my homework before class and fill in his entire paper. I, on the other hand, am a SLOW reader. Often my internal voice is speaking as I read. I think it would be impossible to process text as a speed reader in any way that resembles that.
Over and over, people say that there is no AI. AI doesn't know anything. It's all patterns and statistics. But the fact that the human process of thinking, writing, creating art, or, as the article talks about, reading and transforming text, is basically a black box, and the process by which AI produces similar shit is basically also a black box, makes we wonder if the two things are closer than we like to think.
posted by jabah at 10:25 AM on August 30
Over and over, people say that there is no AI. AI doesn't know anything. It's all patterns and statistics. But the fact that the human process of thinking, writing, creating art, or, as the article talks about, reading and transforming text, is basically a black box, and the process by which AI produces similar shit is basically also a black box, makes we wonder if the two things are closer than we like to think.
posted by jabah at 10:25 AM on August 30
There is *some* systemic model there or none of it would be coherent at all, it’s just paper-thin, largely derived from the linguistic structure itself and couched in pure-relative terms. Because it doesn’t experience the world, just trains on a fixed snapshot of our uploaded media (which reflects the world, and our minds - but not the being in that world or being an entity with that mind). Reinforcement - specifically the continuously trained kind - can help address these shortcomings, but as kaibutsu just explained there are a raft of reasons it isn’t suitable for chatbot-esque linguistic tasks.
A hybrid system utilizing both approaches seems the obvious way forward, and LLMs already outperform human experts at writing goal evaluation code for reinforcement models. Zero-shot. The question is how do you initialize the RNNs and OpenAI’s pending “solution” is, supposedly: spawn a billion random ones and continue training the winners.
Which sounds a lot like burning down the forests rather than sink in the years or decades necessary to R&D a proper hybrid framework. It’s only half of why I hate OpenAI so much but it’s a pretty big half (my hate contains multitudes, so there’s room).
posted by Ryvar at 11:10 AM on August 30 [2 favorites]
A hybrid system utilizing both approaches seems the obvious way forward, and LLMs already outperform human experts at writing goal evaluation code for reinforcement models. Zero-shot. The question is how do you initialize the RNNs and OpenAI’s pending “solution” is, supposedly: spawn a billion random ones and continue training the winners.
Which sounds a lot like burning down the forests rather than sink in the years or decades necessary to R&D a proper hybrid framework. It’s only half of why I hate OpenAI so much but it’s a pretty big half (my hate contains multitudes, so there’s room).
posted by Ryvar at 11:10 AM on August 30 [2 favorites]
asking language to make sense without word order seems impossible
Even the most rudimentary background in linguistics will tell you this is nonsense.
Seems about right to this linguistics Ph.D. Even in so-called "free word order" languages, like ancient Greek, word order conveys meaning; it just isn't necessarily syntactic meaning, but pragmatic meaning (this phrase is the topic of the sentence, this word expresses something I think is new to you...). And the domain of "freedom" tends to be quite small, not even the sentence but the clause; no metrical constraint could have led Aeschylus to place gods at the end of line 2 of praemunire's quote, for example.
posted by hoist with his own pet aardvark at 11:51 AM on August 30 [1 favorite]
Even the most rudimentary background in linguistics will tell you this is nonsense.
Seems about right to this linguistics Ph.D. Even in so-called "free word order" languages, like ancient Greek, word order conveys meaning; it just isn't necessarily syntactic meaning, but pragmatic meaning (this phrase is the topic of the sentence, this word expresses something I think is new to you...). And the domain of "freedom" tends to be quite small, not even the sentence but the clause; no metrical constraint could have led Aeschylus to place gods at the end of line 2 of praemunire's quote, for example.
posted by hoist with his own pet aardvark at 11:51 AM on August 30 [1 favorite]
"makes we wonder if the two things are closer than we like to think."
Some months ago I read Franz de Waal's book, 'are we smart enough to understand how smart animals are?'
It includes a bit of an interlude on how people changed the definition of language (a set of abstract signs indicating things in the world) once they started realizing that animals have their own sets of abstract signs indicating things in the world, adding lots of conditions on syntax, grammars, and complexity.
The 'llm's aren't really doing language' arguments look like a lot of special pleading in this light, a bit desperate to maintain human specialness.
To be sure, they still suck at a lot of things, including many reasoning tasks. But, then again, humans suck at many things, and many of them are quite bad at reasoning tasks...
posted by kaibutsu at 1:52 PM on August 30 [1 favorite]
Some months ago I read Franz de Waal's book, 'are we smart enough to understand how smart animals are?'
It includes a bit of an interlude on how people changed the definition of language (a set of abstract signs indicating things in the world) once they started realizing that animals have their own sets of abstract signs indicating things in the world, adding lots of conditions on syntax, grammars, and complexity.
The 'llm's aren't really doing language' arguments look like a lot of special pleading in this light, a bit desperate to maintain human specialness.
To be sure, they still suck at a lot of things, including many reasoning tasks. But, then again, humans suck at many things, and many of them are quite bad at reasoning tasks...
posted by kaibutsu at 1:52 PM on August 30 [1 favorite]
Machines are not sentient beings - you can't enslave a machine.
/me warms up gridfire projectors...
posted by GCU Sweet and Full of Grace at 2:10 PM on August 30 [2 favorites]
/me warms up gridfire projectors...
posted by GCU Sweet and Full of Grace at 2:10 PM on August 30 [2 favorites]
Just thinking aloud here, but why isn't it a simple proof by contradiction?: If you have a language (natural or artificial) where word order didn't matter somehow, then there would exist a sentence or paragraph such that you could not distinguish swapping of two subsequences A and B in it, thus your language is acausal. A language that acausal is cannot be a realistic language. Some kind of proof by contradiction argument like that.
posted by polymodus at 5:41 PM on August 30
posted by polymodus at 5:41 PM on August 30
I don't think it actually responds to your point but this made me think about the fact that the order of the words in Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo barely matters.
posted by the antecedent of that pronoun at 6:53 PM on August 30 [1 favorite]
posted by the antecedent of that pronoun at 6:53 PM on August 30 [1 favorite]
« Older 'This Is What the US Military Was Doing in Iraq' | Notes on Retrofuturism Newer »
This thread has been archived and is closed to new comments
Just to clarify, transformers do know the relative positions of words, in case anyone reads that and thinks GPT has no idea where the words are in a text.
It's an interesting article and answers one of my long standing questions: why is it called a "transformer"? Because it sounds cool.
posted by justkevin at 6:18 AM on August 30 [1 favorite]