Large Language Models: a useful summary
August 7, 2023 3:48 PM   Subscribe

Weird World of LLMs is a concise, true, and very funny rundown of the actual technology behind the latest Internet Gold Rush. (from Simon Willison)
posted by panglos (21 comments total) 48 users marked this as a favorite
 
The How They're Trained section is a great explanation of the Plagiarism As A Service monicker.
posted by panglos at 4:08 PM on August 7, 2023 [1 favorite]


Nice to see this presentation on the Blue! I am not on Mastodon, but I added Simon Willison's account to my RSS reader, and following his work has been interesting: https://fedi.simonwillison.net/@simon
posted by icebergs at 4:41 PM on August 7, 2023 [2 favorites]


Very interesting, thanks for posting this!
posted by dg at 7:22 PM on August 7, 2023


Of all the GPT4 generation models, I've been the most amazed, (hair standing up on my arms, spooked) by Pi. The level of EQ and relatability was more than uncanny. https://pi.ai/talk?scr
posted by Dag Maggot at 8:38 PM on August 7, 2023 [4 favorites]


Good article! Informative and not sensationalistic. I wonder how many other LLMs are trained on pirated, copyrighted material given that the companies are so cagey about their training data.
posted by TheophileEscargot at 11:26 PM on August 7, 2023


Thanks Dag Maggot!
posted by jragon at 11:38 PM on August 7, 2023


Simon is here on Metafilter.
posted by vacapinta at 12:23 AM on August 8, 2023 [3 favorites]


Great reporting, great explainer, thanks for posting.

We know so little about the capabilities of the mysterious program on the thumb drive. The press sensationalized the wrong side of things.

Fascinating to learn about prompt injection!! I'll not be hooking an ai to my accounts any time soon - or if I do, it will be a quarantine.

I want to learn to harness the capabilities to help me write code - but I don't know how to write code. It sounds like a tool that helps people take shortcuts, but requires someone who can grade the output.

I see that when you train on a dataset - your own blogs, Shakespeare, etc. - it doesn't do so good. Randomness that sounds right, without having value.

I am shocked to learn that the words are merely tokens, and the computer doesn't consider their meaning, only their frequency of combination. But, is that different than me? My words aren't numbers, but they are squeeks and hoots and grunts that, when strung together, have meaning. As I read this section, I swung between "it's fake" to "I'm fake".
posted by rebent at 5:34 AM on August 8, 2023 [6 favorites]


I am shocked to learn that the words are merely tokens, and the computer doesn't consider their meaning, only their frequency of combination.

But that’s the whole point, right? LLMs aren’t considering meaning and have no idea of… well, anything. There is no thinking going on, just number processing and our tendency toward anthropomorphism. Nothing an LLM spits out has any meaning beyond what the human observer assigns it.

I realize this is very basic, but it has to be repeated a lot.
posted by GenjiandProust at 6:04 AM on August 8, 2023 [16 favorites]


"Money laundering for copyrighted data" is such a perfect description.
posted by mediareport at 6:18 AM on August 8, 2023 [5 favorites]


Re pi.ai, I had quite a plausible conversation with it re "Andor". Then I played it at chess, and it crapped out quite quickly - I guess they haven't hooked it up with Fritz or Lichess yet.
posted by domdib at 6:22 AM on August 8, 2023


LLMs aren’t considering meaning and have no idea of… well, anything. There is no thinking going on, just number processing and our tendency toward anthropomorphism. Nothing an LLM spits out has any meaning beyond what the human observer assigns it.

Yeah, but that applies to a lot of people as well.
posted by AlSweigart at 6:32 AM on August 8, 2023 [4 favorites]


There were more “this is a terrible error that has to be recognized by domain expertise” there than would be compatible with my enthusiasm for the tool. Excellent to have them laid right out though.

Possibly I’m unenthusiastic about LLMs because generating plausible strings out of undigested stuff is relatively easy for me (check my comment history, try the veal) and I don’t trust the results. There’s a tone of voice somewhere between platitude and parrot that I hear in myself and others that makes me suspicious.
posted by clew at 6:47 AM on August 8, 2023 [3 favorites]


I want to learn to harness the capabilities to help me write code - but I don't know how to write code. It sounds like a tool that helps people take shortcuts, but requires someone who can grade the output.

The grade is running the code.

LLMs are really about perfect for learning code. You can ask it any dumb question you’d like and it is never annoyed. Back many years ago when we were young and the Internet was new and shiny, I had so many people I could ask (and could ask me) the dumb questions and I learned so much more. Now, even finding an available audience around to answer a coding question is quote the challenge. The embarrassment of asking the dumbest questions I technically could look up but would take a lot of time? Ugh. No.

But chat got helped me write an iOS app in a few hours. True, have no idea what code meant what- but having only done web dev. It gave me the start in setting up Xcode to make an app, and the framework to think about. It got me started when I had dragged my feet on learning iOS app dev FOREVER.

One of the questions I had was along the lines of “I have an idea for an app which is X. What would be some ideas of a basic version of that app to accomplish the main goal of Y? What should I think about when trying to build this app?” And then drilled down with even dumber questions until I was happy with the test idea. Then I asked it how to do it, from setting up xcode and the app project, to the specific code.

It got a few things wrong, I got errors, I told it that and it corrected itself. I also had a few things that it described I couldn’t find that I asked for clarification on a lot sooner than I would have with a person.

So really, if you’re looking to learn code, it’s like the perfect programmer you can bug with ANY question. It has infinite patience. And when it’s wrong, you figure it out quickly, which helps with learning anyway as you correct the error. Seriously, go for it.
posted by [insert clever name here] at 6:50 AM on August 8, 2023 [4 favorites]


I guess they haven't hooked it up with Fritz or Lichess yet.

That's part of what the presentation is trying to say. It's fancy autocomplete. Is has no means of searching ahead of the next token unless it knows what you are going to say.

If an LLM, right now, was going to be able to play chess it would have needed to be trained on every possible board with every possible move and then somehow learn how to weight and predict some outcomes over others. Prediction isn't what an LLM does otherwise we'd all be trying to put stock market data into it oh wait
posted by JoeZydeco at 7:22 AM on August 8, 2023 [4 favorites]


MetaFilter: squeeks and hoots and grunts
posted by elkevelvet at 9:51 AM on August 8, 2023 [2 favorites]


That experience learning to code mirrors my experience using ChatGPT to improve my writing. As a domain expert, I don't need help with the factual part, but it's nice to be able to generate variations on how to present information that I can use for inspiration. Even when it starts hallucinating, it's helping me identify some information that I probably need to include.

This is funny because the big fear seems to be about LLMs ruining education, but it seems like it's already helping people learn in new ways. Yesterday I read "Here Comes the Second Year of AI College" in the Atlantic, and it hit the nail on the head:

“It’s not going to be the big, destructive force that we think it’s going to be anytime soon. Also, higher education will be completely unrecognizable in 15 years because of this technology. We just don’t really know how.”
posted by betaray at 10:58 AM on August 8, 2023


I am shocked to learn that the words are merely tokens, and the computer doesn't consider their meaning, only their frequency of combination.
Meaning and frequency of combination are closely connected. There was a paper published back in 2013 called Word2Vec where researchers started off with a table of word frequencies and used it to generate a representation of semantic relationships between words. Basically all you need do is train a neural network with a single hidden layer that takes a token as an input and outputs the probability of each other token in its vocabulary appearing with x positions of it in the text it has been trained on. After it has been trained, the weights of the hidden layer for each token form a vector that can be used to measure semantic similarities and relationships between words. You can use it for answering questions like 'man is to king as female is to _?" just by doing vector addition: female + (king - man) = queen (approximately). That's just with a single layer, modern LLMs have dozens. So, the fact that these things are trained on word frequencies doesn't mean they don't have internal representations of the meaning of words.
posted by L.P. Hatecraft at 4:31 PM on August 8, 2023 [5 favorites]


From HN, earlier today: What happened in this GPT-3 conversation?

It's like watching Dave Poole reboot HAL.
posted by JoeZydeco at 4:44 PM on August 8, 2023 [1 favorite]


It's got bugs.

Speaking of tokens, try asking it about JSBracketAccess.
posted by betaray at 9:33 PM on August 8, 2023


Also, higher education will be completely unrecognizable in 15 years because of this technology. We just don’t really know how.

It will be more expensive.
You think licenses for these tools are going to be cheap once the monetization starts in earnest?
posted by thatwhichfalls at 12:19 PM on August 10, 2023


« Older Clavicytherium construction   |   "Violence is not funny." Newer »


This thread has been archived and is closed to new comments