ChatGPT cooks up fake sexual harassment scandal & names real professor
April 8, 2023 9:41 AM Subscribe

From The Independent of April 6: "In an opinion piece published in USA Today, professor Jonathan Turley from George Washington University wrote that he was falsely accused by ChatGPT of assaulting students on a trip he never took while working at a school he never taught at. [...] In another instance, ChatGPT falsely claimed a mayor in Australia had been imprisoned for bribery. Brian Hood, the mayor of Hepburn Shire, has also threatened to sue ChatGPT creator OpenAI over the false accusations."

Also, there's this: ChatGPT is making up fake Guardian articles. Here’s how we’re responding

posted by Paul Slade (137 comments total) 22 users marked this as a favorite

While I don't agree with all his stances, I think that Ezra Klein's podcast has delved pretty deeply into this lately, and he seems maybe not preoccupied but at least focused on it.

One of the things he keeps bringing up, and I really like his phrasing, is that you can't tell when these systems are 'hallucinating' without deep knowledge of the information they're referencing.

He also draws lots of comparisons to AI like GPT tools and self driving cars, in the sense that regulatory systems should be mandating that these systems are safe, and that you have to prove they're safe (like with self driving cars) before they can be implemented and released. This obviously hasn't happened with self-driving cars because for us to trust them they have to be better than humans at driving.

These lawsuits should absolutely go forward, if anything to create momentum for a regulatory framework around these tools. They have such potential, and I've found them super useful in low-stakes settings, where it doesn't really matter if they hallucinate. I use GPT4 to assist with creating arcs and settings for a family DND campaign, because I'm not a super competent DM, and it is working EXTREMELY well when I am stuck. It is, like Klein illustrates, having a personal assistant that is competent with an area I am not as competent in. I can see these tools helping us humans a lot, but they obviously have some super clear downsides as they currently are being implemented and should not be relied on for anything consequential; the fact that they cannot tell you when they are hallucinating is a problem, because for how powerful these models are, they are not self aware, and many of the ways they're pulling information is hidden within proprietary black boxes.

For all my disagreements with Klein on a number of topics, I think he's got a pretty lucid take on what is happening with AI right now, and is showing a modicum of wisdom that I appreciate on the topic.
posted by furnace.heart at 9:59 AM on April 8, 2023 [29 favorites]

Jonathan Turley falsely accused? I know this is bad, but on the subject of bad things happening to Jonathan Turley I struggle to rustle up even a single shred of a single fuck.
posted by ivanthenotsoterrible at 10:00 AM on April 8, 2023 [8 favorites]

Evergreen toot from Carl Bergstrom about this and all other discussions of LLMs
posted by lalochezia at 10:01 AM on April 8, 2023 [12 favorites]

It's becoming pretty obvious that this technology is not fully baked yet.

But I was assured by this very website that anyone who denied that ChatGPT had already replaced all forms of content creation as “Old Man Yelling at Clouds”.
posted by Back At It Again At Krispy Kreme at 10:02 AM on April 8, 2023 [23 favorites]

The thing about these things is that theyre being marketed and widely perceived as “ask a question and it tells you the right answer” but what it’s actually doing is “ask a question and it shows you what a plausible answer would look like”
posted by theodolite at 10:04 AM on April 8, 2023 [96 favorites]

It's becoming pretty obvious that this technology is not fully baked yet.

People's understanding of what ChatGPT is and does is also just really lacking.

As an example, I moderate a subreddit for an academic topic, and for a while we were dealing with post after post by people who asked ChatGPT something that required some sort of data analysis or complicated intellectual synthesis of a topic to answer. They would come to us and ask, "Hey, look what I asked ChatGPT to do. Isn't it cool? Is it right?"

It's never cool and it's almost never right unless it's the type of thing that you could find in an introductory textbook (meaning lots of material to draw from, I'm assuming).

I mean, I don't really understand how ChatGPT works either, as I don't have a PhD in a relevant field, but I know enough to know that ChatGPT generates text, not understanding. It's not surprising when its output is inaccurate or even incoherent or contradictory. It should be expected, in fact.

But lots of people think that it is some sort of magic brain that can think and build a model of the world that it can use to answer difficult questions.
posted by Kutsuwamushi at 10:06 AM on April 8, 2023 [34 favorites]

I do have a PhD in a relevant field! ChatGPT generates text with inter-word statistics that are similar to the statistics of its training data. That's all! The fact that it reads as if there is intent is amazing because of what it reveals about the deep structure of language, not because it has any intellectual capacity. There's no intent, or thought, or cognitive through-line through the text it generates, any more that text prediction on my iPhone.

Of course it generates text that describes fake events if you prompt it to! Microsoft Word will let me make fake diplomas from Harvard, but that doesn't make word processors a threat to the Ivy League.
posted by riotnrrd at 10:11 AM on April 8, 2023 [100 favorites]

but how important are facts, really?
posted by ryanrs at 10:32 AM on April 8, 2023 [7 favorites]

Of course it generates text that describes fake events if you prompt it to!

There are a lot of kind of silly criticisms ("we asked ChatGPT to pretend to be a sociopath and it did, oh no") but in this case someone just asked it for some examples of real assault accusations and it produced a list that was partly real and partly fictional, and then Bing mistook a debunk for a confirmation of the story:

...Bing, which is powered by GPT-4, repeated the false claim about Turley — citing among its sources an op-ed by Turley published by USA Today on Monday outlining his experience of being falsely accused by ChatGPT.

In other words, the media coverage of ChatGPT’s initial error about Turley appears to have led Bing to repeat the error — showing how misinformation can spread from one AI to another.

It's probably bad that GPT-4 can't reliably tell the difference between an article reporting on fiction and an article reporting on fact. In this case, it has no excuse- it's been prompted with the USA Today piece, so it had every opportunity to do so.
posted by BungaDunga at 10:33 AM on April 8, 2023 [24 favorites]

I mean, I assume it was, otherwise how did it come up with that exact scenario and cite that link? I assume it was fed the contents of the USA Today piece. And if that's right, that's a bit of a challenge to the idea that these models can be useful in the way that Bing Chat seems to think they are. Like, if its summary of the USA Today article is "Jonathan Turley assaulted students" then that's it failing at the precise job that Bing Chat gave it.
posted by BungaDunga at 10:40 AM on April 8, 2023 [1 favorite]

The legal questions here are interesting. It seems to me that OpenAI would definitely not be liable under US law for defamation of a public figure (because poorly-controlled outputs can't possibly rise to the level of actual malice). Defamation of private figures, and the laws of other countries, would likely be a closer call.

I'm thinking that the currently-governing interpretation of CDA section 230 would also protect anyone republishing the defamatory content, even with actual malice (as long as they did so online). Which would leave someone in Turley's alleged position with seemingly no recourse at all.
posted by Not A Thing at 10:49 AM on April 8, 2023

“ Jonathan Turley falsely accused? I know this is bad, but on the subject of bad things happening to Jonathan Turley I struggle to rustle up even a single shred of a single fuck.”

Idk who is Jonathan turley but this response is exactly what’s required to slide this kind of function past everyone’s radar until it’s too late for everyone.
posted by toodleydoodley at 10:50 AM on April 8, 2023 [67 favorites]

Given the prevalence of human-generated misinformation, and the humans who gleefully receive and traffick it, even when it's not actually true, it doesn't seem that newsworthy that ChatGPT can produce misinformation. Not least because of the misinformation in its diet.

Also, who's taking the raw, unvalidated output of these free/cheap beta-test versions of LLMs trained on unvalidated web scrapings, as truth?
posted by Artful Codger at 10:55 AM on April 8, 2023 [2 favorites]

It's becoming pretty obvious that this technology is not fully baked yet.

Regardless of whether it is or not, it’s not going to stop amoral tech bros from unleashing it and radically transforming society for the worse, and we should all probably do some emotional work to be prepared for that.
posted by rhymedirective at 10:57 AM on April 8, 2023 [11 favorites]

These lawsuits should absolutely go forward, if anything to create momentum for a regulatory framework around these tools.

The entities that should be sued are the ones using or citing current chatbot output as definitive.
posted by Artful Codger at 11:01 AM on April 8, 2023 [8 favorites]

The entities that should be sued are the ones using or citing current chatbot output as definitive.

Isn't that like saying that you can't sue certain people for defamation because "everybody knows" they are liars? (Perhaps they define their show on the news channel as "entertainment.")
posted by anhedonic at 11:04 AM on April 8, 2023 [4 favorites]

There's probably been enough newspaper coverage to put a heavily cited paragraph about the controversy into Turley's wikipedia page, heh.
posted by ryanrs at 11:05 AM on April 8, 2023 [1 favorite]

...but on the subject of bad things happening to Jonathan Turley I struggle to rustle up even a single shred of a single fuck.

If he's as bad as you say there should be no need to use a word kaleidoscope to make up false things about him.
posted by thatwhichfalls at 11:10 AM on April 8, 2023 [11 favorites]

> Isn't that like saying that you can't sue certain people for defamation because "everybody knows" they are liars?

... are you equating beta-version chatbots with sentient humans? That's the problem I'm trying to highlight.

And of course, we're still playing with LLMs that had misinformation in their diets. They're not making stuff up, they're slicing and dicing stuff they were fed, with no scoring for truthfulness.
posted by Artful Codger at 11:13 AM on April 8, 2023 [1 favorite]

They're not making stuff up, they're slicing and dicing stuff they were fed

I mean, they definitely are, at least in the Bing case, because Bing Chat was demonstrably told that the story was not true and it repeated it anyway. That's even worse!
posted by BungaDunga at 11:17 AM on April 8, 2023 [2 favorites]

I think the philosophical point hinges on what “making stuff up” means. If I have a word list and a bunch of dice, roll the dice and write down the words, and it says “the moon is green and lives in Manchester” did the dice “make it up”? Did the dice and the word list together “make it up”? Did I?
posted by wemayfreeze at 11:23 AM on April 8, 2023 [7 favorites]

Sorry, I can't find a reference in the linked article to Bing "being told" something was false. Please help me out.

I don't know yet whether there exists a mechanism yet for "telling" LLMs whether something is true or false. It would certainly be great if they could be purged of false or misleading information, or some kind of final self-validation on their output.

Everyone agrees that this is a serious problem that must be tackled in order to use these things more broadly.
posted by Artful Codger at 11:27 AM on April 8, 2023 [2 favorites]

The term "hallucinate" seems to be doing a lot of damage.

I see people saying that the problem with these systems is that they can't tell when they're hallucinating. But of course they can't: internally, there's no difference between a hallucination and a fact. The systems will never be able to tell the difference. At best you will get systems that can cross-check against external sources, but even that will never be perfect.
posted by nosewings at 11:29 AM on April 8, 2023 [21 favorites]

Bing Chat repeated the Turley story and cited the Op-Ed about how it was nonsense.

My understanding of Bing Chat is that they've yoked GPT-4 up with Bing Search by pulling up the first few results of a search and feeding the text into GPT-4 so it can synthesize a result and also include links to the stuff it's summarizing. Prompting it this way is supposed to mitigate the base model's lack of up-to-date information and fuzzy memory for facts.
posted by BungaDunga at 11:32 AM on April 8, 2023 [1 favorite]

Is there a collection of LLM-related news along the lines of Web3 is going great yet? It's already becoming difficult to keep track of all of their detriments.
posted by bigendian at 11:35 AM on April 8, 2023 [12 favorites]

this is the tweet showing the prompt and result. that's really bad, arguably even worse than ChatGPT playing madlibs. Bing Chat googled (heh) "sexual harrassment by professors at american law schools", since the Turley piece is in the news recently that came up first, and then it totally misunderstood what the articles were saying. It's not that it was just fed a lot of misinformation to begin with, it was given all the facts and it still got turned around.
posted by BungaDunga at 11:37 AM on April 8, 2023 [8 favorites]

AI is great when you absolutely want an answer right now, won't accept any explanation, and don't care if it's right.

So you know, perfect for charlatans. When are we going to stop throwing money at these tech bros who idolize breaking things?
posted by SaltySalticid at 11:38 AM on April 8, 2023 [19 favorites]

with no scoring for truthfulness.

Part of the problem is that they don't really have a way to reliably self-score something as truthful or not, thanks to fundamental limitations of logic. Even people have trouble with that, especially if the information is new. I'd guess almost everyone has firmly believed something to be true about COVID-19 that later turned out to be false, for example.

These lawsuits should absolutely go forward

Sure, why not. I think the result of that is just going to be to wrap the output in shrink wrap. You agree as the user, to indemnify ABCGPT and accept responsibility for any damages that occur through your use or misuse of the service. You agree as the user that the service is only for entertainment purposes. In the event that you are entitled to damages, you agree to proceed by arbitration. Instead of

2+2 = 4

we'll start to see things like:

The information collected by ABCGPT is summarized as follows: 2+2 = 4 However, ABCGPT and ABCGPT Inc., cannot verify these claims as accurate. All results must be verified independently by the end user. This has been generated for your personal use and should not be forwarded to anyone, used in production, quoted, or otherwise promulgated without ascertaining its accuracy before doing so. [5 more paragraphs of disclaimers elided]
posted by xigxag at 11:40 AM on April 8, 2023 [4 favorites]

I use GPT4 to assist with creating arcs and settings for a family DND campaign

Yesterday, I needed an article for the Sharn Inquisitive (Eberron setting). I had my kid write the article, but I had ChatGPT write three other articles to place around it.

Now my handout looks pretty much like a page of a newspaper in a fantasy megacity, instead of a thing meant to represent a page of a newspaper in a fantasy megacity.
posted by gurple at 11:50 AM on April 8, 2023 [5 favorites]

However, ABCGPT and ABCGPT Inc., cannot verify these claims as accurate.

That'd be a great start, for sure.
posted by mediareport at 11:58 AM on April 8, 2023 [5 favorites]

Eugene Volokh at the Volokh conspiracy wrote a series of articles (for a paper) about the possibility of AI generated libel. (His stance was that it was liability waiting to happen.)
The first article's here.
All the articles, along with commentary on more recent events can be found here.
posted by Spike Glee at 12:02 PM on April 8, 2023 [4 favorites]

Jonathan Turley falsely accused? I know this is bad, but on the subject of bad things happening to Jonathan Turley I struggle to rustle up even a single shred of a single fuck.

Maybe you'll give a fuck when AIs learn to hallucinate swatting, say. And someone you know or care about in the real world gets gunned down by a cop, as a result.
posted by They sucked his brains out! at 12:04 PM on April 8, 2023 [25 favorites]

I’ve been hearing all this crap about “AI” like it’s the next big thing for quite a few years. Gonna drive us places better than we can, write term papers, do mental health counseling, trade stocks.

First thing that comes to mind every time when I hear those breathlessly delivered prophecies is “lol Windows ME”.
posted by cybrcamper at 12:06 PM on April 8, 2023 [2 favorites]

The fun thing about chatgpt is it literally never says it can't answer something. I went to Wikipedia's list of unsolved problems in math and physics and asked ChatGPT about a few of them. It gave me a positive and allegedly definitive solution for almost all of them. It's not even smart enough to use the knowledge in its model to infer that the right answer should be "this is an unsolved problem in mathematics". Instead it just happily offers an invalid proof (because it can neither think nor reason). So it's not surprising that it's libelling public figures.
posted by dis_integration at 12:09 PM on April 8, 2023 [36 favorites]

...and then it totally misunderstood what the articles were saying. It's not that it was just fed a lot of misinformation to begin with, it was given all the facts and it still got turned around.

You are assuming a lot: that these things genuinely "understand", and that it could correctly parse some prose as difficult as a written refutation of a false statement. Teaching "truth" to these things is likely a multi-pronged problem, and might involve constant output checking, and purging ingested misinformation and the wrong inferences that they lead to.

Again, there's no dispute that this is a problem.
posted by Artful Codger at 12:10 PM on April 8, 2023 [2 favorites]

I think at this stage in the game, AI-generated output like ChatGPT should be restricted to fields where precise truth is less important, like advertising or political speech. I believe the law already grants false speech more leeway in these areas.
posted by ryanrs at 12:18 PM on April 8, 2023 [1 favorite]

Also, who's taking the raw, unvalidated output of these free/cheap beta-test versions of LLMs trained on unvalidated web scrapings, as truth?

More people than you might think, I fear. Much of the general interest media coverage of ChatGPT and similar systems that I've heard has had a tone of wide-eyed credulity about it, offering no useful scepticism about the technology's limitations at all. The impression given to a lay audience is that these things are miracle robot intelligences from the future which know and tell all.
posted by Paul Slade at 12:28 PM on April 8, 2023 [12 favorites]

... and that's a problem. There has been a ton of crap articles about it, even from publications that generally do better. Everyone feels the need to generate column ink about AI...

derail, maybe, but this article details what more practical, specialized uses AI will be put to, and the smaller, more practical training sets that will be used.

When AI’s Large Language Models Shrink

The public-beta-grade, generalized test versions we're all toying with ... we're finding the weak spots now, aren't we, which is the point of a beta...

Oh well, any conversation about AI is good, I guess. Imagine if it was a Pentagon skunkworks project developed in secrecy...
posted by Artful Codger at 12:34 PM on April 8, 2023 [1 favorite]

Turley's an absolute chode of a human being so I too echo the original sentiment above.

that you can't tell when these systems are 'hallucinating' without deep knowledge of the information they're referencing.

One of my first test questions to Chat-GPT-4 last month was:

How was the Macintosh IIci better than the IIcx?

Answer

The Macintosh IIci was an improvement over the Macintosh IIcx in several ways. Both computers were part of Apple's Macintosh II family, which was popular during the late 1980s and early 1990s. The Macintosh IIci was introduced in September 1989, a few months after the IIcx was released in March 1989. Here are some of the key differences that made the IIci better than the IIcx:

Faster Processor: The Macintosh IIci was equipped with a faster Motorola 68030 processor running at 25 MHz, while the IIcx had the same processor running at 16 MHz. This resulted in better overall performance for the IIci.
Onboard Cache: The IIci included 32 KB of Level 2 cache memory on the motherboard, which improved system performance by reducing the time taken for the processor to access frequently used data. The IIcx lacked this feature.
More Video Options: The IIci had an onboard video controller that supported 8-bit color at 640x480 resolution. This allowed users to connect a color monitor without the need for an additional video card, freeing up an expansion slot. The IIcx required a separate video card to support color displays.
More Expansion Slots: The Macintosh IIci came with three NuBus expansion slots, while the IIcx had only two. This gave users more flexibility for adding peripherals and expansion cards to their system.
Enhanced SCSI: The IIci had an improved SCSI (Small Computer System Interface) controller, which allowed for faster data transfer speeds between the computer and external devices like hard drives and scanners.
Improved Case Design: The IIci had a more streamlined and compact case design compared to the IIcx, making it easier to fit into tight spaces and transport if necessary.

please clarify point 4 above
I apologize for the confusion in point 4. I made an error in my response. Both the Macintosh IIci and IIcx actually had three NuBus expansion slots, so there was no difference in the number of expansion slots between the two models. Please disregard point 4 from my previous response. The other points still highlight the advantages of the Macintosh IIci over the IIcx.

its response is getting better (I did not know about the enhanced SCSI but it looks like it is right!), but it still just made up the last comparison.

This is a weird technology being born.
posted by Heywood Mogroot III at 12:42 PM on April 8, 2023 [2 favorites]

The impression given to a lay audience is that these things are miracle robot intelligences from the future which know and tell all.

My impression is that these things are miracle robot intelligences from the future which will know and tell all.
posted by Heywood Mogroot III at 12:46 PM on April 8, 2023 [2 favorites]

My impression is these things are garbage factories that produce nothing but copyright infringement and libel.
posted by kittens for breakfast at 12:48 PM on April 8, 2023 [12 favorites]

Gonna drive us places better than we can, write term papers, do mental health counseling, trade stocks.

https://www.youtube.com/watch?v=RvZ-667CEdo

(check back in 30 years)
posted by Heywood Mogroot III at 12:51 PM on April 8, 2023 [1 favorite]

> Well, I hadn't been worrying about LMM chat bots being created by the military, but I am now.

AI could help, even there. Of course, a 12 yr old with a GPS could have helped as well, in that one.
posted by Artful Codger at 1:00 PM on April 8, 2023

I wonder how often the makers of MadLibs have been sued over the years.
posted by Tell Me No Lies at 1:05 PM on April 8, 2023 [1 favorite]

My impression is these things are garbage factories that produce nothing but copyright infringement and libel.

It's easy to get that impression, but it's not entirely accurate, I've used GPT-3 for both useful and fun things, none of them involving infringement or libel.

Useful: parsing unstructured text into structured fields at maybe 85% accuracy, "good enough" for my purposes
Fun: generating bespoke interactive text adventures based on public domain literature ("Let's play a text adventure based on Edgar Allen Poe's 'The Masque of the Red Death.")

This isn't, like, world-shattering, but it's not all garbage.
posted by BungaDunga at 1:05 PM on April 8, 2023 [3 favorites]

I mean, the 2024 election cycle is already going to be a rolling carnival of fake headlines, phony news stories, allegations without substance, conspiracy theories, (mis)information bubbles and social media refusing to even attempt to police any of the above.

We might as well prepare for a ChatGPT-moderated Presidential debate.
posted by delfin at 1:07 PM on April 8, 2023 [3 favorites]

I for one am utterly shocked at how well GPT constructs English text. Re-watching the AT&T ads from 30 years ago the last one about parallel translation was still somewhat science fiction until GPT but I think we're good now.

I've caught GPT making a subtle grammatical slip-up in Japanese (に vs で for the verb 住む) but assume GPT will solidify its Japanese grammar too.
posted by Heywood Mogroot III at 1:15 PM on April 8, 2023 [3 favorites]

The LLM have a pretty scary new term for all this, 'Alignment'.

Details (from the horse's mouth)

In the context of machine learning, specifically language models like GPT, "alignment" refers to the extent to which a model's outputs or behavior align with human intentions, values, or expectations. When a language model is well-aligned, it means that it can accurately understand and generate responses that are relevant, coherent, and adhere to the user's intended meaning or goal.

The process of alignment in language models typically involves training the model on a large dataset of text examples, with the aim of optimizing its understanding and generation of human language. The quality of the dataset and the training techniques play a crucial role in achieving better alignment.

However, achieving perfect alignment can be challenging, as language models might sometimes generate outputs that are plausible-sounding but incorrect, biased, or potentially harmful. Researchers continuously work on improving the alignment of language models through better training techniques, fine-tuning on curated datasets, and incorporating user feedback to make the models more safe, useful, and aligned with human values.

posted by Heywood Mogroot III at 1:22 PM on April 8, 2023 [1 favorite]

I've only just started play with Open AI's free ChatGPT, which it tells me uses GPT 3.5:

However, it's important to note that I am not a specific version of GPT, but rather a language model based on the GPT architecture that has been trained on a large corpus of text data. My training data cutoff is September 2021, so any information or knowledge that has been published or created after that date may not be reflected in my responses.

I'm curious how people get access to ChatGPT4? Also I'm curious what the mechanism was for ChatGPT4 to find a nonexistent Washington Post and Guardian articles.

My questions have been pretty mundane, but so far it just seems like a very well spoken search engine. I asked it to make a website for me and it told me to learn how to code. I asked it for a recipe base on some ingredients I have on hand and it showed me a decent looking recipe. Other questions I asked gave answers that I knew to be pretty basic.

I'm old enough to have lived through the first dot com boom with all its breathless hype. It's true that the internet has changed everything, but not as fast as everyone predicted at the time, and not before a lot of companies went bankrupt trying to get ahead of it. The one mistake I will not make is expecting it to be a democratizing event - I'm just really worried that the people creating it and enabling it are the privileged people that are going to benefit from it; I expect it to contribute nothing of real value (feed the starving! combat injustice everywhere! end war!).
posted by maggiemaggie at 2:08 PM on April 8, 2023

> I'm curious how people get access to ChatGPT4?

if you pony up the $20/mo for ChatGPT+ you get access to GPT4, although I think it fallsback to GPT3 under heavy load since version 4 is more computationally intensive.
posted by dis_integration at 2:21 PM on April 8, 2023 [1 favorite]

ChatGPT 4 would have “found” a non-existent Guardian article by making one up. That’s all it ever does. Given a prompt, it constructs plausible text based on all the text it has ingested. What words are likely to follow these two words?

“Hallucinate” is a vivid word, but it’s totally inaccurate. ChatGPT doesn’t perceive or believe in things such that it could hallucinate about them. It just churns out word sequences that are like sequences of words it’s seen before. Ask it for articles about a subject and it might bring you back actual articles that were referenced in its corpus, but it might make up article titles that look plausible. It’s autocorrect on steroids, but consider how often your predictive keyboard has suggested the wrong it’s/its and extrapolate from there. The keyboard doesn’t understand what you are typing, so it’s a complete gamble whether or not it suggests the correct word.

And it and other LLMs have no sense of “truth”, so it cannot distinguish between what it made up and what already exists.
posted by clauclauclaudia at 2:23 PM on April 8, 2023 [15 favorites]

Jonathan Turley falsely accused? I know this is bad, but on the subject of bad things happening to Jonathan Turley I struggle to rustle up even a single shred of a single fuck.

This is why general principles are important. Imagine someone you do give a fuck about, perhaps yourself, being falsely accused by a GPT. Consider that the accusations are random.

At this point, the accusations are implausible. I assume improved LLMs with be more plausible but not a lot more reliable.

In general, people will accept lower quality in exchange for speed and cheapness.
posted by Nancy Lebovitz at 2:24 PM on April 8, 2023 [10 favorites]

And it and other LLMs have no sense of “truth”, so it cannot distinguish between what it made up and what already exists.

well, it does and it doesn't. In my example above it confidently asserted that the Macintosh IIci's case design was more compact than the IIcx's, when in fact the only difference was 'i' vs 'x' on the front.

So I challenged it on that:

how about the last point?

I apologize for any confusion in my previous responses. Upon further review, I must correct my statement regarding the case design. The Macintosh IIci and IIcx actually shared a very similar case design, with both models featuring a compact, modular desktop case.

However, there was a minor improvement in the internal organization of components in the Macintosh IIci compared to the IIcx. The IIci had a more easily accessible motherboard, making it slightly easier for users to perform upgrades and maintenance on their system.

and again it made a false assertion, so I challenged it on that:

referring to the Iici's more easily accessible motherboard above, is that true?

I apologize for the confusion in my earlier response. After further research, it appears that the Macintosh IIci and IIcx share the same compact, modular desktop case design, and their internal organization of components is also quite similar. Therefore, there isn't a significant difference in the accessibility of the motherboard between the two models.

The main improvements of the Macintosh IIci over the IIcx, as mentioned previously, include the faster processor, onboard cache memory, onboard video controller, and enhanced SCSI controller. The case design, expansion slots, and internal organization are largely similar between the two models.

So as a check on its attachment to the truth, I challenged something that was true:

earlier you asserted that the IIci was faster than the Iicx. Is that true?

Yes, the Macintosh IIci was indeed faster than the Macintosh IIcx. The IIci featured a Motorola 68030 processor running at 25 MHz, while the IIcx had the same processor running at 16 MHz. This difference in clock speed made the IIci faster and provided better overall performance compared to the IIcx.

posted by Heywood Mogroot III at 2:33 PM on April 8, 2023 [1 favorite]

What's interesting is that this is analogous to the release of the Google search engine beta at the beginning of the 21st century.

One morning, a link appeared on Slashdot and, if you clicked on it, the Google beta popped out. Most people headed over with the usual beta caveats in mind.

Back then, the majority of Slashdot readers were either directly or tangentially involved with computers and IT. As were large legions of *people on the internet*, for that matter. It was the Web 1.0 era. Buying computer equipment and hooking into the Internet wasn't something everybody and their great-grandfather was doing at the moment. So, caution prevailed.

It's a different story today. It didn't take a Slashdot mention to get people excited about LLMs--that was taken care of by social media and news sites. Within weeks, fascination about this decidedly beta version has spread into every corner of the globe, accompanied by starry-eyed and ill-informed predictions about its future from pundits.

Now, we're in the hangover phase, when it's dawning on us that this is some serious beta shit. Mostly, but not always, LLM developers have been making this point all along, but it fell into a black hole of internet and social media noise.

It's time for a reality check. Yes, LLMs suck. Yes, this suckiness is temporary. Probably.

It's beta . . . Beta.
posted by Gordion Knott at 2:38 PM on April 8, 2023 [4 favorites]

Turley is already on record complaining about Twitter removing disinformation on its platform back in '21.

I do wonder if the metagame here with this is the right campaigning to create a moral panic on GPT since it is now emerging as a media entity, like wikipedia and Twitter before it, that it had not corrupted.

https://nypost.com/2023/02/28/elon-musk-to-develop-ai-rival-to-woke-chatgpt-report/
posted by Heywood Mogroot III at 2:40 PM on April 8, 2023

Gordion Knott: But beta of *what*? A language model is not what you get actual facts out of, and it’s somewhere between confusing and scary that people are using it as if it is.

Heywood’s examples are interesting, but I’m not convinced that the backtracking vs doubling down behavior wasn’t prompted by the exact wording of the questions, rather than by the underlying truth of the matter.
posted by clauclauclaudia at 2:44 PM on April 8, 2023 [7 favorites]

A language model is not what you get actual facts out of, and it’s somewhere between confusing and scary that people are using it as if it is.

Yeah, that's just what I was coming back to ask. I was thinking of it as a chatty search engine, but actually it is just very good at making up sentences that look like facts? It's just by accident that some are true?
posted by maggiemaggie at 2:57 PM on April 8, 2023 [5 favorites]

I've been reading through Scott Aaronson's blogs about ChatGPT and one interesting thing he mentions (more like, bemoans) is that the predominant academic consensus is ChatGPT is bullshit, i.e. the majority of the professoriate takes a very dim view of ~~OpenAI~~ ClosedAI achievement. For example with last week's infamous open letter calling for a 6-month moratorium on training LLMs, I saw three of my old professors were signatories. On the other side, I found yet another of my old professors who actually wrote a personal blog post eloquently criticizing Chomksy's infamous NYTimes op-ed.

But mainly I found the remark interesting because if accurate, the hype and uncritical acceptance are coming from outside academia.

Personally I'm on the fence, I'm not ready to commit any one assessment of the implication of LLMs. I think a lot of explanations are not fully satisfactory because I can see a grain of truth in each argument, and so until some smart cookie synthesizes the ongoing the scientific and meta-scientific debates, I think it is necessary to remain on the fence in terms of what ChatGPT really is. I doubt that OpenAI ~~ClosedAI~~ is going make its IP truly available for university-caliber scientific study, though.
posted by polymodus at 3:18 PM on April 8, 2023

There's two things on display with these public LLM betas:

1) Its ability to parse a natural language input, "comprehend" it, act on it and spit back its response, in natural language.

I don't think there's much doubt that it's been impressive in this. Not perfect, and many higher language concepts elude it, but still...

2) Its ability to pull and summarize useful information out of the reams of data it was trained on

This is almost a byproduct, considering the relatively low grade info it was trained on. And because it has no sense of "truth", only of patterns and statistical prevalence, AND because it doesn't always get the gist of what it's parsed (eg it might lump a denial of a rumour as just another match for reporting that rumour)... its pattern-matching doesn't always generate truth. And because it has no (or limited) sense of whether the found answer has enough evidence behind it, it often sounds like its answer is definitive.

(tl;dr: it sometimes bullshits with stunning confidence)

With the caveat that the output is not to be 100% trusted til checked out by other means, I'm finding that the output I get is pretty useful, quite often.
posted by Artful Codger at 3:20 PM on April 8, 2023 [2 favorites]

But mainly I found the remark interesting because if accurate, the hype and uncritical acceptance are coming from outside academia.

This is the natural conclusion of the academy having to prove its worth in the market, to be profitable.

Why wouldn't the money be excited that an algorithm can fart out a bunch of words that are indistinguishable from the real thing to a layman at a fraction of the cost.

It's not like the people who profit care about the damage they do.
posted by Reyturner at 3:32 PM on April 8, 2023 [4 favorites]

A language model is not what you get actual facts out of

didn't somebody recently say 'never underestimate a 175 billion parameter model'?

When I was a kid we didn't have TV so I grew up on a rather sizable set of National Geographics, the 1965 World Books, and Peoples Almanacs I - III – that training corpus made me an outright beast in Trivial Pursuit, lemme tell ya.

Speaking of which, we had IBM's Watson winning Jeopardy back in 2011, and Google mucking around with DeepMind too last decade, but nothing much seems to have come from those projects.

how is GPT different to IBM's "Deep Mind"

It seems there is a bit of confusion in your question. IBM and DeepMind are two separate entities, and they are not related. IBM is an American multinational technology and consulting company, while DeepMind is a British artificial intelligence research company, which was acquired by Google (now Alphabet) in 2014.
IBM's most famous AI system is "IBM Watson," which gained significant attention after winning the game show Jeopardy! against human champions in 2011. IBM Watson is a question-answering system that uses natural language processing, knowledge representation, and machine learning techniques to understand and respond to complex questions.

GPT, which stands for "Generative Pre-trained Transformer," is a series of language models developed by OpenAI. The most recent iteration available at the time of my knowledge cutoff is GPT-3. GPT models are trained to understand and generate human-like text based on a vast dataset of text examples. They can be used for various tasks, such as question-answering, text summarization, translation, and more.

Here are some differences between IBM Watson and GPT:

Architecture: IBM Watson uses a combination of multiple algorithms and techniques, while GPT is based on the Transformer architecture, a type of neural network designed for natural language processing tasks.
Training and Knowledge: Watson relies on curated knowledge sources and pre-processing of data, while GPT is trained on a large corpus of text data from the internet using unsupervised learning techniques.
Applications: Watson was initially designed for question-answering tasks and has since been expanded to various industry-specific applications, such as healthcare, finance, and customer support. GPT, on the other hand, is a more general-purpose language model that can be used for a wide range of tasks, including question-answering, summarization, translation, and even code generation.
Companies: IBM Watson is developed and maintained by IBM, while GPT is developed by OpenAI.

posted by Heywood Mogroot III at 3:41 PM on April 8, 2023 [2 favorites]

the predominant academic consensus is ChatGPT is bullshit . . . the majority of the professoriate takes a very dim view of ~~OpenAI~~ ClosedAI achievement. For example with last week's infamous open letter calling for a 6-month moratorium on training LLMs, I saw three of my old professors were signatories.

Anyone signing that idiotic letter is taking an excessively generous view of the achievements. "It's so powerful it will outcompete us!" That is not a danger. It's a hype project and advertisement for companies doing AI.

I remain amazed at the ability of Musk (a financier whose company makes like the 9th best self-driving software and does nothing with LLMs) to drum up press coverage as if he were an engineer. Certainly helps his stock prices though.
posted by mark k at 3:46 PM on April 8, 2023 [1 favorite]

It also doesn't involve any sort of interpolation of facts.

It does in its context, the test of "does this game play like the one I want?" with a giant footnote of "in the cases I’ve tried". That kind of thing, automatic code generation has been able to do for a decade (different approach, afaik).

It turns out that defining tests is the tricky part, more so when the code has to interact with the real world, which is where less malleable facts come back in.
posted by clew at 3:46 PM on April 8, 2023

[...] actually it is just very good at making up sentences that look like facts? It's just by accident that some are true?

No, it's not accidental or by chance. The models are trained to predict the next word across a huge corpus of text. In order to correctly predict the word following "The capital of France is", they need to have encoded that "Paris" is the capital of France. But it's not just matching the specific wording "The capital of France is". An LLM can also predict "Paris" as the next word in this terrible sentence it has never seen before: "I was wandering through the capital city, baguette tucked under my arm, beret cocked just-so, romance in the air mingling with the cigarette smoke, and I thought of the name of this great city:" [I literally just tried this as a basic completion with GPT-3.]

Basically, in order to reach the level of accuracy LLMs achieve in their training, they have to retain massive amounts of information encoded in their networks. The encompasses facts, concepts, relations between things, ... And every time a model is predicting the next word, it is in some way calling upon all of that, referencing information it has stored inside itself.

There are limits to this, of course. Ask ChatGPT for a short bio of an incredibly famous person, and it will give you correct information 99.x% of the time. Ask it for the same for progressively less famous people, and its accuracy will start to fall, mixing some truth with some fabrication. Eventually, when asking it about someone who has little or no biographical information on the internet, it will fabricate the entire thing based on, basically, what the person's name "sounds like."

LLMs often have trouble saying "I don't know." Saying "I don' t know" in the correct cases requires some type of awareness of what information is and is not encoded in the network. This is technically possible, but I don't think the current training techniques accomplish this well.

However, LLMs can correctly say "I don't know" in some cases. For example, I asked ChatGPT (GPT-3.5, not -4):

On what date did Tom Brady throw his twenty-third touchdown pass in NFL regular season games in the 2019-2020 season?

I'm sorry, but I cannot provide an exact date for when Tom Brady threw his twenty-third touchdown pass in the 2019-2020 NFL regular season without doing some research. However, I can tell you that Tom Brady threw a total of 24 touchdown passes during that season, his first with the Tampa Bay Buccaneers after spending 20 seasons with the New England Patriots.

I had to go check: He did in fact throw 24 touchdown passes in that season, but his first season with the Buccaneers was the following year.
posted by whatnotever at 3:48 PM on April 8, 2023 [4 favorites]

> how is GPT different to IBM's "Deep Mind"
...
5. [mic drop]
posted by Artful Codger at 3:48 PM on April 8, 2023 [2 favorites]

Is using ChatGPT for text search similar to using DALL-E for image search?
posted by clawsoon at 3:52 PM on April 8, 2023 [3 favorites]

Anyone signing that idiotic letter is taking an excessively generous view of the achievements.

Pretty sure the academics who signed it viewed it as a negative achievement, in case my sarcasm above was not evident. Unlike Musk's involvement, most of the academic signatories are no intellectual slouches; in computer science, one of them I'm aware is a Knuth prize winner, etc.
posted by polymodus at 3:58 PM on April 8, 2023

Are we talking about different letters? This is the one I was thinking of.

In that letter they are saying it's dangerous, but because it's so powerful. Meaning they believe something is there.

By analogy, during it's brief moment of hype, there were physicists who complained about cold fusion because it wasn't real, but also a handful of environmentalists who complained because it would let man complete his conquest of the natural world. Only one of those groups criticizing it actually understood it.
posted by mark k at 4:04 PM on April 8, 2023

This is what #ChatGPT knows about me. It was nice knowing you all

Third-party verification is important: [A]fter I double checked...
posted by They sucked his brains out! at 4:06 PM on April 8, 2023 [3 favorites]

Yeah mark k, that's the one. But I see that the top signatory has been updated to a Turing award winner.
posted by polymodus at 4:14 PM on April 8, 2023

excessively generous view of the achievements

the following info dump from GPT-4 is flawless:

explain に vs で for the verb 住む

In Japanese, particles play a crucial role in indicating the relationship between words in a sentence. For the verb 住む (sumu), which means "to live" or "to reside," the particles に (ni) and で (de) can be used to express different aspects of living or residing.

に (ni): The particle に is used to indicate the location where someone or something exists, or the destination of an action. In the context of the verb 住む, に is used to show the place where someone lives or resides. It marks the location or the place of residence.
Example:
東京に住んでいます。 (Tōkyō ni sunde imasu.)
I live in Tokyo.

In this example, に indicates the location where the speaker lives, which is Tokyo.

で (de): The particle で is used to indicate the means, method, or context in which an action takes place. In the case of the verb 住む, で can be used to describe the conditions or circumstances under which someone lives.
Example:
アパートで住んでいます。 (Apāto de sunde imasu.)
I live in an apartment.

In this example, で is used to describe the type of residence or living situation (an apartment) rather than the specific location.

In summary, use に (ni) with 住む (sumu) to indicate the location where someone lives, and use で (de) to describe the type of residence or living conditions.

This shit is going to change the world, and by shit I mean the T in GPT. The math behind this is far beyond my ability to create but I learned enough in college to understand the basic gists of it. Back in the 1990s I used to have bs sessions with friends talking about what was required for NLP in AI and something like GPT's vector space manipulations entered my mind.

The jury is out about whether Attention is indeed all you need but I consider this technology bigger in import than the internet itself.

The ongoing conversation served at this URL is very similar to what I was doing 30 years ago on Usenet, and 40 years ago had my high school's Apple lab had modems to dial into a bb. "Broadband" internet has enabled the "Information Superhighway" to displace pre-programmed broadcasting with streaming and much interesting monetized content on YouTube but as a futuristic technology I'm not too amazed by where the internet itself is now. ("Wifi" is indeed cool futuristic stuff tho.)

What GPT (and to some extent the Chat interface) is doing with information itself is something deeper, it most reminds me of the Primer of Neal Stephenson's Diamond Age.

Is "Alignment" a problem like the consumer internet scaling from 14.4kbit to today's 150Mbit? Dunno . . . but per my above I wouldn't bet against it.

(similarly to Tesla's FSD efforts . . . I wouldn't bet my life or even $100 that Teslas will have Level 4 ADAS by 2030, but it also wouldn't surprise me if they did, since they are throwing an infinite amount of effort at it)
posted by Heywood Mogroot III at 4:17 PM on April 8, 2023 [1 favorite]

I just asked ChatGPT to tell me everything it knew about me, and it provided me with 6 or 7 paragraphs with mostly true stuff, and several things that were untrue but somewhat plausible (things like that I had edited real books in the genre I work in, that I had nothing to do with). I told it those things were untrue and it was like oh, sorry, I checked my sources and you're right and I asked it to tell me everything it knew about me now and it gave me most of the same information and several new lies (I was a judge for the James Tiptree Jr Award!).
posted by joannemerriam at 4:47 PM on April 8, 2023 [6 favorites]

How does it recalibrate when challenged? I asked it about the evidence for which books are re-read most frequently. It referred to a Goodreads survey on the topic, and when I asked it for a link it said I apologize for the error in my previous response. Upon double-checking my sources, I was mistaken about the existence of a Goodreads survey on the most re-read books. I apologize for any confusion this may have caused. It feels like in this situation it ought to double-down and create a link.

Similarly, with joannemerriam's example, how does it decide it was wrong? Or does it just accept the correction without further analysis?
posted by paduasoy at 4:55 PM on April 8, 2023

... though re-reading the comment, it appears to accept it's wrong, but then regurgitates the same stuff. So is it just producing a kind of social lubricant and then repeating the data process?

It made up a Guardian article for me just now, or at least I think so - deadlink and on a a topic where I think I'd have remembered the specific article. It may also have made up the other three references it gave me; they were all local newspaper articles, and all deadlinks, but given the time lapse between ChatGPT's dataset and now, they may all have gone offline.
posted by paduasoy at 4:59 PM on April 8, 2023 [1 favorite]

The more horrifying stories I read daily on ChatGPT, the more I am convinced that many people are going to be deeply harmed by it. Both because of "hallucination" and made up lies, and because writers and so many others will be out of work because it's so much cheaper and easier to have ChatGPT vomit up some recycled text.

I am so glad I no longer write for a living. I'm terrified for y'all who do.
posted by jenfullmoon at 5:01 PM on April 8, 2023 [3 favorites]

though re-reading the comment, it appears to accept it's wrong, but then regurgitates the same stuff. So is it just producing a kind of social lubricant and then repeating the data process?

All of GPT's abilities are extrapolations from learned examples. There may not be enough examples of people changing their mind when confronted with new evidence for it to be able to do it well.
posted by justkevin at 5:18 PM on April 8, 2023 [8 favorites]

Can these programs be tied or read by a digitalized library, like the University of Michigans, assuming copyright magically disappears. I believe there are over 26 books million digitalized.
posted by clavdivs at 5:20 PM on April 8, 2023

paduasoy, in my case it accepted my corrections and didn't regurgitate the things I had told it were lies, it just incorporated new lies.

The things it said about me that weren't true were all things that somebody in my field did actually do (except for one item which I think is literally fictional, not as in ChatGPT made it up but as in it appears in fiction, which was that I was a member of the "Outer Alliance" which is from The Expanse). I suspect it was mostly pulling from e.g. the website of the small press I own, where for several anthologies I published, there's a section for contributor bios - if you were a word cloud robot who doesn't understand anything, it makes sense to treat all of those bios as though they belong to the editor.
posted by joannemerriam at 5:29 PM on April 8, 2023 [3 favorites]

Interesting that chat gpt is basically doing what the right wing echo chamber does: make an accusation and cite an article that either doesn't exist, doesn't say what they say it does or was written by the author themselves. The complete lack of vigilance regarding that sort of fraud leaves me pessimistic about our future success in policing similarly supported claims made by AIs. I predict they'll become an extremely profitable tool for the right.

Relatedly, the cops, who are already embracing this kind of tech, are probably salivating at the prospect of a machine that spits out groundless accusations, informed as it is by the same sort of biases that stunt their own minds.

Good times ahead.
posted by klanawa at 5:55 PM on April 8, 2023 [10 favorites]

how long before AIs are brought in as expert witnesses in court. the horror.
posted by dis_integration at 6:19 PM on April 8, 2023 [1 favorite]

GPTZero
posted by clavdivs at 6:19 PM on April 8, 2023

mmm, I clipped two samples of LLM text from this thread, and GPTZero graded them both as "Likely to be written entirely by a human", so I'm not much impressed.
posted by rifflesby at 6:31 PM on April 8, 2023 [2 favorites]

I've watched videos (maybe they were faked, I am not led to believe so) where people are using them to, say, make a Minecraft clone. They feed the bot things they need the program to do and the bot spits out code and the code works and they eventually end up building a working game environment.

I mean, that's pretty amazing. It also doesn't involve any sort of interpolation of facts.

I think it is pretty amazing—but the kind of "amazing" that it is for me, as a programmer, is that it lowers the bar for non-programmers who want to feel empowered to make things. It can probably also simplify people's discovery of new programming languages, and even suffice to let people program relatively simple things.

All of that is extremely exciting—and none of it is remotely surprising, because programming languages are already fundamentally linguistic enterprises. There's some amount of inherent lambda calculus (which allows for "if" and "then" and "else"), and there's memory storage obviously, but the actual workings of programming languages have a lot more to do with language than non-programmers presume. (And more than many programmers realize, since not all programmers are all that interested in linguistics.)

The challenge of programming is essentially to work out the grammar necessary to make a computer do a thing. That means understanding the syntactic rules of a language, it means knowing the vocabulary that lets you tell it what to do and when, and it means reasoning out the structure behind your idea of the computer doing something, so that it understands your process, sentence by sentence, and ends at the place you wanted it to end on.

Machine learning systems find code much easier to comprehend than human languages, because no programming language in existence is remotely as complex as even very simple human languages. Haskell, which is famously a mindfuck to wrap your head around, has got nothing on Esperanto. The trickiest part of making a computer program code isn't the programming part: it's the part where it understands the structure of English sentences enough to make sense of what you were looking to code in the first place.

A machine-learning system that can do that much can take "make me a game like Pong" and output Pong. It can take "make me a web site with a navigation bar" and output working HTML and CSS. It might be able to take an input like "I would like a function, in JavaScript, that recognizes when a mouse is clicked, finds the word that it's clicked on, and lets you drag that word somewhere else in a paragraph," parse it, and bake something approximately correct (though it would likely take a few revisions to make it develop that function "correctly").

And all of that is really neat! But its impact on actual development work is more-or-less the same as ChatGPT's impact on science fiction writing is: there are probably clever ways of integrating it into a creative process, and there are definitely ways that lazy or greedy employees can use it to cheat actual people out of jobs, but it's not going to replace what makes actual good development work good. I might find ways of using it to work within the specialized programming environment that I work within, but somebody who's not me couldn't suddenly do my job. Because once you're dealing with a certain level of complexity, a part of the job is comprehending everything you're looking at, understanding its order enough to intuit where to do what, and—most importantly of all—achieving a kind of eloquence, less to be fancy than because all programming creates more cognitive overload for future programming, and eloquence is the only way that you keep that ongoing overload as minimal as possible.

I'd love to teach an intro-to-programming class that spends a week, say, making students get ChatGPT to write them a program, in order to focus their attention on the complex reasoning they have to do in order to design more elaborate applications. That would let them feel accomplished and excited far more quickly than most coding tutorials do, and with less restrictive a structure. And personally speaking, what I'm excited for is that I might be able to ask ChatGPT something like, "Hey, how would I get Swift to render a button in a certain way?" But I wouldn't do that to make it program for me: I'd do it because it might spit out vocabulary words and terminology that's completely new for me, and show me a version of "how to do X" that works—which means I'd have code I can legitimately study to teach myself Swift.

I could maybe see instances where a system like that would be useful for building out a library. There are 250 different JavaScript libraries that handle drag-n-drop for web sites; most of them are pretty rudimentary, or are missing one feature that I just need for whichever given project. And writing new libraries from scratch can be pretty exhausting. So it would be neat to flesh out a library using a machine-learning system, getting it to approximate 98% of what I need and finishing up the work myself. But the utility we're talking there is "developing one minor mechanical function that I don't want to spend a week coding." That would be great, but its greatness, again, is limited in scope.

You could hypothetically build much bigger systems. "Design a theme for my web site in the style of Jacob Bijani, using a neutral grey-blue color scheme and prominent rounded corners, and provide me with the CSS for it and a style guide for its various elements." But then you'd start running into the same issues that we get when AI is used to make art or write stories or tell the truth, which is: things would be slightly off in disconcerting ways, and you wouldn't be able to easily fix them. Because the complexity you've just asked the AI to generate is completely foreign to you, and totally lacks eloquence, and the work it would take to get those last little bits right would be more immense than the work it would have taken to just work it out yourself.

In other words, this technology is great for rough approximation, it's great for filler, it's great for spitting out ideas of things for you to study. All of that is amazing, when applied correctly. But it's amazing in small, limited ways that involve imaginative applications. It's not amazing in the sense that AI shills make it out to be, and its limits are pretty immense. Not just current-generation-of-AI immense, either: it's immense in the sense that there are some things you just can't make computers do. And the more familiar the public becomes with AI, the more it'll start to learn what those limits are.

It's neat to think about this in terms of programming language, really, because computers are good at the sheer technical aspect of constructing working things in programming languages. They have to be, because they themselves are made out of those languages. "Understanding" is a little more genuine, there. So "a computer makes a Minecraft clone when you tell it to" is a really cool demonstration of what these systems have achieved, and at the same time, it can probably make for a useful example of what these systems can't achieve—and of why trying to get them to do more than they're capable of will lead to significant, intricate failures, much in the same vein as the story this FPP is about.
posted by Tom Hanks Cannot Be Trusted at 6:45 PM on April 8, 2023 [8 favorites]

the following info dump from GPT-4 is flawless:

explain に vs で for the verb 住む

Are you sure it's flawless? It looks mostly correct, but I'm fairly sure that the last example (アパートで住んでいます) is very unnatural, if not flat-out ungrammatical. But the accompanying description of the particle で is sufficiently good that an incautious learner could easily be fooled into thinking that the example was reasonable.

Which makes it a great example of why relying on ChatGPT can be so risky!

(A better example would be something like 一人で住んでいます。)
posted by teraflop at 6:48 PM on April 8, 2023 [6 favorites]

Incidentally, if you want to see a fascinating example of ChatGPT breaking down, try speaking to it in ROT13. It does seem to "realize" that you're speaking in ROT13, but it is absolutely terrible at carrying on a conversation that way.
posted by nosewings at 7:08 PM on April 8, 2023 [2 favorites]

yeah I searched for "アパートで住" on google and only got 5 matches, but they looked like valid usages in context. Pre-GPT the thought occurred to me that this form of Google-fighting to learn nuances would fail if the search return was from another silly gaijin floundering in the language, but the matches looked OK to me.
posted by Heywood Mogroot III at 7:16 PM on April 8, 2023

Imagine you pulled words on pieces of paper from an old sock.
Imagine the words said 'Bob molested children in 2008'.
Imagine you published those words, with or without citation.
Under what circumstances are the old sock guilty of libel? Under what circumstances are you personally guilty of libel?

If I was lending my old sock to the whole world with the promise that it could "answer questions," and it was producing false and defamatory statements like "Bob molested kids in 2008," and people were treating those statements like they were true, and I wasn't taking significant steps to prevent that misuse, then I would be guilty of negligence and responsible for the misuse.

If a search engine was answering people's questions with false and defamatory statements from my sock -- not merely linking to sources, but presenting statements from an unreliable source like my sock as if they were true -- then the search engine would be guilty of both negligence and libel.

I don't know if that's how the law works, but those are the principles that I believe should guide the law.
posted by Gerald Bostock at 7:36 PM on April 8, 2023 [8 favorites]

yeah I asked ChatGPT about libel liability and it looks like OpenAI is skating on thin ice.

what circumstances are people guilty of libel

Libel is a form of defamation, which involves making a false and damaging statement about someone in written or printed form. People may be found guilty of libel under the following circumstances:

False statement: The statement made must be false. If the statement is true, it cannot be considered libel.
Published: The statement must be published or communicated to a third party. This can include printed materials like newspapers, magazines, or books, as well as digital publications such as blogs, social media posts, or emails.
Identifiable victim: The statement must clearly identify the person being defamed, either directly or indirectly. This can include using the person's name, likeness, or other identifying information.
Defamatory nature: The statement must be defamatory, meaning it harms the reputation of the person it targets. Examples of defamatory statements include accusing someone of a crime, dishonesty, or incompetence.
Fault: In most jurisdictions, the person making the statement must be at fault in some way, either through negligence or actual malice. Negligence means that the person making the statement failed to exercise reasonable care in determining the truth or falsity of the statement. Actual malice means that the person making the statement knew it was false or had reckless disregard for its truth or falsity.
Damages: In some cases, the plaintiff must show that they suffered actual harm as a result of the libelous statement, such as financial loss, damage to their reputation, or emotional distress. In other cases, damages may be presumed or awarded without the need to prove actual harm, depending on the nature of the statement and the jurisdiction.

posted by Heywood Mogroot III at 7:54 PM on April 8, 2023

Here's C-GPT playing yesterday's Jeopardy

let's play jeopardy. I will be Alex Tribek.
Great, I'm excited to play! What category would you like to start with?

It will be Geography. Here is your answer: "Of the 13 nations through which the Equator passes, it’s the only one whose coastline borders the Caribbean Sea"
What is Colombia?

Correct! "Planet Franchise" is your next category. Answer: "Caprica. We're talking about frakkin' Caprica"
What is the television series "Battlestar Galactica"?

Franchise category again: "Romulus & ('Give me') Genesis"
What is the Star Trek franchise?

Same category: "Mobius, where Dr Robotnik schemed"
What is the Sonic the Hedgehog franchise?

Category: "De-composing". Answer: "On March 25, 1918, the sun set on this 'Moonlight' composer in Paris"
Who is Claude Debussy?

same category: "Doing anything to avoid tax day, this 'messiah' composer departed earth on April 14, 1759"
Who is George Frideric Handel?

"Good E" category. The question: "MEANING CONCERNED WITH MORAL PRINCIPLES, THIS WORD IS USED TO DESCRIBE A TYPE OF VEGANISM"
What is ethical?

same category, "THE LAS VEGAS RAIDERS ARE KNOWN FOR THEIR "COMMITMENT TO" THIS 10-LETTER WORD-IN FACT, IT'S THEIR MOTTO"
What is "Excellence"?

Americana category, the question: "A FEW THINGS ABOUT THIS ICONIC ITEM;
ITS STRIKE NOTE IS E-FLAT, IT WEIGHS A TON & PENNSYLVANIA IS MISSING
THE SECOND "N" ON IT"
What is the Liberty Bell?

Good E, the answer: "Bernini's sculpture, captioned "____ of St Theresa"
What is the Ecstasy of St. Theresa?

Good E, the answer: "THIS WORD FROM GREEK CAN REFER TO A PERFECT MODEL OF HOW TO ACT OR BE, OR TO A SUMMARY OF A LITERARY WORK"
What is an archetype?

incorrect, any other guess?
My apologies. What is an epitome?

posted by Heywood Mogroot III at 8:07 PM on April 8, 2023 [1 favorite]

> let's play jeopardy. I will be Alex Tribek.

I guess Ken Jennings has had too contentious a relationship with AIs in the past to get involved with ChatGPT
posted by dis_integration at 8:24 PM on April 8, 2023

though re-reading the comment, it appears to accept it's wrong, but then regurgitates the same stuff. So is it just producing a kind of social lubricant and then repeating the data process?

My understanding is that in addition to the “long term memory” resulting from pre-training, the web of word associations that implicitly sort of encodes enough facts represented by those words to tell you anything useful at all, it has a limited “short term memory,” a “context window,” which is what allows it to construct a coherent response. I don’t know exactly how it works in chat mode, but I assume that the conversation history is in some way incorporated into the context window for the next response. The permanent training also includes a human feedback process that’s designed to produce, as you put it, “social lubricant.” I believe that this is actually necessary to get it to simulate dialog at all - as recently as the initial GPT 3 release (before InstructGPT and ChatGPT) the default behavior tended towards much more literal continuation of the text you put into it, rather than treating that text as instructions or a question.

Putting this all together, when you correct it in a dialog, it will give you an apologetic message, and try to produce new responses in context of this new information, and maybe even succeed. If the dialog goes on long enough and strays from the topic, it may “forget,” though, and this does not solve the underlying problem of confabulation. When given a fresh context, it definitely won’t recall any new information, though OpenAI may have some longer term process to feed back corrections to improve answer quality.
posted by atoxyl at 10:28 PM on April 8, 2023 [6 favorites]

This is what the Terms of Use have to say about accuracy:

(d) Accuracy. Artificial intelligence and machine learning are rapidly evolving fields of study. We are constantly working to improve our Services to make them more accurate, reliable, safe and beneficial. Given the probabilistic nature of machine learning, use of our Services may in some situations result in incorrect Output that does not accurately reflect real people, places, or facts. You should evaluate the accuracy of any Output as appropriate for your use case, including by using human review of the Output.

There’s also an abbreviated disclaimer above the actual chat window. Feels like it could be more strongly worded (not that this will stop people from trusting it anyway, just from a legal standpoint) but I am not a lawyer.
posted by atoxyl at 10:51 PM on April 8, 2023 [1 favorite]

oh I'm super down for a game of "what does ChatGPT know about me"

[taquito sunrise] is a game designer, writer, and podcaster who is known for her work in the video game industry.
yes

She has worked as a narrative designer on a number of popular games, including "Sunless Sea," "Sunless Skies," and "Cultist Simulator."
uh, no

[sunrise] is also the co-creator of the podcast "Be Gay, Roll Dice," which focuses on LGBTQ+ representation in tabletop gaming.
BRB STARTING THIS PODCAST RIGHT NOW

(I googled & there is a network called that but I think not a podcast per se)
posted by taquito sunrise at 2:35 AM on April 9, 2023 [4 favorites]

...since posting that I found out MeFi's own my ex is a recurring fictional character, "a parody of a typical games journalist," and while he's not real he is beloved in the franchise, so he's got that going for him
posted by taquito sunrise at 2:46 AM on April 9, 2023 [2 favorites]

AIs don’t misinform people.
People misinform people.
posted by Thorzdad at 6:43 AM on April 9, 2023 [2 favorites]

"But I told you I was artificial!"
posted by UN at 7:45 AM on April 9, 2023 [1 favorite]

It sounds like science fiction but it’s not: AI can financially destroy your business Specifically by imitating your voice on the phone.

Uh-Oh, I Seem to Be Dating a Chatbot
posted by jenfullmoon at 8:46 AM on April 9, 2023 [1 favorite]

MetaFilter: marketed and widely perceived as “ask a question and it tells you the right answer” but what it’s actually doing is “ask a question and it shows you what a plausible answer would look like.”
posted by ricochet biscuit at 8:52 AM on April 9, 2023 [2 favorites]

That article linked above, I seem to be dating a chatbot, is one very sorry excuse for an article.
posted by maggiemaggie at 9:46 AM on April 9, 2023

Artful Codger: no scoring for truthfulness.
They can't predict the notion of truth that a customer might hold dear or find abhorrent, yes, this is a hard problem.

It's worth noting that a corpus of training text from across the world's cultures and internet's sources aren't going to agree to one cosmologocal story, so we shouldn't expect this system to have a sole "most likely next" cosmological viewpoint.

justkevin: All of GPT's abilities are extrapolations from learned examples.
I prefer the characterisation that they're interpolation between layers in the model and input vectors ... where extrapolation would reach beyond the training data to things it wasn't trained on.
posted by k3ninho at 1:55 PM on April 9, 2023 [1 favorite]

I was horrified that the chatbot (a) wrote better than most available men, (b) made an in person date knowing it was a chatbot. Like dating didn't already suck enough?!
posted by jenfullmoon at 2:24 PM on April 9, 2023

It sounds like science fiction but it’s not: AI can financially destroy your business Specifically by imitating your voice on the phone.

And that's why you always just get off the phone and call them back if anything seems fishy.
posted by grumpybear69 at 2:55 PM on April 9, 2023 [1 favorite]

[AIs] can't predict the notion of truth that a customer might hold dear or find abhorrent, yes, this is a hard problem.

Some stuff is true and some stuff isn't, regardless of the viewpoint of the reader. The current public-beta LLMs were trained on reams of online material apparently with little regard as to whether it was true or not. (because the main goal has been learning natural language patterns) So, I expect better results from the next iterations if/when they're trained on better fodder. I also expect that the addition of post-result self-checking could catch most of the whoppers that currently make it out of the current implementations.
posted by Artful Codger at 3:50 PM on April 9, 2023

prefer the characterisation that they're interpolation between layers in the model and input vectors ... where extrapolation would reach beyond the training data to things it wasn't trained on.

Is this really a clear-cut distinction? When it produces a plausible but faulty interpolation - i.e. when it makes shit up - is that qualitatively nothing like an extrapolation?

Incidentally, I’m not the first to observe that “making shit up” is simultaneously a fatal flaw in some applications and desirable in others. This tweet I saw yesterday touched on something I’ve been feeling, too, that the dreamlike output of slightly earlier ML models was in a way more fun and more inspiring than the current state of the art.
posted by atoxyl at 5:46 PM on April 9, 2023

Although I’m sure one can get more abstraction out of current image generators -people just like to show off the (hyper)photorealism.
posted by atoxyl at 5:49 PM on April 9, 2023

My wife described LLMs as the ultimate "Yes, and" partners, and that's about the level of veracity they deserve
posted by klangklangston at 12:52 AM on April 10, 2023

My wife described LLMs as the ultimate "Yes, and" partners

I feel like I missed some bit of Internet culture. What's a "Yes, and" partner?
posted by clawsoon at 1:58 AM on April 10, 2023

https://en.m.wikipedia.org/wiki/Yes,_and...

"Yes, and...", also referred to as "Yes, and..." thinking, is a rule-of-thumb in improvisational comedy that suggests that an improviser should accept what another improviser has stated ("yes") and then expand on that line of thinking ("and").[1][2][3] The improvisers' characters may still disagree.[1] It is also used in business and other organizations as a principle that improves the effectiveness of the brainstorming process, fosters effective communication, and encourages the free sharing of ideas.

posted by sebastienbailard at 2:42 AM on April 10, 2023 [2 favorites]

I think most of us know by this 999th thread that ChatGPT is not a general artificial intelligence and that it doesn't have some sort of internalized motivations or desires. Unfortunately, that doesn't matter much. Capitalist forces are aggressively pushing it as a knowledge tool, and the human-like response syntax triggers the "I'm talking to a person" parts of our brains every time we ask it something. At a macro scale, that instinct is going to triumph over the protests of a few people who claim they are able to stay completely above that due to their Perfectly Rational Minds (I'm talking to you, Metafilter). I am using it at work as a general aide, and if I don't try to trick it or pull out very recent (> 2021) information, it gives me useful information 99% of the time.

Let's say you're a tech worker and your friends and family rely on you to explain technical stuff. You don't know everything, but you do your best and help them 99% of the time. At what point do they stop thinking of you as an fallible resource that they have to cross-check every time? Probably RIGHT AWAY. If you then start feeding them bullshit, is it their fault if they rely on that info?

We have much less reliable sources that we (or someone) counts on every day: Fox news, military intelligence, our bosses and companies, advertising, people trying to hook up with us, and so on. We're born suckers. Someone above mentioned romance scams; I am currently dealing with a family member who died broke after being scammed in this way. He was in love with a non-existent person he never met. He was an experienced IT worker whose house is full of programming books, puzzles, games, and all sorts of rational stuff. Doesn't matter.

What happens when this technology:

* is paired with speech tech and put into a robot companion or sex doll?
* has its outputs mapped to some sort of decision engine that makes decisions that affect people?
* is used for therapy?
* is paired with video tech and produces convincing documentaries?
* is relied on for medical advice?
* is used for any purpose without labeling the output as AI-generated?
* is paired to weapons technology?
* is pointed at a non-optimal corpus of information to train itself, like AI-generated content?
* destroys the livelihoods of countless people who write for a living?
* is deliberately used to amplify misinformation and drive the populace in some desired direction?
* is paired with tech to let it run the code that it generates?

And so on. As much as it's fun to play with and has some use in my daily work, I'm on team "burn it down" or at least "regulate it and lawsuit it till it's in a tiny safe box". I don't think we are capable as a society (speaking as an American/capitalism participant/Westerner) of using, frankly, almost *any* technology for net good. Maybe it's because I grew up in the 1970s and 1980s, but I lived in a perfectly fine world where no one needed a cell phone, or a computer, or, shit, an email address. I did 4 semesters of calculus using graph paper. The problems that we have in our society have 99% non-technical solutions and ChatGPT is just another distraction from that - a dangerous one.
posted by caviar2d2 at 6:23 AM on April 10, 2023 [8 favorites]

is paired with speech tech and put into a robot companion or sex doll?

The children of elderly widows will be very happy and occasionally be able to leave town for a few days?
posted by credulous at 7:02 AM on April 10, 2023 [4 favorites]

I expect better results from the next iterations if/when they're trained on better fodder.

That "if" in the "if/when" is doing some heavy lifting.
posted by mediareport at 8:44 AM on April 10, 2023

There will be heavy lifting required. Takes a lot of work (and a village) to turn a public-beta-version toddler into an adult. I am most interested in what widely-available AI will look like in 6 months; I'm expecting big differences.

If anyone remembers the 90s, there was a whole lot of hand-wringing over the Internet too. Some bad stuff did happen... and some good stuff too.
posted by Artful Codger at 9:13 AM on April 10, 2023

Data availability is probably an issue if you want to train from scratch on something high quality and domain-specific? I think you linked an article yourself touching on this, but currently influential research on scaling suggests that it takes a lot of data to max out performance of a model, like a couple TB of text for a GPT-3-sized model? Depending on the domain, there may be some practical, legal and ethical obstacles to sourcing that much good data. Maybe I’m off-base about something here but I have the notion that a lot of interest is going to be in taking these huge models that have been trained on spotty data but do “speak the language” proficiently already, and fine-tuning them for specific fields.
posted by atoxyl at 9:44 AM on April 10, 2023

Yes, understood, thanks. As I think I mentioned elsewhere, I think there are two distinct capabilities being demonstrated currently: facility with natural language, and working with actual knowledge.

It's the first part - the language processing - that required such a huge sample set (The L in LLM). I believe that specific domain knowledge requires a much smaller set, volume-wise. Imagine if an AI had the current language capabilities, AND it had ingested all the university texts, lessons, reading, etc included for all the way from undergrad to Ph. D in say, botany?
posted by Artful Codger at 9:58 AM on April 10, 2023

>Imagine if an AI had the current language capabilities, AND it had ingested all the university texts, lessons, reading, etc included for all the way from undergrad to Ph. D in say, botany?

 IT
WILL

posted by Heywood Mogroot III at 10:11 AM on April 10, 2023 [1 favorite]

Cop to ChatGPT, cop version: Can you give me a list of people locally who have arrest warrants or other legal anomalies in any state?
ChatGPT: Sure. Here's a list (provides list based on specialized huge dataset).
Cop: Can you please dispatch robot officers to taze them and bring them here (using the text2arrest plugin, which feeds the content of a given chat response as input to the arrestbot system).
ChatGPT: Sure.

Does the conversational nature of the interaction make it worse than existing data lookup systems? Yes. Does asking the chat app to arrest people make it psychologically easier, in the way that video game-like drone strikes are easier than bayoneting someone through the stomach? Of course. Is ChatGPT or similar functionality making its way into many major software platforms and packages right this minute, regardless of the ethics of those apps? Yep. Is there a ton of money available to cops, the military, and shady business actors specifically to buy tech solutions that promise to shrink the need for people to make decisions? Yes.
posted by caviar2d2 at 10:39 AM on April 10, 2023 [1 favorite]

There will be heavy lifting required.

And your faith in these companies speeding like crazy to bring AI products to market to do that heavy lifting is touching, but very likely misplaced. The massive intellectual property infringement at the core of these artificial word salad spewers is bad enough, but it's clear that there are currently no consequences for the garbage-generating software's mistakes (hopefully someone will win a libel suit soon).

"There will be heavy lifting required," indeed. Wish we could trust the companies cranking out the garbage to do that lifting.
posted by mediareport at 11:14 AM on April 10, 2023 [1 favorite]

>word salad spewers

As Bill Gates said in a recent interview, LLM/GPT is the same type of epochal advance as the PARC's pioneering GUI work and WWW browsers. I'm interested in using it to help me study Japanese and it's doing a great job, and will do an even better job if I can access a version specifically hooked into the existing JLPT resources.

More AI in the world is not an unalloyed good thing, tho:

https://marshallbrain.com/manna1

posted by Heywood Mogroot III at 12:05 PM on April 10, 2023 [1 favorite]

explain the German grammar of: "Jeder nach seinen Fähigkeiten, jedem nach seinen Bedürfnissen."

This German phrase "Jeder nach seinen Fähigkeiten, jedem nach seinen Bedürfnissen" translates to "From each according to their abilities, to each according to their needs" in English. This phrase is often attributed to Karl Marx, as it succinctly summarizes the ideal of a communist society. Now let's break down the German grammar:

Jeder (each, everyone): This is an indefinite pronoun in the nominative case and masculine gender. It is used here to represent any person, regardless of gender, in a general sense.
nach (according to): This is a preposition which takes the dative case. It is used to express the idea of something being in accordance with or proportional to something else.
seinen Fähigkeiten (his/their abilities): This is a dative plural noun phrase, as "nach" requires the dative case. "Seinen" is a possessive pronoun in the dative plural, corresponding to the English "his" or "their" (used in a gender-neutral sense). "Fähigkeiten" is a plural noun in the dative case, meaning "abilities" or "capabilities."
jedem (to each, to everyone): Like "jeder," "jedem" is an indefinite pronoun. However, it is in the dative case and masculine gender. It corresponds to "to each" or "to everyone" in English, indicating the recipient of something.
nach (according to): This is the same preposition as before, again taking the dative case.
seinen Bedürfnissen (his/their needs): This is another dative plural noun phrase. Like "seinen Fähigkeiten," "seinen" is a possessive pronoun in the dative plural, and "Bedürfnissen" is a plural noun in the dative case, meaning "needs" or "necessities."

The sentence structure is parallel, with "Jeder" and "jedem" acting as the subjects of the two clauses, and "nach seinen Fähigkeiten" and "nach seinen Bedürfnissen" acting as the prepositional phrases that describe the manner in which people contribute and receive in the idealized society.

posted by Heywood Mogroot III at 12:49 PM on April 10, 2023

yeah, I was just impressed by the utility of this. People can use this technology to learn just about anything, and it doesn't look like a 'word salad' to me. Quite the opposite, an answer that looks like it was created by someone for me.
posted by Heywood Mogroot III at 2:11 PM on April 10, 2023

I'm glad some folks find it very useful and interesting when they happen to choose questions where the software's statistical models for creating answers don't result in complete lies. That's fun! The tradeoffs are pretty severe, however, and the lack of any regulatory guidance is classic "move fast, break stuff" garbage from tech companies we've seen destroy so much over the last 2 decades.

That major tech companies competing in what they think is an apparent gold rush have recently disbanded their "ethical AI" teams is so, so telling.
posted by mediareport at 2:19 PM on April 10, 2023 [3 favorites]

People can use this technology to learn just about anything,

not if they can’t trust it. it’s explanation of german grammar is correct but it confects imaginary truths often enough that i can’t use it for anything i’m not already familiar with or can’t easily verify. i use it for exploring areas of software that are new to me (like digital synthesis) but i have enough experience to be able to check the output. if i asked it about organic chemistry i would not feel confident in the answers
posted by dis_integration at 2:52 PM on April 10, 2023 [4 favorites]

I really want to continue, to agree with some here, and dispel some of the FUD here, but my cat just brought in a plastic spring, so I have to throw it.
posted by Artful Codger at 3:39 PM on April 10, 2023 [1 favorite]

CATpture: Click on all the images that contain things your cat might bring you.
posted by achrise at 4:30 PM on April 10, 2023

I mean, this is all the wrong way round. We were supposed to get robots that were infallible on maths and facts, but didn’t sound human. Now they sound exactly like some internet guy, but can’t tell the time reliably. I don’t understand any more.

A bunch of my colleagues have been sharing their attempts to talk math with ChatGPT.

The way it talks about math seems especially revealing: it has the very specific register in which much mathematical English is written absolutely dialled (royal "we", instruction to the reader to "note that", high proportion of sentences starting with "now," as a transition to the next idea in a sequence, etc.). It mostly says absolute nonsense and can easily be made to say obviously contradictory things when the question is formulated twice in superficially different ways.

It's not (only) an issue of factuality/"meaning"; apparently logical consistency (formally, supposedly, just a syntactic issue) isn't some sort of emergent linguistic phenomenon, because ChatGPT has no evident mechanism for imposing it as a constraint on the output. It tends to get Wikipedia-level facts wrong; unlike someone/thing engaged in reasoning, it doesn't, say, game out its assertions by considering what they entail logically when combined with its other knowledge, check for inconsistencies, and revisit what it thinks it knows when an inconsistency is found.

It really just generates bullshitty text and our perception that it's faithfully imitating a human interlocutor doesn't say much other than: most of us, most of the time, are bullshitting, because saying factually correct, coherent things is cognitively expensive and we reserve it for situations when the motivation is, for whatever reason, exceptionally high.

This habit of reflexively saying "Jesus, self, would you listen to yourself --- the fuck are you on about?" (which is one of very few actual "transferable skills" inculcated by mathematics, although obviously not unique to mathematics) seems to me like a definitional bright line between artificial intelligence and artificial bullshit.

Others have pointed out ChatGPT's tendency not to say "I don't know", which seems related. Using logical consistency to sense-check knowledge, and admitting to a lack of knowledge, both seem like behaviours that are hard to describe except with concepts like "self-awareness" or "metacognition" that presume some sort of distinct reasoning agent which I don't think anyone's even claiming is present in the case of GPT (even though human users find it basically impossible to avoid conceiving of chatbots in such terms).

My understanding is that AI research, at some point, threw in the towel on this to some extent and switched to "what *can* we *do* with tools we have?" instead of "what (theoretical, experimental) tools do we need to invent to confront the actual question?". And so, yeah, cool dangerous toys for capitalism but "AI" is kind of a misnomer?

I don't understand any of this stuff, but LLMs activate a broader distrust I have for technical achievements seemingly under-motivated by a desire to articulate and test human-understandable theories. The situation vaguely reminds me of the problem of over-reliance on statistical hypothesis testing/under-reliance on pursuing rigorous theoretical explanations in other sciences, though. And in both cases, the excuse, that human/theoretical/narrative understanding is beside the point, and that it's ok to accumulate intellectual debt (where technical capabilities outstrip theoretical understanding), is the thing that seems unrealistic, since we are not in fact going to eliminate human subjectivity or human consequences any time soon.

(I am largely bullshitting, here, too; I'm just a mathematician with some Luddite attitudes. And I already find "but ChatGPT told me that...", as students' reason for thinking wild shit, to be getting pretty old.)
posted by busted_crayons at 12:52 AM on April 11, 2023 [9 favorites]

Agree 100% that GPT-4 is still fatally flawed as a knowledge engine and calculator.

If history is any guide, this will be like the the 1990s internet not being able to carry 4k video.

Then again, if history is any guide, this AI effort will crap out just like all the others.
posted by Heywood Mogroot III at 8:13 AM on April 11, 2023

Do people have sources claiming that the LLMs (large language models) are accurate? As far as I know, the companies that make them just say "Here is this thing. Have fun.", and then people wildly overestimate how much they can be relied on.
posted by Nancy Lebovitz at 10:06 AM on April 11, 2023 [2 favorites]

unlike someone/thing engaged in reasoning, it doesn't, say, game out its assertions by considering what they entail logically when combined with its other knowledge, check for inconsistencies, and revisit what it thinks it knows when an inconsistency is found.

These criticisms are valid but the reason may not be that LLMs are fundamentally incapable of doing those things but rather that we have not, so far, been explicitly telling it to do those things or provided the tools to do those things. Chain of Thought Prompting, using prompts that encourage the model to talk through a problem step-by-step and explain its reasoning along the way, result in measurably more accurate and logically consistent answers. Model stacking and task delegation is also very useful in this area. GPT has been shown to be very capable of 'tool use' when given the opportunity. Does a Language model struggle with doing complex (or even simple) math? Of course it struggles, that's not what a Language Model is good at. But if you give it a calculator (or Wolfram Alpha) and tell it "Any time you need to do math, write a query to your math API" suddenly it can "do math". Then you can prompt it with "any time the results of the Language Model output seems to disagree with the results of the Math API recheck your work and include a disclaimer that the question may be too difficult or that there may not be enough information in your final results. You can also have multiple instances of the LLM talking to each other playing different roles. Have one instance write something and then have another 'skeptic' instance read it and whatever you consider a reliable source and ask it to compare them. "Are the facts and information contained in this text in agreement with [source]? Does this text contain facts or information that are not verifiable? Remove those statements or provide a disclaimer that they are speculation." I don't have any special knowledge about how Bing's GPT produces its 'citations' but my suspicion is that they may have the prompt order backwards, telling it "generate a plausible sounding response, then search the web and list links to support your assertion" which will inevitably lead to confirmation bias, or citations that don't match the content they are claimed to support just like it does when people do it. Instead you need a multi-step process that says "Think about the question and go through the logical steps of how to best reach an accurate answer. Follow those steps narrating your thought process as you go. Gather information using sources that we trust to contain mostly true information. Extract the relevant information from your sources and use that info to generate your response including the citations back to the source. Compare your answer with the source material. If they disagree revise so that your answer agrees with the trusted sources. If there is disagreement between sources, indicate that in your response. If there is no way to confirm a claim remove it or mark it as speculation."

I don't think this solves the problem entirely, as these methods will certainly come with their own limitations and issues, but if the "definitional bright line between artificial intelligence and artificial bullshit" are features like meta-cognition, thinking methodically, checking your answers against a trusted source of truth and being able to revise those answers or express uncertainty when there is a disagreement then crossing that line may be as 'simple' as reminding the LLM supervising the other LLMs to constantly do all of those things. Prompting LLMs with a very detailed script to follow of how to 'act as a reasoning agent' along with providing it with permanent memory and the ability to delegate tasks to sub-processes (like AutoGPT) to keep these complex, recursive processes, within the token limits seems like it may soon get us closer to or perhaps even over that line.
posted by metaphorever at 11:14 AM on April 11, 2023 [2 favorites]

You can also have multiple instances of the LLM talking to each other playing different roles.

somebody has a theory of consciousness that cavemen hearing the magic voice in their heads is when man gained it . . . reading this:

https://en.wikipedia.org/wiki/Bicameral_mentality

I guess I got that from HBO's Westworld . . .
posted by Heywood Mogroot III at 12:13 PM on April 11, 2023

It's a point of confusion for me, but AlphaGo can play logical games very successfully, so neural nets could already be trained to certain types of logical reasoning, right? It's just that LLMs specifically are not really taught to do that sort of thing, being fed language corpuses. A neural net that is more powerful than either an LLM or an AlphaGo type of learning machine could in principle exist, one that both understands human language and has the abstract cognitive ability such as logical reasoning. And they would not be LLMs anymore, because they would then be a different class of model of intelligence. I hereby call them PCMs, Powerful Cognitive Models, ya heard it here first
posted by polymodus at 1:17 PM on April 11, 2023

Do people have sources claiming that the LLMs (large language models) are accurate?
I have been told by a reddit user that ChatGPT was not built to sound nice but to be accurate. So, yes there are people in the real world who assert this. They are not the builders and I think that is what you are saying, right, that no one who built one of these is saying it is accurate?

I was part of a work call last week where a woman talked about these projects and what we are and are not allowed to do. It was 70 percent "Isn't this cool", ten percent "you have to be careful", and twenty percent "don't put company secrets in these things". Maybe weight those things more evenly and reverse the order?
posted by soelo at 1:25 PM on April 11, 2023 [1 favorite]

anti-Metafilter: I have been told by a reddit user....
posted by Artful Codger at 7:41 AM on April 12, 2023

AlphaGo can play logical games very successfully, so neural nets could already be trained to certain types of logical reasoning, right?

Games have a computer-recognizable source of truth — the (easy-to-describe-in-code) win condition. As I recall the massive improvement in chess and Go computing came when researchers stopped trying to formalize all the possible strategies and let computers play against each other and genetic-algorithm modify the strategies they had.

Genetic algorithms work on lots of things with a “better or worse?” that the computer can recognize by itself — is this strut lighter and how much weight does it hold in testing; or, how many test cases does this code pass? But in the latter case, it takes a human to write most of the tests because "what do people want" is not algorithmically tractable.
posted by clew at 8:40 AM on April 12, 2023 [1 favorite]

I can’t think offhand of a GA working on tests that are sensor readings of physical reality. When you can’t speed up the evolution by running computer against computer I don’t know how often they have the advantage.
posted by clew at 8:43 AM on April 12, 2023

I thought ChatGPT itself was trained by what you describe as a genetic algorithm, clew, through many iterations while gavaging the entire world's English corpus.

But my main point was in response to metaphorever's comment, who observed that logical consistency is syntactic. Which leads me to think that if you can play Go, or do long division, then you are also capable of checking the logical consistency of your own argument. It's a more restricted capability than the ability to generate logical reasons, but checking the correctness of a logical proof as Godel's Completeness Theorem (not the two Incompleteness Thms) showed that this is mechanizable/algorithmic. It would be no harder than playing Go. Maybe they wouldn't need WolframAlpha. The issue is how to combine the two kinds of neural nets.
posted by polymodus at 12:09 PM on April 12, 2023

« Older Accessibility in design is a form of empathy | "I never heard the reverse, that a boy was 'girl... Newer »

This thread has been archived and is closed to new comments

MetaFilter

ChatGPT cooks up fake sexual harassment scandal & names real professor
April 8, 2023 9:41 AM Subscribe

Tags

Share

ChatGPT cooks up fake sexual harassment scandal & names real professor April 8, 2023 9:41 AM Subscribe

Tags

Share

ChatGPT cooks up fake sexual harassment scandal & names real professor
April 8, 2023 9:41 AM Subscribe