give him the gopher repellant
May 14, 2023 9:09 AM   Subscribe

In How To Find Things Online, v buckenham looks at the web as an unwilling data source for Large Language Models (among other omnivorous machine learning projects), and how the changing incentives for both people making content and corporations hosting/controlling that content may undercut the assumption that useful information will continue to find its way into those hungry artificial hippos.

Along the way, v provides a nice overview of some trends in search utility (and the continuing degradation of that as search companies' incentives have changed and AI guesswork has supplanted curation in some cases), the shifting of fandom practices over time, (including the aggressive corporate assent of Fandom as a crappy-but-seemingly-inescapable nexus of game walkthroughs and wiki content), and brief cameo by our old friend the manually fully justified monospace Super Metroid walkthrough.

See also a nice collection of further reading links at the bottom, including a discussion of Dr. Prof. Style—the tendency of very old-school traditional 1993 era web design to persist on the pages of academics—that has, since its 2010 writing, seen many (but not all!) of its own cited old-school links either rot away entirely or succumb to more contemporary design/CMS aesthetics. Of which, see also this contemporaneous 2010 discussion of that article on some sort of internet discussion forum called [squints] "Meatfilter".
posted by cortex (30 comments total) 39 users marked this as a favorite
 
It's been wild to watch the internet become a huge success, and then a huge failure.
posted by rikschell at 9:41 AM on May 14, 2023 [31 favorites]


Ah, I hadn't even thought about that - we don't need to worry about chatGPT etc. replacing human judgment because within a few years it will degrade to only spitting out Sponsored Content. Someone will figure out how to poison the input to make a few bucks, and in capitalism, you'd be a fool not to because someone else sure will.
posted by ctmf at 9:51 AM on May 14, 2023 [4 favorites]


Perhaps ChatGPT or some other AI or LLM would have decided that Finland was a more worthy winner than Sweden. In which case, there is finally an arguably valuable role for such systems.
posted by Wordshore at 10:06 AM on May 14, 2023 [2 favorites]


That would assume that it was not vulnerable to powerful organizations that own it putting their thumb on the scale for their own nefarious purpose (50th anniversary of ABBA’s win).
posted by Artw at 10:08 AM on May 14, 2023 [2 favorites]


I'm waiting for an LLM to compile all the best band names (ex. hungry artificial hippos) from the entire internet. Once that occurs I can happily retire my online presence with a final "My work here is done" comment.
posted by Greg_Ace at 11:07 AM on May 14, 2023 [2 favorites]


my use of Chat-GPT4 today was finding an alternative phrase that will pass DMV review for the personalized plate seven-letter acrostic for my Cybertruck. I actually like what it came up with more than my original, LOL (kinda says the same thing but doesn't fall afoul of DMV rules)
posted by Heywood Mogroot III at 11:22 AM on May 14, 2023


hungry artificial hippos

AHi
posted by chavenet at 11:25 AM on May 14, 2023 [1 favorite]


As an aside on the crappiness of cap-F Fandom as wiki Borg, I've been loving BreezeWiki, a proxy for Fandom wikis that strips out all the junk & reformats it into a clean page. And since Fandom text is Creative Commons BY-SA 3.0, it's above-board.
posted by CrystalDave at 11:42 AM on May 14, 2023 [15 favorites]


With PageRank, Google’s big innovation in the late 1990s was finding a way to see through keyword spam optimizations aimed at the early web’s first generation of search engines. That worked well until the arms race between search engines and spammers heated up over the 00s and 10s. I see a similar future for LLMs, trained on a web that is mostly known to be human-generated but soon forced to consume their own outputs. It’s going to be a giant mess: bad things happen when you get high on your own supply.
posted by migurski at 12:20 PM on May 14, 2023 [5 favorites]


Information wants to be free

Corporations want to harvest free things and sell them

Information wants to be hidden
posted by egypturnash at 12:24 PM on May 14, 2023 [13 favorites]


Oh my god CrystalDave thank you so much for the breezewiki link - the takeover of every game wiki by fandom has been a near pintrest-level apocalypse
posted by Skrubly at 12:57 PM on May 14, 2023 [1 favorite]


Posted this elsewhere but it seems relevant here, AO3 on AI and scraping.
posted by Artw at 1:15 PM on May 14, 2023 [4 favorites]


But are they as vicious as actual corporeal hippos? Can you shoot them like at Disneyland?
posted by y2karl at 1:48 PM on May 14, 2023


Just be sure to keep plenty of marbles on hand and you should be safe.
posted by Greg_Ace at 2:04 PM on May 14, 2023 [5 favorites]


I see a similar future for LLMs, trained on a web that is mostly known to be human-generated but soon forced to consume their own outputs. It’s going to be a giant mess: bad things happen when you get high on your own supply.

this relieves me a bit of anxiety from AI-related doomsdaying.
"It'll be indistinguishable from humans!"
...mmm bet it will be though. But really, REALLY will be.
posted by wellifyouinsist at 2:04 PM on May 14, 2023


@Greg_Ace I wish we could have, like, one superlike a week, for comments like that one [marbles]. You just made the internet a better place.
posted by hearthpig at 2:19 PM on May 14, 2023 [1 favorite]


I'm disappointed that my webpage is not mentioned in the Dr. Prof. article. :(
posted by heatherlogan at 3:47 PM on May 14, 2023


Thanks for posting this. I liked the closing discussion:
And the other way to look at this, really, is not about AI at all, but seeing this as the continuation of a gradual corporate incursion into the early spirit of sharing that characterised the internet. I say incursion but maybe the better word is enclosure, as in enclosure of the commons. And this positions AI as just a new method by which companies try to extract value from the things people share freely, and capture that value for themselves. And maybe the way back from this is being more intentional about building our communities in ways where the communities own them. GameFAQs was created to collate some useful stuff together for a community, and it ended up as part of a complicated chain of corporate mergers and acquisitions. But other communities experienced the kinds of upheaval that came with that, and then decided to create their own sites which can endure outside of that - I’m thinking here especially of Archive of Our Own, the biggest repository for fan-writing online. And incidentally, the source of 8.2 million words in that AI training set, larger even than Reddit.
posted by spamandkimchi at 7:17 PM on May 14, 2023 [3 favorites]


In the future, culture will just be fanfiction all the way down, authors pretending to write stories people pretend to read

And the humans left will secretly communicate via staticy AM radio
posted by eustatic at 9:54 PM on May 14, 2023 [1 favorite]




And the humans left will secretly communicate via staticy AM radio

There’s some bad news on that front, eustatic.
posted by autopilot at 10:45 PM on May 14, 2023 [2 favorites]


I can tell who read the article with one word. Cow.
posted by Nanukthedog at 5:45 AM on May 15, 2023 [1 favorite]


Sees the rainbow background, instantly trusts the article writer.

This is really good entry-level article about how things online are moving from people posting things for other people to read for free, to corporations moving in to try to profit from that free labor and people moving their work behind paywalls / into private discord servers in response.

And I also appreciated the bit about AO3 at the end because I think this is the first place I've seen it confirmed, not just that AO3 was used to train (at least Google's) commercial AI, but how much it was used (more even than reddit!!!!!). No wonder the African content moderators are striking for better conditions to remove all the porn.

(Long live Ao3.)
posted by subdee at 6:17 AM on May 15, 2023 [4 favorites]


I kind of like the idea - in an old man yelling at clouds / “no, it’s the children who are wrong” way - that current English is Last English because C4/Common Crawl represents the last known-good/pre-ouroboros sample set.

And Metafilter is part of that set! (actual link to previous thread with stat breakdown)
posted by Ryvar at 6:45 AM on May 15, 2023 [5 favorites]


Well it’s the language of machines now. The new everyones-second-language that English speakers can enjoy too.
posted by Artw at 8:01 AM on May 15, 2023 [1 favorite]


And the humans left will secretly communicate via staticy AM radio

There’s some bad news on that front, eustatic.


That's a weird conversation going on over there. Do they realize only some new car radios will no longer receive AM, and that AM radio in general isn't being abolished?
posted by Greg_Ace at 9:42 AM on May 15, 2023


Let them know. Don't worry it's basically the same people over there.
posted by Wood at 10:54 AM on May 15, 2023


In the future, culture will just be fanfiction all the way down, authors pretending to write stories people pretend to read

Previously: Eager Readers in Your Area!
posted by one for the books at 12:07 PM on May 15, 2023 [1 favorite]


my use of Chat-GPT4 today was finding an alternative phrase that will pass DMV review for the personalized plate seven-letter acrostic for my Cybertruck.

Too bad "IMASUCKER" won't fit. (Have to give some serious side-eye to anyone buying one of Elon's exploding Magamobiles, these days.)
posted by Pseudonymous Cognomen at 2:52 PM on May 15, 2023


Plus gives you ONFIRE as an ironic replacement if the first one burns down.
posted by Artw at 8:06 AM on May 16, 2023 [1 favorite]


« Older Not All Information is Useful   |   A First Glimpse of Our Magnificent Earth, Seen... Newer »


This thread has been archived and is closed to new comments