OpenAI Furious DeepSeek Might Have Stolen All the Data OpenAI Stole From
January 29, 2025 7:49 AM Subscribe
Oh the irony… it’d be funny if it wasn’t so infuriating!
posted by _benj at 7:57 AM on January 29 [6 favorites]
posted by _benj at 7:57 AM on January 29 [6 favorites]
Disprupters getting disrupted.
posted by td2x10e3 at 7:57 AM on January 29 [4 favorites]
posted by td2x10e3 at 7:57 AM on January 29 [4 favorites]
Man, where did I put that violin? It's so tiny I keep losing track of it.
posted by mmoncur at 7:58 AM on January 29 [40 favorites]
posted by mmoncur at 7:58 AM on January 29 [40 favorites]
Couldn't have happened to nicer people
posted by SaltySalticid at 7:59 AM on January 29 [9 favorites]
posted by SaltySalticid at 7:59 AM on January 29 [9 favorites]
"You're trying to kidnap what I have rightfully stolen!"
Honestly, Vicinni is the perfect foil for "AI" companies.
"Truly your intellect is dizzying."
"Wait till I get going!"
posted by Smedly, Butlerian jihadi at 8:04 AM on January 29 [37 favorites]
Honestly, Vicinni is the perfect foil for "AI" companies.
"Truly your intellect is dizzying."
"Wait till I get going!"
posted by Smedly, Butlerian jihadi at 8:04 AM on January 29 [37 favorites]
This is supremely ironic and funny.
posted by grumpybear69 at 8:07 AM on January 29 [2 favorites]
posted by grumpybear69 at 8:07 AM on January 29 [2 favorites]
What goes around, comes around.
Is there any possibility this was just an elaborate stunt to short a bunch of stocks? Wasn't DeepSeek originally started as the pet project of a hedge fund manager? I'm sure this "reinforcement learning strategy" they developed leads do some increase in efficiency, but it would probably take longer to verify how much more efficient than the market would have time to react to their claims....
posted by RonButNotStupid at 8:08 AM on January 29 [3 favorites]
Is there any possibility this was just an elaborate stunt to short a bunch of stocks? Wasn't DeepSeek originally started as the pet project of a hedge fund manager? I'm sure this "reinforcement learning strategy" they developed leads do some increase in efficiency, but it would probably take longer to verify how much more efficient than the market would have time to react to their claims....
posted by RonButNotStupid at 8:08 AM on January 29 [3 favorites]
I assumed the "large-scale malicious attacks" that DS claim responsible for their major outages all week were just bitter SV execs paying for botnets.
posted by wanderlost at 8:09 AM on January 29 [3 favorites]
posted by wanderlost at 8:09 AM on January 29 [3 favorites]
Force them to open source it!
posted by I-Write-Essays at 8:10 AM on January 29 [1 favorite]
posted by I-Write-Essays at 8:10 AM on January 29 [1 favorite]
Man I can’t wait for everyone betting on AI and crypto to be in the streets warming their hands by a garbage can fire.
posted by caviar2d2 at 8:11 AM on January 29 [22 favorites]
posted by caviar2d2 at 8:11 AM on January 29 [22 favorites]
Is there any possibility this was just an elaborate stunt to short a bunch of stocks?
Matt Levine had a fun piece about that.
My question is, whether this happened or not, how could OpenAI be caught unaware? "Such activity could violate OpenAI’s terms of service"--like, yes, but if you are smart enough to run OpenAI, surely you're smart enough to realize that your TOS may not be respected?
posted by mittens at 8:15 AM on January 29 [15 favorites]
Matt Levine had a fun piece about that.
My question is, whether this happened or not, how could OpenAI be caught unaware? "Such activity could violate OpenAI’s terms of service"--like, yes, but if you are smart enough to run OpenAI, surely you're smart enough to realize that your TOS may not be respected?
posted by mittens at 8:15 AM on January 29 [15 favorites]
The needle on the Hypocrisy-o-meter just broke.
posted by tommasz at 8:15 AM on January 29 [7 favorites]
posted by tommasz at 8:15 AM on January 29 [7 favorites]
Ars Technica: How does DeepSeek R1 really fare against OpenAI’s best reasoning models?
posted by bonehead at 8:24 AM on January 29 [5 favorites]
posted by bonehead at 8:24 AM on January 29 [5 favorites]
I found this to be another extremely satisfying read this morning:
AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt [Ars Technica]
Honestly, Vicinni is the perfect foil for "AI" companies.
One of the tarpits discussed in the ars technica article is named 'Iocaine'
posted by RonButNotStupid at 8:25 AM on January 29 [34 favorites]
AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt [Ars Technica]
Honestly, Vicinni is the perfect foil for "AI" companies.
One of the tarpits discussed in the ars technica article is named 'Iocaine'
posted by RonButNotStupid at 8:25 AM on January 29 [34 favorites]
From what I've read it wasn't a DDoS on DeepSeek, just a rush of AI fans trying to sign up and exercise the system when they saw the news.
posted by JoeZydeco at 8:27 AM on January 29
posted by JoeZydeco at 8:27 AM on January 29
> 'Iocaine'
Iocaine Powder is a reference to a dominant Rock Paper Scissors strategy.
posted by I-Write-Essays at 8:33 AM on January 29 [2 favorites]
Iocaine Powder is a reference to a dominant Rock Paper Scissors strategy.
posted by I-Write-Essays at 8:33 AM on January 29 [2 favorites]
This is a classic Big Tech play, "commodify your complement," though it's more commonly used by large players to crush emergent competitors.
posted by whir at 8:35 AM on January 29 [5 favorites]
posted by whir at 8:35 AM on January 29 [5 favorites]
> just a rush of AI fans trying to sign up
The "Hug of Death" is functionally equivalent to a DDoS, and I often hear people describe it as such. It's distributed, and it denies service. A victim trying to mitigate it can't tell the difference. It's not a coordinated DDoS "attack", but it's still accurate to call it a DDoS incident.
posted by I-Write-Essays at 8:35 AM on January 29 [7 favorites]
The "Hug of Death" is functionally equivalent to a DDoS, and I often hear people describe it as such. It's distributed, and it denies service. A victim trying to mitigate it can't tell the difference. It's not a coordinated DDoS "attack", but it's still accurate to call it a DDoS incident.
posted by I-Write-Essays at 8:35 AM on January 29 [7 favorites]
Apparently the AI web scrapers absolutely HAMMER a lot of web sites, so the admins are looking at ways to mitigate the damage. At least that's what I've seen on hacker news.
posted by Spike Glee at 8:37 AM on January 29 [2 favorites]
posted by Spike Glee at 8:37 AM on January 29 [2 favorites]
I pretty much consider modern genAI to be a machine that takes actual human work and turns it into Hot Dog Meat. Something I wonder about a lot in terms of AI and LLMs is fidelity loss, as AI slop starts to displace actual content, and AI starts to be trained from datasets cobbled together from other AI products. Copies of copies. Hot Dogs made from scraps of other Hot Dogs.
posted by mrjohnmuller at 8:43 AM on January 29 [8 favorites]
posted by mrjohnmuller at 8:43 AM on January 29 [8 favorites]
The Beaverton: "China’s new and cheaper magic beans shock America’s unprepared magic bean salesmen".
Forgive the self-link, but I just finished writing a thing about basically this, observing that just a day after the DeepSeek announcement a new model - and this time genuinely open, as in open source with a real open source license, as in curated and consentfully-obtained training data, as in visible and auditable model weights, the real deal - called Sky-T1 has hit the scene, with training costs claimed to be under five hundred dollars. You can run it yourself on anything you'd call a decent gaming rig from the last year or three; even one of the higher-spec Mac Minis looks like plenty.
I think the AI assistant market is starting to look like the ringtone market. Which was - no joke - a billion dollar market for a few years! And then the tech moved on and that market evaporated. I think what we're seeing in DeepSeek and Sky-T1 is the tech moving on and the market evaporating.
posted by mhoye at 8:55 AM on January 29 [37 favorites]
Forgive the self-link, but I just finished writing a thing about basically this, observing that just a day after the DeepSeek announcement a new model - and this time genuinely open, as in open source with a real open source license, as in curated and consentfully-obtained training data, as in visible and auditable model weights, the real deal - called Sky-T1 has hit the scene, with training costs claimed to be under five hundred dollars. You can run it yourself on anything you'd call a decent gaming rig from the last year or three; even one of the higher-spec Mac Minis looks like plenty.
I think the AI assistant market is starting to look like the ringtone market. Which was - no joke - a billion dollar market for a few years! And then the tech moved on and that market evaporated. I think what we're seeing in DeepSeek and Sky-T1 is the tech moving on and the market evaporating.
posted by mhoye at 8:55 AM on January 29 [37 favorites]
Open AI models:
Require the sum total of human knowledge to become semi-reliable.
Require in excess of 564 MWh per day to run ChatGPT-3 [de Vries, 2023]
A human brain:
Human child requires a few hundred kilo-bits of interaction over several months to gain facility with language.
A Human brain requires 0.3 kilowatt hours (kWh) per day (and is doing a fair bit more than ChatGPT-3 does).
The Open AI efforts (now at least 2 generations old) is a least 2 million times less efficient than a biological one. And probably about the same in terms of training effort.
There is clearly room for optimization, which is what the Chinese researchers have been able to achieve. This is a long race not a sprint. I really don't understand why people are flapping their hands and willy-wallying about this.
posted by bonehead at 8:55 AM on January 29 [4 favorites]
Require the sum total of human knowledge to become semi-reliable.
Require in excess of 564 MWh per day to run ChatGPT-3 [de Vries, 2023]
A human brain:
Human child requires a few hundred kilo-bits of interaction over several months to gain facility with language.
A Human brain requires 0.3 kilowatt hours (kWh) per day (and is doing a fair bit more than ChatGPT-3 does).
The Open AI efforts (now at least 2 generations old) is a least 2 million times less efficient than a biological one. And probably about the same in terms of training effort.
There is clearly room for optimization, which is what the Chinese researchers have been able to achieve. This is a long race not a sprint. I really don't understand why people are flapping their hands and willy-wallying about this.
posted by bonehead at 8:55 AM on January 29 [4 favorites]
Human child requires a few hundred kilo-bits of interaction over several months to gain facility with language.
A Human brain requires 0.3 kilowatt hours (kWh) per day (and is doing a fair bit more than ChatGPT-3 does.
AI is about 2 million times less efficient than a biological one. And probably about the same in terms of training effort.
So we're wiring child brains together now? It feels a little ethically dubious, but I'll invest in your baby brain startup.
posted by betweenthebars at 9:03 AM on January 29 [33 favorites]
A Human brain requires 0.3 kilowatt hours (kWh) per day (and is doing a fair bit more than ChatGPT-3 does.
AI is about 2 million times less efficient than a biological one. And probably about the same in terms of training effort.
So we're wiring child brains together now? It feels a little ethically dubious, but I'll invest in your baby brain startup.
posted by betweenthebars at 9:03 AM on January 29 [33 favorites]
A redundant array of independent brains - a RAIB.
posted by I-Write-Essays at 9:04 AM on January 29 [9 favorites]
posted by I-Write-Essays at 9:04 AM on January 29 [9 favorites]
Warning: flight of fancy.
As hilarious as this is on its face it would take it to another level if the results of OpenAI's lawsuits and the techbros currently putting ideas in TFG's ear either made derivative works legal or massively shortened copyright terms.
posted by Mitheral at 9:14 AM on January 29 [3 favorites]
As hilarious as this is on its face it would take it to another level if the results of OpenAI's lawsuits and the techbros currently putting ideas in TFG's ear either made derivative works legal or massively shortened copyright terms.
posted by Mitheral at 9:14 AM on January 29 [3 favorites]
Can I buy a small NAB (Network-Attached Baby) just for home entertainment?
posted by AzraelBrown at 9:15 AM on January 29 [15 favorites]
posted by AzraelBrown at 9:15 AM on January 29 [15 favorites]
TFG's ear either made derivative works legal or massively shortened copyright terms.
please don't even joke about making me agree with that jerk
posted by AzraelBrown at 9:16 AM on January 29 [1 favorite]
please don't even joke about making me agree with that jerk
posted by AzraelBrown at 9:16 AM on January 29 [1 favorite]
I’m all in on BAAS (Baby as a Service), you get to outsource all the smelly bits
posted by funkaspuck at 9:17 AM on January 29 [10 favorites]
posted by funkaspuck at 9:17 AM on January 29 [10 favorites]
Force them to open source it!
It is open sourced. Although it would be more accurate to say it's open weights - meaning you can't dig into the specifics of the training materials used, but you can download it for free and run it on your own hardware. For the full model you'll either need about half a million dollars worth of datacenter GPU hardware or you can run it very very slowly off-GPU for about $6000 almost all of which is spent on RAM.
If you're comfortable with cloud compute that has a no-export policy for your data, then Fireworks has the full model at $8/million tokens.
That's for the full model. Quantizations (lowering precision of the portions of the neural network where that impacts quality the least) came out super fast and given the base size super aggressive: down to 131~212GB. You'll still need either a lot of RAM and patience or to rent two or three H100s from TensorDock for $4~5/hour.
Finally, DeepSeek themselves released distillations with Qwen 32B and Llama-3.3 70B - but these are actually still Qwen and Llama-3.3 at heart, they've just had a lot of additional training by DeepSeek R1. (Qwen = very good for programming assistance, but not going to get a straight answer on Taiwan or Tianenmen. Llama for people wholly unfamiliar with LLMs is the leading generalist open source model). Most people are going to want to run those slowly in RAM as well, though a high-end Mac or a PC with a pair of 3090s can run the Qwen distillation the way it was meant to be.
Is there any possibility this was just an elaborate stunt to short a bunch of stocks? Wasn't DeepSeek originally started as the pet project of a hedge fund manager? I'm sure this "reinforcement learning strategy" they developed leads do some increase in efficiency, but it would probably take longer to verify how much more efficient than the market would have time to react to their claims..
It's completely legit and I know people aren't being intentionally racist about China's ability to innovate but we are all suffering from a cultural tendency to dismiss Chinese inventions as knockoffs and we need to collectively knock that shit off. High Flyer was a high frequency trading firm which means a ton of razor-sharp GPU programming quants, and when the CCP imposed limits on their primary business they turned to AI models as a "side project".
Reinforcement Learning has nothing to do with their efficiency gains - that's just how they produced a model capable of some crude chain-of-thought reasoning via questioning its own answers exhaustively before delivering a final answer. And some of their optimizations are just things every LLM since GPT-4 has done (Mixture of Experts), or every open source model does (Quantization to reduce fidelity per-parameter - where "parameter" = "virtual neuron", leading to lower quality output but at a size you can actually fit into available memory).
Some of them were a direct result of the US limiting which datacenter GPUs are permitted for export - reduced functionality H800s instead of the H100s most western firms use. They worked around this by reserving threads on-GPU to compress the results continuously throughout the training run and more efficiently communicate them across the reduced bandwidth memory interconnect. They also ducked down into lowlevel PTX (the assembly to CUDA's C-for-GPUs) in critical spots to milk every last cycle out of the H800s.
Others like the multi-headed latent attention work or their KV caching are just wholly novel and happen to combine especially well with the aforementioned established and necessity-based optimizations.
The claimed efficiency gain here is 40x less compute / electricity / carbon footprint (same diff) to train the model. Training a modern model is a one-time operation that usually consumes a few tens of thousands of US households annual power budget. Once trained, running inference on the model (actually using it) typically runs about the same power cost as 1~4 gaming PCs, and they are currently billing as if they are operating with 20x the efficiency of other bleeding-edge models.
My read on the technical paper and a very thorough writeup of the details of those optimizations is those claimed gains are moderately overblown, but we are still looking at an order of magnitude improvement in efficiency, at minimum.
The market panic was a bad misread - Nvidia aren't going to be hurt by this, they'll continue to sell GPUs as fast as they can make them. What this actually does is democratize their pool of customers down to far more mid-sized companies and research groups. Especially those who for legal or business reasons couldn't use a cloud-based AI under any circumstances.
The people actually hurt by this are those relying on scale and secrecy as their moat: OpenAI, Anthropic, etc. Sam Altman, Peter Thiel, Elon Musk. This doesn't fully destroy their moat (the o series is just getting started and DeepSeek R1 is not as capable as o3 preview), but it's a very near thing. Hence OpenAI's pointless whining about this - everyone was already doing this, and they were the first movers themselves solely by being the first to just grab all available Internet text/images without consent (not... exactly a copyright violation, technically, but similar in spirit).
For those of us on the side of open source AI and specifically against large corporate AI and Valley billionaires DeepSeek is literally the best possible thing anyone could build, released at precisely the perfect time. I am using it ecstatically.
posted by Ryvar at 9:22 AM on January 29 [76 favorites]
It is open sourced. Although it would be more accurate to say it's open weights - meaning you can't dig into the specifics of the training materials used, but you can download it for free and run it on your own hardware. For the full model you'll either need about half a million dollars worth of datacenter GPU hardware or you can run it very very slowly off-GPU for about $6000 almost all of which is spent on RAM.
If you're comfortable with cloud compute that has a no-export policy for your data, then Fireworks has the full model at $8/million tokens.
That's for the full model. Quantizations (lowering precision of the portions of the neural network where that impacts quality the least) came out super fast and given the base size super aggressive: down to 131~212GB. You'll still need either a lot of RAM and patience or to rent two or three H100s from TensorDock for $4~5/hour.
Finally, DeepSeek themselves released distillations with Qwen 32B and Llama-3.3 70B - but these are actually still Qwen and Llama-3.3 at heart, they've just had a lot of additional training by DeepSeek R1. (Qwen = very good for programming assistance, but not going to get a straight answer on Taiwan or Tianenmen. Llama for people wholly unfamiliar with LLMs is the leading generalist open source model). Most people are going to want to run those slowly in RAM as well, though a high-end Mac or a PC with a pair of 3090s can run the Qwen distillation the way it was meant to be.
Is there any possibility this was just an elaborate stunt to short a bunch of stocks? Wasn't DeepSeek originally started as the pet project of a hedge fund manager? I'm sure this "reinforcement learning strategy" they developed leads do some increase in efficiency, but it would probably take longer to verify how much more efficient than the market would have time to react to their claims..
It's completely legit and I know people aren't being intentionally racist about China's ability to innovate but we are all suffering from a cultural tendency to dismiss Chinese inventions as knockoffs and we need to collectively knock that shit off. High Flyer was a high frequency trading firm which means a ton of razor-sharp GPU programming quants, and when the CCP imposed limits on their primary business they turned to AI models as a "side project".
Reinforcement Learning has nothing to do with their efficiency gains - that's just how they produced a model capable of some crude chain-of-thought reasoning via questioning its own answers exhaustively before delivering a final answer. And some of their optimizations are just things every LLM since GPT-4 has done (Mixture of Experts), or every open source model does (Quantization to reduce fidelity per-parameter - where "parameter" = "virtual neuron", leading to lower quality output but at a size you can actually fit into available memory).
Some of them were a direct result of the US limiting which datacenter GPUs are permitted for export - reduced functionality H800s instead of the H100s most western firms use. They worked around this by reserving threads on-GPU to compress the results continuously throughout the training run and more efficiently communicate them across the reduced bandwidth memory interconnect. They also ducked down into lowlevel PTX (the assembly to CUDA's C-for-GPUs) in critical spots to milk every last cycle out of the H800s.
Others like the multi-headed latent attention work or their KV caching are just wholly novel and happen to combine especially well with the aforementioned established and necessity-based optimizations.
The claimed efficiency gain here is 40x less compute / electricity / carbon footprint (same diff) to train the model. Training a modern model is a one-time operation that usually consumes a few tens of thousands of US households annual power budget. Once trained, running inference on the model (actually using it) typically runs about the same power cost as 1~4 gaming PCs, and they are currently billing as if they are operating with 20x the efficiency of other bleeding-edge models.
My read on the technical paper and a very thorough writeup of the details of those optimizations is those claimed gains are moderately overblown, but we are still looking at an order of magnitude improvement in efficiency, at minimum.
The market panic was a bad misread - Nvidia aren't going to be hurt by this, they'll continue to sell GPUs as fast as they can make them. What this actually does is democratize their pool of customers down to far more mid-sized companies and research groups. Especially those who for legal or business reasons couldn't use a cloud-based AI under any circumstances.
The people actually hurt by this are those relying on scale and secrecy as their moat: OpenAI, Anthropic, etc. Sam Altman, Peter Thiel, Elon Musk. This doesn't fully destroy their moat (the o series is just getting started and DeepSeek R1 is not as capable as o3 preview), but it's a very near thing. Hence OpenAI's pointless whining about this - everyone was already doing this, and they were the first movers themselves solely by being the first to just grab all available Internet text/images without consent (not... exactly a copyright violation, technically, but similar in spirit).
For those of us on the side of open source AI and specifically against large corporate AI and Valley billionaires DeepSeek is literally the best possible thing anyone could build, released at precisely the perfect time. I am using it ecstatically.
posted by Ryvar at 9:22 AM on January 29 [76 favorites]
One of the possible responses to the Orange Gibbon's trade threats is to revisit the US requirements for IP protections. It's certainly under consideration.
Maybe the future of AI isn't in the US.
posted by bonehead at 9:25 AM on January 29 [2 favorites]
Maybe the future of AI isn't in the US.
posted by bonehead at 9:25 AM on January 29 [2 favorites]
I think the AI assistant market is starting to look like the ringtone market. Which was - no joke - a billion dollar market for a few years! And then the tech moved on and that market evaporated. I think what we're seeing in DeepSeek and Sky-T1 is the tech moving on and the market evaporating.
Yeah everything I've read is that the important thing is all about implementation, not the base models anymore.
posted by MisantropicPainforest at 9:32 AM on January 29 [1 favorite]
Yeah everything I've read is that the important thing is all about implementation, not the base models anymore.
posted by MisantropicPainforest at 9:32 AM on January 29 [1 favorite]
Maybe the future o̵f̵ ̵A̵I̵ isn't in the US.
posted by I-Write-Essays at 9:32 AM on January 29 [12 favorites]
posted by I-Write-Essays at 9:32 AM on January 29 [12 favorites]
we are all suffering from a cultural tendency to dismiss Chinese inventions as knockoffs and we need to collectively knock that shit off
I assume VC types playing into this are wholly cynical. It’s not as if US companies hesitate to hire ML researchers from Chinese universities, or to read their papers.
posted by atoxyl at 9:36 AM on January 29 [5 favorites]
I assume VC types playing into this are wholly cynical. It’s not as if US companies hesitate to hire ML researchers from Chinese universities, or to read their papers.
posted by atoxyl at 9:36 AM on January 29 [5 favorites]
One of the possible responses to the Orange Gibbon's trade threats is to revisit the US requirements for IP protections.
FWIW this would impact DeepSeek's own service (API and chat client on the AppStore), but the model would still be freely available. The damage to OpenAI is done and it is glorious.
Maybe the future of AI isn't in the US.
I'm not sure the future of anything is in the US. I've hated Stephen Miller and Elon Musk both since 2016, but never moreso than this morning. Every morning. For the past week. Personally I'm really hoping this is the beginning of the end for OpenAI, but if other major US firms treat it as a serious wakeup call and really dig into efficiency with their vastly greater resources than I'm all for it. Zuck is reportedly on the fucking warpath since DeepSeek dropped, which is funny because Llama is the open source model. He's the least harmed by this of any US tech billionaire, and AI continues to be the lone thing where he is not literally the worst human being ever.
posted by Ryvar at 9:36 AM on January 29 [6 favorites]
FWIW this would impact DeepSeek's own service (API and chat client on the AppStore), but the model would still be freely available. The damage to OpenAI is done and it is glorious.
Maybe the future of AI isn't in the US.
I'm not sure the future of anything is in the US. I've hated Stephen Miller and Elon Musk both since 2016, but never moreso than this morning. Every morning. For the past week. Personally I'm really hoping this is the beginning of the end for OpenAI, but if other major US firms treat it as a serious wakeup call and really dig into efficiency with their vastly greater resources than I'm all for it. Zuck is reportedly on the fucking warpath since DeepSeek dropped, which is funny because Llama is the open source model. He's the least harmed by this of any US tech billionaire, and AI continues to be the lone thing where he is not literally the worst human being ever.
posted by Ryvar at 9:36 AM on January 29 [6 favorites]
> AI continues to be the lone thing where he is not literally the worst human being ever.
But only because of how strong the competition for that title has been.
posted by I-Write-Essays at 9:40 AM on January 29 [6 favorites]
But only because of how strong the competition for that title has been.
posted by I-Write-Essays at 9:40 AM on January 29 [6 favorites]
There's a certain amount of funny, had a conversation last night with someone else doing first look at deepseek-r1 where they talked about it making the exact same errors on some general knowledge questions that other companies' offerings were making a month or two ago.
All things considered it's not surprising when so much training data comes from the same sources, be they open web or otherwise.
posted by Enturbulated at 9:44 AM on January 29 [3 favorites]
All things considered it's not surprising when so much training data comes from the same sources, be they open web or otherwise.
posted by Enturbulated at 9:44 AM on January 29 [3 favorites]
Do violins come in Planck length?
posted by Lemkin at 9:57 AM on January 29 [17 favorites]
posted by Lemkin at 9:57 AM on January 29 [17 favorites]
Given the speculation around Trump and musk‘s actual agenda, the US doesn’t have much of a future in anything.
posted by JustSayNoDawg at 10:01 AM on January 29 [6 favorites]
posted by JustSayNoDawg at 10:01 AM on January 29 [6 favorites]
Would the Werewolf Porn cases establish federal legal predecent for IP on this one?
OpenAI reminds me of that "im mad you copied the thing i copied!!!" Mess,
But with 1 trillion dollarbux
posted by eustatic at 10:04 AM on January 29 [2 favorites]
OpenAI reminds me of that "im mad you copied the thing i copied!!!" Mess,
But with 1 trillion dollarbux
posted by eustatic at 10:04 AM on January 29 [2 favorites]
Zuck is reportedly on the fucking warpath since DeepSeek dropped, which is funny because Llama is the open source model. He's the least harmed by this of any US tech billionaire
I guess he’s upset that Meta got scooped on their mission to commoditize models (or more importantly, wasted a lot of time and money training the next Llama) but yeah, if you’re going to give your models away, nominally for the good of all, it doesn’t make a ton of sense to assume they will always lead the pack.
posted by atoxyl at 10:06 AM on January 29 [5 favorites]
I guess he’s upset that Meta got scooped on their mission to commoditize models (or more importantly, wasted a lot of time and money training the next Llama) but yeah, if you’re going to give your models away, nominally for the good of all, it doesn’t make a ton of sense to assume they will always lead the pack.
posted by atoxyl at 10:06 AM on January 29 [5 favorites]
Another video from the Corrridor Crew VFX channel on How To Identify AI Slop (25 min, previously.)
posted by TheophileEscargot at 10:06 AM on January 29 [6 favorites]
posted by TheophileEscargot at 10:06 AM on January 29 [6 favorites]
Do violins come in Planck length?
A violin made from a plan(c)k would sound horrible, I don't need AI to tell me that.
posted by Greg_Ace at 10:07 AM on January 29
A violin made from a plan(c)k would sound horrible, I don't need AI to tell me that.
posted by Greg_Ace at 10:07 AM on January 29
Also Deepseek still sucks. I asked it to give me a navigation distance in river miles between Two ports, which is something a human can do with two charts and four addition operations. Ithe response was off by hundreds of miles, and it presented a range!
It sounded authoritative while being worse than no answer.
At least we know what werewolf born is used for
posted by eustatic at 10:09 AM on January 29 [3 favorites]
It sounded authoritative while being worse than no answer.
At least we know what werewolf born is used for
posted by eustatic at 10:09 AM on January 29 [3 favorites]
Do violins come in Planck length?
If they did, it would be approximately 115 octaves above middle C. To save on having to print the score on A0 paper, I suggest 806ma notation.
posted by jedicus at 10:27 AM on January 29 [18 favorites]
If they did, it would be approximately 115 octaves above middle C. To save on having to print the score on A0 paper, I suggest 806ma notation.
posted by jedicus at 10:27 AM on January 29 [18 favorites]
It's completely legit and I know people aren't being intentionally racist about China's ability to innovate but we are all suffering from a cultural tendency to dismiss Chinese inventions as knockoffs and we need to collectively knock that shit off.
A cursory look at CS Arxiv should disabuse people of this idea. There is a tremendous amount of CS research- including and especially machine learning- being done by researchers at Chinese universities.
Side note: distillation is a Very Good Thing. Training models is energy-intensive, and distillation means that the energy expenditure from training one model isn't just wasted / ignored when you train another model.
posted by a faded photo of their beloved at 10:28 AM on January 29 [3 favorites]
A cursory look at CS Arxiv should disabuse people of this idea. There is a tremendous amount of CS research- including and especially machine learning- being done by researchers at Chinese universities.
Side note: distillation is a Very Good Thing. Training models is energy-intensive, and distillation means that the energy expenditure from training one model isn't just wasted / ignored when you train another model.
posted by a faded photo of their beloved at 10:28 AM on January 29 [3 favorites]
Side note: distillation is a Very Good Thing. Training models is energy-intensive, and distillation means that the energy expenditure from training one model isn't just wasted / ignored when you train another model.
Yeah, and I apologize if I gave a contrary impression - I just didn't want to give people the impression that the version they are far more likely to be able to run at home was, like, the real model in terms of capability. They're still Qwen/Llama primed on 800,000 prompt-response pairs from DeepSeek R1. I've mentioned in other AI threads that we (as a species) need to be training far fewer foundational models in general as so much of what actually differentiates the various products occurs in fine tuning. A big part of why I'm such a huge open source AI backer is that open source - even just open weights - models vastly reduce the need for every major company or research group to train their own, just like you said.
posted by Ryvar at 10:36 AM on January 29 [2 favorites]
Yeah, and I apologize if I gave a contrary impression - I just didn't want to give people the impression that the version they are far more likely to be able to run at home was, like, the real model in terms of capability. They're still Qwen/Llama primed on 800,000 prompt-response pairs from DeepSeek R1. I've mentioned in other AI threads that we (as a species) need to be training far fewer foundational models in general as so much of what actually differentiates the various products occurs in fine tuning. A big part of why I'm such a huge open source AI backer is that open source - even just open weights - models vastly reduce the need for every major company or research group to train their own, just like you said.
posted by Ryvar at 10:36 AM on January 29 [2 favorites]
It's not open source.
Timnit Gebru puts it well:
"Friends, for something to be open source, we need to see
1. The data it was trained and evaluated on
2. The code
3. The model architecture
4. The model weights.
DeepSeek only gives 3, 4. And I'll see the day that gives us #1 without being forced to do so, because all of them are stealing data."
(Source link)
posted by splitpeasoup at 10:48 AM on January 29 [14 favorites]
Timnit Gebru puts it well:
"Friends, for something to be open source, we need to see
1. The data it was trained and evaluated on
2. The code
3. The model architecture
4. The model weights.
DeepSeek only gives 3, 4. And I'll see the day that gives us #1 without being forced to do so, because all of them are stealing data."
(Source link)
posted by splitpeasoup at 10:48 AM on January 29 [14 favorites]
Yes, it's very hard to be open-source without the source code.
posted by I-Write-Essays at 11:02 AM on January 29 [1 favorite]
posted by I-Write-Essays at 11:02 AM on January 29 [1 favorite]
It's not open source.
Yeah that's why I opened with "It is open sourced. Although it would be more accurate to say it's open weights"
hard to be open-source without the source code
The code is just ollama, which is built on llama.cpp. Same software you use for running any model.
That said, Timnit's absolutely right about lacking the training data and that is true for virtually every model because nobody wants to get sued into oblivion. In DeepSeek's particular case we're missing the critical 800K prompt-response pairs. To that end, HuggingFace has announced their project to perform a fully open clean reproduction of DeepSeek R1. The announcement has a pretty thorough breakdown on what's missing.
(...and maybe this is just me but I keep forgetting that HuggingFace is, like, an actual company and not just Wikipedia for open source (or open weights if you prefer) AI.)
posted by Ryvar at 11:12 AM on January 29 [6 favorites]
Yeah that's why I opened with "It is open sourced. Although it would be more accurate to say it's open weights"
hard to be open-source without the source code
The code is just ollama, which is built on llama.cpp. Same software you use for running any model.
That said, Timnit's absolutely right about lacking the training data and that is true for virtually every model because nobody wants to get sued into oblivion. In DeepSeek's particular case we're missing the critical 800K prompt-response pairs. To that end, HuggingFace has announced their project to perform a fully open clean reproduction of DeepSeek R1. The announcement has a pretty thorough breakdown on what's missing.
(...and maybe this is just me but I keep forgetting that HuggingFace is, like, an actual company and not just Wikipedia for open source (or open weights if you prefer) AI.)
posted by Ryvar at 11:12 AM on January 29 [6 favorites]
Whenever you hear HuggingFace, don't you always picture a Face Hugger?
posted by I-Write-Essays at 11:26 AM on January 29 [11 favorites]
posted by I-Write-Essays at 11:26 AM on January 29 [11 favorites]
On the energy front, Jeff Geerling has gotten a full version of Deepseek at "a few hundred Watts" at 4ish tokens a second---about the same as a slow human typing. So that's plainly a huge improvement in efficiency. And being who he is, demonstrates a more limited version running on a Pi and an AMD gpu.
posted by bonehead at 11:37 AM on January 29 [6 favorites]
posted by bonehead at 11:37 AM on January 29 [6 favorites]
- Friday: DeepSeek spent $5M using 3-yo NVidia chips to match OpenAI.
- Monday: NVidia tanks $600B in valuation in one day. 'OpenAI is cooked' stories show up.
[ The FUD, PR machine goes into overdrive ]
- Tuesday: DeepSeek had $500M worth of chipsets. See, they still needed NVidia.
- Wednesday: DeepSeek cheated by stealing data. See, they still needed OpenAI.
- Thursday: Valuations pop back up. Balance is restored. Carry on.
posted by fubar at 11:39 AM on January 29 [4 favorites]
- Monday: NVidia tanks $600B in valuation in one day. 'OpenAI is cooked' stories show up.
[ The FUD, PR machine goes into overdrive ]
- Tuesday: DeepSeek had $500M worth of chipsets. See, they still needed NVidia.
- Wednesday: DeepSeek cheated by stealing data. See, they still needed OpenAI.
- Thursday: Valuations pop back up. Balance is restored. Carry on.
posted by fubar at 11:39 AM on January 29 [4 favorites]
A redundant array of independent brains - a RAIB.
Its really not as simple as it sounds. Installation of a stable Servitor Colony requires precise application of psalms and canticles (the psalm Abjuring False Gaskets is particularly useful for the Magos Biologis running the nutrient feeds). Maintenance and use? It could take the work of several adepts and who knows how many tech-serfs.
A grand undertaking though
posted by Slackermagee at 11:42 AM on January 29 [9 favorites]
Its really not as simple as it sounds. Installation of a stable Servitor Colony requires precise application of psalms and canticles (the psalm Abjuring False Gaskets is particularly useful for the Magos Biologis running the nutrient feeds). Maintenance and use? It could take the work of several adepts and who knows how many tech-serfs.
A grand undertaking though
posted by Slackermagee at 11:42 AM on January 29 [9 favorites]
Feels like their goal here is to delegitimize DeepSeek through a familiar narrative of Chinese industrial espionage.
Already, the U.S. Navy has apparently banned DeepSeek, both the cloud service and the model itself, citing "security and ethical concerns."
To be fair, it's probably not a terrible idea for the US military to avoid using hard-to-introspect software developed in China, even offline where it can't leak anything, but the ethical argument seems much more political. Would militaries generally reject a powerful tool because someone may have violated the terms of service of a popular website while developing it?
posted by smelendez at 11:43 AM on January 29 [1 favorite]
Already, the U.S. Navy has apparently banned DeepSeek, both the cloud service and the model itself, citing "security and ethical concerns."
To be fair, it's probably not a terrible idea for the US military to avoid using hard-to-introspect software developed in China, even offline where it can't leak anything, but the ethical argument seems much more political. Would militaries generally reject a powerful tool because someone may have violated the terms of service of a popular website while developing it?
posted by smelendez at 11:43 AM on January 29 [1 favorite]
the techbros currently putting ideas in TFG's ear either made derivative works legal or massively shortened copyright terms.
finally, something that explains that assassination attempt
posted by chavenet at 11:46 AM on January 29 [2 favorites]
finally, something that explains that assassination attempt
posted by chavenet at 11:46 AM on January 29 [2 favorites]
What happened to honor among thieves?
posted by JohnnyGunn at 11:58 AM on January 29
posted by JohnnyGunn at 11:58 AM on January 29
> What happened to honor among thieves?
The iterated prisoner's dilemma can only support cooperation as long as the time until the end of the game remains unknown.
posted by I-Write-Essays at 11:59 AM on January 29 [11 favorites]
The iterated prisoner's dilemma can only support cooperation as long as the time until the end of the game remains unknown.
posted by I-Write-Essays at 11:59 AM on January 29 [11 favorites]
My son, an animation software developer, tells me he and his colleagues call all AI "Grand Theft Autocorrect". Seems especially appropriate to this discussion.
posted by angiep at 12:21 PM on January 29 [20 favorites]
posted by angiep at 12:21 PM on January 29 [20 favorites]
> Side note: distillation is a Very Good Thing. Training models is energy-intensive, and distillation means that the energy expenditure from training one model isn't just wasted / ignored when you train another model [...]
Ya know what I'd really like to see about now? Stabs at MoE models somewhere under < 200B weights (maybe under 100B) done up with all the stuff that's been learned and applied to create the previously mentioned unsloth quants of deepseek-r1 that go down to ~130GB, so that in the end it will run acceptably quickly on CPU only and in a halfway reasonable amount of memory. Task-focused models in that kind of range would have the potential to be good enough for a decent range of purposes while being accessible to mere mortals.
Also wouldn't mind getting a unicorn, but the former seems marginally more likely in the coming months.
posted by Enturbulated at 12:21 PM on January 29 [2 favorites]
Ya know what I'd really like to see about now? Stabs at MoE models somewhere under < 200B weights (maybe under 100B) done up with all the stuff that's been learned and applied to create the previously mentioned unsloth quants of deepseek-r1 that go down to ~130GB, so that in the end it will run acceptably quickly on CPU only and in a halfway reasonable amount of memory. Task-focused models in that kind of range would have the potential to be good enough for a decent range of purposes while being accessible to mere mortals.
Also wouldn't mind getting a unicorn, but the former seems marginally more likely in the coming months.
posted by Enturbulated at 12:21 PM on January 29 [2 favorites]
I will explain what this means in a moment, but first: Hahahahahahahahahahahahahahahaha hahahhahahahahahahahahahahaha
Ok I don't usually sign up for accounts to read stuff but this line from TFA was an insta-subscribe for me. They placed the break well.
posted by hovey at 12:31 PM on January 29 [5 favorites]
Ok I don't usually sign up for accounts to read stuff but this line from TFA was an insta-subscribe for me. They placed the break well.
posted by hovey at 12:31 PM on January 29 [5 favorites]
So I have been following AI and ML and LLMs at arm's length, and I have a question that's both naive and kind of fundamental: what's going to get these models out of the data center and into a user's hand? Like, when will I have this running on a box in my house to know all about my life (groceries, furnace, talk to kids' school, weather on my commute) -- or will it always be more about a centralized Big Brain?
I'm curious what the long-term path for all these AI models is, and whether they can reach their potential in a way that respects my privacy -- or if they'll always operate in the Data Hoover mode, and need wide access to everyone's everything.
posted by wenestvedt at 12:56 PM on January 29 [4 favorites]
I'm curious what the long-term path for all these AI models is, and whether they can reach their potential in a way that respects my privacy -- or if they'll always operate in the Data Hoover mode, and need wide access to everyone's everything.
posted by wenestvedt at 12:56 PM on January 29 [4 favorites]
> or will it always be more about a centralized Big Brain?
The real danger with AI is not some pie in the sky idea like Alignment, it's the much closer danger of companies like ClosedAI building a moat against competition from small players and killing distributed local open source applications as a viable path for AI. Keeping things on a centralized server that people have to pay them API fees to access is in their best interest, and the worst case scenario for the rest of us.
posted by I-Write-Essays at 1:00 PM on January 29 [4 favorites]
The real danger with AI is not some pie in the sky idea like Alignment, it's the much closer danger of companies like ClosedAI building a moat against competition from small players and killing distributed local open source applications as a viable path for AI. Keeping things on a centralized server that people have to pay them API fees to access is in their best interest, and the worst case scenario for the rest of us.
posted by I-Write-Essays at 1:00 PM on January 29 [4 favorites]
will it always be more about a centralized Big Brain?
You know rhe Basilisk doesn't like it when you called it that. Or rather you will know, in your agony.
posted by bonehead at 1:20 PM on January 29 [11 favorites]
You know rhe Basilisk doesn't like it when you called it that. Or rather you will know, in your agony.
posted by bonehead at 1:20 PM on January 29 [11 favorites]
> Can I buy a small NAB (Network-Attached Baby) just for home entertainment?
Better yet, you can make your own!
posted by gingerbeer at 1:21 PM on January 29 [3 favorites]
Better yet, you can make your own!
posted by gingerbeer at 1:21 PM on January 29 [3 favorites]
Wenestvedt:
On the hardware side I think Nvidia’s Project DIGITS is their first halting step towards a “Personal AI in a box”: 20 Arm Cores, a 5000 series GPU, 128GB combined system/VRAM and a 4TB SSD for $3000.
The problem of course is the P in GPT: Pre-trained. How are your habits, your data, your needs incorporated into a largely immutable neural network? Like, you can get some mileage out of large context windows with a big infodump summary, but what you really want is a sort of perpetual fine-tuning.
It makes me think of that paper HearHere linked a few months back about how during dreaming our memories of the past day are re-encoded into data optimized for our existing neurotopology so it can bed down into the our neural networks with minimal disruption (Kefei Liu has a few papers along these lines, actually). I think at minimum you’d need something like that for ANNs rather than a pure transformer or even transformer + chain-of-thought RL hybrid? Sorry, I’m rambling.
At any rate, in terms of getting people applications that they’ll actually find useful I’m bullish on Apple solving this well in advance of everyone else: they have hundreds of millions of devices with all personal context information readily at hand, with native support for lightweight models, and on-the-fly cloud AI VMs with incredibly strict privacy controls. They’re in an ideal place to try a bunch of different things to see what people actually find useful in daily life.
Keeping things on a centralized server that people have to pay them API fees to access is in their best interest, and the worst case scenario for the rest of us.
Paraphrasing something I wrote elsewhere:
Automating even the rudiments of reasoning is an act of economic violence against all workers. AI is the new class warfare and people like Sam Altman, Peter Thiel and Elon Musk must never become gatekeepers to the means of production.
posted by Ryvar at 2:08 PM on January 29 [15 favorites]
On the hardware side I think Nvidia’s Project DIGITS is their first halting step towards a “Personal AI in a box”: 20 Arm Cores, a 5000 series GPU, 128GB combined system/VRAM and a 4TB SSD for $3000.
The problem of course is the P in GPT: Pre-trained. How are your habits, your data, your needs incorporated into a largely immutable neural network? Like, you can get some mileage out of large context windows with a big infodump summary, but what you really want is a sort of perpetual fine-tuning.
It makes me think of that paper HearHere linked a few months back about how during dreaming our memories of the past day are re-encoded into data optimized for our existing neurotopology so it can bed down into the our neural networks with minimal disruption (Kefei Liu has a few papers along these lines, actually). I think at minimum you’d need something like that for ANNs rather than a pure transformer or even transformer + chain-of-thought RL hybrid? Sorry, I’m rambling.
At any rate, in terms of getting people applications that they’ll actually find useful I’m bullish on Apple solving this well in advance of everyone else: they have hundreds of millions of devices with all personal context information readily at hand, with native support for lightweight models, and on-the-fly cloud AI VMs with incredibly strict privacy controls. They’re in an ideal place to try a bunch of different things to see what people actually find useful in daily life.
Keeping things on a centralized server that people have to pay them API fees to access is in their best interest, and the worst case scenario for the rest of us.
Paraphrasing something I wrote elsewhere:
Automating even the rudiments of reasoning is an act of economic violence against all workers. AI is the new class warfare and people like Sam Altman, Peter Thiel and Elon Musk must never become gatekeepers to the means of production.
posted by Ryvar at 2:08 PM on January 29 [15 favorites]
"One giant theft is a tragedy. A million little thefts are a statistic."
- Sam Altman (not really)
posted by Sauce Trough at 3:18 PM on January 29 [3 favorites]
- Sam Altman (not really)
posted by Sauce Trough at 3:18 PM on January 29 [3 favorites]
In Spanish, we have a saying: "Quien roba a un ladrón, tiene cien años de perdón" (which rhymes and means "Who steals from a thief gets a hundred years of forgiveness"). So I guess Deepseek has a hundred years of forgiveness.
posted by alvy at 4:04 PM on January 29 [7 favorites]
posted by alvy at 4:04 PM on January 29 [7 favorites]
Point and laugh was my response too but I hope Suchir Balaji's ghost is having a good laugh and haunting Sam Altman's ass.
posted by Wretch729 at 5:06 PM on January 29
posted by Wretch729 at 5:06 PM on January 29
I've seen lean times during my 40 years in the animation workforce but nothing like the last couple years. It's partly due to labour disruptions and the collapse of the streaming race but there's also a huge pause in production while the upper echelons imagine how they'll structure this new Ai based labour pool. The best comment I've seen about Deeseek is “Even Ai lost it's job to Ai”.
posted by brachiopod at 6:06 AM on January 30 [5 favorites]
posted by brachiopod at 6:06 AM on January 30 [5 favorites]
my unnamed sources told me that Deep Seke was a hoax and also its was engineered in a CHINESE lab so i posted it on x and meta
pls someone tell me how to block all these job offer emails from the New York times
posted by AlSweigart at 8:26 AM on January 30 [3 favorites]
pls someone tell me how to block all these job offer emails from the New York times
posted by AlSweigart at 8:26 AM on January 30 [3 favorites]
I have been continuing to enjoy my time with DeepSeek immensely. There is something slightly different about its default tone that… ChatGPT has always felt very slightly insufferable. A quiet itch in the back of my mind that this wasn’t a person I particularly wanted to spend a lot of time around (I know: don’t anthropomorphize, but nevertheless…). Hopefully all my technical review above grants me a free pass at an unusually empty, vibes-based take: I just generally like the way DeepSeek talks to me more than any previous LLM. It is a quiet, mild subconscious pleasure just to interact with.
Actually seeing all the chain of thought behind the responses written out is fantastic because you start to learn where and why the responses are going off the rails (somewhat less than prior LLMs but still very much a routine part of interacting with it) and you can adapt your prompt accordingly. Again I have to pull back on the anthropomorphizing but it feels a lot like learning to work with a fellow neurodivergent person, just ND in a way I’m not quite used to.
The anthropomorphizing impulse becomes almost overwhelming at times when it really starts to struggle - if you’re trying to test the boundaries of the crude reasoning, the exposed chain-of-thought behind the failure case read *exactly* like someone having a major panic attack and triggers an immediate sympathy response from me. When it gets through a response it finds difficult, or it is finding *you* difficult there is such a… Twilight Sparkle “I will love and tolerate the heck out of you” attitude on display.
There is an absolutely hilarious thread about this on /r/LocalLlama: "Alright, the user has been a bit all over in our conversation". Solid gold and if you only click one thing AI-related today, make it that one.
It’s something I’m seeing everywhere in more casual discussion about the model, though: seeing chain-of-thought fully on display is… not quite a killer app, but definitely a massive quality of life upgrade for the overall experience. Will definitely steer people down the wrong path with anthropomorphizing these things, but that’s better for society than indoctrinating people to make errors in the opposite direction. (Can’t remember if it was Asimov or Clarke with the line about people learning to speak to non-sentient robots with a degree of respect because how they spoke to robots would inevitably spill over into how they spoke to other humans, but I always thought that was a really good point)
posted by Ryvar at 9:38 AM on January 30 [5 favorites]
Actually seeing all the chain of thought behind the responses written out is fantastic because you start to learn where and why the responses are going off the rails (somewhat less than prior LLMs but still very much a routine part of interacting with it) and you can adapt your prompt accordingly. Again I have to pull back on the anthropomorphizing but it feels a lot like learning to work with a fellow neurodivergent person, just ND in a way I’m not quite used to.
The anthropomorphizing impulse becomes almost overwhelming at times when it really starts to struggle - if you’re trying to test the boundaries of the crude reasoning, the exposed chain-of-thought behind the failure case read *exactly* like someone having a major panic attack and triggers an immediate sympathy response from me. When it gets through a response it finds difficult, or it is finding *you* difficult there is such a… Twilight Sparkle “I will love and tolerate the heck out of you” attitude on display.
There is an absolutely hilarious thread about this on /r/LocalLlama: "Alright, the user has been a bit all over in our conversation". Solid gold and if you only click one thing AI-related today, make it that one.
It’s something I’m seeing everywhere in more casual discussion about the model, though: seeing chain-of-thought fully on display is… not quite a killer app, but definitely a massive quality of life upgrade for the overall experience. Will definitely steer people down the wrong path with anthropomorphizing these things, but that’s better for society than indoctrinating people to make errors in the opposite direction. (Can’t remember if it was Asimov or Clarke with the line about people learning to speak to non-sentient robots with a degree of respect because how they spoke to robots would inevitably spill over into how they spoke to other humans, but I always thought that was a really good point)
posted by Ryvar at 9:38 AM on January 30 [5 favorites]
The anthropomorphizing impulse becomes almost overwhelming at times when it really starts to struggle - if you’re trying to test the boundaries of the crude reasoning, the exposed chain-of-thought behind the failure case read *exactly* like someone having a major panic attack and triggers an immediate sympathy response from me.
For me personally, this is actually part of the problem with the pretense that these bots are "thinking" or "feeling" anything. Not only is it a lie, but now my brain has to expend extra effort on processing those thoughts and feelings as if something were actually experiencing them. I feel like I'm being asked to do emotional labor for a computer program.
posted by Gerald Bostock at 1:12 PM on January 30 [4 favorites]
For me personally, this is actually part of the problem with the pretense that these bots are "thinking" or "feeling" anything. Not only is it a lie, but now my brain has to expend extra effort on processing those thoughts and feelings as if something were actually experiencing them. I feel like I'm being asked to do emotional labor for a computer program.
posted by Gerald Bostock at 1:12 PM on January 30 [4 favorites]
Nvidia will do fine. NVDA at $3 trillion in enterprise value is a lot less of a sure thing. That's AI in a nutshell ... increases in unit demand and revenue are just about the most certain thing in existence, but hard to find the right trade at current valuations: certainly with any particular name, and maybe even for the whole industry.
posted by MattD at 1:37 PM on January 30 [1 favorite]
posted by MattD at 1:37 PM on January 30 [1 favorite]
Mod note: [We've added Ryvar's comment and this post to the sidebar and Best Of blog!]
posted by taz (staff) at 12:52 AM on February 2 [2 favorites]
posted by taz (staff) at 12:52 AM on February 2 [2 favorites]
A few additional notes/updates:
1) DeepSeek R1 fails every safety test thrown at it (eg common LLM jailbreaks -> bomb instructions). If you like your AI wildly uncensored for things not involving Taiwan this is great news. If not, welp...
2) Senator Josh Hawley (R-MO) proposes a bill that would criminalize the downloading of PRC-affiliated models like DeepSeek 180 days after its passage with criminal penalties of 20 years in jail or $1 million. The bill currently has no co-sponsors. If passed this would have a significant negative impact on AI development within the US while slowing China not one bit.
3) Microsoft, by far OpenAI's largest partner, adds DeepSeek R1 to Azure AI Foundry and Github (I'm sorry, but: LOL).
4) I've been kicking myself for failing to mention a writeup jeffburdges linked in the previous DeepSeek thread: The Short Case for Nvidia Stock - which is a fantastic read not only in its technical breakdown of DeepSeek R1 but also its overview of Nvidia's actual potential competitors for AI-focused GPU Compute. This is a particularly good read for people who are moderately tech-informed but not super invested in LLMs: it's very long, but really puts in the work to slow-roll the core concepts of modern LLMs as it goes. Super recommended for anybody who is feeling lost or bewildered.
5) Anthropic CEO Dario Amodei wrote up a defense of large corp US SOTA models which you can easily Google (because I'm not gonna link him or Sam Altman), in which he argues the most negative-without-being-wholly-dishonest take of DeepSeek V3's training cost reduction (V3 is the base LLM the Reinforcement / chain-of-thought features of R1 build upon). He estimates roughly 8x~10x efficiency gains, which he contends lies directly on the industry-wide training-costs-reduction curve and is therefore nothing special. I bring this up mostly because it's the lowest estimate I've seen of the improvement that wasn't uninformed or obviously disingenuous (and an order of magnitude reduction matches my own guess at actual, real-world savings).
6) Finally, people are continuing to find ways to reduce the cost of running the full DeepSeek R1 model at less-aggressive quantization than I linked above, (esp. Q4, or 4 bits per parameter) including $2000 AMD Epyc servers or even relatively normal gaming hardware (single 3090 + 96 GB RAM + 2TB SSD) by relying on the fact that Mixture of Experts loads relatively few layers at any one time. These output rate of these two approaches could be described as "human typing speed" and "email speed," respectively.
posted by Ryvar at 10:55 AM on February 3 [1 favorite]
1) DeepSeek R1 fails every safety test thrown at it (eg common LLM jailbreaks -> bomb instructions). If you like your AI wildly uncensored for things not involving Taiwan this is great news. If not, welp...
2) Senator Josh Hawley (R-MO) proposes a bill that would criminalize the downloading of PRC-affiliated models like DeepSeek 180 days after its passage with criminal penalties of 20 years in jail or $1 million. The bill currently has no co-sponsors. If passed this would have a significant negative impact on AI development within the US while slowing China not one bit.
3) Microsoft, by far OpenAI's largest partner, adds DeepSeek R1 to Azure AI Foundry and Github (I'm sorry, but: LOL).
4) I've been kicking myself for failing to mention a writeup jeffburdges linked in the previous DeepSeek thread: The Short Case for Nvidia Stock - which is a fantastic read not only in its technical breakdown of DeepSeek R1 but also its overview of Nvidia's actual potential competitors for AI-focused GPU Compute. This is a particularly good read for people who are moderately tech-informed but not super invested in LLMs: it's very long, but really puts in the work to slow-roll the core concepts of modern LLMs as it goes. Super recommended for anybody who is feeling lost or bewildered.
5) Anthropic CEO Dario Amodei wrote up a defense of large corp US SOTA models which you can easily Google (because I'm not gonna link him or Sam Altman), in which he argues the most negative-without-being-wholly-dishonest take of DeepSeek V3's training cost reduction (V3 is the base LLM the Reinforcement / chain-of-thought features of R1 build upon). He estimates roughly 8x~10x efficiency gains, which he contends lies directly on the industry-wide training-costs-reduction curve and is therefore nothing special. I bring this up mostly because it's the lowest estimate I've seen of the improvement that wasn't uninformed or obviously disingenuous (and an order of magnitude reduction matches my own guess at actual, real-world savings).
6) Finally, people are continuing to find ways to reduce the cost of running the full DeepSeek R1 model at less-aggressive quantization than I linked above, (esp. Q4, or 4 bits per parameter) including $2000 AMD Epyc servers or even relatively normal gaming hardware (single 3090 + 96 GB RAM + 2TB SSD) by relying on the fact that Mixture of Experts loads relatively few layers at any one time. These output rate of these two approaches could be described as "human typing speed" and "email speed," respectively.
posted by Ryvar at 10:55 AM on February 3 [1 favorite]
"Industry analyst firm SemiAnalysis reports that the company behind DeepSeek incurred $1.6 billion in hardware costs and has a fleet of 50,000 Nvidia Hopper GPUs, a finding that undermines the idea that DeepSeek reinvented AI training and inference with dramatically lower investments than the leaders of the AI industry. DeepSeek operates an extensive computing infrastructure with approximately 50,000 Hopper GPUs, the report claims. This includes 10,000 H800s and 10,000 H100s, with additional purchases of H20 units, according to SemiAnalysis. These resources are distributed across multiple locations and serve purposes such as AI training, research, and financial modeling. The company's total capital investment in servers is around $1.6 billion, with an estimated $944 million spent on operating costs, according to SemiAnalysis."
posted by mittens at 3:41 PM on February 3
posted by mittens at 3:41 PM on February 3
Yeah, there seems to be some confusion here (probably deliberate on the part of US LLM companies): High Flyer specifically claimed a *training* cost of $5.5 million. As in just the power consumption over two months, nothing about their hardware buildout. And that’s completely normal, and 40x below typical at this level, so it’s hard to view this as anything other than a smoke screen to prevent investors fleeing.
posted by Ryvar at 6:20 PM on February 3
posted by Ryvar at 6:20 PM on February 3
« Older "but then I turned out to be that idiot" | Punk's still twitching Newer »
posted by Thorzdad at 7:56 AM on January 29 [33 favorites]