LLM just needs a little help and a little prompt to fake a data set
November 27, 2023 6:45 AM Subscribe
"ChatGPT generates fake data set to support scientific hypothesis" "In a paper published in JAMA Ophthalmology on 9 November1, the authors used GPT-4 — the latest version of the large language model on which ChatGPT runs — paired with Advanced Data Analysis (ADA), a model that incorporates the programming language Python and can perform statistical analysis and create data visualizations. The AI-generated data compared the outcomes of two surgical procedures and indicated — wrongly — that one treatment is better than the other."
"At the request of Nature’s news team, Wilkinson and his colleague Zewen Lu assessed the fake data set using a screening protocol designed to check for authenticity.
This revealed a mismatch in many ‘participants’ between designated sex and the sex that would typically be expected from their name. Furthermore, no correlation was found between preoperative and postoperative measures of vision capacity and the eye-imaging test. Wilkinson and Lu also inspected the distribution of numbers in some of the columns in the data set to check for non-random patterns. The eye-imaging values passed this test, but some of the participants’ age values clustered in a way that would be extremely unusual in a genuine data set: there was a disproportionate number of participants whose age values ended with 7 or 8."
I don't know how much of the screening protocol is automated.
"At the request of Nature’s news team, Wilkinson and his colleague Zewen Lu assessed the fake data set using a screening protocol designed to check for authenticity.
This revealed a mismatch in many ‘participants’ between designated sex and the sex that would typically be expected from their name. Furthermore, no correlation was found between preoperative and postoperative measures of vision capacity and the eye-imaging test. Wilkinson and Lu also inspected the distribution of numbers in some of the columns in the data set to check for non-random patterns. The eye-imaging values passed this test, but some of the participants’ age values clustered in a way that would be extremely unusual in a genuine data set: there was a disproportionate number of participants whose age values ended with 7 or 8."
I don't know how much of the screening protocol is automated.
I do encourage people to actually real the links, the goal of the original paper was to make a fake dataset. This isn't fraud.
posted by MisantropicPainforest at 6:55 AM on November 27, 2023 [15 favorites]
posted by MisantropicPainforest at 6:55 AM on November 27, 2023 [15 favorites]
I do understand what the goal of the paper was, but you better believe future fraud of this nature awaits.
posted by mcstayinskool at 6:56 AM on November 27, 2023 [9 favorites]
posted by mcstayinskool at 6:56 AM on November 27, 2023 [9 favorites]
Fair enough. This was more like demonstrating the possibility of fraud. It's something like white hat hacking.
posted by Nancy Lebovitz at 6:57 AM on November 27, 2023 [8 favorites]
posted by Nancy Lebovitz at 6:57 AM on November 27, 2023 [8 favorites]
Like most worries about LLMs, all this does is compress the amount of code it takes to do something. Its very easy to fake data, it has been, and perhaps the easiest thing to do is to take your data, and just change a few numbers to get the results you want. This will most likely affect the glut of shitty and already fake papers that are out there and land in predatory journals.
The real issue is the perverse incentives that academics face. Faking data is always in the direction of 'getting the results you want', as long as those incentives are there, fraud and bad research will continue.
posted by MisantropicPainforest at 7:05 AM on November 27, 2023 [15 favorites]
The real issue is the perverse incentives that academics face. Faking data is always in the direction of 'getting the results you want', as long as those incentives are there, fraud and bad research will continue.
posted by MisantropicPainforest at 7:05 AM on November 27, 2023 [15 favorites]
Where did you get it? Who taught you how to do this stuff?you better believe future fraud of this nature awaits
You, alright? I learned it by watching you!
Sure. Plenty of past fraud of this nature exists, too, without AIs. It's actually kind of a big problem now in the sciences. Not so much intentional made up data but also just bad science with p-hacking or otherwise self-selecting work to reach a desired conclusion.
I take it as a sign of AI success that they can learn to fake data just like real scientists sometimes do. It is a huge problem for usability though. ChatGPT's eagerness to make shit up really limits its usefulness as a tool. You can even be clever and ask for citations and it will produce very plausible looking paper titles and URLs that are completely fabricated.
A newer crop of AI search UIs are doing better. I like how Bing and Phind are more search-engine oriented, where every sentence in an answer from the AI is a link to a web page or other citation. And the links are designed to encourage you to go deeper. With this UI the AI isn't the oracle giving answers, it is more of a knowledge guide pointing you to further resources.
posted by Nelson at 7:16 AM on November 27, 2023 [8 favorites]
The spam and plagiarism machine produces spam and plagiarism.
If you are a scientist who thinks you can trust this shit you are an incompetent and a fool.
posted by Artw at 7:42 AM on November 27, 2023 [15 favorites]
If you are a scientist who thinks you can trust this shit you are an incompetent and a fool.
posted by Artw at 7:42 AM on November 27, 2023 [15 favorites]
I do feel like ChatGPT is getting better at not making stuff up as often. A few months, you could casually ask it "hey, tell me about that politician who was arrested for punching out a mailman a while back, and give me sources", and it would obligingly fabricate a story, using a real politician's name, and "source" it with superficially convincing links to (non-existent) articles on real newspaper sites. I've tried this a few times recently and now just get "I can't find any articles about such a situation"-type responses. Whether ChatGPT gets to the point where it makes stuff up less often than humans do remains to be seen.
posted by senor biggles at 8:12 AM on November 27, 2023
posted by senor biggles at 8:12 AM on November 27, 2023
I wonder if the GPT-4 Python-writing bot used the Faker library to generate the fake data. We'll never know unless I get a PubMed login, and this also assumes they pressed the little arrow to show the code that was generated.
posted by credulous at 8:29 AM on November 27, 2023 [2 favorites]
posted by credulous at 8:29 AM on November 27, 2023 [2 favorites]
Articles like this are just stupid moral panics. An LLM is just a statistical model of written language. This is about as surprising as "autocomplete suggested I like 'ducking' my wife and that's not true at all!"
posted by riotnrrd at 8:36 AM on November 27, 2023 [4 favorites]
posted by riotnrrd at 8:36 AM on November 27, 2023 [4 favorites]
Flipping things around, I'm more interested that the AI generated fraud fails basic statistical checks. Maybe it's flaws are more subtle than the stuff found in a Dan Ariely paper but they don't seem that subtle.
My joke response is that if it can't help avoid detection what's the point? Another failure of AI vs. hype.
But yeah, maybe there's some risk? Like it makes (incompetent) fraud easier and thus some (incompetent) people will try fraud who previously wouldn't have, convinced that they'll get away with it because AI.
FWIW I have created fake data sets in R and other software. In my cases I used them to test some processing and visualization pipelines, either before data is available or to see what results might appear to be under various cases. The ability to do this is a useful tool.
posted by mark k at 8:39 AM on November 27, 2023 [4 favorites]
My joke response is that if it can't help avoid detection what's the point? Another failure of AI vs. hype.
But yeah, maybe there's some risk? Like it makes (incompetent) fraud easier and thus some (incompetent) people will try fraud who previously wouldn't have, convinced that they'll get away with it because AI.
FWIW I have created fake data sets in R and other software. In my cases I used them to test some processing and visualization pipelines, either before data is available or to see what results might appear to be under various cases. The ability to do this is a useful tool.
posted by mark k at 8:39 AM on November 27, 2023 [4 favorites]
Articles like this are just stupid moral panics. An LLM is just a statistical model of written language.
Statistical models of written language that some very rich, very stupid deeply unethical people are trying to replace the entire middle class with, and while that is a dumb idea we need to keep an eye on it because they are going to cause a lot of damage along the way.
posted by Artw at 8:49 AM on November 27, 2023 [33 favorites]
Statistical models of written language that some very rich, very stupid deeply unethical people are trying to replace the entire middle class with, and while that is a dumb idea we need to keep an eye on it because they are going to cause a lot of damage along the way.
posted by Artw at 8:49 AM on November 27, 2023 [33 favorites]
"An LLM is just a statistical model of written language."
A howitzer is just a machine for moving metal from here to there.
posted by exlotuseater at 8:53 AM on November 27, 2023 [32 favorites]
A howitzer is just a machine for moving metal from here to there.
posted by exlotuseater at 8:53 AM on November 27, 2023 [32 favorites]
Statistical models of written language that some very rich, very stupid deeply unethical people are trying to replace the entire middle class with, and while that is a dumb idea we need to keep an eye on it because they are going to cause a lot of damage along the way.
And also which those same very rich, very stupid, and deeply unethical people are deeply invested in convincing everybody else (not to mention themselves) are either already sapient or soon will be fully sapient consciousnesses capable of evaluation and synthesis and analysis when, no, they're just spitting out words that statistically are likely to go next to each other.
posted by Pope Guilty at 8:54 AM on November 27, 2023 [17 favorites]
And also which those same very rich, very stupid, and deeply unethical people are deeply invested in convincing everybody else (not to mention themselves) are either already sapient or soon will be fully sapient consciousnesses capable of evaluation and synthesis and analysis when, no, they're just spitting out words that statistically are likely to go next to each other.
posted by Pope Guilty at 8:54 AM on November 27, 2023 [17 favorites]
I've been using Google's AI Bard for some preliminary research related to my primary care clinical work and would peg its overall accuracy at about 70%. This surprised me, given the publicity around AIs acing the US Medical Licensing Exam. When confronted with questions where data is thin or consensus nonexistant, Bard consistently picks a side and will fabricate citations, sometimes attributing nonexistent books or papers to real living authors. Of course this may be a bug rather than a feature: AI plagiarists (at least in my field) are (for now) easily discoverable through citation search.
I still use LLMs, but in the same way that I might give work to a sloppy undergraduate research assistant who nevertheless sports quick turnaround time.
posted by Richard Saunders at 8:57 AM on November 27, 2023
I still use LLMs, but in the same way that I might give work to a sloppy undergraduate research assistant who nevertheless sports quick turnaround time.
posted by Richard Saunders at 8:57 AM on November 27, 2023
A little over 30 years ago, I shifted country and research field in one swell foop. Shortly after my arrival, The Gaffer waved a paper at me; all indignant because a colleague/rival from his previous job a) seemed to be muscling in on the field b) had found something mildly unexpected. My task became a close scrutiny of this paper. It had the look-and-feel of science but the referees had done the usual half-arsed refereeing. Several of the column totals in the main data table didn't sum correctly and the stats were a bit wonk. Accordingly my first paper in the new field was a hatchet-job re-analysis of the data which [surprise] proved that Aspergillus nidulans showed the same effect as my new lab had demonstrated in several other species. The rogue paper had been cobbled together by a harried and hurried graduate student and not enough people downstream were sufficiently incentivized to do their jobs with enough rigour.
So far, so normal science. Over the next couple of years we put out a number of similar papers: Yada yada Candida albicans; Yada yada C.elegans etc. It was only a step to imagine writing a program that would write the paper as part of the protocol of programs that downloaded the data from GenBank, tallied it up and did the multivariate stats to reach a conclusion. The autopaper template would be same boiler-plate with variables for [species] [sample size] [most highly expressed gene] and would, of course, cite all our previous papers in the line.
posted by BobTheScientist at 9:01 AM on November 27, 2023 [7 favorites]
So far, so normal science. Over the next couple of years we put out a number of similar papers: Yada yada Candida albicans; Yada yada C.elegans etc. It was only a step to imagine writing a program that would write the paper as part of the protocol of programs that downloaded the data from GenBank, tallied it up and did the multivariate stats to reach a conclusion. The autopaper template would be same boiler-plate with variables for [species] [sample size] [most highly expressed gene] and would, of course, cite all our previous papers in the line.
posted by BobTheScientist at 9:01 AM on November 27, 2023 [7 favorites]
Regurgitation engine in talking absolute nonsense shock - film at 10 ....
posted by GallonOfAlan at 9:13 AM on November 27, 2023
posted by GallonOfAlan at 9:13 AM on November 27, 2023
It does seem like the larger problem here is academics not receiving credit for (a) more careful, detailed paper refereeing and (b) duplication/verification studies. If doing some amount of that work was a requirement for tenure and promotion, or if the major research journals started including sections for such work (= same citation index as the original work, for tenure and promotion calculations), our disciplinary knowledge bases would be more robust.
posted by eviemath at 9:49 AM on November 27, 2023 [6 favorites]
posted by eviemath at 9:49 AM on November 27, 2023 [6 favorites]
"But he warns that advances in generative AI could soon offer ways to circumvent these protocols"
"please state the nature of the medical emergency™"
posted by clavdivs at 9:49 AM on November 27, 2023
"please state the nature of the medical emergency™"
posted by clavdivs at 9:49 AM on November 27, 2023
And also which those same very rich, very stupid, and deeply unethical people are deeply invested in convincing everybody else (not to mention themselves) are either already sapient or soon will be fully sapient consciousnesses capable of evaluation and synthesis and analysis when, no, they're just spitting out words that statistically are likely to go next to each other.
A large number of these rich unethical stupid people are also in a fascist cult that worships the pretend fully sapient consciousness and has a Harry Potter fanfic as one of its religious texts.
It is both utterly ridiculous and a huge problem.
posted by Artw at 9:57 AM on November 27, 2023 [7 favorites]
A large number of these rich unethical stupid people are also in a fascist cult that worships the pretend fully sapient consciousness and has a Harry Potter fanfic as one of its religious texts.
It is both utterly ridiculous and a huge problem.
posted by Artw at 9:57 AM on November 27, 2023 [7 favorites]
Referees should get respect. They should also get paid.
posted by Nancy Lebovitz at 10:01 AM on November 27, 2023 [4 favorites]
posted by Nancy Lebovitz at 10:01 AM on November 27, 2023 [4 favorites]
I think the routine operation of academics trying to sweeten their resumes is more than enough of a problem.
posted by Nancy Lebovitz at 10:14 AM on November 27, 2023 [2 favorites]
posted by Nancy Lebovitz at 10:14 AM on November 27, 2023 [2 favorites]
Soylent ChatGPT is pretend fully sapient consciousnesses. Or is that consciences?
posted by y2karl at 10:34 AM on November 27, 2023
posted by y2karl at 10:34 AM on November 27, 2023
Yes it's easy to fake data and yes, there are shitty systemic incentives to do so.
However, it's pretty tough to fake usable amounts of data in a way that also passes forensic screening, and rather difficult to fake the data in a way that will stand up to careful scrutiny.
I feel like the take-home here is that faking data well is about to get a lot easier, and I'm not sure if the tools for detecting this kind of foul play are advancing fast enough to stem the tide.
posted by SaltySalticid at 11:17 AM on November 27, 2023 [5 favorites]
However, it's pretty tough to fake usable amounts of data in a way that also passes forensic screening, and rather difficult to fake the data in a way that will stand up to careful scrutiny.
I feel like the take-home here is that faking data well is about to get a lot easier, and I'm not sure if the tools for detecting this kind of foul play are advancing fast enough to stem the tide.
posted by SaltySalticid at 11:17 AM on November 27, 2023 [5 favorites]
It's impossible to know without reading the original paper, but some other articles I've found imply that the researchers spent a lot of iterations crafting the prompt. Which makes me wonder if they were just taking a roundabout way of writing a Python script, which they could have perhaps written themselves more efficiently. I guess we could run a double-blind test with a fast-typing grad student.
posted by credulous at 11:29 AM on November 27, 2023
posted by credulous at 11:29 AM on November 27, 2023
In a way, I do get paid to referee articles, in that I do it as part of my job, and my salary is fine. I mean, nobody told me I had to, but I understand it as part of my role, part of how we keep science afloat, so I do it a few times a year. I’m not convinced that it’s a great idea systemwise to separate that work out as its own income source. I’d be tempted to quit doing it because then it would feel like moonlighting, and I’d rather have my evenings/weekends than a tax complication. I guess I don’t know how many adjuncts (who do not all make a decent salary) also do this work.
What I don’t do, and can’t do without the infrastructure changing, is rerun authors’ code on their data to confirm that I get the same numbers they do. That wouldn’t say anything about whether they fabricated the data, it’s a much weaker test, but it’s a test that many of our papers probably still wouldn’t pass. And this I think would improve the whole process a great deal. More work, I’d referee fewer papers, but I’d be more confident that I was advising the editor well.
posted by eirias at 11:51 AM on November 27, 2023 [2 favorites]
What I don’t do, and can’t do without the infrastructure changing, is rerun authors’ code on their data to confirm that I get the same numbers they do. That wouldn’t say anything about whether they fabricated the data, it’s a much weaker test, but it’s a test that many of our papers probably still wouldn’t pass. And this I think would improve the whole process a great deal. More work, I’d referee fewer papers, but I’d be more confident that I was advising the editor well.
posted by eirias at 11:51 AM on November 27, 2023 [2 favorites]
What I don’t do, and can’t do without the infrastructure changing, is rerun authors’ code on their data to confirm that I get the same numbers they do.
Lots of journals in my field do this after a paper is accepted. Replication is required for publication.
posted by MisantropicPainforest at 12:00 PM on November 27, 2023 [3 favorites]
Lots of journals in my field do this after a paper is accepted. Replication is required for publication.
posted by MisantropicPainforest at 12:00 PM on November 27, 2023 [3 favorites]
Yeah, but in political science that means general journals and top subfield journals have to go through that effort on the 2-10% of submissions that are accepted instead of on all of them.
posted by GCU Sweet and Full of Grace at 12:15 PM on November 27, 2023
posted by GCU Sweet and Full of Grace at 12:15 PM on November 27, 2023
Lots of journals in my field do this after a paper is accepted. Replication is required for publication.
What's your field?
It's not common in the areas I work in (life sciences and adjacent.)
posted by mark k at 12:27 PM on November 27, 2023
What's your field?
It's not common in the areas I work in (life sciences and adjacent.)
posted by mark k at 12:27 PM on November 27, 2023
As I coasted to retirement, I was asked by a Journal Editor pal to referee a paper which needed scrutiny by someone with my niche expertise. I spent a lot of time on it because the first and last authors were not native English speakers and this was their first paper in "my" field. Sometimes, you see in a paper's Acknowledgements "Thanks are due to an anonymous referee who made constructive comments, improved the English and made it a better paper" not in this case! The Journal did, at year end, send me an e-certificate. I am not ungrateful, so I wrote back to the robot who sent me the Cert: "Thank you so much, I printed it out at A3 and folded it into a hat: really useful to help survive the current heatwave. With care, recipients can ensure that WILEY is displayed at a jaunty angle over the left eyebrow". If we're going to whore ourselves out for Multinationals we may as well wear the cap that fits.
posted by BobTheScientist at 2:33 PM on November 27, 2023 [1 favorite]
posted by BobTheScientist at 2:33 PM on November 27, 2023 [1 favorite]
I've been using Google's AI Bard for some preliminary research related to my primary care clinical work and would peg its overall accuracy at about 70%.
I'd be interested to know if Bard's alternate answers are any better. I haven't used it a ton, but when I have I've noticed that it's pretty common that at least one of the three drafts it generates is closer to reality than the others. It seems like whatever scoring function they are using to choose which draft is presented scores readability/conversational tone over correctness a lot of the time.
posted by wierdo at 7:59 PM on November 27, 2023
I'd be interested to know if Bard's alternate answers are any better. I haven't used it a ton, but when I have I've noticed that it's pretty common that at least one of the three drafts it generates is closer to reality than the others. It seems like whatever scoring function they are using to choose which draft is presented scores readability/conversational tone over correctness a lot of the time.
posted by wierdo at 7:59 PM on November 27, 2023
Tech Conference Canceled After Using AI to Generate Fake Women Speakers
posted by Artw at 10:26 AM on November 28, 2023 [6 favorites]
posted by Artw at 10:26 AM on November 28, 2023 [6 favorites]
« Older This post intentionally left blank | Together, we can get paid in full Newer »
This thread has been archived and is closed to new comments
FFS, I was joking
posted by mcstayinskool at 6:53 AM on November 27, 2023 [16 favorites]