Is that enough to account for all human bias?
November 26, 2024 12:47 PM Subscribe
This blog post is a bit ... different. Normally, a blog post is a static document, and the direction of communication is from the screen to the user. But this piece requires you to interact. You're not just reading the content; you'll actually change the story of the blog post as you interact with it. This makes the piece a bit more experimental, but hopefully, also much more interesting! from An Inverse Turing Test
Keeps resetting. I must be a bot.
posted by toodleydoodley at 12:53 PM on November 26 [1 favorite]
posted by toodleydoodley at 12:53 PM on November 26 [1 favorite]
Oh god I did not pass.
posted by mittens at 1:06 PM on November 26 [1 favorite]
posted by mittens at 1:06 PM on November 26 [1 favorite]
01100001010!
posted by y2karl at 1:10 PM on November 26 [1 favorite]
posted by y2karl at 1:10 PM on November 26 [1 favorite]
I failed. But i can't figure out if that means I'm a human or a bot.
posted by If only I had a penguin... at 1:14 PM on November 26 [2 favorites]
posted by If only I had a penguin... at 1:14 PM on November 26 [2 favorites]
First, let me say: this is a neat article & I appreciated reading it. It illustrates a lot about what it means to generate uniform random numbers, or at any rate to try to determine whether a sequence IS from a uniform random number generator.
I was curious, though, whether this would do any good at telling chatgpt from a person. So I created a program that could make fake key inputs into the web page, and solicited chatgpt to give me a long sequence of T & H. This could then be entered into the web page and we can answer the question of whether this technology is good at distinguishing a human from an LLM instructed to produce a "random sequence".
(At first, chatgpt generated and then executed a Python program; but when I told it I wanted it to generate the letters directly it obeyed)
tl;dr: Much like a human is expected to, the chatgpt sequence failed the majority of the statistical tests:
In any case, it's simultaneously interesting & unsurprising that the LLM's weights managed to produce "distinctly non-uniform" random numbers.
posted by the antecedent of that pronoun at 1:47 PM on November 26 [1 favorite]
I was curious, though, whether this would do any good at telling chatgpt from a person. So I created a program that could make fake key inputs into the web page, and solicited chatgpt to give me a long sequence of T & H. This could then be entered into the web page and we can answer the question of whether this technology is good at distinguishing a human from an LLM instructed to produce a "random sequence".
(At first, chatgpt generated and then executed a Python program; but when I told it I wanted it to generate the letters directly it obeyed)
tl;dr: Much like a human is expected to, the chatgpt sequence failed the majority of the statistical tests:
In our case we have 515 heads and 428 tails… At the moment 104 tests fail while 18 tests passI don't know how closely this resembles a "human distribution", whether it mimics the distribution I included in my prompt ('Make a random sequence of T & H (e.g., TTHTHTTHHHTHTHTHTH...). Make a minimum of 1000.'), or what.
In any case, it's simultaneously interesting & unsurprising that the LLM's weights managed to produce "distinctly non-uniform" random numbers.
posted by the antecedent of that pronoun at 1:47 PM on November 26 [1 favorite]
I managed to get pretty close to randomness, not by trying directly but by (spoiler?) aiming for the line between the two buttons and making my hands vibrate slightly. (On my phone, in case that wasn't clear.)
posted by demi-octopus at 2:11 PM on November 26 [2 favorites]
posted by demi-octopus at 2:11 PM on November 26 [2 favorites]
Not sure what it says about Perl's random number generation that typing in the output of:
was not a slam dunk for robot...
posted by straw at 3:17 PM on November 26 [1 favorite]
perl -e 'for (0..100) { print ((rand(2) > 1) ? 1 : 0)} print "\n";'
was not a slam dunk for robot...
posted by straw at 3:17 PM on November 26 [1 favorite]
That was a roller coaster. I got nearly 50-50 on the first level of the analysis, but that rapidly degenerated as the level went up, so I'm proud to report that I'm not a robot, and anyone who thinks I am can bite my shiny metal a** -- whoops.
posted by BCMagee at 3:19 PM on November 26 [1 favorite]
posted by BCMagee at 3:19 PM on November 26 [1 favorite]
I passed all the tests up to size 3, then failed half and passed half the aize 4 tests. Beep boop.
posted by dsword at 4:00 PM on November 26
posted by dsword at 4:00 PM on November 26
To put this in context: The blog post in this FPP introduces the monobit test and the runs test of the K2 criterion from BSI for evaluating the randomness of pseudorandom number generators.
posted by The genius who rejected Anno's budget proposal. at 4:30 PM on November 26
posted by The genius who rejected Anno's budget proposal. at 4:30 PM on November 26
« Older A pop song in classical dress | Harris to remain in power this time? Newer »
posted by grumpybear69 at 12:52 PM on November 26 [4 favorites]