"Chess has been called the drosophila of artificial intelligence"
June 10, 2014 11:07 AM Subscribe
How To Catch A Chess Cheater: Ken Regan Finds Moves Out Of Mind
"Regan clicks a few times on his mouse and then turns his monitor so I can view his test results from the German Bundesliga. His face turns to disgust. “Again, there’s no physical evidence, no behavioral evidence,” he says. “I’m just seeing the numbers. I’ll tell you, people are doing it.” Regan is 53. His hair has turned white. What remains of it, billows up in wild tufts that make him look the professor. When Regan acts surprised his thick, jet-black eyebrows rise like little boomerangs that return a hint of his youth. His enthusiasm for work never wanes; his voice merely shifts modes of erudition that make him sound the professor.
[...]
In Regan’s algorithms it is the relative differences in move quality that matter, not the absolute differences. So if, for example, three top candidate moves are judged by the engine to be only slightly apart, then these top three moves will each earn approximately 30 percent credit (the remaining 10 percent left for the remaining candidate moves). This emphasis on relative differences rather than absolute value explains why cheaters who use moves that are not always the engine’s first choice will still get caught. This also explains why it’s not possible for partial credit to be greater against weak opponents.
After a player’s partial credit is plotted for a set of positions, Regan graphically scores his exam by drawing a curve averaged through the data (See Figure 3). (In statistical jargon, this process is called a “least squares best fit.” The score on a standard multiple-choice exam can be thought of as a “best fit” too, but in this case its best fit is calculated between the points zero and one on a number line rather than between multiple points on a two-dimensional plot. See Figure 1 again.) The best fit produces a curve (shown as ‘y’ in Figure 3) and two values, ‘s’ and ‘c,’ which characterize the bend in the curve. Regan calls ‘s’ the sensitivity. It shifts the curve left and right and correlates to a player’s ability to sense small differences in move quality. Regan calls ‘c’ the consistency and it thins or thickens the tail of the curve. A larger ‘c’ represents a player’s avoidance of gross blunders (“gross” being somewhat relative to the interpretation of the engine). Regan has found that different values of ‘s’ and ‘c’ translate into well-defined categories that align with Elo ratings, similar to the way that a 95 percent and an 85 percent on an exam typically translate to an A and B, respectively. Back in the 1970s, when Arpad Elo designed the USCF and FIDE rating systems, he arbitrarily picked 2000 to mean expert, 2200 to mean master, etc. This arbitrary assignment means chess ratings are based on a curve, and specific values of ‘s’ and ‘c’ can be mapped directly to specific Elo. The mapped rating is the Intrinsic Performance Rating."
Previously: Chess mates' cheating checked; 'You're a pretty good player, but you're too pessimistic.'
"Regan clicks a few times on his mouse and then turns his monitor so I can view his test results from the German Bundesliga. His face turns to disgust. “Again, there’s no physical evidence, no behavioral evidence,” he says. “I’m just seeing the numbers. I’ll tell you, people are doing it.” Regan is 53. His hair has turned white. What remains of it, billows up in wild tufts that make him look the professor. When Regan acts surprised his thick, jet-black eyebrows rise like little boomerangs that return a hint of his youth. His enthusiasm for work never wanes; his voice merely shifts modes of erudition that make him sound the professor.
[...]
In Regan’s algorithms it is the relative differences in move quality that matter, not the absolute differences. So if, for example, three top candidate moves are judged by the engine to be only slightly apart, then these top three moves will each earn approximately 30 percent credit (the remaining 10 percent left for the remaining candidate moves). This emphasis on relative differences rather than absolute value explains why cheaters who use moves that are not always the engine’s first choice will still get caught. This also explains why it’s not possible for partial credit to be greater against weak opponents.
After a player’s partial credit is plotted for a set of positions, Regan graphically scores his exam by drawing a curve averaged through the data (See Figure 3). (In statistical jargon, this process is called a “least squares best fit.” The score on a standard multiple-choice exam can be thought of as a “best fit” too, but in this case its best fit is calculated between the points zero and one on a number line rather than between multiple points on a two-dimensional plot. See Figure 1 again.) The best fit produces a curve (shown as ‘y’ in Figure 3) and two values, ‘s’ and ‘c,’ which characterize the bend in the curve. Regan calls ‘s’ the sensitivity. It shifts the curve left and right and correlates to a player’s ability to sense small differences in move quality. Regan calls ‘c’ the consistency and it thins or thickens the tail of the curve. A larger ‘c’ represents a player’s avoidance of gross blunders (“gross” being somewhat relative to the interpretation of the engine). Regan has found that different values of ‘s’ and ‘c’ translate into well-defined categories that align with Elo ratings, similar to the way that a 95 percent and an 85 percent on an exam typically translate to an A and B, respectively. Back in the 1970s, when Arpad Elo designed the USCF and FIDE rating systems, he arbitrarily picked 2000 to mean expert, 2200 to mean master, etc. This arbitrary assignment means chess ratings are based on a curve, and specific values of ‘s’ and ‘c’ can be mapped directly to specific Elo. The mapped rating is the Intrinsic Performance Rating."
Previously: Chess mates' cheating checked; 'You're a pretty good player, but you're too pessimistic.'
Wow, I loved the section on “Converting Utilities into Probabilities.” This is an incredibly useful problem for this kind of framework, and like Reagan, i figured it would be kind of old research.
“I made it up,” he says. “I’ve been astounded, actually, that there doesn’t seem to be precedent in the literature for it. I was dead sure people were doing this problem.”
posted by DGStieber at 11:53 AM on June 10, 2014 [1 favorite]
“I made it up,” he says. “I’ve been astounded, actually, that there doesn’t seem to be precedent in the literature for it. I was dead sure people were doing this problem.”
posted by DGStieber at 11:53 AM on June 10, 2014 [1 favorite]
See also: Kenneth Regan's supplemental notes.
posted by gyc at 11:59 AM on June 10, 2014 [2 favorites]
posted by gyc at 11:59 AM on June 10, 2014 [2 favorites]
Curious to hear chess MeFite's reactions to this.
posted by benito.strauss at 12:00 PM on June 10, 2014
posted by benito.strauss at 12:00 PM on June 10, 2014
Claude Shannon, the father of information theory, in his famous paper “Programming a Computer for Playing Chess,” estimated the number of possible unique chess positions to be roughly 1043.
That's a hilarious formatting error. The real number is a little bigger than that (1043).
He said "roughly". Geez.
posted by Etrigan at 12:16 PM on June 10, 2014 [14 favorites]
That's a hilarious formatting error. The real number is a little bigger than that (1043).
He said "roughly". Geez.
posted by Etrigan at 12:16 PM on June 10, 2014 [14 favorites]
Well I had a mixed reaction to the article. I do think in principle that this kind of analysis could detect some kinds of cheating but not others. It seems that using a computer for every move would show up clearer than single-move cheating. He acknowledges that it's harder to detect single-move cheating, but thinks it would happen more often in critical positions. This sounds reasonable in high-level games, but he's also including regular tournament players. How do they determine critical moves? (Hindsight, usually)
The idea of using ratings certainly has validity, but if you're not greedy and you only cheat a little, your rating will go up to the point where it matches your cheating.
I know I'm capable of making moves that are better than my rating would suggest from time to time. How is that distinguishable from cheating using his program?
Takeaway: don't be greedy.
posted by MtDewd at 12:20 PM on June 10, 2014
The idea of using ratings certainly has validity, but if you're not greedy and you only cheat a little, your rating will go up to the point where it matches your cheating.
I know I'm capable of making moves that are better than my rating would suggest from time to time. How is that distinguishable from cheating using his program?
Takeaway: don't be greedy.
posted by MtDewd at 12:20 PM on June 10, 2014
if the chess cheater is using a hidden radio earphone, you find out what frequency it's on and broadcast fortissimo yoko ono music until he dies.
posted by bruce at 12:30 PM on June 10, 2014 [2 favorites]
posted by bruce at 12:30 PM on June 10, 2014 [2 favorites]
He said "roughly". Geez.
It's only a difference of 40 orders of magnitude. Piddling.
posted by maryr at 1:22 PM on June 10, 2014
It's only a difference of 40 orders of magnitude. Piddling.
posted by maryr at 1:22 PM on June 10, 2014
Naked chess in Faraday cages. And does chess being the drosophila of AI mean that it'll have won once we can't tell the difference between a conversation with a chess computer and one with a fruit fly?
More seriously, the observation that 'freestyle' chess teams - mixed humans and computers - can be better than purely carbon or silicon-based teams, seems to match the move in pub quizzes towards allowing people to use their mobile devices while skewing the questions to avoid stuff you can just Google. The whole question of what cheating is and what it means is far more malleable than I think we think. It's fun and status-enhancing to measure humans against other humans in ways that reflect talent and hard work, but if one just removes the opportunities for dishonesty does that diminish the comparisons?
Also, I've often found that expert practitioners in many fields can easily spot fraud, even if they can't prove it. A day trader I know got out of the game because she was disgusted at the level of manipulation; speak to professors or salesmen or CFOs and they'll know who's doing a bit too well in their world to be plausible, and often what is happening and how. But proving it? How many big financial fraud cases actually succeed? (I know to a high degree of certainty about quite serious fraud and corruption among bits of the IT world, but the practitioners are good at squid-inking and most people really don't care. They should, but they don't.)
That's what I like about writing fiction - you can rip the very skin and muscle off other people's ideas, but cheerfully admit it at the time and your work stands. And most of the rest of writing you just can't cheat at anyway. Prose style structure, characterisation, plot development, insight and whatever the hell the magic juice is: do it however you like, but you can't cheat.
posted by Devonian at 1:37 PM on June 10, 2014 [2 favorites]
More seriously, the observation that 'freestyle' chess teams - mixed humans and computers - can be better than purely carbon or silicon-based teams, seems to match the move in pub quizzes towards allowing people to use their mobile devices while skewing the questions to avoid stuff you can just Google. The whole question of what cheating is and what it means is far more malleable than I think we think. It's fun and status-enhancing to measure humans against other humans in ways that reflect talent and hard work, but if one just removes the opportunities for dishonesty does that diminish the comparisons?
Also, I've often found that expert practitioners in many fields can easily spot fraud, even if they can't prove it. A day trader I know got out of the game because she was disgusted at the level of manipulation; speak to professors or salesmen or CFOs and they'll know who's doing a bit too well in their world to be plausible, and often what is happening and how. But proving it? How many big financial fraud cases actually succeed? (I know to a high degree of certainty about quite serious fraud and corruption among bits of the IT world, but the practitioners are good at squid-inking and most people really don't care. They should, but they don't.)
That's what I like about writing fiction - you can rip the very skin and muscle off other people's ideas, but cheerfully admit it at the time and your work stands. And most of the rest of writing you just can't cheat at anyway. Prose style structure, characterisation, plot development, insight and whatever the hell the magic juice is: do it however you like, but you can't cheat.
posted by Devonian at 1:37 PM on June 10, 2014 [2 favorites]
It's only a difference of 40 orders of magnitude. Piddling.
What's an order of magnitude? Just another zero. What's a zero worth?
posted by blue t-shirt at 1:39 PM on June 10, 2014
What's an order of magnitude? Just another zero. What's a zero worth?
posted by blue t-shirt at 1:39 PM on June 10, 2014
What's a zero worth?
Depends on where it is and how it got there.
posted by Herodios at 4:53 PM on June 10, 2014
Depends on where it is and how it got there.
posted by Herodios at 4:53 PM on June 10, 2014
For instance, if it were in town.
posted by Simon! at 6:00 PM on June 10, 2014 [3 favorites]
posted by Simon! at 6:00 PM on June 10, 2014 [3 favorites]
This is actually something I'm interested in, peripherally at least, and so I stuck with the article, even though it had a few of those clichés of writing about computers—like mixing jargon with wording that panders to the uninformed without actually aiding understanding. For instance:
What Regan means by regression is this: While some ones and zeroes remain static in the engine’s initial program, other ones and zeroes essential for the engine’s evaluation function—those essential for the way it “thinks”—rapidly update in short-term random access memory (RAM).
The sentence manages to be both too technical about the easy stuff and not technical enough about the hard stuff, to the point that it doesn't work well as an explanation.
Then there's this:
Countless athletes and chess players, including Bobby Fischer, have compared sports to life. “Chess is life,” the former American world champion said.
That's just...a bad couple of sentences.
It also, yes, squicked me out a bit how much mention was made of Regan's Christianity—clearly it's important to him, and this is as much a personality profile as it is a technical exegesis of cheating in chess, so I can see why it's mentioned, but as an editor, I could see several places where the mentions just felt a bit indulgent of the subject and made my mind wander. Editing out some of that probably would have made it a stronger piece.
Looking into the writer, Goldowsky, though, it appears that this kind of reflects his philosophy—he's interested in the spiritual aspects of chess, so I guess that's why he focuses on that so much in the piece. He's also an alum of the University at Buffalo, where Regan works, so there's that personal connection there as well. (And that said, maybe I'm making Regan's argument—I'm critiquing this, but I've also written for UB's alumni magazine before, so there's another weird coincidence for ya.)
Anyway, overall, it's an interesting piece of writing!
posted by limeonaire at 6:17 PM on June 10, 2014 [2 favorites]
What Regan means by regression is this: While some ones and zeroes remain static in the engine’s initial program, other ones and zeroes essential for the engine’s evaluation function—those essential for the way it “thinks”—rapidly update in short-term random access memory (RAM).
The sentence manages to be both too technical about the easy stuff and not technical enough about the hard stuff, to the point that it doesn't work well as an explanation.
Then there's this:
Countless athletes and chess players, including Bobby Fischer, have compared sports to life. “Chess is life,” the former American world champion said.
That's just...a bad couple of sentences.
It also, yes, squicked me out a bit how much mention was made of Regan's Christianity—clearly it's important to him, and this is as much a personality profile as it is a technical exegesis of cheating in chess, so I can see why it's mentioned, but as an editor, I could see several places where the mentions just felt a bit indulgent of the subject and made my mind wander. Editing out some of that probably would have made it a stronger piece.
Looking into the writer, Goldowsky, though, it appears that this kind of reflects his philosophy—he's interested in the spiritual aspects of chess, so I guess that's why he focuses on that so much in the piece. He's also an alum of the University at Buffalo, where Regan works, so there's that personal connection there as well. (And that said, maybe I'm making Regan's argument—I'm critiquing this, but I've also written for UB's alumni magazine before, so there's another weird coincidence for ya.)
Anyway, overall, it's an interesting piece of writing!
posted by limeonaire at 6:17 PM on June 10, 2014 [2 favorites]
« Older "Just because we have the best hammer" | Take it easy babe Newer »
This thread has been archived and is closed to new comments
That's a hilarious formatting error. The real number is a little bigger than that (1043).
posted by blue t-shirt at 11:44 AM on June 10, 2014 [23 favorites]