Should have seen this one coming, too.
January 8, 2011 10:34 AM Subscribe
Following the Journal of Personality and Social Psychology's decision to publish Daryl Bem's writeup of 8 studies (PDF) purporting to show evidence for precognition (previously), researchers from the University of Amsterdam have written a rebuttal (PDF) which finds methodological flaws not only in Bem's research, but in many other published papers in experimental psychology. Could this prove to be psychology's cold fusion moment?
There has already been at least one failed attempt at replicating Bem's findings. From the rebuttal paper:
There has already been at least one failed attempt at replicating Bem's findings. From the rebuttal paper:
"Parapsychology is worth serious study. (...) if it is wrong [i.e., psi
does not exist], it offers a truly alarming massive case study of how
statistics can mislead and be misused." (Diaconis, 1991, p. 386)
The argument boils down to these two main points:
1. Bem doesn't distinguish between exploratory and confirmatory studies (e.g. did the hypothesis come after or before data collection?). Doing a "fishing expedition" -- looking for something significant in a mountain of data -- is far less likely to find significant results than starting with a hypothesis and testing it experimentally.
2. Bem's research confuses the probability of the data given the hypothesis with the probability of the hypothesis given the data. For example, what's the probability that someone is dead given that they were lynched? Near 1. What's the probability that someone was lynched given that they are dead? Near 0.
posted by yourcelf at 10:50 AM on January 8, 2011 [3 favorites]
1. Bem doesn't distinguish between exploratory and confirmatory studies (e.g. did the hypothesis come after or before data collection?). Doing a "fishing expedition" -- looking for something significant in a mountain of data -- is far less likely to find significant results than starting with a hypothesis and testing it experimentally.
2. Bem's research confuses the probability of the data given the hypothesis with the probability of the hypothesis given the data. For example, what's the probability that someone is dead given that they were lynched? Near 1. What's the probability that someone was lynched given that they are dead? Near 0.
posted by yourcelf at 10:50 AM on January 8, 2011 [3 favorites]
Could this prove to be psychology's cold fusion moment
It could prove to be yet another cold-fusion moment. (*sigh*)
posted by grumblebee at 10:51 AM on January 8, 2011
It could prove to be yet another cold-fusion moment. (*sigh*)
posted by grumblebee at 10:51 AM on January 8, 2011
Nelson, Berger and Berry explained this in a fairly lucid way. I have put their paper on my website here: Berger and Berry, 1988.
You can also see the recent NY Times article about this, at NY Times. (Note: that's a bit of a self link, since my work is discussed in the article).
posted by Philosopher Dirtbike at 11:01 AM on January 8, 2011 [6 favorites]
You can also see the recent NY Times article about this, at NY Times. (Note: that's a bit of a self link, since my work is discussed in the article).
posted by Philosopher Dirtbike at 11:01 AM on January 8, 2011 [6 favorites]
In the late 1970s, James Maas hosted an extensive and incredibly well put together evening program at Bailey Hall not only debunking, but truly obliterating, many then-current claims of parapsychology and psi. My hazy recollection is that Dr. Bem, an amateur stage magician, may have even participated in such events. In any case, the rather theatrical approach traditionally employed at the Cornell psych department (using Candid Camera as a teaching tool, for example) suggests that this could be a sort of epic art piece by Dr. Bem about peer-reviewed social science, and not an actual investigation of ESP.
Incidentally, if there truly was a way to predict the future using porn, I would be Mandrake the fucking Magician.
posted by xowie at 11:01 AM on January 8, 2011 [4 favorites]
Incidentally, if there truly was a way to predict the future using porn, I would be Mandrake the fucking Magician.
posted by xowie at 11:01 AM on January 8, 2011 [4 favorites]
From the NYT:
In an interview, Dr. Bem, the author of the original paper and one of the most prominent research psychologists of his generation, said he intended each experiment to mimic a well-known classic study, “only time-reversed.”
It looks to me like he is trying to demonstrate methodological flaws common to psychology research by using the same methods to "prove" an absurd result.
posted by metaplectic at 11:04 AM on January 8, 2011
In an interview, Dr. Bem, the author of the original paper and one of the most prominent research psychologists of his generation, said he intended each experiment to mimic a well-known classic study, “only time-reversed.”
It looks to me like he is trying to demonstrate methodological flaws common to psychology research by using the same methods to "prove" an absurd result.
posted by metaplectic at 11:04 AM on January 8, 2011
Experiment design, confounding during the experiment, and faulty analysis are always problems in any instantiation of the scientific method. Confounding is often identified by other researchers when they try to reproduce the results of another researchers experiment.
One dishonest or deluded researcher, or even one out of three researchers being dishonest or deluded does not condemn the entire field of Experimental Psychology. Three out of five, maybe, but the general assertion made about experimental psychologists in the abstract is not supported in the paper by any statistics regarding the rate of occurrence of weak experiments in publications.
posted by the Real Dan at 11:06 AM on January 8, 2011
One dishonest or deluded researcher, or even one out of three researchers being dishonest or deluded does not condemn the entire field of Experimental Psychology. Three out of five, maybe, but the general assertion made about experimental psychologists in the abstract is not supported in the paper by any statistics regarding the rate of occurrence of weak experiments in publications.
posted by the Real Dan at 11:06 AM on January 8, 2011
Not to defend Bem's work, but some of the criticisms in the rebuttal seem crazy to me. For example,
Furthermore, in Bem’s Experiment 5 the analysis shows that “Women achieved a significant hit rate on the negative pictures, 53.6%, t(62) = 2:25, p = :014, d = :28; but men did not, 52.4%, t(36) = 0:89, p = :19, d = :15.” But why test for gender in the first place? There appears to be no good reason. Indeed, Bem himself states that “the psi literature does not reveal any systematic sex differences in psi ability”.
Uh, curiosity maybe? Apply this same reasoning to Rutherford's experiment with backscattering - there was no "good reason" for him to measure large angle scattering events (which subsequently led to the nuclear model of the atom). Are experimenters only allowed to analyze data if there is a good reason? Isn't the result of the analysis evidence for the reasonableness?
Also, can someone with a good understanding of statistics explain to me how coming up with a hypothesis before or after data collection changes anything? Do you actually use different statistical methods?
posted by Wet Spot at 11:12 AM on January 8, 2011
Furthermore, in Bem’s Experiment 5 the analysis shows that “Women achieved a significant hit rate on the negative pictures, 53.6%, t(62) = 2:25, p = :014, d = :28; but men did not, 52.4%, t(36) = 0:89, p = :19, d = :15.” But why test for gender in the first place? There appears to be no good reason. Indeed, Bem himself states that “the psi literature does not reveal any systematic sex differences in psi ability”.
Uh, curiosity maybe? Apply this same reasoning to Rutherford's experiment with backscattering - there was no "good reason" for him to measure large angle scattering events (which subsequently led to the nuclear model of the atom). Are experimenters only allowed to analyze data if there is a good reason? Isn't the result of the analysis evidence for the reasonableness?
Also, can someone with a good understanding of statistics explain to me how coming up with a hypothesis before or after data collection changes anything? Do you actually use different statistical methods?
posted by Wet Spot at 11:12 AM on January 8, 2011
Wet spot: Also, can someone with a good understanding of statistics explain to me how coming up with a hypothesis before or after data collection changes anything? Do you actually use different statistical methods?
In classical statistics, statistical tests are constructed to keep the probability of a false alarm low (typically, 5%). That is, if there is TRULY no effect of a particular variable, then the probability of (incorrectly) concluding that there is an effect will be 5%. However, if you look at many variables at once, the probability that you'll get at least one false alarm increases. For instance, if you test two variables with no effect, the chance that you'll get a false alarm is 1-(1-.05)^2 = 0.0975. This is almost twice the false alarm probability you have with one test, and it grows with more tests.
Typically, to deal with this, what you do is you adjust the single-test criterion to be more conservative so that the probability of getting at least one false alarm across all tests is 5%. How you do this depends on how many tests you wanted to do in the first place, and whether you looked at the data first.
Why would looking at the data matter? Well, imagine that you have a bunch of variables, all of which truly have no effect. Just by chance, some effects will be bigger than others. If you restricted testing to only effects that look big, you increase your false alarm rate.
For more reading, check out the Wikipedia articles on multiple comparisons, and planned versus post hoc comparisons. As a Bayesian statistician, I'm obligated to point out that none of this is an issue in Bayesian statistics, due to the fact that the false alarm rate is not the focus in Bayesian methods.
posted by Philosopher Dirtbike at 11:49 AM on January 8, 2011 [11 favorites]
In classical statistics, statistical tests are constructed to keep the probability of a false alarm low (typically, 5%). That is, if there is TRULY no effect of a particular variable, then the probability of (incorrectly) concluding that there is an effect will be 5%. However, if you look at many variables at once, the probability that you'll get at least one false alarm increases. For instance, if you test two variables with no effect, the chance that you'll get a false alarm is 1-(1-.05)^2 = 0.0975. This is almost twice the false alarm probability you have with one test, and it grows with more tests.
Typically, to deal with this, what you do is you adjust the single-test criterion to be more conservative so that the probability of getting at least one false alarm across all tests is 5%. How you do this depends on how many tests you wanted to do in the first place, and whether you looked at the data first.
Why would looking at the data matter? Well, imagine that you have a bunch of variables, all of which truly have no effect. Just by chance, some effects will be bigger than others. If you restricted testing to only effects that look big, you increase your false alarm rate.
For more reading, check out the Wikipedia articles on multiple comparisons, and planned versus post hoc comparisons. As a Bayesian statistician, I'm obligated to point out that none of this is an issue in Bayesian statistics, due to the fact that the false alarm rate is not the focus in Bayesian methods.
posted by Philosopher Dirtbike at 11:49 AM on January 8, 2011 [11 favorites]
Wet Spot, coming up with a hypothesis before collecting the data is crucial for avoiding the sharpshooter fallacy. Large sets of data will always appear to have meaningful patterns. Imagine, for instance, a survey of people's star signs and whether they have ever had cancer; finding that Capricorns are most prone to cancer simply because they happen to be at the top of that list is quite different from making a prior prediction that (say) the three winter star signs are more prone to cancer because their immune systems are compromised, then finding that relationship in the data. Same for the difference between hunting for any pattern in the works of Shakespeare and predicting a specific pattern in a specific place.
posted by dontjumplarry at 12:03 PM on January 8, 2011 [1 favorite]
posted by dontjumplarry at 12:03 PM on January 8, 2011 [1 favorite]
The way I write a paper is that I start with an observation, which I show to be significant by doing an experiment, which I run three times. Then I start doing a theme and variation: what happens if I vary temperature? What happens if I use a different host? And so on. This not only gives me a meaty paper, but it also causes me to confirm my basic observation a number of times in different ways. That means you can pretty much trust my assertion about the basic observation. Here's what's making me so depressed: my agency (I work for the government) has decreed that we publish twice as many papers a year as we had been. This forces me to choose between working twice as hard (which would be very difficult, realistically) or to reduce the quality of my papers. If I reduce the number of experiments per paper, I will be risking doing exactly the sort of thing that happened here: misinterpreting data based on insufficient replication. The pressures on scientists to publish too many tiny papers is becoming a very big problem.
posted by acrasis at 12:59 PM on January 8, 2011 [3 favorites]
posted by acrasis at 12:59 PM on January 8, 2011 [3 favorites]
"(using Candid Camera as a teaching tool, for example)"
Alan Funt is an alum. Cornell tends to play up their alums (and yet I've still not read any Vonnegut).
posted by Eideteker at 1:25 PM on January 8, 2011
Alan Funt is an alum. Cornell tends to play up their alums (and yet I've still not read any Vonnegut).
posted by Eideteker at 1:25 PM on January 8, 2011
If I reduce the number of experiments per paper, I will be risking doing exactly the sort of thing that happened here: misinterpreting data based on insufficient replication. The pressures on scientists to publish too many tiny papers is becoming a very big problem.
That's actually not what happened here. Bem's paper has many experiments that all (under a particular interpretation...) provide consistent evidence. Also, Bem is a very senior, respected researcher. He is under no pressure to publish, ans he's especially not under pressure to publish this stuff. This isn't his main area of research.
posted by Philosopher Dirtbike at 2:14 PM on January 8, 2011 [1 favorite]
That's actually not what happened here. Bem's paper has many experiments that all (under a particular interpretation...) provide consistent evidence. Also, Bem is a very senior, respected researcher. He is under no pressure to publish, ans he's especially not under pressure to publish this stuff. This isn't his main area of research.
posted by Philosopher Dirtbike at 2:14 PM on January 8, 2011 [1 favorite]
It would be pretty simple to knock out a silly little Internet app that does dozens of simple experiments like that hidden porn thing. People could do dozens of trials per minute and it could easily triple his sample sizes in a day.
I'd say falsification should be easy here, but with this kind of ad hoc "supernatural" theorizing (like "people can see into the past, but only with porn for some reason, because that's what squinting at the datanoise shows right now") bullshit can live on forever. In the minds of the believers anyway.
posted by dgaicun at 2:54 PM on January 8, 2011
I'd say falsification should be easy here, but with this kind of ad hoc "supernatural" theorizing (like "people can see into the past, but only with porn for some reason, because that's what squinting at the datanoise shows right now") bullshit can live on forever. In the minds of the believers anyway.
posted by dgaicun at 2:54 PM on January 8, 2011
Not to defend Bem's work, but some of the criticisms in the rebuttal seem crazy to me. For example,I don't know enough about the study in question, but it's possible that this criticism is basically saying that if you look at random data in enough ways, you're bound to find a way that is an outlier in that particular test run (as opposed to due to something about the way itself).
Furthermore, in Bem’s Experiment 5 the analysis shows that “Women achieved a significant hit rate on the negative pictures, 53.6%, t(62) = 2:25, p = :014, d = :28; but men did not, 52.4%, t(36) = 0:89, p = :19, d = :15.” But why test for gender in the first place? There appears to be no good reason. Indeed, Bem himself states that “the psi literature does not reveal any systematic sex differences in psi ability”.
Uh, curiosity maybe?
That is, if he has a bunch of data about time-backwards ESP tests, and looks at it and sees...
"Catholics test average... left handed people test average... people with a college degree test average... Samoans test average... people who own dogs test average... divorced people test average... people over 40 years old test average... AH HAH! Women test above average!"... then the conclusion probably shouldn't be "Women are psychic"; you've looked at so many possible groups, none of them for any particular reason, that you're bound to have found one that was above average, even if there's no such thing as psychic ability whatsoever.
The conclusion should be "Let's try it again and see if women are above average next time, too". Because next time, it might be Samoans.
posted by Flunkie at 3:05 PM on January 8, 2011 [4 favorites]
I totally predicted this controversy...ten years ago.Keeping in line with the way that the paper indicates that ESP may work, I will predict it ten years from now.
posted by Flunkie at 3:07 PM on January 8, 2011
Could this prove to be psychology's cold fusion moment?
What? Precognition is true and we just don't know it yet?
Low Energy Nuclear Reactions.
posted by rough ashlar at 3:27 PM on January 8, 2011
What? Precognition is true and we just don't know it yet?
Low Energy Nuclear Reactions.
posted by rough ashlar at 3:27 PM on January 8, 2011
Could this prove to be psychology's cold fusion moment?
No, not psychology's... um, para-psychology's?
(Dr. Peter Venkman: Well, I have a PhD in parapsychology and psychology...)
posted by ovvl at 3:35 PM on January 8, 2011
No, not psychology's... um, para-psychology's?
(Dr. Peter Venkman: Well, I have a PhD in parapsychology and psychology...)
posted by ovvl at 3:35 PM on January 8, 2011
What? Precognition is true and we just don't know it yet?
Low Energy Nuclear Reactions.
Is there a crank cause you WON'T endorse?
posted by Pope Guilty at 4:32 PM on January 8, 2011 [1 favorite]
Low Energy Nuclear Reactions.
Is there a crank cause you WON'T endorse?
posted by Pope Guilty at 4:32 PM on January 8, 2011 [1 favorite]
Nelson, Berger and Berry explained this in a fairly lucid way. I have put their paper on my website here: Berger and Berry, 1988.
A personal derail here. Don Berry was on the faculty here at the University of Minnesota, barely older than I, a new graduate student, was at the time. He taught my first graduate statistics course. He left for Duke, then M.D. Anderson, and has now started his own company. He has always been one of the most lucid writers and speakers on statistics I have ever dealt with. He is also a gentleman and an avid athlete.
That said, the problem here goes deeper than any confusion about the objectivity of statistics. This particular corner of psychology seems to have wrapped itself around the axle by accepting research that ignores problems of post hoc analysis, multiple inference, and publication bias.
posted by Mental Wimp at 2:44 PM on January 9, 2011
A personal derail here. Don Berry was on the faculty here at the University of Minnesota, barely older than I, a new graduate student, was at the time. He taught my first graduate statistics course. He left for Duke, then M.D. Anderson, and has now started his own company. He has always been one of the most lucid writers and speakers on statistics I have ever dealt with. He is also a gentleman and an avid athlete.
That said, the problem here goes deeper than any confusion about the objectivity of statistics. This particular corner of psychology seems to have wrapped itself around the axle by accepting research that ignores problems of post hoc analysis, multiple inference, and publication bias.
posted by Mental Wimp at 2:44 PM on January 9, 2011
« Older Lone nut or something worse? | Just another Saturday afternoon art post Newer »
This thread has been archived and is closed to new comments
posted by Nelson at 10:44 AM on January 8, 2011