Norvig Does Probability
October 15, 2015 8:43 AM Subscribe
A delightful exposition on probability and related paradoxes.
Peter Norvig is director of research at Google (previously). He has also solved Sudoku.
Peter Norvig is director of research at Google (previously). He has also solved Sudoku.
He's very careful to show how these apparent paradoxes come from different interpretations of (possibly not well-formed) questions. I like it.
posted by RustyBrooks at 9:05 AM on October 15, 2015 [6 favorites]
posted by RustyBrooks at 9:05 AM on October 15, 2015 [6 favorites]
This is a lot better than I was expecting. One of the better treatments of "a boy born on Tuesday" that I've seen.
posted by Proofs and Refutations at 9:28 AM on October 15, 2015 [2 favorites]
posted by Proofs and Refutations at 9:28 AM on October 15, 2015 [2 favorites]
Damnit, does the airplane take off ?
posted by k5.user at 9:38 AM on October 15, 2015 [2 favorites]
posted by k5.user at 9:38 AM on October 15, 2015 [2 favorites]
Only if your clothes are on the lower peg, and your brother is getting a haircut.
posted by Samuel Farrow at 9:48 AM on October 15, 2015 [2 favorites]
posted by Samuel Farrow at 9:48 AM on October 15, 2015 [2 favorites]
This is a great explanation of some fun paradoxes.
One of my favorite ways of making sense of probability is by imagining that you're gambling. Somehow, when you put money on the line, even in your imagination, things get clearer (at least for me!)
For instance, I find that the Sleeping Beauty problem makes more sense if, instead of just asking Sleeping Beauty what her belief is that the coin came up heads, you have her make a bet every time she wakes up. Then, it is clear that it is better for her to bet on tails then on heads -- so the probabilities of the two of them can't be equal.
posted by goingonit at 10:24 AM on October 15, 2015
One of my favorite ways of making sense of probability is by imagining that you're gambling. Somehow, when you put money on the line, even in your imagination, things get clearer (at least for me!)
For instance, I find that the Sleeping Beauty problem makes more sense if, instead of just asking Sleeping Beauty what her belief is that the coin came up heads, you have her make a bet every time she wakes up. Then, it is clear that it is better for her to bet on tails then on heads -- so the probabilities of the two of them can't be equal.
posted by goingonit at 10:24 AM on October 15, 2015
But is that a difference in probability, or just a difference in payoffs?
If I flip a coin and say if it's heads, then the value of your bet is doubled, it's logical to bet on heads, even though the odds are still equal.
posted by RobotHero at 10:31 AM on October 15, 2015
If I flip a coin and say if it's heads, then the value of your bet is doubled, it's logical to bet on heads, even though the odds are still equal.
posted by RobotHero at 10:31 AM on October 15, 2015
I am a thirder with respect to the Sleeping Beauty Problem, so maybe this is odd to say, but I'm not at all satisfied with his discussion of that problem. The main difficulty I have is that in that problem (but not in general, as you see later on) he follows Laplace in endorsing the principle of indifference and assigns equal probability to the events in the sample space:
A = {<heads, Monday, awake>, <heads, Tuesday, asleep>, <tails, Monday, awake>, <ails, Tuesday, awake>}
But nothing in the axioms of probability theory require equal assignments of probabilities here, and the most serious paradoxes in probability theory (like Bertrand's Paradox and its relatives) are all related to the principle of indifference (making that principle very controversial).
Suppose then that a halfer replies by saying:
To make things worse, the halfer can say, "Look, you're not really assigning equal probabilities to all of the points in the sample space either, since the full sample space is really much bigger than the one you are imagining." After all, the full sample space is:
S = {<heads, Monday, awake>, <heads, Monday, asleep>,
<heads, Tuesday, awake>, <heads, Tuesday, asleep>,
<tails, Monday, awake>, <tails, Monday, asleep>,
<tails, Tuesday, awake>, <tails, Tuesday, asleep>}
The thirder assigns probability zero to a bunch of these values. Why? And why shouldn't the halfer get to assign zeroes to one more point in the space and distribute the remaining probabilities unequally?
Complicating things further, it's really unclear in the problem how the thirder's assignment of Pr(<heads, Tuesday, asleep>) = 1/4 makes any sense. These are supposed to be probabilities from Beauty's point of view. But doesn't she know that she is awake?
posted by Jonathan Livengood at 10:39 AM on October 15, 2015
A = {<heads, Monday, awake>, <heads, Tuesday, asleep>, <tails, Monday, awake>, <ails, Tuesday, awake>}
But nothing in the axioms of probability theory require equal assignments of probabilities here, and the most serious paradoxes in probability theory (like Bertrand's Paradox and its relatives) are all related to the principle of indifference (making that principle very controversial).
Suppose then that a halfer replies by saying:
Yeah, A is the right sample space, but we should not assign equal probabilities to the simple events because they are not all equally likely; we should assign probabilities like this:What does the thirder say at this point? It isn't as simple as pointing out that the assignments are not equal: the halfer admits that the assignments are not equal and thinks it is a virtue of her approach.
Pr(<heads, Monday, awake>) = 1/2
Pr(<heads, Tuesday, asleep>) = 0
Pr(<tails, Monday, awake>) = 1/4
Pr(<tails, Tuesday, awake>) = 1/4
To make things worse, the halfer can say, "Look, you're not really assigning equal probabilities to all of the points in the sample space either, since the full sample space is really much bigger than the one you are imagining." After all, the full sample space is:
S = {<heads, Monday, awake>, <heads, Monday, asleep>,
<heads, Tuesday, awake>, <heads, Tuesday, asleep>,
<tails, Monday, awake>, <tails, Monday, asleep>,
<tails, Tuesday, awake>, <tails, Tuesday, asleep>}
The thirder assigns probability zero to a bunch of these values. Why? And why shouldn't the halfer get to assign zeroes to one more point in the space and distribute the remaining probabilities unequally?
Complicating things further, it's really unclear in the problem how the thirder's assignment of Pr(<heads, Tuesday, asleep>) = 1/4 makes any sense. These are supposed to be probabilities from Beauty's point of view. But doesn't she know that she is awake?
posted by Jonathan Livengood at 10:39 AM on October 15, 2015
What does the thirder say at this point?
I think you need to discriminate between prior and posterior probabilities, no?
posted by MisantropicPainforest at 11:06 AM on October 15, 2015
I think you need to discriminate between prior and posterior probabilities, no?
posted by MisantropicPainforest at 11:06 AM on October 15, 2015
I wish they had used a different presentation, I personally had a harder time follow exactly what they were doing with the code.
posted by Carillon at 11:19 AM on October 15, 2015 [2 favorites]
posted by Carillon at 11:19 AM on October 15, 2015 [2 favorites]
I think you need to discriminate between prior and posterior probabilities, no?
Yes. Does that mean I'm missing something obvious here?
I thought my worry about Pr(<heads, Tuesday, asleep>) = 1/4 was exactly a worry about whether it's a posterior probability or a prior probability. If it's posterior, then it should be conditional on Beauty being awake. In which case, why isn't it 0? If it's prior, then why is it 1/4 in space A, rather than 1/8 in space S? And more generally, if it's prior, then there is still freedom to assign anything you like consistent with the axioms ... so what motivates some prior assignment in line with what a thirder wants as opposed to a prior assignment that a halfer might want?
posted by Jonathan Livengood at 11:39 AM on October 15, 2015
Yes. Does that mean I'm missing something obvious here?
I thought my worry about Pr(<heads, Tuesday, asleep>) = 1/4 was exactly a worry about whether it's a posterior probability or a prior probability. If it's posterior, then it should be conditional on Beauty being awake. In which case, why isn't it 0? If it's prior, then why is it 1/4 in space A, rather than 1/8 in space S? And more generally, if it's prior, then there is still freedom to assign anything you like consistent with the axioms ... so what motivates some prior assignment in line with what a thirder wants as opposed to a prior assignment that a halfer might want?
posted by Jonathan Livengood at 11:39 AM on October 15, 2015
Oh, ipython notebooks! (or Jupyter), is there nothing you can't do?
posted by signal at 12:02 PM on October 15, 2015 [4 favorites]
posted by signal at 12:02 PM on October 15, 2015 [4 favorites]
I always have high expectations for Peter Norvig, and every time he exceeds them. This is a clear (and kind) explanation of tricky statistics, and the way he solves problems with such a small amount of clear code is something I keep in mind and aspire to when doing my own coding.
posted by jjwiseman at 1:23 PM on October 15, 2015 [1 favorite]
posted by jjwiseman at 1:23 PM on October 15, 2015 [1 favorite]
Jonathan Livengood: The Beauty who gets woken up twice counts for twice as much as the Beauty who gets woken up once in the thirder interpretation.
Let's assume we have two Beauties, one who gets woken up once ("heads") and one who gets woken up twice ("tails"). Since neither one knows which one they are, let's have them both guess the same thing. If they both guess "heads", they'll be right 1 out of 3 times. If they both guess "tails", they'll be right 2 out of 3 times.
So even though only half of Beauties are right in either case, two-thirds of questions can be answered correctly if they both guess "tails". Does that make sense?
This is what Norvig was saying about how it's important to define what the "experiment" is very carefully in order to answer the question. Yes, it's really a question about the underlying mechanism of assigning probabilities, but I feel it was well-defined in this case.
posted by goingonit at 2:21 PM on October 15, 2015 [1 favorite]
Let's assume we have two Beauties, one who gets woken up once ("heads") and one who gets woken up twice ("tails"). Since neither one knows which one they are, let's have them both guess the same thing. If they both guess "heads", they'll be right 1 out of 3 times. If they both guess "tails", they'll be right 2 out of 3 times.
So even though only half of Beauties are right in either case, two-thirds of questions can be answered correctly if they both guess "tails". Does that make sense?
This is what Norvig was saying about how it's important to define what the "experiment" is very carefully in order to answer the question. Yes, it's really a question about the underlying mechanism of assigning probabilities, but I feel it was well-defined in this case.
posted by goingonit at 2:21 PM on October 15, 2015 [1 favorite]
And again this is why I like betting. Sure, the halfer can claim that your assignment of probabilities is arbitrary, but the thirder will win the money in the end (assuming that the beauties are really assigned with a fair coin flip, etc.)
posted by goingonit at 2:26 PM on October 15, 2015
posted by goingonit at 2:26 PM on October 15, 2015
As I said up front: I am a thirder about the sleeping beauty problem. You don't need to convince me to accept Norvig's conclusion. I already accept the conclusion. (Though I will note that some very very smart people I respect a lot think the halfer position is correct, which gives me pause.) What I have a problem with is the reasoning that gets him there -- and his unnecessary swipe at philosophical arguments.
My main gripe is that we can't -- as Norvig suggests -- go straight from the sample space to the solution. We need to assign probabilities to the simple events in the sample space. To get Norvig's answer, we need to apply the principle of indifference to his proposed sample space (or some other principle that gives the same answer in this case). But the principle of indifference is notoriously fiddly. Why think that it is being correctly applied here? Or that it ought to be applied here at all?
posted by Jonathan Livengood at 2:33 PM on October 15, 2015
My main gripe is that we can't -- as Norvig suggests -- go straight from the sample space to the solution. We need to assign probabilities to the simple events in the sample space. To get Norvig's answer, we need to apply the principle of indifference to his proposed sample space (or some other principle that gives the same answer in this case). But the principle of indifference is notoriously fiddly. Why think that it is being correctly applied here? Or that it ought to be applied here at all?
posted by Jonathan Livengood at 2:33 PM on October 15, 2015
But I think he covers that. "A fair coin will be tossed to determine which experimental procedure to undertake: if the coin comes up heads, Beauty will be awakened and interviewed on Monday only. If the coin comes up tails, she will be awakened and interviewed on Monday and Tuesday."
This suggests that we can use the properties of an event we already know about (an idealized fair coin flip) to assign probabilities to the different events in the sample space.
Can you think of a way to assign non-uniform probabilities to the events in the sample space without it implying that there's not a 50/50 chance the coin comes up heads?
posted by goingonit at 2:46 PM on October 15, 2015
This suggests that we can use the properties of an event we already know about (an idealized fair coin flip) to assign probabilities to the different events in the sample space.
Can you think of a way to assign non-uniform probabilities to the events in the sample space without it implying that there's not a 50/50 chance the coin comes up heads?
posted by goingonit at 2:46 PM on October 15, 2015
Or is the issue that Beauty is saying "what was the prior chance that I would be awake? What was the prior chance that it was monday?" and thus assigning different probabilities to the sample space? I suppose you get in to frequentism vs. Bayesianism here...so if you run the experiment *exactly once* you could get in trouble with the Bayesians somehow.
But as long as you performed the experiment multiple times, the Beauties who guessed tails would be right much more often in the end. Call me a frequentist, I guess, but I'm satisfied with that.
posted by goingonit at 2:51 PM on October 15, 2015
But as long as you performed the experiment multiple times, the Beauties who guessed tails would be right much more often in the end. Call me a frequentist, I guess, but I'm satisfied with that.
posted by goingonit at 2:51 PM on October 15, 2015
Well, what do you mean by there being a 50/50 chance that the coin comes up heads? If you look at the non-uniform distribution I recommended to halfers, you will see that Pr(heads) = Pr(<heads, Monday, awake>) = 1/2. So, it looks like the halfer agrees that the coin has a 50/50 chance of coming up heads. (Indeed, Lewis says as much in his halfer reply to Elga (pdf).)
posted by Jonathan Livengood at 2:54 PM on October 15, 2015
posted by Jonathan Livengood at 2:54 PM on October 15, 2015
But wouldn't Lewis admit that even if Beauty is in some sense "right" to say that Pr(heads) = 0.5, she would be correct more often on the whole if she always answered "tails" when someone asked her, "which way did the coin come up?"
I am willing to accept that there could be a sense in which the right answer is 0.5, but from the experimenter's point of view, Beauty would do better to believe otherwise. And so if Beauty knows the "rules of the game", which are specified not from her viewpoint but from the experimenter's, she can "play by the rules" to get Pr(heads) = 1/3.
And this is where Norvig is going. He gives a definition of probability (the frequentist one!) and follows it to derive the "correct" (frequentist) answer to a bunch of different questions. And this is the tool he is trying to equip you with:
posted by goingonit at 3:18 PM on October 15, 2015
I am willing to accept that there could be a sense in which the right answer is 0.5, but from the experimenter's point of view, Beauty would do better to believe otherwise. And so if Beauty knows the "rules of the game", which are specified not from her viewpoint but from the experimenter's, she can "play by the rules" to get Pr(heads) = 1/3.
And this is where Norvig is going. He gives a definition of probability (the frequentist one!) and follows it to derive the "correct" (frequentist) answer to a bunch of different questions. And this is the tool he is trying to equip you with:
We've seen how to manage probability paradoxes. Just be explicit about what the problem says, and then methodical about defining the sample space, and finally be careful in counting the number of outcomes in the numerator and denominatorAnd it is! As long as you are explicit about the terms of the problem -- that you're always assuming an outside observer conducting an arbitrarily large number of identical experiments -- you'll have an easy time. Right?
posted by goingonit at 3:18 PM on October 15, 2015
And this is where Norvig is going. He gives a definition of probability (the frequentist one!) and follows it to derive the "correct" (frequentist) answer to a bunch of different questions. ... As long as you are explicit about the terms of the problem -- that you're always assuming an outside observer conducting an arbitrarily large number of identical experiments -- you'll have an easy time. Right?
Oh, no. I don't think so at all. In the first place, the Laplacean view that Norvig endorses is NOT frequentist. And the reasoning that he gives is not frequentist, either. He wants to set up a sample space in which all the simple events are equally probable and then count. I think it's not that easy and that reasoning this way doesn't always give easy or even correct answers.
I find the hypothetical frequentist reasoning in the Sleeping Beauty Problem intuitively appealing (even though I'm not a frequentist). But note that the result isn't simply read off of the sample space.
I wonder if part of why it doesn't seem paradoxical to Norvig (or to you?) is the way the situation is described in his version. Try the following reformulation.
Beauty is told all about the following set-up.
Beauty will be put to sleep on Sunday. She will be awakened on Monday. She will be told that it is Monday. She will be asked whether a fair coin flip later that day will come up heads or tails. She is put to sleep and then the coin is flipped. If the coin comes up heads, then she will be left to sleep until Wednesday when the experiment ends. If the coin comes up tails, then she will be awakened on Tuesday. In each case when she is put to sleep, her memory is erased so that during the experiment, she doesn't know without being told whether it is Monday or Tuesday.
When she is awakened on Monday and told that it is Monday, what credence should Beauty have that the coin will come up tails?
Lewis says the answer is easy: Apply the principal principle, which says that if you know the objective chances, you should align your credences with them. Since the objective chance of a fair coin landing tails is 1/2, Beauty should have credence 1/2 in tails.
As to what Lewis would say to your question: I have no idea! I thought he should have been persuaded by Elga's paper (pdf). But he wasn't. ;)
posted by Jonathan Livengood at 3:32 PM on October 15, 2015
Oh, no. I don't think so at all. In the first place, the Laplacean view that Norvig endorses is NOT frequentist. And the reasoning that he gives is not frequentist, either. He wants to set up a sample space in which all the simple events are equally probable and then count. I think it's not that easy and that reasoning this way doesn't always give easy or even correct answers.
I find the hypothetical frequentist reasoning in the Sleeping Beauty Problem intuitively appealing (even though I'm not a frequentist). But note that the result isn't simply read off of the sample space.
I wonder if part of why it doesn't seem paradoxical to Norvig (or to you?) is the way the situation is described in his version. Try the following reformulation.
Beauty is told all about the following set-up.
Beauty will be put to sleep on Sunday. She will be awakened on Monday. She will be told that it is Monday. She will be asked whether a fair coin flip later that day will come up heads or tails. She is put to sleep and then the coin is flipped. If the coin comes up heads, then she will be left to sleep until Wednesday when the experiment ends. If the coin comes up tails, then she will be awakened on Tuesday. In each case when she is put to sleep, her memory is erased so that during the experiment, she doesn't know without being told whether it is Monday or Tuesday.
When she is awakened on Monday and told that it is Monday, what credence should Beauty have that the coin will come up tails?
Lewis says the answer is easy: Apply the principal principle, which says that if you know the objective chances, you should align your credences with them. Since the objective chance of a fair coin landing tails is 1/2, Beauty should have credence 1/2 in tails.
As to what Lewis would say to your question: I have no idea! I thought he should have been persuaded by Elga's paper (pdf). But he wasn't. ;)
posted by Jonathan Livengood at 3:32 PM on October 15, 2015
This is good stuff, but the part I don't really go along with is summed up in the his remark:
First, if you want to convince me, show me a sample space; don't just make philosophical arguments. (Although a philosophical argument can be employed to help you define the right sample space.)
I mean, once you decide that what probability means is "divide the measure of one set of outcomes by the measure of the sample space" then, yes, there aren't any paradoxes, because you're just doing real analysis. But lots of things we're uncertain about don't admit a natural sample space, at least not one that I can see.
If you want to convince me about the Sleeping Beauty problem, make a philosophical argument; don't just show me a sample space. I could have done that myself.
posted by escabeche at 9:02 PM on October 15, 2015
First, if you want to convince me, show me a sample space; don't just make philosophical arguments. (Although a philosophical argument can be employed to help you define the right sample space.)
I mean, once you decide that what probability means is "divide the measure of one set of outcomes by the measure of the sample space" then, yes, there aren't any paradoxes, because you're just doing real analysis. But lots of things we're uncertain about don't admit a natural sample space, at least not one that I can see.
If you want to convince me about the Sleeping Beauty problem, make a philosophical argument; don't just show me a sample space. I could have done that myself.
posted by escabeche at 9:02 PM on October 15, 2015
The article is excellent, but does Norvig's coding style strike anyone else as extremely terse? It's super-clever once you just read the statements as basically ordinary English (which his intention-revealing names help with), but I found it really hard to keep up with once I started forgetting the implementations of earlier functions.
posted by AABoyles at 7:01 AM on October 16, 2015
posted by AABoyles at 7:01 AM on October 16, 2015
The coding is terse, but I think it's because he's really talking about the math and not the code. OTOH I'm totally in love with this style of notebook presentation right now. It's such a powerful tool, being able to put working executable code and visualizations right in line with text in a web page.
posted by Nelson at 7:45 AM on October 16, 2015 [2 favorites]
posted by Nelson at 7:45 AM on October 16, 2015 [2 favorites]
I think I've got a grip on the difference in interpretation for Sleeping Beauty.
The 1/3 interpretation thinks of it as randomly sampling all the possible occasions she can be asked the question. And that all three are equally likely.
Under the 1/2 interpretation, it's the coin flip that's being randomly sampled, and how many times she is asked the question is irrelevant. If the coin came up tails, then there is 50% chance that it's Monday and 50% chance that it's Tuesday. If the coin came up heads, then there is 100% chance that it is Monday.
posted by RobotHero at 1:09 PM on October 16, 2015
The 1/3 interpretation thinks of it as randomly sampling all the possible occasions she can be asked the question. And that all three are equally likely.
Under the 1/2 interpretation, it's the coin flip that's being randomly sampled, and how many times she is asked the question is irrelevant. If the coin came up tails, then there is 50% chance that it's Monday and 50% chance that it's Tuesday. If the coin came up heads, then there is 100% chance that it is Monday.
posted by RobotHero at 1:09 PM on October 16, 2015
related article on displaying probabilities.
(both links via data elixir newsletter btw).
posted by andrewcooke at 1:16 PM on October 16, 2015 [1 favorite]
(both links via data elixir newsletter btw).
posted by andrewcooke at 1:16 PM on October 16, 2015 [1 favorite]
Ooh, that colah blog looks very cool. Thanks for linking.
posted by benito.strauss at 1:19 PM on October 16, 2015 [1 favorite]
posted by benito.strauss at 1:19 PM on October 16, 2015 [1 favorite]
« Older "You’re so sweet, Bonney. You’re too sweet." | It seemed nearly impossible for a movie to fail by... Newer »
This thread has been archived and is closed to new comments
posted by savetheclocktower at 8:59 AM on October 15, 2015 [3 favorites]