Probability #2

tutorial

The Monty Hall Problem

You’ve surely heard of this one. It’s a really tough one to wrap your head around. I’ll describe the problem in order to solve it, but it’s worth looking into the background of the “Ask Marilyn” incident if you’re not familiar with it.

Behind Monty Hall’s Doors

The story shows that it’s not just hacks like myself that find this stuff non-intuitive. Lots of highly educated, very smart people got tricked by this one.

The problem is based on an old game show, “Let’s Make a Deal”. The host of the show was named Monty Hall. The contestant is shown three doors. Behind two doors are goats. Behind one is A NEW CAR!!!

The contestant chooses a door. If he chooses the one with THE NEW CAR, he gets to keep the car. If he chooses a goat door, he gets a goat. The unspoken assumption being that he’d rather have a car.

But… before the chosen prize is revealed, Monty Hall opens one of the two remaining doors, revealing a goat!

There are now two doors remaining. The one that the contestant chose, and one other one. Monty asks the contestant if he wants to stay with his choice, or if he’d like to switch to the other door.

What should he do?

To clarify, what should he do in order to optimize his chances of winning THE NEW CAR? Are the odds of winning the car better if he sticks with his original choice? Are they better if he switches? Does it just not matter?

Intuition tells us that it doesn’t matter. There are two doors. One has a goat, one has a car. The odds are 50/50. Doesn’t really matter if he sticks with the first choice or switches. The odds are the same.

The truth though? He should absolutely switch. It will significantly improve his odds of winning. In fact, he’s twice as likely to win if he switches than if he sticks with his original choice.

But why???

If you want a deep explanation… I mean really deep, just head over to the Monty Hall Problem page on wikipedia. But I’ll give my take on it.

When the contestant first chose a door, he had a 1 in 3 chance of winning the car. That’s pretty straightforward.

When Monty opens a door, he is not randomly opening a door. He’s opening a door with a goat behind it, always. So he is using his knowledge of what’s behind each door to change the nature of the game.

Let’s say we have doors A, B, and C. A and B have goats, C has a car.

There are three possible scenarios:

  1. Contestant chooses door A. Monty will open door B. If contestant switches doors, he wins.
  2. Contestant chooses door B. Monty opens A. If contestant switches, he wins.
  3. Contestant chooses door C. Monty opens either A or B. It doesn’t matter. If contestant switches, he loses.

So in two of the three scenarios, switching is the winning move.

Even though I’ve worked out the logic of this myself many times and done it in code more than once, it still feels vaguely magical.

Bring on the Code

Let’s start by playing the game 1000 times in a for loop. And we’ll start by setting up the three doors with goats behind them and swapping out a car on one of them.

// we'll play the game 1000 times.
for (let i = 0; i < 1000; i++) {
  // create 3 doors with goats behind them.
  let doors = ["goat", "goat", "goat"];

  // randomly choose one door and put a car behind it.
  let winner = Math.floor(Math.random() * 3);
  doors[winner] = "car";
  console.log(doors);
}

This outputs the 1000 random door configurations. Something like this:

[ 'car', 'goat', 'goat' ]
[ 'car', 'goat', 'goat' ]
[ 'goat', 'car', 'goat' ]
[ 'car', 'goat', 'goat' ]
[ 'goat', 'car', 'goat' ]
[ 'car', 'goat', 'goat' ]
[ 'goat', 'car', 'goat' ]
[ 'car', 'goat', 'goat' ]
[ 'car', 'goat', 'goat' ]
...

Now let’s have the contestant make a choice – 0, 1 or 2. Just to check ourselves, let’s see whether he wins a car and add up how many times he wins. Out of 1000 games, he should win somewhere close to 333.

let wins = 0;

// we'll play the game 1000 times.
for (let i = 0; i < 1000; i++) {
  // create 3 doors with goats behind them.
  let doors = ["goat", "goat", "goat"];
 
  // randomly choose one door and put a car behind it.
  let winner = Math.floor(Math.random() * 3);
  doors[winner] = "car";
  // console.log(doors);
 
  // choose a door and count how many wins we get
  let choice = Math.floor(Math.random() * 3);
  if (doors[choice] === "car") {
    wins++;
  }
}
// should be around 333
console.log(wins);

That checks out for me.

Next, we let Monty open a door to reveal a goat. There are probably plenty of clever ways to code this, but I’ll just loop through the three doors and choose one that is not the same as what the contestant chose, and make sure that it’s a goat door.

  ...
  // now monty chooses another door (must be a goat!)
  let montyChoice;
  for (let j = 0; j < 3; j++) {
    if (j != choice && doors[j] === "goat") {
      montyChoice = j;
      break;
    }
  }
  console.log(doors[montyChoice]);
  ...

Just to check myself, I logged what’s behind Monty’s door. Sure enough, nothing but goats.

Now we have the contestant’s choice and Monty’s choice. We can add the code back in that calculates how many times the contestant wins the car…

let wins = 0;
// we'll play the game 1000 times.
for (let i = 0; i < 1000; i++) {
  // create 3 doors with goats behind them.
  let doors = ["goat", "goat", "goat"];

  // randomly choose one door and put a car behind it.
  let winner = Math.floor(Math.random() * 3);
  doors[winner] = "car";
  // console.log(doors);

  // choose a door and count how many wins we get
  let choice = Math.floor(Math.random() * 3);

  // now monty chooses another door (must be a goat!)
  let montyChoice;
  for (let j = 0; j < 3; j++) {
    if (j != choice && doors[j] === "goat") {
      montyChoice = j;
      break;
    }
  }
  // console.log(doors[montyChoice]);
 
  // contestant does not switch
  if (doors[choice] === "car") {
    wins++;
  }
}
console.log(wins);

This isn’t any different than the first time we counted the wins. I consistently get numbers in the low 300s. Of course, because the fact that Monty opens a door doesn’t change the fact that there was a 1 in 3 chance of the contestant winning.

But let’s see what happens if the contestant switches doors. Again, you can get fancy here, but I’ll go brute force, looping through the doors till I find the one that is neither the contestant’s choice nor Monty’s choice. And I’ll count how many times he wins with that choice.

let wins = 0;
// we'll play the game 1000 times.
for (let i = 0; i < 1000; i++) {
  // create 3 doors with goats behind them.
  let doors = ["goat", "goat", "goat"];

  // randomly choose one door and put a car behind it.
  let winner = Math.floor(Math.random() * 3);
  doors[winner] = "car";
  // console.log(doors);

  // choose a door and count how many wins we get
  let choice = Math.floor(Math.random() * 3);

  // now monty chooses another door (must be a goat!)
  let montyChoice;
  for (let j = 0; j < 3; j++) {
    if (j != choice && doors[j] === "goat") {
      montyChoice = j;
      break;
    }
  }
  // console.log(doors[montyChoice]);
 
  // contestant does not switch
  // if (doors[choice] === "car") {
  //   wins++;
  // }
 
  // contestant switches
  let newChoice;
  for (let j = 0; j < 3; j++) {
    if (j != choice && j != montyChoice) {
      newChoice = j;
      break;
    }
  }
  if (doors[newChoice] === "car") {
    wins++;
  }
}
console.log(wins);

Running this I get numbers in the upper 600s! Winning roughly 2 out of 3 times, exactly as predicted.

Summary

So there you go. The code proves things out. But it still feels a bit magical.

I did think of another though experiment that helps to make it make sense. Let’s say there were four doors – three goats, one NEW CAR. Contestant chooses one, and Monty Hall opens two of them to reveal two goats. Or better yet, there are ten doors. Contestant chooses one and Monty opens eight. The only way he’d lose by switching is if he had chosen the car to begin with. And there was only a 1 in 10 chance he did that. So if he switches now, he’s got a 9 out of 10 chance of winning! That helps my brain a bit. But still seems a bit magical.

Probability #1

tutorial

I started re-reading The Drunkard’s Walk again today.

I read it a few years ago and remember really liking it. There are lots of examples in there of situations that seem to defy logic, or at least defy our sense of what is logical. But these are provable mathematically using the basic probability.

Whenever I come across a problem like this, even once I get my head around it as much as I can, I like to write some code to prove it out.

One of my favorite such problems is the two children problem. I haven’t gotten to it in the book yet, but I know it’s coming up. It’s a classic. Here it is, my paraphrase:

A woman says she has two children. She says one of them is a boy. What are the odds that the other one is a boy?

The obvious answer is 50%. There’s a child you don’t know about. It’s either a girl or a boy. Everything else is irrelevant, right?

Nope. Actually the odds are 1 in 3 that the other child a boy, 2 in 3 that it’s a girl.

And the really odd part of it is you can change the wording in a way that seems to make no difference, but totally changes it:

A woman says she has two children. She says the first born one is a boy. What are the odds that the other one is a boy?

In this case, yes, the odds are 50% that the other child is a boy.

To understand the odds of a particular situation occurring, such as the genders of two children, you have to consider all possible arrangements and then how many of those arrangements satisfy the criteria. Then divide.

In this case, we have a family who had one child, then another child. They may have had a boy first, then a girl. Or maybe a boy, then another boy. Or a girl and then a boy. Or a girl and another girl. That’s four possibilities:

  • boy boy
  • boy girl
  • girl boy
  • girl girl

So in the first problem, the mother says that one of her children is a boy. This narrows us down to just three possibilities.

  • boy boy
  • boy girl
  • girl boy

In all three of those, one of the children is a boy. In two of them the other child is a girl and in only one, the other child is also a boy. So, 1 in 3 for boy, 2 in 3 for girl.

But let’s look at the second problem. Mom says that her first child is a boy. That gives us only two possibilities:

  • boy boy
  • boy girl

In one of those, the other child is a girl and in one, it’s a boy. 50/50.

Prove it with Code

I’m doing this with JavaScript using node.js. But use whatever you want.

For probability situations, it helps to have a large number of samples. So let’s make 1000 families. Each family will have two children. We can represent these by strings: “bb” means they had a boy, then another boy. “gb” means they had a girl, then a boy, in that order. Likewise, “bg” means the opposite order and “gg” means they had two girls. We’ll store all the families in an array, and we’ll actually go through one by one, randomly choosing the gender of the first child, and then the second child.

let families = [];

// create 1000 families with two randomly gendered children.
for (let i = 0; i < 1000; i++) {
  // no kids yet.
  let family = "";

  // first child
  if (Math.random() < 0.5) {
    family += "b";
  } else {
    family += "g";
  }

  // second child
  if (Math.random() < 0.5) {
    family += "b";
  } else {
    family += "g";
  }

  families.push(family);
}
console.log(families);

This should give you something like the following output, with 1000 total strings.

['gb', 'gb', 'gg', 'gb', 'gb', 'gg', 'bb', 'bg', 'bb', 'gb',
'bg', 'gg', 'gg', 'bg', 'bg', 'gg', 'gg', 'bg', 'gb', 'gg',
'gg', 'gb', 'bb', 'bg', 'gg', 'gb', 'gg', 'gb', 'bb', 'gb',
'bg', 'bb', 'gg', 'gb', 'gb', 'bb', 'bg', 'bg', 'gb', 'bb',
'gb', 'gb', 'gg', 'gg', 'bb', 'bb', 'bb', 'gb', 'gg', 'gb',
'gb', 'bg', 'gg', 'bg', 'bg', 'gb', 'bg', 'gg', 'gg', 'bg',
'bb', 'gb', 'bg', 'gb', 'gg', 'gg', 'bg', 'bg', 'gg', 'bb',
'gb', 'bg', 'gg', 'gb', 'bg', 'bg', 'bg', 'gg', 'bb', 'gb',
'gg', 'bb', 'bb', 'gg', 'gg', 'bb', 'gg', 'gg', 'bg', 'bb',
'bb', 'gg', 'gg', 'gg', 'gg', 'bg', 'gg', 'gg', 'bg', 'gg',
...
]

We have multiples of every type of family in there: “bb”, “bg”, “gg”, “gb”. Now, the mother said that one of the children was a boy. So let’s filter this down to only the families that have a “b” in them.

// now lets get all the families who have at least one boy
let oneBoyFamilies = families.filter(family => family.indexOf("b") > -1);
console.log(oneBoyFamilies);

My var name isn’t the greatest. Consider oneBoyFamilies to mean “at least one boy”. This should give you something like the following output.

['gb', 'gb', 'gb', 'gb', 'bb', 'bg', 'bb', 'gb', 'bg', 'bg',
'bg', 'bg', 'gb', 'gb', 'bb', 'bg', 'gb', 'gb', 'bb', 'gb',
'bg', 'bb', 'gb', 'gb', 'bb', 'bg', 'bg', 'gb', 'bb', 'gb',
'gb', 'bb', 'bb', 'bb', 'gb', 'gb', 'gb', 'bg', 'bg', 'bg',
'gb', 'bg', 'bg', 'bb', 'gb', 'bg', 'gb', 'bg', 'bg', 'bb',
'gb', 'bg', 'gb', 'bg', 'bg', 'bg', 'bb', 'gb', 'bb', 'bb',
'bb', 'bg', 'bb', 'bb', 'bg', 'bg', 'bg', 'bb', 'bb', 'bb',
'gb', 'bb', 'bb', 'bb', 'bg', 'bg', 'bg', 'gb', 'bb', 'bb',
'gb', 'bb', 'bg', 'bg', 'bb',
...
]

I don’t see any “gg”s in there, but just to be sure, we can say:

// validate that there are no families with two girls here
console.log(oneBoyFamilies.indexOf("gg"));

If we’ve done that right, we should get a -1 here. I do, so I’m satisfied.

Now, to find families where the other child is a boy, we need to look for families consisting of “bb”. But for families where the other child is a girl, we need to look for either “gb” or “bg”. Already you can see why it’s 2 to 1 in favor of the other child being a girl. But here’s the code:

let otherChildIsABoy = oneBoyFamilies.filter(family => family === "bb");
let otherChildIsAGirl = oneBoyFamilies.filter(family => family === "bg" || family == "gb");
console.log("boy: " + otherChildIsABoy.length);
console.log("girl: " + otherChildIsAGirl.length);

Alternately, for the girls, you could do something like:

let otherChildIsAGirl = oneBoyFamilies.filter(family => family.indexOf("g") > -1);

Shouldn’t make any difference. With either method, I consistently get in the mid-200s for boys and around 500 for girls. Total would be around 750-ish, which makes sense since we filtered out one of the four arrangements (“gg”). So, 1 in 3 for boys, 2 in 3 for girls. Spot on.

Finally, let’s do the second version. Here, the mom says her first child is a boy. So for that we have to search for families where the first character is a “b”. Then we just get the count of the resulting families where the second character is a “b” and the count where it’s “g”.

// this time, lets get all the families where the FIRST child is a boy
let firstBoyFamilies = families.filter(family => family.charAt(0) === "b");
console.log(firstBoyFamilies);

let secondChildIsABoy = firstBoyFamilies.filter(family => family.charAt(1) === "b");
let secondChildIsAGirl = firstBoyFamilies.filter(family => family.charAt(1) === "g");

console.log("boy: " + secondChildIsABoy.length);
console.log("girl: " + secondChildIsAGirl.length);

The results you see here will vary. Sometimes you’ll get more boys than girls as the second child. Sometimes the opposite. Occasionally they’ll be dead even. We’ll call that 50%.

Why am I doing this?

This is firmly in the realm of recreational mathematics. I just find it really interesting to prove these things out even when my brain doesn’t agree 100% all the time. I enjoy doing it and I’ll probably do some more.

PS:

Here’s all the code in one spot:

let families = [];
// create 1000 families with two randomly gendered children.
for (let i = 0; i < 1000; i++) {
  // no kids yet.
  let family = "";

  // first child
  if (Math.random() < 0.5) {
    family += "b";
  } else {
    family += "g";
  }

  // second child
  if (Math.random() < 0.5) {
    family += "b";
  } else {
    family += "g";
  }

  families.push(family);
}
console.log(families);

// now lets get all the families who have at least one boy
let oneBoyFamilies = families.filter(family => family.indexOf("b") > -1);
console.log(oneBoyFamilies);

// validate that there are no families with two girls here
console.log(oneBoyFamilies.indexOf("gg"));

let otherChildIsABoy = oneBoyFamilies.filter(family => family === "bb");
let otherChildIsAGirl = oneBoyFamilies.filter(family => family === "bg" || family == "gb");
// alternately…
// let otherChildIsAGirl = oneBoyFamilies.filter(family => family.indexOf("g") > -1);

console.log("boy: " + otherChildIsABoy.length);
console.log("girl: " + otherChildIsAGirl.length);

// this time, lets get all the families where the FIRST child is a boy
let firstBoyFamilies = families.filter(family => family.charAt(0) === "b");
console.log(firstBoyFamilies);

let secondChildIsABoy = firstBoyFamilies.filter(family => family.charAt(1) === "b");
let secondChildIsAGirl = firstBoyFamilies.filter(family => family.charAt(1) === "g");

console.log("boy: " + secondChildIsABoy.length);
console.log("girl: " + secondChildIsAGirl.length);