*The third in a series of seven fables/lessons/meditations on probability.*

The teacher led the student to a blind vendor who sold two types of incense, identical in appearance.

“This one he calls *Forest*,” the teacher said, holding up a speckled brown bundle that smelled of sandalwood and pine.

“And this one he calls *Tea Garden*,” the teacher said, holding up another.

The student sniffed the *Tea Garden*. “It has no odor,” she said.

“Ah, a shame,” the teacher said. “Only some people can smell *Tea Garden*.”

The student shrugged. “Well, let’s not buy any of that one.”

“He sells them in bags of two,” the teacher continued. “But he does not pay attention to which incense goes in which bag.”

The blind vendor had a massive crate of incense, with the two types mixed together. His hands dove in and out, grabbing another stick with each motion, and throwing it into a plastic bag. When a bag contained two sticks, he set it aside for sale.

“I’ll just smell the bags before we buy them,” the student said. “And we’ll only buy ones where I detect *Forest*.”

“Ah,” the teacher said. “But won’t we end up paying for lots of *Tea Garden*, too?”

“The bags we’ll pick already have one *Forest*,” the student said. “So there’s a 50% chance they’ll contain another *Forest*, and a 50% chance they’ll contain a *Tea Garden*. That seems worth the risk.”

“Then let’s buy 120 bags,” the teacher said. “Start choosing.”

When they got home, the teacher put her to work immediately. “Smell each stick of incense, one by one,” she said. “Then make two piles of bags. The first is for those with two *Forest*. The second is for those with only one *Forest*.” The teacher smiled. “And we shall see which pile is larger.”

“They’ll be the same size,” the student said. “Like I said—there’s a 1 in 2 chance the second stick is *Forest*, and a 1 in 2 chance that it’s *Tea Garden*.”

“But how do you know which is the second stick?” The teacher giggled and walked off.

By the time she finished sorting, the student had grown convinced something was wrong. The first pile—with two-*Forest* bags—held only 41 bags. The other pile—the one-*Forest*, one-*Tea Garden* bags—stood almost twice as high, with 79 bags.

“The vendor cheated us,” the student seethed. “He deliberately gave me extra *Tea Garden*,” the student said. “Only half of our bags should have *Tea Garden*. But instead, 2 out of 3 do.”

They returned the next day to the vendor’s stall. “Don’t just watch him,” the teacher said. “Help fill the bags yourself. You must see the incense as the vendor does, not as the customer does.”

So the student began to fill bags, smelling each stick of incense as she grabbed it from the crate, and noting the order.

After half an hour, she exclaimed suddenly, “I get it! There are four types of bags.”

“Four?”

“Yes. There’s *Forest* plus *Forest*. And *Tea Garden* plus *Tea Garden*. And then, there are two more possibilities. There’s *Tea Garden* plus *Forest*, and there’s *Forest* plus *Tea Garden*.”

“Those last two,” the teacher said, “aren’t they the same?”

“The bags look the same when you’re done,” the student said, “but they’re not *created* the same. The process for each one is different.

“When we came yesterday,” the student continued, “we eliminated one of the four possible bags—the ones with only *Tea Garden*. That left three other types of bags. So of the ones we brought home, the double-*Forest* should only be 1 in 3. I thought I’d only picked bags where the first stick was *Forest*, but that was wrong. In some of the bags I picked, *Forest* was the second stick.”

When they arrived home, the student lit a stick of incense and sat with her eyes closed. “This *Forest* smells different than it did yesterday,” she said.

The teacher smiled. “I’m not surprised,” she said. “That’s *Tea Garden*.”

**Further Thoughts**

This is a classic—and very tricky—problem. Usually, it’s presented the following way: “If a two-child family has at least one daughter, what is the probability that *both* their children are daughters?” Lots of people make the same mistake as the student in the story. The problem is an interesting introduction to the idea of “sample space,” because it requires careful thinking but almost no computation.

*Sample space*, by the way, is nothing but a fancy term for “list of possibilities.” The trick is that you’ve got to list possibilities that are equally likely. The student’s original sample space—*F *+ *F*, *TG* + *TG*, and “one of each”—was flawed, because the last item was twice as likely as either of the other two.

Technically, the incense scenario is slightly different from the daughter scenario. If you have a daughter, the probability your next child is also a daughter is precisely 50%. But if our crate of incense starts out with a 50-50 mix, then once we pick out a stick of *Forest*, there’s less *Forest* than *Tea Garden* remaining. So our probability is just below 50%. (Luckily, if there are thousands of sticks of incense, as in the story, then this change is barely noticeable.)

*Get the pdf, or check out other stories in the series!*

*I’d like to thank my father, James Orlin, for providing some foundational ideas for these stories, as well as helpful feedback and conversations. Also for being one happenin’ dude.*

This is an awesome story!! How do you present this to students (or do you)?

Thanks! I’m taking a year off from teaching, so I haven’t had the chance to do lessons around any of this content. I’d probably either:

(A) Read it in a few chunks during class, letting it structure the lesson, but not reading the second part (finding out that the bags have a 2:1 ratio) or the final part (explaining why this is) until students have reached those conclusions on their own–essentially, an adulterated version of a Dan Meyer 3-act lesson; or

(B) Assign it as supplemental, follow-up reading after a day of doing related problems.

There’s also some good (and tricky!) follow-up problems in the pdf (problems which my dad came up with). I’d probably use those either as an in-class challenge or as homework.

Actually the daughter scenario is more complicated. First, the probability of a daughter is slightly less than 50%. Second, the sexes of offspring are not quite independent—there can be biological reasons why one sex is more likely to come to term than the other (such as mutations on the X or Y chromosome from the father), which leads to a small positive correlation between sexes of siblings.

Ah, interesting!

Why is P(son) > P(daughter) in general? Is it because meiosis isn’t totally 50/50? Or because the XX genotype is somehow less likely to survive pregnancy?

See

http://www.livescience.com/33491-male-female-sex-ratio.html

http://en.wikipedia.org/wiki/Sex_ratio

and (best)

http://en.wikipedia.org/wiki/Human_sex_ratio

I think reason for “the probability of a daughter is slightly less than 50%” is due to the empirical statistics rather than classical statistics. In theory, the probability for having either son or daughter is 50%.

What “theory”? The theory of sex determination for offspring is amazingly complex, and no one has come up with a good model that effectively predicts male/female ratios under a variety of different nutrition, stress, and age conditions. It is much more complicated than flipping a coin, and simplifying it to 50/50 is only very roughly reflective of reality.

If you are going to do statistics without data, then don’t pretend you are talking about real-world phenomena like the sex of offspring.

Speaking to your comments about the complications of the question most studies are done on distribution of large populations. The problem described is, one family has two children you know one is a daughter what is the probability for the other child? Are there studies that attempt to find out the probability of same sex or opposite sex children in the set of children from the same two parents? From nothing but my own observation I would be extremely suspicious that it is NOT a 50/50 proposition. It would be interesting to find out.

http://www.genetics.org/content/15/5/445.full.pdf has an article from 1929 that address the question of correlation of gender. The authors concluded that there was a small positive correlation of the sex of siblings, but I’m not sure their results would hold up with a larger data set (they were using an 1889 dataset—they did not collect new data in 1929).

Awesome. Way further from 1:1 than I’d have guessed.

<3.

This

slightlydifferent presentation of the problem makes a big difference. Turns out, in this case, the probability is actually 1/2. It may explain why people have trouble stomaching the 2/3, which is the answer to the question you gave, Ben.For more on the difference between these two presentations of the problem, see James Tanton’s video here and the comment thread on Dave Richeson’s blog here.

Love the blog, by the way.

So in practical use doesn’t it mater primarily what problem you are trying to solve and making sure you frame the problem correctly? In the sticks of incense problem if I were to ask you, “What it the probability that the next stick I pick will be Forest?”, and you first asked me “What were the last two sticks you picked?” I would know that you were likely to answer the question incorrectly unless you changed you were trying to throw me off 🙂 because, assuming 50/50 in the box still, the odds are 50%. But if I told you that ultimately I would only pay for bag’s with 2 forests, how much are you charging me given that your source is blind, then framing the question to get the 2/3rds ones you would not pay for and 1/3rd ones you would is exactly the solution I need to price the bags to you properly. In fact if I still thought it was 50/50 I could literally go broke thinking I was doing really well.

PS: my apologies, I should have edited the above a few more times before hitting submit.

I think you’re right that framing is a major (even THE major) issue here. Probability is all about information, and since these minor adjustments in phrasing affect what information is being assumed, as well as what information is being asked for, they must also affect the result.

Thanks – I like that as a point of comparison. The idea that “What’s the probability that the other is also a boy?” and “What’s the probability that both are boys?” could have different answers captures some of what’s maddening about probability.

The answer to both of those questions is still 1/2. The important issue is the *volunteering of the information*.

A man says he has two children, and says one is a boy. What is the probability that

bothare boys? Answer: 1/2From the set of all possible 2-child families, a family is selected at random. If one child is a boy, what is the probability

bothare boys? Answer: 2/3See the difference in phrasing? The man who is volunteering information will definitely say “one is a boy” if both children are boys. But he will only volunteer this information half of the time if he has a girl and a boy. The other half of the time he will say “one is a girl”

Imagine 8 families of 2-kids each. We expect 2 family’s to have 2 boys, 2 families to have 2 girls, and 4 families to have one of each. All the parents of 2-boy families will volunteer the information “we have 1 boy”. And half of the 1-boy 1-girl families will volunteer the information “we have 1 boy”. That’s 4 total families out of 8.

Like I said, the volunteering information is the key. This is the same for the girl answering the door as well, that someone else commented on.

The chances that both are boys is roughly 100%, because only somebody being deliberately obtuse would ask such a question.

Meanwhile, we have qualifiers in language for good reason, and their existence helps to define statements that do not have said qualifiers.

“I have one boy.” Means what it says.

“I have at least one boy.” Means..also..what it says.

Ergo..if one is to say “I have one.” it follows that one !does not! have more than one.

So..anyway.

I don’t think you can say that the answer is 1/2 without explicitly stating that the man volunteers information about whether one of his children is a boy or girl at random.

If for example the man always volunteered the information that he had a boy, whenever he had either a boy and a girl, or a boy and a boy, then the probability that the other child is a boy, given that he has volunteered the information that one child is a boy, would be 1/3.

Yes, I agree, Josh. In my analysis I assumed what you said (random volunteering of information), but it’s always better to be clear. It could very well be true that he has a bias toward announcing that he has a son, and that changes the problem.

Ben

The way I was introduced to the daughter problem was “A family has two children. You knock on the door and a young girl answers. What is the probability that she has a sister?” Then the whole birth order issue (like the incense order issue) becomes an interesting part of the conversation. Floyd Bullard, from NCSSM, talked me through this one carefully and I still have his helpful emails stored somewhere but they are not handy right now. My memory is that it is not a 2/3 issue but a 1/2 issue. Once you know a girl has answered then the BB possibility of birth order is off the table. Now, the possibilities are BG, GB, G1G2, and G2G1. You don’t know if the girl who answered is the younger or older girl (if there are two girls at all) so what you are looking at is 4 possibilities (each roughly equal – I think you HAVE to work with 50% birth for each gender) where 2 of the cases involve another girl and two of the cases involve a brother. I may be misremembering here, I would welcome smart corrections.

That sounds like a good analysis. Another approach–simpler, but maybe more prone to error–would be to say, “We know this child is a daughter. But we have NO information about the other child whatsoever. So the best we can do is 50%.”

This is very similar to the Monty Hall problem. By throwing out Tea – Tea, you’ve actually gained information, albeit indirectly. (In the MH problem, he gives you information indirectly by showing you a goat.) In your problem, the student and teacher end up with a total of 1/3 Tea and 2/3 Forrest sticks, which is better than if they had just grabbed bags without smelling, but not as good as the 1/4 Tea and 3/4 Forrest sticks they were expecting.

A good comparison. As in Monty Hall, the elimination of one possibility gives us some new information, and new information means new probabilities.

Great post, but you should add a spoiler tag for the solution imo.

Does the story change if the incense vendor grabs 2 sticks at once and packages them together?

Good question! That shouldn’t change anything. We could just arbitrarily label one of those sticks as “first” and one as “second,” and then the same analysis applies.

I think the problem was failing to smell the Forest for the Teas.