The sample size fallacy is the failure to take sample size into account when estimating the probability of obtaining a particular value in a sample drawn from a known population. It was first reported by Daniel Kahneman and Amos Tversky in a 1972 article in the journal Cognitive Psychology. In their experiment participants were asked to estimate the probability of a group of people having an average height of more than six feet (1.8 m). The participants produced almost identical estimates for group sizes of 10, 100, and 1000, despite the probability of an usually high average being much greater in a small sample than a large one.[1]

In another example subjects were asked the following question:

A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know, about 50% of all babies are boys. However, the exact percentage varies from day to day. Sometimes it may be higher than 50%, sometimes lower.

The subjects were then asked which hospital, over the period of one year, they thought would report more days on which over 60 per cent of the babies born were boys. Fifty-six percent believed that the figures for both hospitals would be about the same, but the correct answer is that the smaller hospital would be likely to report more male births, as the sample size is smaller, and small sample sizes lead to greater variation.[2]

Tversky and Kahneman explained these results as being caused by what they named the representativeness heuristic, according to which people intuitively judge samples as having similar properties to their population, but do not consider whether those samples are really representative.[3] The gambler’s fallacy is a well-known variant on this heuristic.[1]

References



Bibliography