Selection bias is a common problem that can occur when analyzing data that is not fully representative of the information intended to be studied. This can lead to inaccurate conclusions and decisions that are not based on the full picture. A classic example of selection bias is the story of the statistician Abraham Wald and the missing US Air Force planes.
the full story
Some people think hospitals are the most dangerous places on earth, more people die inside them than anywhere else. When people say things like that, they are technically right and may get fooled by what’s known as selection bias.
A selection bias occurs when we look at information that it is not fully representative of the data intended to be studied. As a result of the biased sample, we then draw a false conclusion.
abraham wald’s story
One story that explains the bias perhaps better than any other is that of Abraham Wald. Wald was a brilliant mathematician, who, after the Nazis persecuted him and his family as Jews in Austria, fled to the United States in 1938.
During World War II, Wald was invited to become a member of the Statistical Research Group, an elite think tank to aid the American war effort against Nazi Germany. One day the US Air Force came to Wald and his colleagues with a problem. Many of their planes got shot down due to a lack of armor.
The officers presented Wald with data for all the aircrafts that made it back from their mission. The planes had lots of holes on the body and wings but less below the engines. The officers then asked the mathematicians to compute the optimal protection by concentrating the armor where the planes were getting hit the most.
the missing planes
After studying the problem, Wald suggested something unexpected. The armor, he said, doesn’t go where the bullet holes are. It goes where the bullet holes aren’t.
The officers didn’t understand, because they were looking at a biased sample. Wald wasn’t. He realized that to get representative data to analyze, he needed to include the missing holes, the missing planes, the missing information.
The reason planes were coming back with fewer hits to the engine is that planes that got hit in the engine weren’t coming back, he explained. But selection bias isn’t just the result of missing information.
The simpson paradox is a phenomenon in which a trend appears in groups of data but then when the groups are combined, disappears. It shows the importance of really understanding the data we select for analysis.
the real example
One famous example came from students applying to the University of California, Berkeley in 1973. The data showed that males applying were more likely to be accepted than females. People thought that the institution was discriminating against women.
When researchers dug deeper into the data they found out that men had applied to less competitive departments with higher rates of admission. Women chose more competitive departments with fewer available spots. After correcting for this detail the data showed a significant bias in favor of women – not men.
Planes that were shot in the engines were not analyzed. Women in Berkley weren’t discriminated against, but instead picked more competitive classes. People that die in hospitals are often already sick when they are admitted.
what do you think?
What are your thoughts? Is selection bias corrupting your decision making? And if it’s fooling you, how about the people behind the “research” you see being published in popular media? Did they really make sure they selected an unbiased fully representative sample? Share your thoughts in the comments below! And if you still don’t quite understand it, here is a simple challenge to experience it first hand!
Go out of your house, knock at the doors of your next 10 neighbors, and ask those who open if they are afraid of strangers. After you are done, report your findings in the comments below and explain to us: what can your research tell us about your community? Anything?
- Selection bias – Wikipedia.org
- Abraham Wald – Wikipedia.org
- Hall, M. J., Levant, S., & DeFrances, C. J. (2017, May 24). Trends in inpatient hospital deaths: National Hospital Discharge Survey, 2000–2010. Centers for Disease Control and Prevention. Retrieved August 20, 2022, from https://www.cdc.gov/nchs/products/databriefs/db118.htm
- Read about Simpson’s paradox
- Read this article about the different possible types of selection bias
- Read about why a Hospital is the most dangerous place on earth.
- Read this article on how to avoid selection bias in geriatric research.
In the following activity students will learn about participant recruitment and selection bias.
- Split the class into groups of 4 or 5 students.
- Tell the groups they have to come up with a way of studying the different symptoms of ADHD. What would they do to recruit participants for their studies? While doing this, remind the students that the prevalence of ADHD in the population is from 2% to 4%, and so simply asking people in the street is probably not a good idea.
- Show the class the video of Sprouts on Selection Bias.
- Share with the class the fact that in 2007 only 12% of people with ADHD in the US had a diagnosis. Also share the fact that most health studies recruit participants directly in clinics or hospitals that already have a diagnosis.
- Ask the class if this selection method creates a bias in the study of the symptoms of ADHD.
- Ask the class how they would do to recruit participants in order to avoid this bias, and if they would change their initial answer.
- Script: Jonas Koblin
- Co-Author: Ludovico Saint Amour Di Chanaz
- Artist: Pascal Gaggelli
- Voice: Matt Abbott
- Coloring: Nalin
- Editing: Peera Lertsukittipongsa
- Production: Selina Bador
- Sound Design: Miguel Ojeda