Statistics and hair color

Statistics and hair color

TL;DR – Statistics applies to populations, not individuals. There is no such thing as the probability of one individual having some characteristic.

I remember talking to a friend of mine a long time ago. “I wonder what will I do when I grow up?” “What are the chances that I become a doctor?” “What are the chances that I snap my finger right this second?” “What were the chances I asked that question?” I asked myself: “What are the chances he understands statistics?”

There are two things I believe most human beings will never be able to understand: how to program a VCR and statistics. We just got rid of VCRs, it seems the last one was built July 2016, but statistics is still there. In fact, it will be with us for a long time because it is, maybe unexpectedly, quite useful.

Some argue that statistics, after all, is a fairly recent invention, merely 100 years ago, and that is why it is not yet well understood by the general population. I would note, though, that 20 years ago we didn’t have smartphones or social media, yet the general population doesn’t seem to have a problem with them. Well, at least not in using them. So, it’s just excuses.

There is, I think, a fundamental misunderstanding in the understanding of statistics. It’s something that plagues experts as well. They will say things like “if you open a restaurant, you have about 55% chance of going out of business in the first 3 years” or “if you are poor, you have a lower chance to finish college”. But do you? Do you actually have a chance of doing or being anything? Let’s explore this question with another intangibly concrete thought experiment

The color of someone’s hair

We want to study the chances of someone having blonde hair. Now I’ll give you some statistical data. Some of it may be made up, which is not a problem because 35% of all statistics are made up. But I didn’t make it up myself. I simply took it from the internet.

Let’s say 6% of the world population is blonde. Hhhhmmmm… maybe blonde is not politically correct anymore in the US, given that blondes are constantly targeted by blonde jokes and the like. But I don’t think the rest of the world has blonde jokes: the Wikipedia page exists only in four languages… and one of them is… Finnish? Finland is 58% blonde… Can’t imagine that they have a stereotype where 58% of their population is not smart…

Anyway, hair color is not distributed evenly across the globe. Some countries are richer in hair color than others. I don’t know what political process caused this inequality, but in Europe 20% of the people are blonde. In northern Europe it is more like 30% and in Finland it is, as noted before, 58%. In southern Europe it is more like 8%. In Italy, where I am from originally, it is 7%. But that’s the average. In Veneto, a region of the north of Italy, it is about 12% and it’s less than 2% in Sardinia. Southern Italy averages around 5% but in Benevento, a city in the south of Italy, 13% of the people are blonde.

So, suppose you have a person in front of you. What are the chances that that person is blonde? Is it 6%, the average of the world population? Maybe she’s Italian from Beneveneto? Should we use the Italian percentage, 7%? The southern Italian percentage, 5%? The Benevento percentage, 13%? What if that person moves to Finland? Does she suddenly have 58% chance of being blonde? This is all very confusing.

Maybe it helps being more concrete. You know where you are from. What is the chance that you have blonde hair? Do you think it’s 7%? Do you think it’s 20%? Well, let me give you a hint. It should either be 0% or 100%: could you just check in the mirror?

Populations and statistics

The point is that statistics applies to a population, and not an individual. The same individual can be part of multiple populations, so it cannot have any well-defined statistical attribute. You see, the question “what is the chance that a person is blonde?” makes as much sense as “what is the chance that I am going to order turkey?” To make it meaningful, you have to specify how you selected the person. For example, “if I take a random person in Benevento, what is the chance that he has blonde hair?” That is going to be 13%. Alternatively, you can simply ask “is this person blonde?”: population of one.

Specifying the population is fundamental because the statistics will change on a different population. For example, suppose you have to pass a test and I tell you that only 33% of the people that have to take it pass it. You may think: whoa! That must be super hard! The odds are against me. The people designing the tests must be very strict. Suppose I told you that 66% of the people who actually studied passed the test. Would you still think that it’s a hard test? Suppose I told you that 100% of the people that studied and showed up passed the test. Do you still think it’s a hard test? You see, the test is not hard at all. It is just that the people that are supposed to take it are lazy unreliable bastards. So the question becomes, what are the chances that you take the test seriously? Again, it’s either 100% (you take it seriously) or 0% (you do not take it seriously).

Part of the confusion is that we also use statistics to describe belief or “credence”. For example, suppose I blindfold you and dye your hair. Now you don’t know what hair color you have. You may think there is 75% chance that your hair is now blonde, because you think you saw a blonde hair color box. The probability here represents what you know and would expect in similar situations. But it’s _your_ lack of knowledge: I know for sure. So, again, you don’t have a well-defined chance to have blonde hair since it depends on what you know.

This I think is the most difficult aspect to understand: probabilities, averages and all other statistical concepts apply to populations, not individuals. You personally don’t have a well-defined chance of being or doing anything. Not even of understanding statistics.

2 Replies to “Statistics and hair color”

  1. Ok but if I
    – put a white cat in a box with the usual stuff (a Geiger counter, a tiny bit of radioactive substance, a small flask of hydrocyanic acid…)
    – blindfold myself
    – then randomly pick a dye out of a 2-piece set that includes a blonde dye and a black dye
    – pour the dye over the cat

    …what are the chances that the cat is blonde AND alive? ;p

    Keep up with the good work, always a pleasure to read your posts!

    1. I think the cat is still white. You are blindfolded and the cat is in the box, so you simply poured the dye on the box.

      Did I get it right? 😛

Leave a Reply

Your email address will not be published. Required fields are marked *