## September 18, 2005

### Overlapping Bell Curves can obscure important differences

Richard Lewontin's argument that there is more genetic variation within racial groups than between them is constantly cited and constantly misinterpreted. You often hear people say that this means that a black man is more genetically similar to a white man than two white men are to each other, which is completely brain-dead. That's not at all what Lewontin meant.

What Lewontin pointed out is that the difference between the averages for two racial groups is often less than the difference between two individuals. As so often with race, it helps in understanding to recast the statement as being about family. Consider two nuclear families, the Shorts and the Longs, each with three adult sons. The Short sons are 5'-4", 5'-8", and 6'-0", while the Long boys are 5'-7", 5'-11", and 6'-3". On average, the Longs are three inches taller than the Shorts, while the standard deviation within each family is 4 inches. So, one of the Shorts is taller than two of the Longs, but overall the Longs are notably taller than the shorts on average. Many of the racial differences we see in the world are roughly similar.

A reader sends me an example from statistical mechanics:

This is a Maxwellian distribution, which shows the number of air particles per unit volume as a function of their velocity, for a couple of (normalized) one-dimensional Maxwellians for dry air at about 81 degrees F. The variation within either distribution is obviously much larger than the variation between them. However the red curve represents perfectly still air, while the blue curve represents a hurricane wind blowing at 120 mph. Is there no meaningful distinction to be made between a calm day and a raging hurricane?

What are you, some kind of windist?

With respect to Lewontin's comments, they're not even true by the lights of your most charitable interpretation. he wasn't talking about phenotypes per se, but about genetic variance, so to rephrase:

"What Lewontin pointed out is that the [genetic distance] between the averages for two racial groups is often less than the [genetic distance] between two individuals [in the same racial group]."

That is in fact *never* the case by any reasonable measure of genetic distance. He pulled a little shell game by focusing on the fluctuations possible in a single locus, fluctuations which will *never* do anything but average out over the whole genome. This is a little bit of a tricky topic, but it's absolutely crucial, so bear with me...

Suppose that there were 100 genes for governing a trait (like skin color or height), of equal effect. Suppose also that there were two possible alleles you could have at each site and those alleles were inherited independently of each other (i.e. you could have any combination with equal probability) and that there was a "low" allele and a "high" allele (say +0 and +1). Finally, suppose that at each locus, population 1 had the low allele 30% of the time while population 2 had the low allele 50% of the time.

At any given locus, yeah, you might find that a guy from population 1 and population 2 had the same allele.

But when you look at GENOME CONTENT -- meaning all 100 loci -- it becomes vanishingly unlikely that a guy from population 2 has the same alleles as a guy from population 1 at all the loci.

This is a bit subtle statistically, but it's the same thing that Cochran and Steve Hsu were talking about at Delong's blog (see the "Zulu" comments).

There's much more here at GNXP along with a figure that'll make it clear:

Pattern Classification in Population Genetics: Why Lewontin Was Wrong

And here are four cites to different authors debunking Lewontin:

Edwards, from Cambridge