Friday, 6 March 2015

Humanity rotating in 3 dimensions

  • What is this? 
It's a picture of the first three principal components of the DNA sequences of people from around the world. Each dot is one individual.

  • What are the colours?
They represent ethnicity. The orange ball is East Asian. IIRC, red is African and blue is European. The remaining two (forgotten which is which, sorry) are South Asian and South American.

European - blue
African/African American - red
East Asian - yellow
Latino - green
South Asian - brown
  • What do these dimensions mean? 
DNA is like a huge list of letters: ACGTGGCTAT... and so on for about 3 billion letters. Obviously people's DNA can vary in billions of ways from each other. But in fact a lot of this variation has some structure: if two people differ at one location, they probably also differ in a predictable way in many others. You can represent this by a "principal component score" which is a kind of shorthand for many many variations that all go together. This shows the first 3 principal components. There's nothing special about 3 - there are many more - but only 3 can fit on a picture.

  • OK, so these are like genetic differences. What are they related to in the real world?
Everything and nothing, ha ha.

Long answer: for example, they are related to whether the individuals use chopsticks. Obviously, this is not because there is a genetic propensity to use chopsticks. These genetic differences come from groups' ancestry. But ancestry also affects other things, such as culture. So, it is more or less impossible to tell what these broad-sweep genetic differences actually cause.

That doesn't mean that some of the underlying differences don't have real world effects. We just can't tell what they are. To do that, you have to look at individual SNPs, or individual genes, and hold these principal components constant or control for them in some way.

The correlations can be quite amazing. The best known example is this picture of the first two principal components of a European population. It fits rather accurately on to a map of Europe, with people mostly shown quite close to their country of origin.

  • Why are the East Asians so separate?
I am not sure, and I think noone knows for sure. It may represent that they are "really more genetically separate" from the other groups, but it may just be a technical artifact: DNA sequencing is quite a tricky science.

  • What does this picture tell us?
Genetic diversity between human groups is a matter of overlapping clusters, not of discrete buckets. Also, modern genetics is cool.

  • Who made it?
Monkol Lek. Thanks for letting me use your graph, Monkol!