Friday, 6 March 2015

Things I learned at the Boulder workshop on statistical genetics

  • It is quite straightforward to "edit" a mouse's genome during the critical half hour when the (unborn) mouse is unicellular. Theoretically this would be possible for humans too.
From chatting to someone at lunch. Scary!
  • The cool guys at Genes for Good are collecting data from the general public for scientific use, in exchange for a free reading of your genome (typical cost at 23andme: $100). They'll tell you about your genetic ancestry, and give you your data. You will need a Facebook account and to live in the US. 
The US makes it hard to give people information about their genomes for some reason (regulation? liability law?) but you can probably get this info from somewhere outside the US.
  • GWAS studies have actually been very successful in discovering SNPs* associated with diseases and other traits
There was a period when things didn't look good, and this was reported in the mainstream media. Basically, it turns out that many traits are caused by lots of genes with small effects, which means you need a large sample size to detect them. Now people are working together to share data and create these large samples, and many SNPs have been detected. For instance (IIRC) about 30% of variation in height and 10% of variation in IQ is caused by a specific set of genes.

* Single Nucleotide Polymorphisms, i.e. a place where people's genomes differ by just one "letter"
  • Most behavioural traits are probably affected by many, many genetic variants acting together, each with a tiny effect on the trait.
Partly as a result:
  • SNP arrays (1 million independent variables per individual) are too easy. The cool new thing is sequencing data with billions of independent variables per individual.
The difference is that SNP arrays just store a few informative bits of someone's DNA - like an index to a book. Sequencing data gives you the whole book, and the cost of sequencing is coming down, right now it's at about $2000 per person. Yes, there are serious computational difficulties in doing statistics at this level. No, I won't be using sequencing data (not brave or smart enough).

The workshop is here and if you want to learn statistical methods for genetics it is an absolutely awesome event.