NOTE: Getting Advice About Genetic Testing
Since my previous post comparing my 23andMe health report to Promethease was so popular, I thought it would be worthwhile to share what I have found from digging a little deeper into my raw 23andMe data.
This analysis required some coding on my part, but I've provided links to see a detailed description of my analysis and how to reproduce this analysis. If you don't want to try and run my scripts on your own data, you can just take a look at the high-level discussion that I have provided below.
Step #1: Annotate SNPs using SeattleSNP
Step #2: Match SNPs in GWAS Catalog
Step #3: Combine SeattleSNP and GWAS Catalog annotations. Add PAM score.
Step #4: Filter combined dataset
Step #5: Summarize features in combined dataset
Description of my 23andMe SNPs
Number of 23andMe SNPs: 950,566 (v3 array)
Unique SNPs in Combined File (SeattleSNP + GWAS Catalog): 926,754 (97.5%)
Number of SNPs with GWAS Catalog Annotations: 3,050
Number of SNPs with Disease-Associated Alleles: 1,626
-Heterozygous Risk Allele: 990
-Homozygous Risk Allele: 636
Number of Coding SNPs: 288,894
Number of Non-synonymous SNPs: 16,993
Number of Non-synonymous SNPS with PAM Score < 0: 915
Number of SNPs Causing Premature Stop Codons: 57
Integration of SeattleSNP and GWAS Catalog Annotations
If I filter my non-synonymous SNPs for those with odds ratios greater than 2 and a PAM score less than 0, then I can idenify a single SNP (rs1260326) with 3 entries in the GWAS catalog for associations with triglycerides (OR = 8.8, Teslovich et al. 2010), liver enzyme levels for gamma-glutamyl transferase (OR=3.2, Chambers et al. 2011), and platelet counts (OR = 2.3, Gieger et al. 2011). This allele is present in approximately 40% of the population, and it changes the coding sequence of glucokinase (hexokinase 4) regulator (GCKR). Reviewing Chambers et al. 2011 was especially interesting because GCKR was selected as one of the five genetic loci to also be tested for correlations with metabolomic data (figure 3 of that publication). In fact, GCKR seems to show the strongest correlation with increased LDL and VLDL in that figure.
NCBI Gene also indicates that GCKR is associated with diabetes, which is also described in the text for Chambers et al. 2011. Chambers et al. 2011 classify GCKR as a gene associated with inflammation, as measured by concentrations of C-reactive protien (CRP) in Elliott et al. 2009. As a general note, all of these publications require a subscription, but NCBI Gene is a good free source of information about gene functions.
The nice thing about these associations is that many of them are measured with routine blood tests. Although I have always received normal blood test results, I can easily keep an eye out for changes in the future. More specifically, Chambers et al. 2011 show an association between my GCKR SNP and gamma-glutamyl transferase levels (GGT), which is "sensitive to most kinds of liver insult, particularily alcohol" (citing Pratt et al. 2000). So, perhaps this can encourage me to continue to drink only in moderation.
Comparison with Previous Analysis
In my previous blog post, I highlighted 3 disease associations: venous thromboembolism, rheumatoid arthritis, and type I diabetes. Of course, none of these associations are identifed if I filter both by GWAS Catalog odds ratios and PAM scores, but I do find 3 SNPs associated with rheumatoid arthritis if I only filter for GWAS Catalog associations with an odds-ratio greater than 2.
The reason I didn't originally see these SNPs in my first filter is that none of them cause non-synoymous mutations. Like most of the SNPs, they were not located in coding regions. Unfortuantely, it is harder to characterize the likely function of these types of mutations, but this is certainly an exciting area of on-going research.
If I look at the GWAS catalog annotations for my SNPs, I can confirm that the GWAS catalog does contain SNPs associated with venous thromboembolism and type I diabetes (in fact, there are a lot of SNPs associated with type I diabetes), and I can confirm that I am a carrier for some of these risk alleles. However, none of these SNPs showed associations with odds ratios greater than 2.
Although I am emphasizing the overlap between different methods of analyzing my 23andMe data, I think it would be too conservative to say that only candidates that are independently identified are worth examining. For example, it is very hard to determine the best way to predict the interaction of different variants. In fact, I found it especially exciting to read about the GCKR SNP that didn't jump out at me from any of the other analysis, and I think gaining exposure to genomics research is very important benefit of having direct-to-consumer genetic testing.