Wednesday, December 4, 2013

Additional Analysis of AncestryDNA Data

NOTEGetting Advice About Genetic Testing

I have a friend that asked me what extra information she could get from her raw AncestryDNA data.  I've studied by own 23andMe data extensively, so I thought it would be useful to show what tools can also be applied to raw data from other genetic testing services.

I have not purchased an AncestryDNA kit for myself, so all the analysis that I performed was for somebody else.  Therefore, I will not mention any specific results from this analysis.

First, you may find it useful to convert your raw data to 23andMe format.  You can use this Perl script ( to convert your file.  The Perl script can be run using instructions similar to those provided here.

Here are some potentially useful tools to learn more from your AncestryDNA data:

1) Interpretome - free, but requires 23andMe format file
2) Custom scripts - free, but requires 23andMe format file (and probably some comfort with programming)
3) Promethease - can directly use raw AncestryDNA file, but it costs $5.  I would probably recommend trying to use the free options first.

At first, I wasn't certain how well these strategies would work.  For example, I thought AncestryDNA might focus on non-functional regions that wouldn't affect disease risk.  However, I think there is a decent amount of additional information that can be gained from the raw data.

For example, all the functions that I tested in Interpretome worked properly.  Additionally, I found 2,581 AncestryDNA SNPs were listed in the GWAS Catalog.  Of those 2,581 SNPs, 1,337 were risk allele the individual that I tested.  This is less than I had for my 23andMe data (3,050 with GWAS Catalog annoations, 1,626 risk alleles within GWAS Catalog variants), but I think it is decent for an assay that don't provide any health reports to the user.

For those that are interested, here is a venn diagram of the overlapping 23andMe variants (V3 23andMe chip, AncestryDNA result from a few months ago):

You can see that the 23andMe chip covers ~50% more of the genome, but the AncestryDNA chip still covers a fair number of nucleotides.  My only real problem is the lack of labeling data from X, Y, and MT chromosomes as such in the raw data.  There are chromosomes listed as "23" and "25," so I initially guessed that "24" is the Y-chromosome.  However, I noticed that randomly selected rsIDs from chromosomes "23" and "25" both came from the X-chromosome.  So, I don't see a simple way to change those into a standard format.

Nevertheless, I think the information from chromosomes 1-22 provide a lot of information to review.  I hope this post helps AncestryDNA customers find something interesting!


  1. Autosomal DNA is a random recombination of DNA from both parents. I assume an ancestors DNA might be totally displaced, leaving no DNA evidence of that ancestor’s existence.

  2. I think people overestimate the ability to identify the precise lineage between distant cousins, but I think the the DNA markers can clearly identify the relationship between closely related individuals (say, up to a 2nd cousin) and I think general ancestry results can also be accurate. For example, my 23andMe ancestry results accurately matched what I knew to be true.

    Also, please do not post links unrelated to the comment. I will generally delete these.



Creative Commons License
My Biomedical Informatics Blog by Charles Warden is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.