Thursday, June 14, 2012

Find 23andMe SNPs with GWAS Catalog Annotations

Although there are other tools to help sort through the annotations in the GWAS Catalog, I've found that none of them to completely satsify my needs.  More importantly, SeattleSNP clinical associations don't directly provide the name of the disease they are associated with and are not identical to the annotations in the GWAS Catalog.  So, this information is meant to complement the report that can be obtained from SeattleSNP.

Step #1: Download GWAS Catalog Data
  • There should be a link on the main GWAS catalog website to download the full catalog.  As of today, you can click this link to view / download the annotations.
    • For most internet browsers, you can download the data as a tab-delimited file by right-clicking on the link and then left-clicking "save target as...".
    • Please no not copy and paste the table from your browser.  This may not preserve the proper formatting
  • Please save the GWAS annotations in the same folder as your 23andMe data
    • The file is currently saved as gwascatalog.txt.  If the name of this file changes in the future, please rename the file gwascatalog.txt

Step #2: Find Overlapping SNPs

  • Download the perl script
  • There is one parameter that you need to enter:
    • genome = raw data file from 23andMe
  • The resulting output file with have _GWAS.txt appended to the name of the genome file
  • PC Users
    • Open a terminal window (type "cmd" in Run, for example)
    • Move to the folder where your 23andMe data is saved.
      • Basic commands:
        • cd = change folder
          • If the data is not in your C:\ drive, you can type "cd \d D:"
        • .. = move up one folder
    • Type in "perl" and enter the required genome parameter.  See example below  (click to enlarge) .

  • Mac Users
    • Open Terminal (in Applications/Utilities, for example)
    • Basic commands:
      • cd = change folder
      • .. = move up one folder
    • Type in "perl" and enter the required genome parameter. See example below  (click to enlarge) .
  • You can open and manipulate the resulting file in Excel (or OpenOffice Calc)
I have tested my perl scripts on a PC and Mac, but I cannot guarentee that they will work on every possible platform.  Also, these scripts may need modifications as file formats change, but I have currently confirmed that my scripts work with v2 and v3 arrays using genomes from Genomes Unzipped. If you have any questions or comments, please post them below and I will do my best to help troubleshoot.


  1. None of the scripts are available at "Find 23andMe SNPs with GWAS Catalog Annotations".
    Do they no longer function?

  2. Sorry about that - I've updated the link for the new Google Drive location.

    However, I've also saved those files in GitHub, which should have fewer problems with changing file locations:


Creative Commons License
My Biomedical Informatics Blog by Charles Warden is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.