Thursday, June 14, 2012

Combine SeattleSNP and GWAS Catalog Annotations for 23andMe SNPs

There are two main functions for this script:

1) Combine the results from and

2) Add a score to predict the severity of non-synonymous SNPs.  In this case, I am adding a PAM score (created from this matrix).  These scores are correlated with the frequency of various amino acids substitutions over time.  In fact, there are different PAM matrics that can be used.  There are some slightly more rigorous tools to accomplish this (such as PolyPhen or SIFT), and SeattleSNP can provide PolyPhen predictions for certain SNPs.  However, I wanted to use the PAM score as something that can be quickly added to all the non-synonymous mutations.

Step #1: Prepare Inputfiles
  • SeattleSNP annotations (click here for details)
  • GWAS Catalog annotations (click here for details)
  • My PAM matrix can be downloaded here.
Step #2: Combine Files

  • Download the perl script
  • There are three parameters that you need to enter:
    • seattleSNP =23andMe SNPs with SeattleSNP annotations (click here for details)
    • GWAS =23andMe SNPs with GWAS Catalog annotations.  Please note that this is not the original GWAS annotation file but the file that was created at this step. (click here for details)
    • PAM = substitution matrix indicating the severity of the non-synonymous mutation (such as the file provided here)
  • The outputfile will have _combined.txt appended to the end of the seattleSNP file name.
  • PC Users
    • Open a terminal window (type "cmd" in Run, for example)
    • Move to the folder where your 23andMe data is saved.
      • Basic commands:
        • cd = change folder
          • If the data is not in your C:\ drive, you can type "cd \d D:"
        • .. = move up one folder
    • Type in "perl" and enter the required SeattleSNP, GWAS, and PAM parameters. See example below  (click to enlarge) .

  • Mac Users
    • Open Terminal (in Applications/Utilities, for example)
    • Basic commands:
      • cd = change folder
      • .. = move up one folder
    • Type in "perl" and enter the required  SeattleSNP, GWAS, and PAM parameters . See example below  (click to enlarge) .

I have tested my perl scripts on a PC and Mac, but I cannot guarentee that they will work on every possible platform.  Also, these scripts may need modifications as file formats change, but I have currently confirmed that my scripts work with v2 and v3 arrays using genomes from Genomes Unzipped. If you have any questions or comments, please post them below and I will do my best to help troubleshoot.

1 comment:

Creative Commons License
My Biomedical Informatics Blog by Charles Warden is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.