Charles Warden's Science Blog: Filter Combined Annotations for 23andMe SNPs

Thursday, June 14, 2012

Filter Combined Annotations for 23andMe SNPs

Step #1: Prepare Inputfile

List of 23andMe SNPs with both SeattleSNP and GWAS Catalog annotations (click here for details)

Step #2: Filter List of SNPs

Download the perl script 23andMe_filter.pl
There is one parameter that you need to enter:

input = file containing 23andMe SNP file with SeattleSNP and GWAS Catalog SNPs (see here for more details)

There is 5 optional parameters that you can enter:

output = output file containing filtered SNP lists. By default, _filter.txt is appended to the end of the input file
OR = odds ratio cutoff (filter for scores greater than cutoff) [default = 2]
PAM = PAM score cutoff (filter for scores less than cutoff) [default = 0]
risk_status = status for GWAS Catalog risk allele, Either "Homozygous", "Heterozygous" (which actually filters for both homozygous and heterozygous risk alleles), or "none" [default = "Heterozygous]
allele_freq = set of parameters to describe allele frequency cutoff. If provided, parameter must be the following format [genetic background]_[comparison type]_[threshold] For example, European_gt_0.25. [default = "none?]

Genetic background can be "European", "African", and "Asian"
Comparison type can be "gt" for greater than or "lt" for less than
Threshold corresponds to the population frequency. Must be between 0 and 1.

PC Users

Open a terminal window (type "cmd" in Run, for example)
Move to the folder where your 23andMe data is saved.

Basic commands:

cd = change folder

If the data is not in your C:\ drive, you can type "cd \d D:"

.. = move up one folder

Type in "perl 23andMe_filter.pl" and enter the required input parameter. See example below (click to enlarge) .

You can also enter in optional parameters (OR, PAM, risk_status , and/or allele_freq ). See example below (click to enlarge) .

Mac Users

Open Terminal (in Applications/Utilities, for example)
Basic commands:

cd = change folder
.. = move up one folder

Type in "perl 23andMe_ filter.pl" and enter the required input parameter. See example below (click to enlarge).

You can also enter in optional parameters (OR, PAM, risk_status , and/or allele_freq ). See example below (click to enlarge) .

I have tested my perl scripts on a PC and Mac, but I cannot guarentee that they will work on every possible platform. Also, these scripts may need modifications as file formats change, but I have currently confirmed that my scripts work with v2 and v3 arrays using genomes from Genomes Unzipped. If you have any questions or comments, please post them below and I will do my best to help troubleshoot.