Monday, October 21, 2013

Analyze Your 16S rRNA Data Using RDP-Classifier

Step #1: Convert Files from FASTQ to FASTA

There are lots of ways to do this, but I would recommend using Galaxy if you don't have any programming experience:

Go to the Galaxy website: https://usegalaxy.org/

If you are an academic researcher, your institution might have a local mirror (which should be faster).  However, the link above will work for everybody.

Upload your data using "Get Data" --> "Upload File" (the functions are available on the left-hand side of the screen).  You can set the file type to "fastq", but you probably don't need to.  Updates will appear on the right-hand side of the screen, so you know when each step is complete (the box for the corresponding step will turn green).

Go to "NGS: QC and manipulation" --> "FASTQ Groomer" (should be under "ILLUMINA DATA" in grey font).  Leave all the default settings and click "Execute".  This is technically necessary because of a formatting issue.

Go to "Convert Formats" --> "FASTQ to FASTA".  Once this step is complete, click the appropriate green box on the right-hand side.  Once the box becomes larger (allowing you to see the first few lines of the file), click the purple floppy disk icon to download the FASTA file.  I would recommend renaming the FASTA file after it is downloaded, so it is easier to keep track of.

Step #2: Create an RDP Account

You can sign up using this link: https://rdp.cme.msu.edu/user/createAcct.spr

An account will be created automatically.  You will receive an e-mail with a username and password (you will be asked to change your password the first time you sign in).  Technically, you don't need an account to run the classifier.  However, I think it may be helpful if you want to play around with some other tools.

Step #3: Sign-In and Run RDP-Classifier

Using the link provided by the registration e-mail, sign into myRDP.

Now, go to this link: https://rdp.cme.msu.edu/classifier/classifier.jsp

There will be an option to "Choose a file (unaligned format) to upload:".  Use the browser to select the FASTA (not FASTQ) file that you downloaded from Galaxy.  Next, click "Submit".

The classifier is very fast (you should get your results in a few minutes).  The result page is somewhat hard to parse, but everything is clickable to learn more.  The number of reads is shown in parentheses.

It really helps to remember biological classifications when interpreting these results.  Here is a quick cheat sheet:

phylum > class > order (> suborder) > family > genus

Unfortunately, the classifier won't provide species-specific information.

You can also download the results in a text file.  If you do this, you can use a tool like Notepad++ to search for keywords (like phylum, genus, etc.), but I think the results are a little easier to view on the webpage.

4 comments:

  1. Hi! I randomly came across this when searching for help on the RDP classifier. I have my text file, which I can open with Excel. However, all the genera and families are reported as "family x, family xii, etc" (for families) and "gpv, gpvi, gpiia etc" (for genera). Any idea what these codes mean to corresponding family and genus names? I'd appreciate any pointers, I'm completely lost at this point!

    Thanks!!

    ReplyDelete
  2. It has been a while since I checked those results, but I would say that there should be some (if not mostly) recognizable results (as well as some more generic results). For example, you can see some of my results on this link:

    http://cdwscience.blogspot.com/2013/10/open-source-analysis-of-my-raw-american.html

    I think it would be best to ask address your question to a developer: http://rdp.cme.msu.edu/misc/contacts.jsp

    On a side note, it will take a bit longer to get your results, but I think MG-RAST is probably a better tool that can give you more precise results:

    http://cdwscience.blogspot.com/2013/10/analyze-your-16s-rrna-data-using-mg-rast.html

    ReplyDelete
    Replies
    1. Alright thank you, I'll try the developer, good idea. Looking at your results, I could generate similar pie charts now on Excel, I just wanted to get more info on the genus level. I'll look into MG-RAST.

      Delete
  3. can anyone please help me in analyzing the RDP classifier results please with respect to % reads under the unclassified_root

    ReplyDelete

 
Creative Commons License
My Biomedical Informatics Blog by Charles Warden is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.