Monday, October 21, 2013

Analyze Your 16S rRNA Data Using MG-RAST

Step #1: Register for an Account

You can use this link to register:

Registration is not automated, so registration is not immediate.  It took about a day to create an account for me.

Once you receive an e-mail saying "MG-RAST - account request approved", you can sign-in for the next step.

Step #2: Upload Your Data

Go to the MG-RAST website:

I would recommend using Firefox - you will see a pop-up if you do not.

Sign into MG-RAST (username and password are entered in the upper-right hand corner of the screen).

Choose "Upload".  This is represented by a green arrow pointing upwards.  There should be one in the middle of your screen (which says "Upload") as well as in the upper-right hand corner of the screen (although this one is not specifically labeled).

The metadata step is not required.  I skipped this because I figure the American Gut data should eventually be entered into this database, and I didn't want to produce a duplicate dataset (and I probably didn't know all of the details regarding funding, sample processing, etc.).  However, I contacted the MG-RAST developers, and they actually encouraged me to make the sample public.  If you take the time to fill out the metadata for your sample, it will be processed more quickly.

Under "PREPARE DATA", Click "2. upload files" and browse for your FASTQ that you downloaded from ENA.  You will see a pop-up, but just click "close".  It is not necessary to complete the check.  You can click "3. Manage Inbox" to see when the upload is complete (if you want to wait a few minutes, you can keep clicking "update inbox" until the files are ready).  Otherwise, you can just do something else and come back later.

Step #3: Run the MG-RAST Pipeline

After the data has been uploaded, click "1. select metadata file" under "DATA SUBMISSION".  If you didn't create a metadata file, just click the box saying "I do not want to supply metadata" and click "select".

Click "2. select project".  You probably don't have an existing project, so just type in something like "American Gut" and click "select".

Under "3. select sequence file(s)", click the check marks next to the files that you want to analyze and click "select".

Unless you have some experience with metagenomic analysis, just select "4. choose pipeline options" and click "select";

Finally, choose a data submission option (if you don't provide metadata, you have to keep your data private) and click "submit job".

Step #4: Analyze Your Processed Data

The pipeline may take a while (at least a few hours and possibly as long as a week), especially if you are keeping the data private.  So, I would recommend doing something else and then signing back into MG-RAST.  You can check the status of your samples at any time by clicking the earth icon in the upper-right hand corner of the screen (or "Browse Metagenomes" in the middle of the screen).  There will be numbers next to different stages in the upper-left hand corner.  If you click the number next to "In Progress" and you see your samples, then they are not ready (but you can at least you can see where your samples are in the pipeline).  You need to be able to click the number next to "Available for Analysis" and then be able to see your samples in the next menu that is loaded.

Once your samples are available for analysis, click on the bar-plot icon in the upper-right hand corner of the screen.  There are a lot of options available for metagenomic analysis, but I will walk through what I think is the most useful analysis.

Under "Organism Abundance" on the left-hand side, click "Best Hit Classification".  Under "Data Selection", select your samples by clicking the "+" icon next to "Metagenomes".  If you left your samples as private, then they should be relatively easy to select.

Under "Annotation Sources", the default may be "M5NR".  I would strongly recommend you use a RNA database, such ad RDP, Greengenes, or M5RNA.  M5RNA is a little more interesting because it also contains Eukaroytic sequences, but I will mostly focus on RDP (so that I can compare the results to the RDP-Classifier).  Highlight the desired database click "OK".

At this point, you should have all the necessary configurations set up, and your screen should look something like this:

To analyze your selected data, click a radio button under "Data Visualization" and then click "generate".  I think the tree and table tools are the most useful.


  1. Hi, I have tried this with RDP, M5RNA and Greengenes. My 16S metagenomic sequences are from soil sample. It shows that no data match your selection criteria or metagenoms number contains no organism data for the above selected organism.

  2. Did you get any QC results from your sequencing facility? You could try BLASTing one of your reads to make sure your protocol worked correctly.

    I've never encountered this problem, but I admittedly have used mothur ( and the RDPclasssifer ( more often than MG-RAST.

    You can also contact MG-RAST with tech support questions:


Creative Commons License
My Biomedical Informatics Blog by Charles Warden is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.