I've provided a tutorial based entirely on web-based analysis, so you don't need to know any programming to follow these steps in your own data. You can also skip the tutorial links to just see my own results.
Step #1: Get Your Data
Step #2a: Analyze Your Data in MG-RAST (preferred, but time-consuming)
Step #2b: Analyze Your Data using RDP-Classifier (quick, but less functionality)
Here are some charts that I could quickly create in Excel using data from RDP-Classifier:
Fecal | Oral | |
Streptococcaceae Streptococcus | 0 | 10614 |
Neisseriaceae Neisseria | 0 | 8824 |
Actinomycetaceae Actinomyces | 0 | 4129 |
Lachnospiraceae Oribacterium | 0 | 2319 |
Veillonellaceae Veillonella | 0 | 2124 |
Bacteroidaceae Bacteroides | 1800 | 5 |
Enterobacteriaceae Escherichia/Shigella | 8039 | 5 |
I was able to find reports that Actinomyces was associated with transformation of lymphocytes in patients with periodontal disease (Baker et al. 1976) and Streptococcus mutans plays a role in human dental decay (Loesche 1986), although I don't know if I had this specific strain (based upon the MG-RAST report, it looks like I don't). Likewise, I showed this list to my dentist, and she recognized these two genera as being associated with dental problems. Although I don't know how common these are overall, it seems to make sense that they could be found in an oral sample.
Obviously, I recognized Escherichia/Shigella. The American Gut report points out that the phylum containing this genus is not highly abundant in an average participant. At one point, I had to be hospitalized with ulcerative colitis (with a strain of E. coli producing shiga toxin), so perhaps this is related to the high abundance of this genus (although that was several years ago).
If you are patient enough to wait for your MG-RAST results, then you can make similar (but slightly cooler looking) pie charts and tables automatically. For example, you can take a look at the corresponding plots from MG-RAST for my fecal and oral samples. You can also create plots to compare species in multiple samples (red bars are for my oral samples, green bars are for my fecal sample):
If you are patient enough to wait for your MG-RAST results, then you can make similar (but slightly cooler looking) pie charts and tables automatically. For example, you can take a look at the corresponding plots from MG-RAST for my fecal and oral samples. You can also create plots to compare species in multiple samples (red bars are for my oral samples, green bars are for my fecal sample):
Perhaps most importantly, MG-RAST will provide annotations down to the species level (and strain level, when possible). The species counts aren't perfectly correlated with the genera counts (predicted from the classifier), but the the most interesting genera appeared in both lists.
metagenome
|
strain
|
abundance
|
oral
|
uncultured bacterium
|
19715
|
fecal
|
Escherichia coli ED1a
|
11316
|
oral
|
Abiotrophia
para-adiacens
|
6185
|
oral
|
Actinomyces
odontolyticus
|
4154
|
oral
|
Veillonella dispar
|
1913
|
oral
|
Butyrivibrio
fibrisolvens
|
1431
|
oral
|
Blautia sp. Ser8
|
1075
|
fecal
|
Bacteroides stercoris
|
736
|
oral
|
Syntrophococcus
sucromutans
|
661
|
fecal
|
Pseudomonas
fluorescens
|
660
|
fecal
|
Bacteroides caccae
|
559
|
fecal
|
Bacteroides stercoris
ATCC 43183
|
558
|
fecal
|
Bacteroides vulgatus
|
510
|
oral
|
Streptococcus
|
499
|
oral
|
Rothia mucilaginosa
|
450
|
fecal
|
uncultured bacterium
|
370
|
oral
|
Haemophilus
haemolyticus
|
295
|
fecal
|
Prevotella buccalis
|
287
|
oral
|
Ruminococcus
gauvreauii
|
282
|
oral
|
Streptococcus
sanguinis
|
275
|
oral
|
Ruminococcus torques
L2-14
|
272
|
fecal
|
Escherichia coli
|
232
|
oral
|
Abiotrophia defectiva
|
213
|
oral
|
Gemella morbillorum
|
204
|
fecal
|
Dialister
propionicifaciens
|
184
|
oral
|
Veillonella parvula
|
175
|
oral
|
Butyrivibrio hungatei
|
142
|
oral
|
Atopobium minutum
|
137
|
oral
|
Parvimonas micra
|
126
|
oral
|
Leptotrichia shahii
|
121
|
This information can allowed to conduct more effective literature searches. For example, my understanding is that the ED1a strain has not been shown to be associated with ulcerative colitis. On the other hand, the species information allowed me to find a paper for the discovery of my specific strain of Actinomyces, which was harvested from 450 tooth cavities (Batty 2005). Likewise, I could confirm that Streptococcus sanguinis was also pathogenic (Xu et al. 2007).
I am still interested in seeing what my official individual report will look like: although I have general experience with bioinformatics analysis, the folks at American Gut have looked at a lot more metagenomic data than I have. Likewise, I am interesting in seeing how my profiles change at different time points: once I eventually get my uBiome results, I will put together another post to compare the results.
Charles, thank you for posting this information. Now that I have my Taxa Summary I'm hoping that MG-RAST will provide me with species information so I can try to better understand what's in me.
ReplyDeleteI loaded my data this afternoon so now I just need to hurry up and wait.
Do you plan to post your Taxa Summary information?
The first table in the post is from RDP-Classifer and the second table is from MG-RAST. The pie charts also come from the the higher level levels of the taxa organization.
ReplyDeleteI also posted my official American Gut report here:
http://cdwscience.blogspot.com/2013/11/my-american-gut-individual-report.html
Are you looking for something else?
Yes, I'm looking for your American Gut Taxa Summary. Mine was made available last week. It lists the detailed make up of the samples down to the Genus with their percentage.
DeleteOk - the top 5 most abundant genera are in the PDF version of the report, but you can view my entire list here:
Deletehttps://drive.google.com/file/d/0B1xpw6_kQMKuUmJnTGZMc0NtSm8/edit?usp=sharing
One thing that this report made more obvious is the lack of WAL_1855D (the most abundant taxa) anywhere in my MG-RAST results. So far, I've contacted the American Gut and MG-RAST tech support, and it looks like this may be due to the fact MG-RAST use an older version of the Greengenes 16S reference database. Because all samples need to be processed the same way, updating the reference databases takes some time in MG-RAST, but they did say this is something they are working on. I'll add a note to the individual report post once I can confirm this is the case.
Thank you for your interest!
How did you generate the "metagenome strain abundance" table? I have tried lots of MG-RAST settings but I have found none that produce a similar table. I'm very interested in the species level detail.
ReplyDeleteI think it might be best to post a question like this for the MG-RAST report post that includes the detailed instructions. That way, I think other users with similar questions will be more likely to see the results:
Deletehttp://cdwscience.blogspot.com/2013/10/analyze-your-16s-rrna-data-using-mg-rast.html
It take some time to process your sample (especially if you don't immediately make it publicly available)
Picking up from the end of that tutorial (selecting datasets, reference database, etc), here is what you should do:
1) Click radio button for "table"
2) Click "generate" on the right-hand slide of the radio buttons
3) Wait for table to be produced. If the annotations are not specific enough, use the pull-down for "group table by" to select the specificity of the report (in your case, you should select "species")
4) Click the "change" button immediately to the right of the pull-down option.
5) Optionally, there is a "download" button if you want to use a text file to browse your results
Thanks, that did it! Off to do more research...
Delete