Monday, October 21, 2013

How to Download Your American Gut Data

Step #1: Find Your Barcode(s)

Each sample has a nine digit barcode.  If you see a smaller number, add leading zeros.

For example, my barcodes were 2683 (fecal) and 2684 (oral), so I need to use 000002683 and 000002684 as my sample IDs.

Step #2: Search For Your Sample

Go to the European Nucleotide Archive (ENA) website:

Copy and paste your 9-digit barcode into the text search

I get a single result for each of my samples when I do this.  If you get multiple results, choose the metagenome sample (see image below).

If you want to double-check you have the right sample, click on the "Sample accession" link (which will start with "ERS").  If you then click the "Attributes" tab, you should be able to see your metadata.  For example, I know I live in Los Angeles, so my state better not be GA.

Step #3: Download Your FASTQ Files

If you are certain you have the right sample, click the the link for "Fastq files (ftp)" to start the download.  Note that the sample will be labeled based upon the "Run accesssion" (starting with "ERR").  For example, here are the different IDs for my samples:

Fecal: 000002683 --> ERS345317 --> ERR336561
Oral: 000002684 --> ERS344890 --> ERR336138

The .fastq files will be compressed, so you should unzip them.  I would recommend using 7zip for this.  I would also recommend renaming your files (like fecal.fastq and oral.fastq) to make it easier for you to keep track of them.


  1. I can't find my barcode when I search in the ENA, even though I already received results in the mail. Any ideas? Maybe it takes a while for the data to get uploaded there?

  2. I don't work for AmericanGut, so I would recommend contacting them (

    I asked about the individual data after I saw the preliminary report for the first batch of data (which I assumed included by sample). If you submitted your sample over a year ago, I think it should be in the ENA. You'll have to ask about the submission time for newer samples. You can also check if they applied some sort of QC filter prior to sample submission.

    Hope this helps!

  3. Thanks for the information - just curious - what did you learn from your further analysis of your sample? I've posted our results & some of my theories on my site -

  4. It looks like this post is being passed along as a direct link, but is probably useful to note that this is actually meant to be a sup-topic from a broader post:

    Additionally, I put together an another post after I actually received my official report:

    I think this last link sounds like what you are most interested in.

    One thing that might be worth noting is that they have found the growth of certain bacteria (namely Gammaprotoebacteria) in the samples while they are in the mail and waiting to be processed. They filter out the likely culprits, but I feel that this filtering was probably too stringent in my case.

    For example, you can see my report here:

    The pie chart in my first post looks radically different than the bar chart in my official report because ~75% of my reads were ignored. The result is something that looks more like result in your first link. However, I would argue that the PCA plot on the official report (furthest right plot on the bottom) supports my original analysis indicating a lot more Proteobacteria and a lot less Firmicutes (since my sample clusters with the others that still had a low Firmuicute count, even after the filter that removed ~75% of the reads from my sample). In other words, I think the "real" abundance is somewhere between my own analysis and the analysis presented in my official report.

    The short answer is that I would consider this on-going research. My American Gut data/results have piqued my interest in certain bacteria bit, but I didn't take any medical action because of my results.


Creative Commons License
My Biomedical Informatics Blog by Charles Warden is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.