Monday, January 30, 2012

Article Review: Accurate identification of A-to-I RNA editing in human by transcriptome sequencing

In this article, Bahn et al. develop a novel method to identify A-to-I RNA editing sites in next-generation sequencing data.

My favorite aspect of this paper was how the authors empirically estimated the false discovery rate of their algorithm using an ADAR siRNA knock-down in a cancer cell line that only showed normal expression levels for one member of the ADAR family (shown in Figure 2 of the paper).  Experimental validation with Sanger sequencing also shows a low false positive rate for the A-to-G events (although not necessarily for non-A-to-G events).

Supplemental Table 3 is also worth checking out: it provides a good review of genome-wide RNA editing studies, including the contentious study in Science by Li et al.  For example, only 34% of the RNA editing sites shared by Li et al. and this paper were A-to-G events, whereas 86-100% of the overlapping sites for all of the other studies were A-to-G events.  Likewise, the differences in the histograms for RNA editing sites (Figure 2A in this paper, and Figure 1A in Li et al.) emphasize how different the analysis in Li et al. is from other similar studies in the literature.

The supplemental table also shows how few RNA editing sites overlap between studies.  For example, the authors emphasize how their study recovers 854 A-to-G differences in the DARNED database, but I think it is worth keeping in mind that there were 42,045 sites in the DARNED database and 9636 predicted RNA editing sites (using the threshold for comparison with other studies).  This seems to be a common problem that isn't unique to this study (and the authors emphasize that the overlap between genes with RNA editing sites is greater than the overlap of individual RNA editing sties), but I think it is still an interesting observation that is worth keeping in mind for future analysis (which will hopefully have larger samples of paired DNA-Seq and RNA-Seq samples).

In general, I think this method does a good job of identifying and filtering likely causes of spurious RNA editing events (like those mentioned in Schrider et al. 2011).  For example, the authors use a "double-filtering" strategy to focus on reads with unique alignments (where a conservative threshold is used to define alignments to potential RNA editing sites but a more liberal criteria is used to search for homologous regions that could be causing inaccurate alignments).  I also liked that most of the in-depth analysis focused on sites with an editing ratio greater than 0.2.

This study focused on analysis of the grade IV glioma cell line U87MG (RNA-Seq: GSE28040, DNA-Seq: GSE19986) and a primary breast cancer sample (EGAS00000000054).  Although it probably allowed for more cost-effective analysis, I wonder if the results would have been even cleaner if the RNA-Seq and DNA-Seq data were both newly created for this study using similar technologies (for example, the RNA-Seq data is paired-end Illumina reads whereas the DNA-Seq data was from another study using SOLiD reads).  However, I think the results were clean enough that this probably didn't matter too much (based upon the ADAR knock-down data).

The novel motif discovery (Figure 5) was interesting, but I had a hard time imagining the relevance of this motif that isn't found at a consistent distance from the A-to-I site (like those shown in Figure 4).  That said, I would be interested in see any follow-up analysis that characterizes the mechanism by which this motif is involved with A-to-I editing.

I think this study only provides very limited analysis on A-to-I editing in cancer.  To be fair, the sample size (one sample at a time) is probably not sufficient to make many general claims about A-to-I editing in cancer.  However, I still think this aspect of the study was over-emphasized.  For example, Supplemental Table 13 shows how sensitive the hypergeometric test (comparing RNA editing sites in the two samples) will be when dealing with such a large background set; all of the RNA editing events except G-to-C were statistically significant with a p-value < 0.05, even though the A-to-G overlap was the only category with more than 5 overlapping sites.  In other words, I don't think statistical significance was a strong indicator of biological importance for this analysis.  Likewise, it was nice that the enrichment analysis of the NCI Cancer Gene Index genes provided some candidate genes, but I don't think this study is useful in identifying a gene where A-to-I editing is highly likely to play an important role in oncogenesis.

Overall, I would recommend this article to anyone interested in RNA editing and next-generation sequencing analysis.

Wednesday, November 16, 2011

Notes from ICHG / ASHG 2011

Although it may be old news for anyone following the #ICHG2011 twitter feed, I figure there are still some people out there that might be interested in seeing my summary slides that I'll be presenting at a Bioinformatics Core group meeting to discuss what I learned at the conference (those slides are available here).

Generally speaking, I was very pleased with the number and variety of great speakers.  Plus, there were fun activities like a circus performance to open the conference and complementary poutine for lunch a couple days.  I kind of wish the conference was a day or two shorter and there were more activities / discussions to encourage networking among smaller groups of the attendees, but I think these are only minor concerns.

Overall, I would consider the conference to be a great success, and I am seriously considering attending next year in San Francisco!

Saturday, August 6, 2011

What is it like to be a "Bioinformatics Specialist"?

I recently received a request from a complete stranger to learn more about the field of bioinformatics.  Since I think others may also benefit from my answers, I've converted this e-mail conversation into a blog post.  I've made some modifications to the questions and my responses, but all the main ideas are the same.

FYI, I've provided a link to my CV, so you can get a better idea about my background.

Q)  Can you tell me a little bit about your work as a bioinformatics specialist and what a typical day looks like?

A) I think it is safe to assume that someone with this job description will work for at least one lab, and your goal will usually be to help biologists without a strong computational background analyze their data.  In particular, I assist with microarray and next-generation sequencing data analysis.  Sometimes you may work in the lab of an individual scientist, but I work in a shared resource facility.  So, I work for several scientists on campus.  The Bioinformatics Core is also in charge of software support, so I also assist in installing and maintaining software and hardware.  Additionally, I assist in writing papers and grants, but I don’t know if is safe to assume all Bioinformatics Specialists will be authors in papers.

Q) When you completed your MS degree, did you find the job market to be favorable? 

A I actually have an MA degree in Molecular Biology (I was in a PhD program and left the program with just a Master's degree), so this may be a little different than someone going for an independent MS degree in Bioinformatics.  When I had to look for jobs, it did take a lot of effort, and I basically accepted the first offer I could get after a few months of hunting.  However, there are lots of people who are unemployed or go a year or more without a job, so it could be a lot worse.

Q)  How deeply would you suggest that a person searching for a job similar to yours get into programming?  What programming languages are most useful for your job?
 
A) I would say that you pretty much can’t get a job with “Bioinformatics” in the title without significant programming experience and a firm grasp of statistics.

I am proficient in R and Perl.  SQL is also very important.  Python is especially useful for next-generation sequencing analysis.  It is also valuable to learn Java, Apache, and PHP.
 
Q)  Do you know of any useful resources for job seekers?  For example, what do you know about bioinformatics internships?  

A) I think that the importance of internships varies with your career goal.  I think they are a little less important if you plan to eventually get a PhD, but I think they can be very important for individuals with a terminal BS or MS degree.

I would suggest e-mailing PIs / Scientists whose work you find interesting to see if there are any jobs – this is how I have gotten all of my jobs.  

Before I got paid to do research, I had to do research for academic credit for 1-2 years.  If you don’t have considerable research experience, I would consider offering to do volunteer work (or an unpaid internship).  

You should also apply for jobs that you see posted for companies.  However, I haven’t actually had much success with these.  I think companies are legally obligated to post jobs, even if they have already found an internal person for the job.  A lot of companies like to promote from within, so this may be worth something to consider.  Nevertheless, I think anyone who is willing to pay money to post a job on an external website is probably serious about at least considering candidates from outside of the company.  For example, here are some useful resources when looking for bioinformatics jobs (in addition to places like Monster, etc.):
When you are in school, I believe there should be some sort of career services department that might be able to help you.  For example, I have helped send out job postings for the Bioinformatics Core to prestigious universities looking for recent graduates.

Also, be certain to take advantage of research experience (even it is not required) for networking purposes.  Plus, research experience also has other direct benefits, such as getting practical experience that will almost certainly be useful for jobs later down the road.

Please feel free to continue the discussion with questions and comments below!
 
Creative Commons License
Charles Warden's Science Blog by Charles Warden is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.