Charles Warden's Science Blog: Article Review: Accurate identification of A-to-I RNA editing in human by transcriptome sequencing

Monday, January 30, 2012

Article Review: Accurate identification of A-to-I RNA editing in human by transcriptome sequencing

In this article, Bahn et al. develop a novel method to identify A-to-I RNA editing sites in next-generation sequencing data.

My favorite aspect of this paper was how the authors empirically estimated the false discovery rate of their algorithm using an ADAR siRNA knock-down in a cancer cell line that only showed normal expression levels for one member of the ADAR family (shown in Figure 2 of the paper). Experimental validation with Sanger sequencing also shows a low false positive rate for the A-to-G events (although not necessarily for non-A-to-G events).

Supplemental Table 3 is also worth checking out: it provides a good review of genome-wide RNA editing studies, including the contentious study in Science by Li et al. For example, only 34% of the RNA editing sites shared by Li et al. and this paper were A-to-G events, whereas 86-100% of the overlapping sites for all of the other studies were A-to-G events. Likewise, the differences in the histograms for RNA editing sites (Figure 2A in this paper, and Figure 1A in Li et al.) emphasize how different the analysis in Li et al. is from other similar studies in the literature.

The supplemental table also shows how few RNA editing sites overlap between studies. For example, the authors emphasize how their study recovers 854 A-to-G differences in the DARNED database, but I think it is worth keeping in mind that there were 42,045 sites in the DARNED database and 9636 predicted RNA editing sites (using the threshold for comparison with other studies). This seems to be a common problem that isn't unique to this study (and the authors emphasize that the overlap between genes with RNA editing sites is greater than the overlap of individual RNA editing sties), but I think it is still an interesting observation that is worth keeping in mind for future analysis (which will hopefully have larger samples of paired DNA-Seq and RNA-Seq samples).

In general, I think this method does a good job of identifying and filtering likely causes of spurious RNA editing events (like those mentioned in Schrider et al. 2011). For example, the authors use a "double-filtering" strategy to focus on reads with unique alignments (where a conservative threshold is used to define alignments to potential RNA editing sites but a more liberal criteria is used to search for homologous regions that could be causing inaccurate alignments). I also liked that most of the in-depth analysis focused on sites with an editing ratio greater than 0.2.

This study focused on analysis of the grade IV glioma cell line U87MG (RNA-Seq: GSE28040, DNA-Seq: GSE19986) and a primary breast cancer sample (EGAS00000000054). Although it probably allowed for more cost-effective analysis, I wonder if the results would have been even cleaner if the RNA-Seq and DNA-Seq data were both newly created for this study using similar technologies (for example, the RNA-Seq data is paired-end Illumina reads whereas the DNA-Seq data was from another study using SOLiD reads). However, I think the results were clean enough that this probably didn't matter too much (based upon the ADAR knock-down data).

The novel motif discovery (Figure 5) was interesting, but I had a hard time imagining the relevance of this motif that isn't found at a consistent distance from the A-to-I site (like those shown in Figure 4). That said, I would be interested in see any follow-up analysis that characterizes the mechanism by which this motif is involved with A-to-I editing.

I think this study only provides very limited analysis on A-to-I editing in cancer. To be fair, the sample size (one sample at a time) is probably not sufficient to make many general claims about A-to-I editing in cancer. However, I still think this aspect of the study was over-emphasized. For example, Supplemental Table 13 shows how sensitive the hypergeometric test (comparing RNA editing sites in the two samples) will be when dealing with such a large background set; all of the RNA editing events except G-to-C were statistically significant with a p-value < 0.05, even though the A-to-G overlap was the only category with more than 5 overlapping sites. In other words, I don't think statistical significance was a strong indicator of biological importance for this analysis. Likewise, it was nice that the enrichment analysis of the NCI Cancer Gene Index genes provided some candidate genes, but I don't think this study is useful in identifying a gene where A-to-I editing is highly likely to play an important role in oncogenesis.

Overall, I would recommend this article to anyone interested in RNA editing and next-generation sequencing analysis.