Monday, March 14, 2011

Article Review: Epigenetic suppression of the TGF-beta pathway revealed by transcriptome profiling in ovarian cancer

In this paper, Matsumura et al. develop a method to identify methylated genes in ovarian cancer patients using gene expression data from roughly 40 ovarian cancer cell lines and 20 cultured primary tumor samples.  The authors posit that this method provides a unique opportunity to study pathways affected by methylation because it directly examines gene expression.

My overall thoughts on this paper:

  • The study produced a relatively large amount of data, which is now available in GEO
  • The study utilized a large amount of publicly available data, providing a very useful list of citations for anyone interested in doing bioinformatics analysis on ovarian cancer (especially those interested in methylation).
  • The authors utilize useful open-source tools for pathway analysis (namely GATHER and the specialized binary regression method)

  • I think it is more likely that methylation directly suppresses EMT-related genes (such as those involved with cell adhesion) rather than repressing the TGF-beta pathway (which then regulates EMT genes).
  • Unlike in other cancers, patients with methylated genes do not show a worse prognosis.  In fact, I wouldn't be surprised if patents with methylated genes had a slightly better prognosis because methylation suppresses genes associated with the epithelial-mesenchymal transition (which is associated with a progression to a more aggressive cancer).  This hypothesis is also supported by the stromal response data shown in Figure S9.

I think one of the most useful tools discussed in this paper is GATHER, which is very fast and has a simple user interface.  GATHER provides enrichment analysis for information from various databases, such as Gene Ontology, KEGG Pathways, TRANSFAC, and MEDLINE.  More detailed information about the data mined in GATHER can be found in the associated paper by Chang and Nevins.

In fact, GATHER was immediately useful in helping interpret the results of this study.  For example, I used GATHER to check the enrichment for the list of 378 methylated genes described in this paper.  This revealed that the TGF-beta signaling pathway was not the most significantly enriched pathway in the gene list, and the TGF-beta signaling pathway actually had the smallest number of representative genes in the methylated gene list (out of the significantly enriched pathways).  GATHER was also useful for studying the enrichment of pathways in the more conservative "methyl cluster" gene list (which showed a weaker association with the TGF-beta pathway and a stronger association with other pathways, such as the focal adhesion genes).  These are some of the reasons that I believe the methylation directly suppresses EMT-related genes in these ovarian cancer patients (rather than acting through the TGF-beta pathway).

Another useful open-source tool described in the paper is the binary regression method used to define the TGF-beta gene signature.  The binary regression method is especially useful for biologists without a lot of coding experience because it has MATLAB GUI with a simple, user-friendly interface (and version 2.0 is even better than the original code).  In addition to defining gene and pathway signatures, the Bild lab is also currently using this binary regression algorithm to predict drug sensitivity from patient samples.

That said, there are probably a few things I should warn potential users about before giving this product my complete stamp of approval.  Although I have played around this tool a little bit (with encouraging results), I haven't had a chance to use it as much as the relatively common R packages for SVMs (in the e1071 package) and classification trees (in the tree package).  Therefore, I can't really comment about the practical limitations of this algorithm.

I was also a little bit nervous when I saw that Anil Potti (who I mentioned in my previous blog post) was one of the authors on the original Nature paper by Bild et al. for the binary regression method.  However, Potti wasn't involved with the early framework for this method (described by West et al.), and a retraction request for one of the retracted Potti papers states "although we believe that the underlying approach to developing predictive signatures is valid, a corruption of several validation data sets precludes conclusions regarding these signatures."  Therefore, I don't think Anil Potti had any negative influence on the binary regression method.

Overall, I found this paper to be useful and informative, and I would recommend it for anyone interested in microarray analysis.

Thursday, March 10, 2011

Retractions in PubMed

For those who don't know, PubMed lists retractions (in addition to the standard stuff like articles, editorials, etc.).  The details regarding how PubMed decides when to flag retractions are provided here.

With retractions on the rise, I think PubMed retraction listings can play a useful role in helping hold authors accountable for publishing misleading formation.  For example, I hope NIH reviewers check PubMed to watch out for PIs with tainted records.

However, I noticed an inconsistency today that made me curious about how these retractions are annotated.

For example, take a look of these two Anil Potti papers that have been retracted:

This is how retractions should look:

However, this other paper doesn't have the retraction designation, even though there is already a separate PubMed entry for this paper's retraction::

There is a 2007 erratum mentioned for the New England Journal of Medicine paper but no retraction flag (as could be seen clearly for the Nature Medicine paper).

If anyone can provide additional information on this topic, then I would certainly appreciate it.
Creative Commons License
My Biomedical Informatics Blog by Charles Warden is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.