Initial analysis of the TCGA data for ovarian cancer was recently published in Nature this week. The Cancer Genome Atlas (TCGA) is a joint project by the NCI and NHGRI to study genomic changes that are associated with many different types of cancer by collecting a large number of patient samples for analysis using mRNA gene expression microarrays, copy number arrays, methylation arrays, miRNA microarrays, and exome sequencing. This data can be freely downloaded using the TCGA Data Portal.
There is a huge amount of information presented in this paper. For example, the first TCGA paper provided an overview of the glioblasoma data, mostly focusing on somatic mutations and copy number alternations. There have been a number number of subsequent papers studying the glioblastoma data, and the subsequent TCGA papers that I am most familar with focused on subtypes defined by gene expression patterns (Verhaak et al. 2010) and methylation patterns (Noushmehr et al. 2010). The new ovarian cancer TCGA paper provides all of the information provided in the glioblastoma nature paper in addition to the subtyping analysis that was covered in mulitple high-impact, highly cited papers.
I think one of the most important take-home messages was the extremely important role of p53 in ovarian cancer. For example, 96% of high-grade tumors showed p53 mutations, which has also been shown previously in publications such as Ahmed et al. 2010. In contrast, the TCGA glioblastoma paper showed a p53 mutation rate of 38% to 58% for untreated and treated tumors, respectively. Interestingly, the ovarian cancer TCGA paper also revealed a high rate of p53 mutation in ovarian cancers contributes to FOXM1 overexpression by using PARADIGM to identify pathway alterations in the new TCGA data (where pathways were defined using the NCI Pathway Interaction Database).
Another striking result was how consistent the copy-number alterations were within either ovarian tumors or glioblastomas but how different the copy-number alternations were between the two cancer types (as shown in Figure 1a).
Although I was impressed that the study defined separate subtypes for mRNA gene expression, miRNA gene expression, and CpG methylation status, I had mixed feelings about the results. For example, the subtypes defined by methylaton only had "modest stability" (so, they have limited predictive power), and I thought the overlap between the mesenchymal mRNA subtype and tbe C2 miRNA subtye (and the proliferative mRNA subtype with the C1 miRNA subtype) was overemphasized. I was also a little disappointed that the integrative analysis didn't substantially enhance the subtype definitions (for example, I think Figure S6.4 in the ovarian cancer paper looks less impressive than Figure 3 in Verhaak et al.). However, I did find it interesting that both the glioblastoma and ovarian cancers had a "mesenchymal" subtype (although I don't think these subtypes necessarily have the same biological meaning), and I think it will definitely be interesting to further characterize the subtypes defined based upon mRNA gene expression.
I was somewhat surprised at how much the survival curves varied for the 4 data sets shown in Figure 2c. For example, the TCGA test set (N = 255) and the data from Tothill et al. 2008 (N = 237) had very different Cox p-values (0.02 and 0.00008, respectively). Nevertheless, it is not trivial to get a statistically significant result in 4 independent data sets, and I think the survival results are certianly strong enough to warrant further investigation in order to understand the cause of this variation.
Overall, I would consider this a must-read for any bioinformatician interested in cancer research.