EMT is the dominant program in human colon cancer (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/1755-8794-4-9.pdf

EMT is the dominant program in human colon cancer

- program in Loboda et al. Open Access EMT is the dominant program in human colon cancer Andre Loboda1, Michael V Nebozhyn1, James W Watters1, Carolyne A Buser3, Peter Martin Shaw2, Pearl S Huang3, Laura Vant Veer7, Rob AEM Tollenaar8, David B Jackson6, Deepak Agrawal5, Hongyue Dai4, Timothy J Yeatman5* Background: Colon cancer has been classically described by clinicopathologic features that permit the prediction of outcome only after surgical resection and staging. Methods: We performed an unsupervised analysis of microarray data from 326 colon cancers to identify the first principal component (PC1) of the most variable set of genes. PC1 deciphered two primary, intrinsic molecular subtypes of colon cancer that predicted disease progression and recurrence. Results: Here we report that the most dominant pattern of intrinsic gene expression in colon cancer (PC1) was tightly correlated (Pearson R = 0.92, P < 10-135) with the EMT signature both in gene identity and directionality. In a global micro-RNA screen, we further identified the most anti-correlated microRNA with PC1 as MiR200, known to regulate EMT. Conclusions: These data demonstrate that the biology underpinning the native, molecular classification of human colon cancerpreviously thought to be highly heterogeneous was clarified through the lens of comprehensive transcriptome analysis. Background Colon cancer has long been postulated to be a molecularly heterogeneous disease. This heterogeneity has been proposed as the reason why it has been difficult to identify unifying molecular hypotheses explaining the biology and behavior of the disease. Molecular profiling of colon cancer has been a relatively effective approach for identifying prognosis of early and intermediate stage disease. We and others have identified biologically complex signatures that affect multiple programs such as adhesion, invasion, and angiogenesis and correlate well with cancer progression and recurrence. These signatures appear to support Weinbergs hypothesis [1] of multiple programs leading to cancer development and progression. These signatures have generally been developed using supervised machine learning techniques that train their models on pre-determined good vs. poor prognosis patient populations [2-6]. Colon cancer, unlike breast cancer where luminal and basal intrinsic subtypes have * Correspondence: 5Moffitt Cancer Center, 12902 Magnolia Drive, Tampa, FL 33612, USA Full list of author information is available at the end of the article been identified [7-13], or bladder cancer where intrinsic signatures of recurrence have been established [14,15], has yet to be classified by unsupervised, molecular profiling approaches. We believed it was important to attempt to uncover unbiased, native biological traits that might underpin colon cancer. Methods Colon Cancer Samples 326 human colon cancer samples derived from the Moffitt Cancer Center were previously assessed using a single Affymetrix U133Plus2.0 platform and single standard operating procedure. Formalin fixed paraffin blocks (FFPE) were obtained for 69 of these cases and used to extract tumor RNA after macrodissection. Tumor RNA was submitted for global microRNA analysis using an Applied Biosystems platform covering ~700 unique microRNA species. The gene expression data were then compared directly to the microRNA data derived from the same samples. All patient samples and clinical information for the 326 colon samples were obtained through a protocol approved by The University of South Florida Institutional Review Board. Identification of the cell line derived EMT signature The EMT signature was derived from a microarray dataset with 93 lung cancer cell lines by performing a t-test comparing cell lines exhibiting mesenchymal-like gene expression pattern (high levels of VIM and low levels of CDH1) vs. cell lines with epithelial-like gene expression pattern (low levels of VIM and high levels of CDH1). Genes with p-value < 0.01 by a t-test were selected, and were split into those that were up-regulated in mesenchymal-like cell lines and those that were up-regulated in epithelial like, and further restricted to approximately 200 unique gene symbols in each up and down regulated gene sets based on the absolute value of the fold change. Identification of PC1 Unsupervised analysis of the most variable genes expressed in the colon cancer data set (n = 326) was undertaken to discover new, intrinsic biology of colon cancer. Principal component analysis on the entire gene expression data set of 326 CRC samples, as implemented in the Princomp function in Matlab, (Mathworks Inc.), was computed by selecting the 1st principal component (PC1) corresponding to the highest eigenvalue of the covariance matrix, describing the inherent variability of the data. Derivation of colon signatures We identified a set of gene sets that were associated with different endpoints related to tumor histology. Signatures for each of the following scenarios was created: right/left (RT/LT) colon was computed by comparing 60 samples collected in RT Colon vs. 18 samples collected in LT Colon; Mucinous/Non-Mucinous colon carcinoma was developed by comparing 35 mucinous colon carcinomas vs. 165 non-mucinous; MSI/MSS was created by comparing 6 MSI vs. 73 MSS samples; Carcinoma vs. Adenoma was developed by comparing 22 pure adenocarcinoma samples vs. 5 pure adenomas; Poor/Well differentiation was discovered by comparing 32 poorly differentiated samples vs. 19 well differentiated, Colon/Rectum by comparing 50 samples collected in colon vs. 19 samples collected in rectum; Stage2/Stage1 was identified by comparing 59 stage 2 samples vs. 32 stage 1 samples, Stage 3/Stage 2 (71 Stage 3 samples vs. 59 Stage 2 samples) was similarly identified. Each comparison was carried on non-metastatic samples with known stage, histology, and collection site. For each comparison, two gene sets (up and down regulated) were identified by t-test with p-value < 0.01, split by a sign of fold change, selection of unique gene symbols among 100 probes most differentially expressed by an absolute value of fold change. Performance of these gene sets was evaluated by back substitution and the scores for gene sets were computed as the mean of probes mapped by the gene symbol to the up-regulated subset minus the mean of the probes that mapped by the gene symbol to the down-regulated subset. They were found to have ROC AUC>0.7 and 1-way ANOVA p-value < 1e-6 when applied to distinguish the same samples that were used to identify these gene sets. Scoring of signatures in the data set Signature score for a given gene set was obtained by averaging the expression levels of the probes that mapped by the gene symbol to that gene set. MYC and RAS signatures were obtained from Nevins et al [16,17]. Standard microarray data processing The microarray data was processed by running RMA normalization method as implemented in Affymetrix Power Tools usin (...truncated)