Integrated Analysis of Copy Number Variations and Gene Expression Profiling in Hepatocellular carcinoma
www.nature.com/scientificreports
OPEN
Received: 17 March 2017
Accepted: 18 August 2017
Published: xx xx xxxx
Integrated Analysis of Copy
Number Variations and
Gene Expression Profiling in
Hepatocellular carcinoma
Chenhao Zhou1,3, Wentao Zhang1,3, Wanyong Chen1,2,3, Yirui Yin1,3, Manar Atyah1,3, Shuang
Liu1,3, Lei Guo1,3, Yi Shi4, Qinghai Ye1,3, Qiongzhu Dong2,5 & Ning Ren1,2,3
Hepatocellular carcinoma (HCC) is one of the top three cancer killers worldwide. To identify CNV-driven
differentially expressed genes (DEGs) in HBV related HCC, this study integrated analysis of copy number
variations (CNVs) and gene expression profiling. Significant genes in regions of CNVs were overlapped
with those obtained from the expression profiling. 93 CNV-driven genes exhibiting increased expression
in the duplicated regions and 45 showing decreased expression in the deleted regions were obtained,
which duplications and deletions were mainly documented at chromosome 1 and 4. Functional and
pathway enrichment analyses were performed using DAVID and KOBAS, respectively. They were mainly
enriched in metabolic process and cell cycle. Protein-protein interaction (PPI) network was constructed
by Cytoscape, then four hub genes were identified. Following, survival analyses indicated that only high
NPM1 expression was significantly and independently associated with worse survival and increased
recurrence in HCC patients. Moreover, this correlation remained significant in patients with early stage
of HCC. In addition, we showed that NPM1 was overexpressed in HCC cells and in HCC versus adjacent
non-tumor tissues. In conclusion, these results showed that integrated analysis of genomic and
expression profiling might provide a powerful potential for identifying CNV-driven genes in HBV related
HCC pathogenesis.
Hepatocellular carcinoma (HCC) is one of the top three types of fatal cancer in China and the world1, 2. In China,
the high incidence of HCC is mainly attributed to the prevalence of hepatitis virus infection, especially hepatitis B
virus (HBV). Non-alcoholic fatty liver diseases and alcoholic liver diseases are also risk factors to drive the process
of developing HCC3. However, a lack of knowledge regarding the precise molecular mechanisms underlying HCC
progression limits the ability to treat HCC effectively.
Copy number variations (CNVs) are DNA segments, which are larger than 1 kb in length when compared to
a reference genome, that can lead to activation of oncogenes and inactivation of tumor suppressor genes in cancers4, 5. CNVs can effectively affect gene expression and are related to the susceptibility of diseases. Several studies
have shown that a duplication or deletion of CNVs affects the expression of genes and cancer-related biologic
processes6. Duplications of chromosome 1, 7, 8 and 20 and deletions of chromosome 4, 8, 13 and 17 have been
identified in HCC through traditional technical methods. For example, the CNV of chromosome 13q might be
used to monitor the progression of chronic hepatitis-associated liver carcinogenesis7–10.
Gene expression profiling by microarray analysis has been shown to be a powerful tool for the identification of
cancer-related genes. However, a large number of differentially expressed genes (DEGs) can be obtained through
the analysis. Hence, the key points of analyzing gene expression profiling are how to accurately select out which
DEGs are critical to neoplastic process (“driver genes”) and which are not (“passenger genes”)11.
1
Department of Liver Surgery, Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China.
Institute of Fudan-Minhang Academic Health System, Minhang Hospital, Zhongshan Hospital, Fudan University,
Shanghai, China. 3Key Laboratory of Carcinogenesis and Cancer Invasion, Ministry of Education, Shanghai, China.
4
Biomedical Research Centre, Zhongshan Hospital, Fudan University, Shanghai, China. 5Institutes of Biomedical
Sciences, Fudan University, Shanghai, China. Chenhao Zhou, Wentao Zhang and Wanyong Chen contributed equally
to this work. Correspondence and requests for materials should be addressed to Q.D. (email: .
cn) or N.R. (email: )
2
SCIENtIFIC REPOrTs | 7: 10570 | DOI:10.1038/s41598-017-11029-y
1
www.nature.com/scientificreports/
Figure 1. (a) Distributions of copy number deletions in the chromosomes, (b) Distributions of copy number
duplications in the chromosomes. (c) Hierarchical clustering of gene expression profiling. Samples are indicated
along the horizontal axis and grouped by the color bar above the heat map. Blue represents non-tumor tissue
and red represents tumor tissue.
Several studies have been conducted through integrated analysis of CNVs and gene expression profiling in
HCC, but they were limited to the use of small sized tumor samples or relatively lower-resolution platforms12, 13.
In this study, we applied a whole-genome SNP 6.0 array to analyze CNVs of the 33 paired HBV related HCC
and non-tumor tissues. The gene expression profiling data were obtained from our previous studies (GSE14520).
By integrating the analysis of CNVs and gene expression profiling to identify CNV-driven DEGs, we may light
further insights of HBV related HCC development at a molecular level, and explore a clinically useful candidate
gene for diagnosis, prognosis, and drug targets.
Results
Identification of significant CNVs in HCC genomes. We analyzed the hybridization signal intensities of
33 paired HCC and non-tumor tissues to identify regions of CNVs. A total of 13,839 CNVs were identified in the
33 HCC genomes, including 5,457 copy number deletions (mean size, 349.1 kb) and 8,382 copy number duplications (mean size, 419.0 kb) (Supplementary Table 1). CNVs were scattered across chromosome 1 to 22, and both
of the highest duplication and deletion were found in chromosome 1. The second highest number of duplications
and deletions were documented at chromosome 5 and 4, respectively. HCC genomes had a mean of 419 CNVs,
and copy number duplications were more commonly observed than deletions (1.5:1). It was found that regions of
>100 kb long had the most copy number deletions and duplications (Fig. 1a,b).
To find potential HCC-related significant CNVs, we further evaluated CNVs using the following standard: the
gene of CNVs was present in at least 10% (4 samples) samples. Accordingly, a total of 2,912 significant CNV genes
were obtained, including 875 deletions and 2,037 duplications (Supplementary Table 2 and 3).
Analysis of gene expression profiling. The total number of samples analyzed was 30 paired HCC and
non-tumor tissues from our previous study (GSE14520). The baseline characteristics of these patients were similar to the 33 patients whose paired HCC and non-tumor tissues analyses were carried out with a whole-genome
SNP 6.0 array (Table 1). All patients of these two cohorts were infected with hepatitis B virus. In total, 965 genes
were differentially expressed by at least two-fold with statistic (...truncated)