Pancan-meQTL: a database to systematically evaluate the effects of genetic variants on methylation in human cancer

Nucleic Acids Research, Jan 2019

DNA methylation is an important epigenetic mechanism for regulating gene expression. Aberrant DNA methylation has been observed in various human diseases, including cancer. Single-nucleotide polymorphisms can contribute to tumor initiation, progression and prognosis by influencing DNA methylation, and DNA methylation quantitative trait loci (meQTL) have been identified in physiological and pathological contexts. However, no database has been developed to systematically analyze meQTLs across multiple cancer types. Here, we present Pancan-meQTL, a database to comprehensively provide meQTLs across 23 cancer types from The Cancer Genome Atlas by integrating genome-wide genotype and DNA methylation data. In total, we identified 8 028 964 cis-meQTLs and 965 050 trans-meQTLs. Among these, 23 432 meQTLs are associated with patient overall survival times. Furthermore, we identified 2 214 458 meQTLs that overlap with known loci identified through genome-wide association studies. Pancan-meQTL provides a user-friendly web interface (http://bioinfo.life.hust.edu.cn/Pancan-meQTL/) that is convenient for browsing, searching and downloading data of interest. This database is a valuable resource for investigating the roles of genetics and epigenetics in cancer.

Article PDF cannot be displayed. You can download it here:

https://academic.oup.com/nar/article-pdf/47/D1/D1066/27436283/gky814.pdf

Pancan-meQTL: a database to systematically evaluate the effects of genetic variants on methylation in human cancer

D1066–D1072 Nucleic Acids Research, 2019, Vol. 47, Database issue doi: 10.1093/nar/gky814 Published online 7 September 2018 Pancan-meQTL: a database to systematically evaluate the effects of genetic variants on methylation in human cancer Jing Gong1,* , Hao Wan1 , Shufang Mei1 , Hang Ruan2 , Zhao Zhang2 , Chunjie Liu3 , An-Yuan Guo3 , Lixia Diao4,* , Xiaoping Miao1,* and Leng Han2,* Department of Epidemiology and Biostatistics, Key Laboratory of Environmental Health of Ministry of Education, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, PR China, 2 Department of Biochemistry and Molecular Biology, The University of Texas Health Science Center at Houston McGovern Medical School, Houston, TX 77030, USA, 3 Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China and 4 Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA Received July 23, 2018; Revised August 22, 2018; Editorial Decision August 30, 2018; Accepted August 30, 2018 ABSTRACT INTRODUCTION DNA methylation is an important epigenetic mechanism for regulating gene expression. Aberrant DNA methylation has been observed in various human diseases, including cancer. Single-nucleotide polymorphisms can contribute to tumor initiation, progression and prognosis by influencing DNA methylation, and DNA methylation quantitative trait loci (meQTL) have been identified in physiological and pathological contexts. However, no database has been developed to systematically analyze meQTLs across multiple cancer types. Here, we present Pancan-meQTL, a database to comprehensively provide meQTLs across 23 cancer types from The Cancer Genome Atlas by integrating genome-wide genotype and DNA methylation data. In total, we identified 8 028 964 cis-meQTLs and 965 050 trans-meQTLs. Among these, 23 432 meQTLs are associated with patient overall survival times. Furthermore, we identified 2 214 458 meQTLs that overlap with known loci identified through genome-wide association studies. Pancan-meQTL provides a user-friendly web interface (http://bioinfo.life.hust.edu.cn/Pancan-meQTL/) that is convenient for browsing, searching and downloading data of interest. This database is a valuable resource for investigating the roles of genetics and epigenetics in cancer. The interpretation of the function of genomic variants, particularly in non-coding regions, is a major challenge for the genetic dissection of complex diseases such as cancer (1). Genome-wide association studies (GWAS) have identified numerous genetic loci that influence the risk of human cancer (2,3), but most of these loci are located in noncoding regions and are without clear molecular mechanisms that contribute to the phenotypic outcome. Previous studies considered a diverse set of functional regions, including miRNA binding sites, protein modification sites and transcription factor binding sites (4,5). However, the link between variants and epigenetic signals involved in the regulation of key biological processes has been largely overlooked. As a major epigenetic mechanism that directs gene expression, DNA methylation plays a key role in the regulation of crucial biological and pathological processes (6). Aberrant DNA methylation is frequently observed in various cancers (7) and represents an attractive biomarker and therapeutic target (8,9). Increasing evidence indicates that single-nucleotide polymorphisms (SNPs) contribute to tumor initiation, progression and prognosis by influencing DNA methylation levels (10,11). Therefore, DNA methylation may be an important molecular-level phenotype that links a genotype with the trait of a complex disease. It is fundamentally vital to build a public data repository to identify SNPs that significantly affect DNA methylation levels, i.e. methylation quantitative trait loci (meQTL). Recent methodological advances allow for genome-wide screening of meQTLs in different tissues, including blood (12), lung * To whom correspondence should be addressed. Tel: +86 27 8365 0744; Email: Correspondence may also be addressed to Xiaoping Miao. Tel: +86 27 8365 0744; Email: Correspondence may also be addressed to Lixia Diao. Email: Correspondence may also be addressed to Leng Han. Tel: +1 713 500 6039; Email:  C The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 1 Nucleic Acids Research, 2019, Vol. 47, Database issue D1067 DATA COLLECTION AND PROCESSING Genotype data collection, imputation and processing We downloaded genotype data (level 2) from TCGA data portal (https://portal.gdc.cancer.gov/) (Figure 1A). We kept 7735 samples with both genotype data and methylation data. We then combined colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ) as colorectal cancer (CRC) (15) and removed cancer types with sample size <100 primary tumor samples. Thus, for further analysis, we had 7242 samples across 23 cancer types. We performed genotype imputation and filtering per cancer type as described in our previous study (16). After imputation and quality filtering, on average, 4 318 218 genotypes per cancer type were included in the meQTL analysis. Methylation data collection and processing Methylation beta values (level 3) obtained from TCGA data portal (https://gdc-portal.nci.nih.gov/) were measured by the Illumina Infinium HumanMethylation450 BeadChip array, which contained 485 512 probes for each sample. Due to the specific nature of methylation patterns on sex chromosomes (17), we focused on autosomes. In each cancer type, probes were filtered by the following criteria: (i) methylation beta value missing rate > 0.05, (ii) mapping to multiple locations on the genome (18) and (iii) containing known SNP (1000 Genome Phase3 (19), MAF > 0.01) at CpG sites (20,21) (Figure 1B). On average, 369 244 highquality methylation probes per cancer type were used for analyses. To minimize the effects of outliers on the regression scores, the values for each probe across samples per cancer type were transformed into a standard normal distribution based on rank (17,22,23). Covariates To correct for known and unknown confounders and increase the sensitivity of our analyses, we included several covariates. The top five principal components calculated by smartpca in the EIGENSOFT program (24) were included to control for ethnicity differences. To remove hidden batch effects and other confounders in the methylation data, we used PEER software (25) to select the first 15 PEER factors from the (...truncated)


This is a preview of a remote PDF: https://academic.oup.com/nar/article-pdf/47/D1/D1066/27436283/gky814.pdf
Article home page: https://academic.oup.com/nar/article/47/D1/D1066/5091954

Gong, Jing, Wan, Hao, Mei, Shufang, Ruan, Hang, Zhang, Zhao, Liu, Chunjie, Guo, An-Yuan, Diao, Lixia, Miao, Xiaoping, Han, Leng. Pancan-meQTL: a database to systematically evaluate the effects of genetic variants on methylation in human cancer, Nucleic Acids Research, 2019, pp. D1066-D1072, Volume 47, Issue D1, DOI: 10.1093/nar/gky814