Integrated analyses of gene expression and genetic association studies in a founder population

Human Molecular Genetics, May 2016

Genome-wide association studies (GWASs) have become a standard tool for dissecting genetic contributions to disease risk. However, these studies typically require extraordinarily large sample sizes to be adequately powered. Strategies that incorporate functional information alongside genetic associations have proved successful in increasing GWAS power. Following this paradigm, we present the results of 20 different genetic association studies for quantitative traits related to complex diseases, conducted in the Hutterites of South Dakota. To boost the power of these association studies, we collected RNA-sequencing data from lymphoblastoid cell lines for 431 Hutterite individuals. We then used Sherlock, a tool that integrates GWAS and expression quantitative trait locus (eQTL) data, to identify weak GWAS signals that are also supported by eQTL data. Using this approach, we found novel associations with quantitative phenotypes related to cardiovascular disease, including carotid intima-media thickness, left atrial volume index, monocyte count and serum YKL-40 levels.

Article PDF cannot be displayed. You can download it here:

https://hmg.oxfordjournals.org/content/25/10/2104.full.pdf

Integrated analyses of gene expression and genetic association studies in a founder population

Human Molecular Genetics, 2016, Vol. 25, No. 10 2104–2112 doi: 10.1093/hmg/ddw061 Advance Access Publication Date: 29 February 2016 Association Studies Article A S S O C I AT I O N S T U D I E S A R T I C L E Integrated analyses of gene expression and genetic association studies in a founder population 1 Department of Human Genetics and, 2Department of Medicine, Section of Cardiology, University of Chicago, Chicago, IL 60637, USA, 3Division of Biology and Medicine, Brown University, Providence, RI 02912, USA and 4 Pulmonary and Critical Care, Yale School of Medicine, New Haven, CT 06519, USA *To whom correspondence should be addressed at: Department of Human Genetics, University of Chicago, 920 E. 58th St CLSC 431F, Chicago, IL 60637, USA. Tel: +1 773702-5898; Fax: +1 7738340505; Email: Abstract Genome-wide association studies (GWASs) have become a standard tool for dissecting genetic contributions to disease risk. However, these studies typically require extraordinarily large sample sizes to be adequately powered. Strategies that incorporate functional information alongside genetic associations have proved successful in increasing GWAS power. Following this paradigm, we present the results of 20 different genetic association studies for quantitative traits related to complex diseases, conducted in the Hutterites of South Dakota. To boost the power of these association studies, we collected RNA-sequencing data from lymphoblastoid cell lines for 431 Hutterite individuals. We then used Sherlock, a tool that integrates GWAS and expression quantitative trait locus (eQTL) data, to identify weak GWAS signals that are also supported by eQTL data. Using this approach, we found novel associations with quantitative phenotypes related to cardiovascular disease, including carotid intima-media thickness, left atrial volume index, monocyte count and serum YKL-40 levels. Introduction Genome-wide association studies (GWASs) have become the gold standard for assessing the genetic underpinnings of complex traits in human populations (1). These studies are easily scalable, limited primarily by financial or practical challenges related to measuring the trait of interest in large samples of subjects. However, an obstacle commonly faced by GWAS is that the modest influence of genetic variation at any particular locus makes it difficult to identify variants with statistically significant associations. In light of this, various strategies have been adopted to increase the power of GWAS, including: increasing sample sizes [e.g. (2)], combining studies through meta-analysis [e.g. (3)], studying intermediate phenotypes [e.g. (4)] and integrating results with independent functional data sets [e.g. (5)]. While † Present address: Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA. D.A.C. and M.C. contributed equally. § Present address: Division of Cardiology, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA. Received: August 15, 2015. Revised: February 12, 2016. Accepted: February 21, 2016 ‡ © The Author 2016. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/ licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact 2104 Darren A. Cusanovich1,†,‡, Minal Caliskan1,‡, Christine Billstrand1, Katelyn Michelini1, Claudia Chavarria1, Sherryl De Leon1, Amy Mitrano1, Noah Lewellyn1, Jack A. Elias3, Geoffrey L. Chupp4, Roberto M. Lang2, Sanjiv J. Shah2,§, Jeanne M. Decara2, Yoav Gilad1 and Carole Ober1, * Human Molecular Genetics, 2016, Vol. 25, No. 10 Results GWAS for 20 quantitative traits To identify loci associated with 20 quantitative traits that are known or potential risk factors for asthma and/or CVD, we conducted a GWAS for each of the 20 traits (Table 1 and Supplementary Material, Tables S1 and S2). These studies ranged in size from 263 to 788 subjects and included 387 345–396 968 single-nucleotide polymorphisms (SNPs; Table 1 and Fig. 1B). We analyzed the data using the Genome-wide Efficient Mixed Model Association (GEMMA) algorithm (12), which allowed us to test for genetic associations while accounting for known covariates (Supplementary Material, Table S1) and for SNP-based estimates of relatedness between individuals. Four of the 20 phenotypes we studied (triglycerides, neutrophil count, serum YKL-40 levels, Chitinase 1 activity) yielded significant associations at the genome-wide Bonferroni-corrected threshold (Table 1 and Fig. 2, Supplementary Material, Figs. S1–S16). The strongest associations were between SNPs at the 1q32.1 locus and both YKL-40 levels and chitotriosidase (Chitinase 1) activity. The genes encoding YKL-40 (CHI3L1; Chitinase 3-like 1) and chitotriosidase (CHIT1; Chitinase 1) are adjacent to each other on chromosome 1q32.1. Our top signal with YKL-40 levels (rs2153101) is in the promoter region of CHI3L1, as was the most significant SNP (rs4950928) in our previously published GWAS for this phenotype and r 2 between the two SNPs was 0.98 (8). To our knowledge, we are reporting the first GWAS of chitotriosidase activity. It was previously shown that a 24 base-pair insertion polymorphism (rs3831317) in CHIT1 gene results in the complete absence of chitotriosidase activity (13). The most significant SNP in our GWAS (rs2486070) was an intronic variant of CHIT1 that was in perfect linkage disequilibrium (LD; r 2 = 1) with the functional variant (rs3831317) reported previously. In turn, the GWAS for triglyceride levels implicated variants at 11q23.3, a locus previously associated with both triglyceride levels (14,15) and CVD (16,17). However, previously identified SNPs were not in high LD with the four SNPs that passed the genome-wide significance threshold in our study and no SNPs were pruned (see the ‘Materials and Methods’ section) from the GWAS because of LD with the top SNP. Hence, it is likely that more than one variant in the 11q23.3 locus affects triglyceride levels. Finally, one SNP (rs12634993), an intronic variant of ROBO2, was 2105 associated with neutrophil counts at the genome-wide significance level. Neither this SNP nor the locus has previously been associated with neutrophil counts (18–23), indicating the necessity of further replication studies of this finding. Mapping cis-expression quantitative trait loci To further enrich the resources available for studying the genetic basis of disease in the Hutterites, we collected gene expression data from LCLs from a large cohort of individuals. In total, RNAseq data from 431 adult Hutterites were included in the final data set (see the Materials and Methods section and Supplementary Materials for full details). The sample of 431 individuals is independent from the samples (...truncated)


This is a preview of a remote PDF: https://hmg.oxfordjournals.org/content/25/10/2104.full.pdf
Article home page: http://hmg.oxfordjournals.org/content/25/10/2104.abstract

Darren A. Cusanovich, Minal Caliskan, Christine Billstrand, Katelyn Michelini, Claudia Chavarria, Sherryl De Leon, Amy Mitrano, Noah Lewellyn, Jack A. Elias, Geoffrey L. Chupp, Roberto M. Lang, Sanjiv J. Shah, Jeanne M. Decara, Yoav Gilad, Carole Ober. Integrated analyses of gene expression and genetic association studies in a founder population, Human Molecular Genetics, 2016, pp. 2104-2112, 25/10, DOI: 10.1093/hmg/ddw061