Integrated analyses of gene expression and genetic association studies in a founder population
Human Molecular Genetics, 2016, Vol. 25, No. 10
2104–2112
doi: 10.1093/hmg/ddw061
Advance Access Publication Date: 29 February 2016
Association Studies Article
A S S O C I AT I O N S T U D I E S A R T I C L E
Integrated analyses of gene expression and genetic
association studies in a founder population
1
Department of Human Genetics and, 2Department of Medicine, Section of Cardiology, University of Chicago,
Chicago, IL 60637, USA, 3Division of Biology and Medicine, Brown University, Providence, RI 02912, USA and
4
Pulmonary and Critical Care, Yale School of Medicine, New Haven, CT 06519, USA
*To whom correspondence should be addressed at: Department of Human Genetics, University of Chicago, 920 E. 58th St CLSC 431F, Chicago, IL 60637, USA.
Tel: +1 773702-5898; Fax: +1 7738340505; Email:
Abstract
Genome-wide association studies (GWASs) have become a standard tool for dissecting genetic contributions to disease risk.
However, these studies typically require extraordinarily large sample sizes to be adequately powered. Strategies that incorporate
functional information alongside genetic associations have proved successful in increasing GWAS power. Following this
paradigm, we present the results of 20 different genetic association studies for quantitative traits related to complex diseases,
conducted in the Hutterites of South Dakota. To boost the power of these association studies, we collected RNA-sequencing data
from lymphoblastoid cell lines for 431 Hutterite individuals. We then used Sherlock, a tool that integrates GWAS and expression
quantitative trait locus (eQTL) data, to identify weak GWAS signals that are also supported by eQTL data. Using this approach, we
found novel associations with quantitative phenotypes related to cardiovascular disease, including carotid intima-media
thickness, left atrial volume index, monocyte count and serum YKL-40 levels.
Introduction
Genome-wide association studies (GWASs) have become the gold
standard for assessing the genetic underpinnings of complex
traits in human populations (1). These studies are easily scalable,
limited primarily by financial or practical challenges related to
measuring the trait of interest in large samples of subjects. However, an obstacle commonly faced by GWAS is that the modest
influence of genetic variation at any particular locus makes it difficult to identify variants with statistically significant associations. In light of this, various strategies have been adopted
to increase the power of GWAS, including: increasing sample
sizes [e.g. (2)], combining studies through meta-analysis [e.g.
(3)], studying intermediate phenotypes [e.g. (4)] and integrating
results with independent functional data sets [e.g. (5)]. While
†
Present address: Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
D.A.C. and M.C. contributed equally.
§
Present address: Division of Cardiology, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA.
Received: August 15, 2015. Revised: February 12, 2016. Accepted: February 21, 2016
‡
© The Author 2016. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/
licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
For commercial re-use, please contact
2104
Darren A. Cusanovich1,†,‡, Minal Caliskan1,‡, Christine Billstrand1,
Katelyn Michelini1, Claudia Chavarria1, Sherryl De Leon1, Amy Mitrano1,
Noah Lewellyn1, Jack A. Elias3, Geoffrey L. Chupp4, Roberto M. Lang2,
Sanjiv J. Shah2,§, Jeanne M. Decara2, Yoav Gilad1 and Carole Ober1, *
Human Molecular Genetics, 2016, Vol. 25, No. 10
Results
GWAS for 20 quantitative traits
To identify loci associated with 20 quantitative traits that are
known or potential risk factors for asthma and/or CVD, we conducted a GWAS for each of the 20 traits (Table 1 and Supplementary Material, Tables S1 and S2). These studies ranged in size from
263 to 788 subjects and included 387 345–396 968 single-nucleotide polymorphisms (SNPs; Table 1 and Fig. 1B). We analyzed
the data using the Genome-wide Efficient Mixed Model Association (GEMMA) algorithm (12), which allowed us to test for genetic
associations while accounting for known covariates (Supplementary Material, Table S1) and for SNP-based estimates of relatedness
between individuals. Four of the 20 phenotypes we studied (triglycerides, neutrophil count, serum YKL-40 levels, Chitinase 1
activity) yielded significant associations at the genome-wide
Bonferroni-corrected threshold (Table 1 and Fig. 2, Supplementary
Material, Figs. S1–S16).
The strongest associations were between SNPs at the 1q32.1
locus and both YKL-40 levels and chitotriosidase (Chitinase 1)
activity. The genes encoding YKL-40 (CHI3L1; Chitinase 3-like
1) and chitotriosidase (CHIT1; Chitinase 1) are adjacent to each
other on chromosome 1q32.1. Our top signal with YKL-40 levels
(rs2153101) is in the promoter region of CHI3L1, as was the most
significant SNP (rs4950928) in our previously published GWAS
for this phenotype and r 2 between the two SNPs was 0.98 (8).
To our knowledge, we are reporting the first GWAS of chitotriosidase activity. It was previously shown that a 24 base-pair insertion polymorphism (rs3831317) in CHIT1 gene results in the
complete absence of chitotriosidase activity (13). The most significant SNP in our GWAS (rs2486070) was an intronic variant of
CHIT1 that was in perfect linkage disequilibrium (LD; r 2 = 1) with
the functional variant (rs3831317) reported previously.
In turn, the GWAS for triglyceride levels implicated variants at
11q23.3, a locus previously associated with both triglyceride
levels (14,15) and CVD (16,17). However, previously identified
SNPs were not in high LD with the four SNPs that passed the genome-wide significance threshold in our study and no SNPs were
pruned (see the ‘Materials and Methods’ section) from the GWAS
because of LD with the top SNP. Hence, it is likely that more than
one variant in the 11q23.3 locus affects triglyceride levels. Finally,
one SNP (rs12634993), an intronic variant of ROBO2, was
2105
associated with neutrophil counts at the genome-wide significance level. Neither this SNP nor the locus has previously been
associated with neutrophil counts (18–23), indicating the necessity of further replication studies of this finding.
Mapping cis-expression quantitative trait loci
To further enrich the resources available for studying the genetic
basis of disease in the Hutterites, we collected gene expression
data from LCLs from a large cohort of individuals. In total, RNAseq data from 431 adult Hutterites were included in the final data
set (see the Materials and Methods section and Supplementary
Materials for full details). The sample of 431 individuals is independent from the samples (...truncated)