Very low-depth sequencing in a founder population identifies a cardioprotective APOC3 signal missed by genome-wide imputation
Human Molecular Genetics, 2016, Vol. 25, No. 11
2360–2365
doi: 10.1093/hmg/ddw088
Advance Access Publication Date: 4 May 2016
Association Studies Article
ASSOCIATION STUDIES ARTICLE
Very low-depth sequencing in a founder population
identifies a cardioprotective APOC3 signal missed by
genome-wide imputation
1
Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK,
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus,
Hinxton, Cambridge CB10 1SD, UK, 3Wellcome Trust Centre for Human Genetics, Oxford OX3 7BN, UK,
4
Department of Nutrition and Dietetics, School of Health Science and Education, Harokopio University, Athens
17671, Greece and 5Anogia Medical Centre, Anogia 740 51, Greece
2
*To whom correspondence should be addressed. Tel: þ44 (0)1223 834244; Fax: þ44 (0)1223 496802; Email:
Abstract
Cohort-wide very low-depth whole-genome sequencing (WGS) can comprehensively capture low-frequency sequence
variation for the cost of a dense genome-wide genotyping array. Here, we analyse 1x sequence data across the APOC3 gene in
a founder population from the island of Crete in Greece (n ¼ 1239) and find significant evidence for association with blood
triglyceride levels with the previously reported R19X cardioprotective null mutation (b ¼ 1.09,r ¼ 0.163, P ¼ 8.2 1011) and a
second loss of function mutation, rs138326449 (b ¼ 1.17,r ¼ 0.188, P ¼ 1.14 109). The signal cannot be recapitulated by
imputing genome-wide genotype data on a large reference panel of 5122 individuals including 249 with 4x WGS data from the
same population. Gene-level meta-analysis with other studies reporting burden signals at APOC3 provides robust evidence
for a replicable cardioprotective rare variant aggregation (P ¼ 3.2 1031, n ¼ 13 480).
Introduction
Dyslipidaemia is a well-established risk factor for cardiovascular disease, the leading cause of death worldwide. Blood lipid
levels have a heritable component, and the underlying common-frequency genetic determinants have been studied in
large-scale genome-wide association studies (GWAS) (1,2).
Apolipoprotein CIII plays an important role in regulating triglyceride (TG) levels (3). Common-frequency variants upstream of
the APOC3 gene, coding for apolipoprotein CIII, have been
associated with plasma TG levels at genome-wide significance
in studies of 100 000 individuals (2). More recently, a rare splice
variant in APOC3 was found to be associated with blood TG levels in the UK10K study, replicating across a total of 15 000
European individuals (4). Power to detect genetic associations
can be considerably higher in isolated populations as rare variants may have drifted up in frequency following the bottleneck
event (5,6). In 2008, a low-frequency APOC3 null mutation (R19X)
was found to have a cardioprotective effect in the Amish founder population (n 1800) (7), and the same variant was
Received: December 23, 2015. Revised: March 4, 2016. Accepted: March 14, 2016
C The Author 2016. Published by Oxford University Press.
V
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/),
which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
2360
Arthur Gilly1, Graham Rs Ritchie1,2, Lorraine Southam1,3,
Aliki-Eleni Farmaki4, Emmanouil Tsafantakis5, George Dedoussis4
and Eleftheria Zeggini1,*
Human Molecular Genetics, 2016, Vol. 25, No. 11
subsequently found to be associated with reduced TG levels in
an isolated Greek population (n 1000) (8). R19X has independently risen in frequency to over 1% in both isolates, and is very
rare (0.05%) in the general European population.
A burden of rare loss of function (LoF) variants in APOC3 was
found to be associated with coronary heart disease and TG levels in the Exome Sequencing Project study across 110 000 individuals from cosmopolitan populations (9). Recently, exome
sequencing of 8500 European American and African American
individuals identified a rare LoF variant burden in APOC3, also
associated with TGs (10). Here, we use very low-depth whole-genome sequencing (WGS) data in a Greek isolated population to
describe an APOC3 cardioprotective signal missed by genomewide imputation and to provide empirical proof-of-principle of
how very low-depth sequencing can leverage the power advantages afforded by founder populations in catalysing these
discoveries.
A total of 990 individuals from the Hellenic 20 Isolated Cohorts Minoan Isolates (HELIC-MANOLIS) study were sequenced at 1x
depth and 249 at 4x depth using Illumina HiSeq (total 1239 samples). Following variant calling and imputation-based genotype
refinement, we identified 57 single nucleotide variants (SNVs) in
the APOC3 gene (Supplementary Material, Table S1). We performed single-point association analysis with TG levels
(n ¼ 1192), using a threshold of 1 108 to define genome-wide
significance. Two variants exceeded this threshold, the null mutation R19X (rs76353203, b ¼ 1.09,r ¼ 0.163, P ¼ 8.2 1011),
which is a C/T substitution in exon 2 that changes codon 19 into
a premature stop codon, and the splice donor variant
rs138326449 (b ¼ 1.17,r ¼ 0.188, P ¼ 1.14 109), located 1 base
pair downstream, which disrupts the donor splice site in intron
2. These two variants are in very low linkage disequilibrium (LD)
(r2 < 0.0001) (Fig. 1).
To confirm genotype calling and imputation accuracy, we
genotyped both R19X and rs138326449 in a subset of 1087 individuals using Sequenom massARRAY technology. In total,
98.9% of all genotypes were concordant for R19X and 99.1% for
rs138326449. Minor allele concordance reached 72.2 and 80%, respectively. The fraction of true positives among non-reference
calls, or positive predictive value (PPV) was high for both variants (96.3 and 100%), indicating that most mismatches were
caused by false negatives rather than overconfidence in calling
the alternate allele. We repeated the association analysis using
the directly genotyped samples (n ¼ 1087), and found both variants to remain significantly associated with TG levels (b ¼
1.19,r ¼ 0.165, P ¼ 3.24 1012 for R19X; b ¼ 1.10,r ¼ 0.190,
P ¼ 1.63 108 for rs138326449), further confirming the validity
of this signal.
For burden testing, we restricted our focus on the four potentially functional rare or low-frequency [minor allele frequency
(MAF) < 5%] variants that reside in exons or the essential splice
sites in the consensus splice variant of APOC3 (APOC3-001)
(Table 1). These included the two LoF variants R19X and
rs138326449. We additionally identified a single carrier of a
novel missense variant (11:116701489) also in codon 19 but in
exon 3 as the intron falls between the first and second bases of
the codon (Supplementary Material, Fig. S1). The resulting
amino acid substitution (R19L) is predicted (...truncated)