Gene-Wide Analysis Detects Two New Susceptibility Genes for Alzheimer's Disease
Gene-Wide Analysis Detects Two New Susceptibility Genes for Alzheimer's Disease
109 Department of Medicine (Geriatrics), University of Mississippi Medical Center, Jackson, Mississippi, United States of America, 110 Rush Alzheimers Disease Center,
Rush University Medical Center, Chicago, Illinois, United States of America, 111 Laboratory of Epidemiology, Demography, and Biometry, National Institute of Health,
Bethesda, Maryland, United States of America, 112 Aging Research Center, Department Neurobiology, Care Sciences and Society, Karolinska Institutet and Stockholm
University, Stockholm, Sweden, 113 Department Geriatric Medicine, Genetics Unit, Karolinska University Hospital Huddinge, Stockholm, Sweden, 114 Division of Clinical
Neurosciences, School of Medicine, University of Southampton, Southampton, United Kingdom, 115 Departments of Neurology and Epidemiology, Erasmus MC
University Medical Center, Rotterdam, the Netherlands, 116 Department of Pathology, University of Washington, Seattle, Washington, United States of America,
117 Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, California, United
States of America, 118 INSERM UMR_S975-CNRS UMR 7225, Universite Pierre et Marie Curie, Centre de recherche de lInstitut du Cerveau et de la Moe lle e pinie` re-CRICM,
Ho pital de la Salpetrie` re, Paris France, 119 AP-HP, H opital de la Pitie-Salpetrie` re, Paris, France, 120 Department of Epidemiology, University of Washington, Seattle,
Washington, United States of America, 121 Laboratory of Neurogenetics, Intramural Research Program, National Institute on Aging, Bethesda, Maryland, United States of
America, 122 Imperial College, London, United Kingdom, 123 Department of Biology, Brigham Young University, Provo, Utah, United States of America, 124 Human
Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America, 125 Human Genetics Center and Div. of Epidemiology, University of
Texas Health Sciences Center at Houston, Houston, Texas, United States of America, 126 Hospital Universitari Vall dHebron - Institut de Recerca, Universitat Auto` noma de
Barcelona. (VHIR-UAB), Barcelona, Spain, 127 Department of Neurology, Medical University Graz, Graz, Austria, 128 Centre de Memoire de Ressources et de Recherche de
Bordeaux, CHU de Bordeaux, Bordeaux, France, 129 Inserm U708, Victor Segalen University, Bordeaux, France, 130 Institute of Human Genetics, Department of Genomics,
Life and Brain Center, University of Bonn, and German Center for Neurodegenerative Diseases (DZNE, Bonn), Bonn, Germany, 131 Karolinska Institutet, Department of
Neurobiology, Care Sciences and Society, KIADRC, Stockholm, Sweden, 132 Group Health Research Institute, Group Health Cooperative, Seattle, Washington, United
States of America, 133 Vanderbilt Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America, 134 Department of
Epidemiology & Biostatistics, Case Western Reserve University, Cleveland, Ohio, United States of America, 135 McGill University and Ge nome Quebec Innovation Centre,
Montreal, Canada, 136 Department of Epidemiology, Boston University School of Public Health, Boston, Massachusetts, United States of America, 137 Department of
Neurology, Boston University School of Medicine, Boston, Massachusetts, United States of America, 138 Center for Medical Systems Biology, Leiden, The Netherlands,
139 Department of Psychiatry and Psychotherapy and Institute of Human Genetics, University of Bonn, Bonn, Germany, 140 The Framingham Heart Study, Framingham,
Massachusetts, United States of America, 141 Centre Hospitalier Re gional Universitaire de Lille, Lille, France
Background: Alzheimers disease is a common debilitating dementia with known heritability, for which 20 late onset
susceptibility loci have been identified, but more remain to be discovered. This study sought to identify new susceptibility
genes, using an alternative gene-wide analytical approach which tests for patterns of association within genes, in the
powerful genome-wide association dataset of the International Genomics of Alzheimers Project Consortium, comprising
over 7 m genotypes from 25,580 Alzheimers cases and 48,466 controls.
Principal Findings: In addition to earlier reported genes, we detected genome-wide significant loci on chromosomes 8
(TP53INP1, p = 1.461026) and 14 (IGHV1-67 p = 7.961028) which indexed novel susceptibility loci.
Significance: The additional genes identified in this study, have an array of functions previously implicated in Alzheimers
disease, including aspects of energy metabolism, protein degradation and the immune system and add further weight to
these pathways as potential therapeutic targets in Alzheimers disease.
Citation: Escott-Price V, Bellenguez C, Wang L-S, Choi S-H, Harold D, et al. (2014) Gene-Wide Analysis Detects Two New Susceptibility Genes for Alzheimers
Disease. PLoS ONE 9(6): e94661. doi:10.1371/journal.pone.0094661
Editor: Yong-Gang Yao, Kunming Institute of Zoology, Chinese Academy of Sciences, China
Received December 3, 2013; Accepted March 17, 2014; Published June 12, 2014
This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for
any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Funding: The i-Select chips was funded by the French National Foundation on Alzheimers disease and related disorders. The French National Fondation on
Alzheimers disease and related disorders supported several I-GAP meetings and communications. Data management involved the Centre National de
Ge notypage,and was supported by the Institut Pasteur de Lille, Inserm, FRC (fondation pour la recherche sur le cerveau) and Rotary. This work has been
developed and supported by the LABEX (laboratory of excellence program investment for the future) DISTALZ grant (Development of Innovative Strategies for a
Transdisciplinary approach to ALZheimers disease) and by the LABEX GENMED grant (Medical Genomics). The French National Foundation on Alzheimers disease
and related disorders and the Alzheimers Association (Chicago, Illinois) grant supported IGAP in-person meetings, communication and the Alzheimers
Association (Chicago, Illinois) grant provided some funds to each consortium for analyses. EADI The authors thank Dr. Anne Boland (CNG) for her technical help in
preparing the DNA samples for analyses. This work was supported by the National Foundation for Alzheimers disease and related disorders, the Institut Pasteur
de Lille and the Centre National de Ge notypage. The Three-City Study was performed as part of a collaboration between the Institut National de la Sante et de la
Recherche Medicale (Inserm), the Victor Segalen Bordeaux II University and Sanofi-Synthe labo. The Fondation pour la Recherche Me dicale funded the preparation
and initiation of the study. The 3C Study was also funded by the Caisse Nationale Maladie des Travailleurs Salaries, Direction Ge nerale de la Sante, MGEN, Institut
de la Longevite , Agence Francaise de Securite Sanitaire des Produits de Sante , the Aquitaine and Bourgogne Regional Councils, Agence Nationale de la
Recherche, ANR supported the COGINUT and COVADIS projects. Fondation de France and the joint French Ministry of Research/INSERM Cohortes et collections
de donnees biologiques programme. Lille Ge nopo le received an unconditional grant from Eisai. The Three-city biological bank was developed and maintained
by the laboratory for genomic analysis LAG-BRC - Institut Pasteur de Lille. Belgium sample collection: The patients were clinically and pathological characterized
by the neurologists Sebastiaan Engelborghs, Rik Vandenberghe and Peter P. De Deyn, and in part genetically by Caroline Van Cauwenberghe, Karolien Bettens
and Kristel Sleegers. Research at the Antwerp site is funded in part by the Belgian Science Policy Office Interuniversity Attraction Poles program, the Foundation
Alzheimer Research (SAO-FRA), the Flemish Government initiated Methusalem Excellence Program, the Research Foundation Flanders (FWO) and the University of
Antwerp Research Fund, Belgium. Karolien Bettens is a postdoctoral fellow of the FWO. The Antwerp site authors thank the personnel of the VIB Genetic Service
Facility, the Biobank of the Institute Born-Bunge and the Departments of Neurology and Memory Clinics at the Hospital Network Antwerp and the University
Hospitals Leuven. Finish sample collection: Financial support for this project was provided by the Health Research Council of the Academy of Finland, EVO grant
5772708 of Kuopio University Hospital, and the Nordic Centre of Excellence in Neurodegeneration. Italian sample collections: the Bologna site (FL) obtained funds
from the Italian Ministry of research and University as well as Carimonte Foundation. The Florence site was supported by grant RF-2010-2319722, grant from the
the Cassa di Risparmio di Pistoia e Pescia (Grant 2012) and the Cassa di Risparmio di Firenze (Grant 2012). The Milan site was supported by a grant from the
(funding via numerous sources including Stichting MS Research, Brain Net Europe, Hersenstichting Nederland Breinbrekend Werk, International Parkinson
Fonds, Internationale Stiching Alzheimer Onderzoek), Institut de Neuropatologia, Servei Anatomia Patologica, Universitat de Barcelona. Marcelle
MorrisonBogorad, PhD., Tony Phelps, PhD and Walter Kukull PhD are thanked for helping to co-ordinate this collection. ADNI Funding for ADNI is through the Northern
California Institute for Research and Education by grants from Abbott, AstraZeneca AB, Bayer Schering Pharma AG, Bristol-Myers Squibb, Eisai Global Clinical
Development, Elan Corporation, Genentech, GE Healthcare, Glaxo-SmithKline, Innogenetics, Johnson and Johnson, Eli Lilly and Co., Medpace, Inc., Merck and
Co., Inc., Novartis AG, Pfizer Inc, F. Hoffman-La Roche, Schering-Plough, Synarc, Inc., Alzheimers Association, Alzheimers Drug Discovery Foundation, the Dana
Foundation, and by the National Institute of Biomedical Imaging and Bioengineering and NIA grants U01 AG024904, RC2 AG036535, K01 AG030514. Data
collection and sharing for this project was funded by the ADNI (National Institutes of Health Grant U01 AG024904). ADNI is funded by the National Institute on
Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Alzheimers Association;
Alzheimers Drug Discovery Foundation; BioClinica, Inc.; Biogen Idec Inc.; Bristol-Myers Squibb Company; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and
Company; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; GE Healthcare; Innogenetics, N.V.; IXICO Ltd.; Janssen Alzheimer
Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co., Inc.; Meso Scale
Diagnostics, LLC.; NeuroRx Research; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Synarc Inc.; and Takeda Pharmaceutical
Company. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by
the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education,
and the study is coordinated by the Alzheimers Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the
Laboratory for Neuro Imaging at the University of California, Los Angeles. This research was also supported by NIH grants P30 AG010129 and K01 AG030514.
The authors thank Drs. D. Stephen Snyder and Marilyn Miller from NIA who are ex-o_cio ADGC members. Support was also from the Alzheimers Association
(LAF, IIRG-08-89720; MP-V, IIRG-05-14147) and the United States Department of Veterans Affairs Administration, Office of Research and Development,
Biomedical Laboratory Research Program. Peter St George-Hyslop is supported by Wellcome Trust, Howard Hughes Medical Institute, and the Canadian
Institute of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: Bruce M. Psaty serves on the DSMB for a clinical trial of a device funded by the manufacturer (Zoll LifeCor) and on the Steering
Committee of the Yale Open Data Access Project funded by Johnson & Johnson. Data used in preparation of this article were obtained from the Alzheimers
Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation
of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.
loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf. This does not alter the authors adherence to PLOS ONE policies on sharing
data and materials.
. These authors contributed equally to this work.
" Membership of the UK Brain Expression consortium is provided in Materials S1.
The prevalence of Alzheimers disease (AD) is increasing as
more people live into old age. Hope for finding preventative and
clinical therapies lies in the ability to gain a better understanding
of the underlying biology of the disease, and genetics will provide a
valuable starting point for advancement. Rare monogenic forms of
AD, the majority of which are attributable to mutations in one of
three genes, APP, PSEN1 and PSEN2, exist, but common,
lateonset AD is genetically complex with heritability estimated to be
between 5679%[1,2]. Along with the APOE polymorphism,
20 common susceptibility loci have been identified associated with
AD. (This figure does not include CD33 as it did not show
genome-wide significance in the original report.) Recently, a
moderately rare variant in TREM2 has also shown evidence for
association. However, new variants remain to be found. This
study sought to identify new susceptibility genes, using an
alternative gene-wide analytical approach, which focuses on the
pattern of association within gene regions.
Genome-wide association (GWA) studies to date have focused
on single nucleotide polymorphisms (SNPs) as the unit of analysis.
Single locus tests are the simplest to generate and to interpret, but
have limitations. For example, if susceptibility is conferred by
multiple variants within a locus[11,12], this gives rise to complex
patterns of association that might not be reflected by association to
the same SNPs in different samples, despite apparently reasonably
powered tests[13,14]. In addition, rare risk-increasing variants
may not be tagged by single SNPs, as is e.g. the case for CLU in
which significant enrichment of rare variants in patients was
observed independent of the single locus GWA signal. It is
therefore likely that the power to detect association might be
enhanced by exploiting information from multiple signals within
genes encompassed by gene-wide statistical approaches.
Disease risk may reflect the co-action of several loci but the
number of loci involved at the individual or the population levels
are unknown, as is the spectrum of allele frequencies and effect
sizes. The observations of multiple genome-wide significant or
suggestive linkage signals for disorders, that do not readily
replicate between studies but which are not randomly distributed
across the genome[17,18] is compatible with the existence of
multiple risk alleles of moderate effect that would implicate a locus
in disease risk, when analysed together. Thus the first aim of this
study is to test for gene-wide association with AD, using a powerful
mega-meta analysis of genome-wide datasets as part of the
International Genomics of Alzheimers Project (IGAP)
Consortium comprising four AD genetic consortia (see the full list of
consortia members in Materials S1): Genetic and Environmental
Risk in Alzheimers Disease (GERAD), European Alzheimers
Disease Initiative (EADI), Cohorts for Heart and Aging in
Genomic Epidemiology (CHARGE) and Alzheimers Disease
Genetics Consortium (ADGC) (see full IGAP datasets description
in Materials S2). A two stage study was undertaken. In Stage 1 the
combined sample included 17,008 AD cases and 37,154 controls.
In Stage 2 loci with p-values (combined over all SNPs at the locus)
less than 1024 were selected for replication for 8,572 AD cases and
11,312 controls of European ancestry. We observed evidence for
gene-wide association at loci which implicate genes which already
show genome-wide significant association from single SNP analysis
(CR1, BIN1, HLA-DRB5/HLA-DRB1, CD2AP, EPHA1, PTK2B,
CLU, MS4A6A, PICALM, SORL1, SLC24A4, ABCA7, APOE), three
new genes in the vicinity of lately reported single SNP hits
(ZNF3, NDUFS3, MTCH2) and two novel loci (TP53INP1,
combined p = 1.461026 and IGHV1-67 combined p = 7.961028).
Initially, we tested for excess genetic signal revealed by the Stage
1 IGAP SNP GWAS study. We observed more SNPs at all
significance intervals, and more genes at multiple significance
thresholds, than expected by chance (Table S1). This is unlikely to
be due to uncorrected stratification, since each of the individual
GWAS samples in the IGAP Stage 1 analysis was corrected for
Over-representation p-values were calculated with chi-square/Fishers exact tests counting the genes within 0.5 Mb as one locus.
ethnic variation. Thus it is likely that the sample contains novel
genetic signals, in addition to those detected by the primary
Next, we looked at overrepresentation of significant genes in the
Stage 1 data. Table 1 gives the observed and expected numbers of
significant genes at significance levels 1024, 1025, 1026 when all
genes are counted in the analyses and when the known genes
(Table S1) and genes within 500kb of them are excluded, the
observed numbers of genes are much larger than expected at all
significance levels (all p#0.001). Thus there are more loci
associated with AD to find.
Furthermore, the number of independent nominally significant
loci at Stage 2 (N = 60, (13.5%)) was significantly greater than
expected by chance (p = 4.6610212). The percentage of replicated
loci increased with the decrease of the gene-wise significance
threshold at Stage 1 (see Table 2 for details).
Combining the gene-wide p-values in both stages 1 and 2, using
Fishers method revealed two new gene-based genome-wide
significant (p,2.561026) loci TP53INP1 and IGHV1-67. The
TP53INP1 gene is located on chromosome 8:95,938,200
95,961,615 and its combined gene-based p-value = 1.461026
(Table 3). Table S3 provides details for each SNP contributing
to the gene-based result. Out of 45 SNPs in the gene, three SNPs
(rs4735333, rs1713669, rs896855) have p-value#1024. Figure 1
shows the LD plot of this gene and suggests that there are at least
two partially independent signals in the TP53INP1 gene (r2
between the pairs of most significant SNPs rs4735333-rs1713669
and rs1713669- rs896855 are 0.65 and 0.6 respectively).
The IGHV1-67 gene on chromosome 14:107,136,620
107,137,059 has combined p-value = 7.961028 (Tables 3). This
gene is covered by two SNPs (rs2011167, rs1961901), both are
significant at 1024 level. LD plot in Figure 2 and Table S4 indicate
that the two most significant SNPs in IGHV1-67 gene represent
almost the same signal (r2 = 0.92, calculated with SNAP
software, 1000 genomes Pilot 1 dataset, CEU population
To look at the gene expression patterns in these novel genes, we
used the Webster-Myers expression dataset, available at
Comparing 137 AD vs 176 controls with temporal or frontal
cortex expression values by t-test, t showed significantly higher
TP53INP1 expression in cases compared to controls (p = 0.0128).
Further examination in the BRAINEAC database (www.
braineac.org) from the UK Brain Expression Consortium showed
TP53INP1 to have a best cis-eQTL p-value of 6.861026 (for
rs4582532 SNP, which is about 7.6 kb upstream of the gene). The
three SNPs with association p#1024 mentioned above
(rs4735333, rs1713669, rs896855) had significant cis-eQTL
pvalues of 8.261026, 7.861025 and 1.161025 respectively in
BRAINEAC brain expression data. The r2 between the cis-eQTL
and the three associated SNPs were 0.80, 0.65, and 0.81,
respectively). Further analysis of additional independent brain
expression and methylation datasets (see Methods S1) indicated
significant cis eQTLs and meQTLs for TP53INP1 (Tables S10 and
S11). The probe for the meQTL is in a CpG island region that
corresponds well with ENCODE DNAse/ChIP-seq/Histone
marks and is located upstream (,1.5 kb) of the TP53INP1
Table 2. Overrepresentation of significant loci, excluding regions of 0.5 Mb around previously reported and Stage 1 IGAP
genes[9,19] containing genome-wide significant SNPs.
Numbers of loci (genes)
The observed number of genes is calculated by combining significant loci within 0.5 Mb into one signal. The APOE region is excluded (CHR19; 44,411,940
46,411,945bp). The total number of genes after exclusions is 24,849.
Figure 1. Linkage disequilibrium structure of TP53INP1 gene. The SNPs which are significant at 1024 level are circled in red.
transcription start site. In combination these results suggest a
possible epigenetic mechanism whereby the associated variants in
the region influence TP53INP1 expression in several brain regions.
These expression data provide further evidence supporting the
functional relevance of TP53INP1 to AD susceptibility. The
IGHV1-67 gene was not found in those databases.
In addition we detected two genome-wide significant loci 1)
ZNF3 (chr7: 99,661,65399,679,371; p = 8.661027) and 2) two
closely located genes on chromosome 11 MTCH2 (47,638,858
47,664,206, combined p = 2.561026) and NDUFS3 (47,600,632
47,606,114, combined p = 4.861027) (Table 4). None of these
genes harbour genome-wide significant SNPs in the SNP GWAS
analysis on its own (see Tables S5-S7). Figures S1-S3 show LD
plots of these additional genes.
ZNF3 and NDUFS3, MTCH2 genes on chromosomes 7 and 11,
respectively, lie close to rs1476679 (chr7:100,004,446; ZCWPW1)
and rs1083872 (chr11:47,557,871; CELF1) SNPs, which are shown
to be genome-wide significant in the IGAP study, when combining
Stage 1 and Stage 2 data. Figures S1-S3 show LD structure of
these genes in relation to the IGAP singe genome-wide significant
hits. (Note that the NDUFS3 gene on chromosome 11 was
genebased genome-wide significant already at Stage 1.) Although none
of these SNPs actually lie within the genes mentioned above, it is
possible that they may account for the gene-based signals through
linkage disequilibrium. In order to test whether the gene-based
signals are independent of these strongly-associated SNPs, we
performed single-SNP association for each SNP annotated to these
genes by regression, adjusting for the significant SNPs mentioned
above, along with the other study covariates. The resulting
pvalues were combined into gene-based tests, as described
previously. Under this conditional analysis ZNF3 gene does not
show significant association, however NDUFS3 still shows a trend
towards significance (p = 0.081) (see Table S8 for details).
Furthermore, five genes in chr11:47,593,74947,615,961
(KBTBD4, NDUFS3, LOC100287127, FAM180B, C1QTNF4) all
have p,0.05 with gene-based analysis 610 kb, when conditioning
by the genome-wide significant hit rs10838725 in this region. This
may partially be explained by the SNP rs10838731 (p = 1.261023
after conditioning by rs10838725) which is shared by all latter five
Gene-based analysis with 610 kb around genes did not reveal
additional genome-wide significant loci in the Stage 1 data set.
Moreover, the significance of the genes identified above did not
improve in general, indicating that adding 10 kb flanking regions
to genes introduces more noise to the gene-based signal. The
combined Stage 1 and Stage 2 gene-based analysis provided
further evidence for significant signals in the loci on chr 11 with 8
genes (SPI1, SLC39A13, LOC100287086, PTPMT1, KBTBD4,
NDUFS3, LOC100287127, FAM180B) and on chr 7 with 6 genes
(LOC100128334, MCM7, PILRB, PILRA, LOC100289298,
C7orf51), all reaching genome-wide significance. This is likely to
be due to the fact that including genes flanking regions captures a
greater number of the same SNPs or SNPs in high LD showing
The Manhattan plot of the gene-based p-values (Figure 3) gives
a general overview of the gene-based results and shows the new
loci in relation to previously reported genes (see also QQ-plots in
Figure S4). The results of gene-wide analysis for the genes, which
were previously reported as associated with AD[4-8] and those
which are GWAS significant in the Stage 1 analysis are presented
in Table S9. Out of 16 reported susceptibility genes, 15 are
nominally significant with gene-wide analysis (almost all p-values
are smaller than 1024), however not all of them reach the
genebased genome-wide significance level (2.561026) when the
number of SNPs per gene and LD structure of the gene is taken
We did not observe genome-wide significance for CD33 gene.
This gene was genome-wide significant in Stage 1 (p = 1.961026),
but the association was attenuated when combining Stage 1 and
Stage 2 data (p = 1.7961025), similar to the single SNP association
result in the SNP GWAS study[9,19].
In this study we show that there are more signals in the GWAS
imputed data at SNP- and gene-based levels than revealed by
single SNP analysis. A gene-based analysis is a next logical step
after the single SNP analyses in any attempt to combine possible
several signals in genes and thus enhance the power of the
The first new gene TP53INP1 (chromosome 8) encodes a
protein that is involved in mediating autophagy-dependent cell
death via apoptosis through altering the phosphorylation state of
p53 and in modulating cell-extracellular matrix adhesion and
cell migration. TP53INP1 encodes a pro-apoptotic tumor
suppressor and its antisense oligonucleotide has been used as
potential treatment for castration-resistant prostate cancer.
This association is notable, given the potential inverse association
between cancer and AD that has previously been reported [26,27].
The second new gene IGHV1-67 (chromosome 14) is a
pseudogene in the immunoglobulin (IgG) variable heavy chain
region of chromosome 14: its function is unknown but all genes in
this region are most likely to be involved in IgG heavy chain VDJ
recombinations that lead to the full repertoire of antigen-detecting
immune cell clones.
The gene-based analysis in this study has shown its utility to
enhance the information provided by single SNP analysis (i.e.
NDUFS3 gene was genome-wide significant from Stage 1 using
gene-based analysis whereas this gene was only genome-wide
significant after combining the two stages of single SNP analysis).
ZNF3 is a zinc-finger protein at the same locus on chromosome
7 as ZCWPW1 thus rendering it a candidate as the gene that
contains the functional signal in this region. Although we can not
identify which gene actually confers the risk to AD, it is interesting
that ZNF3 function is unknown though it interacts with BAG3
which is involved in ubiquitin/proteasomal functions in protein
degradation and ZNF3 is regulated by upstream binding of
BACH1 whose target genes have roles in the oxidative stress
response and control of the cell cycle.
In the cluster of genes on chromosome 11, MTCH2 encodes one
of the large family of inner mitochondrial membrane
transporters which is associated with mitochondrially-mediated cell
death, adipocyte differentiation, insulin sensitivity and
has a genetic association with increased BMI. NDUFS3 also
has functions in the mitochondria as it encodes an iron-sulphur
component of complex 1 (mitochondrial NADH:ubiquinone
oxidoreductase) of the electron transport chain. A deficiency
causes a form of Leigh syndrome an early-onset progressive
neurodegenerative disorder with a characteristic neuropathology
consisting of focal lesions including areas of demyelination and
In summary, we report two novel genes TP53INP1 (chr8:
95,938,20095,961,615; combined p = 1.461026) and IGHV1-67
(chr14: 107,136,620107,137,059; combined p = 7.961028),
which were not reported as genome-wide significant before. We
also report ZNF3 gene on chromosome 7 and a cluster of genes on
chromosome 11 (SPI1-MTCH2), showing gene-based
genomewide significant association with Alzheimers disease. These genes
are in proximity with, but not the same as, those detected by
genome-wide significant SNPs, demonstrating support for the
Figure 2. Linkage disequilibrium structure of IGHV1-67 gene 5 kb. The SNPs which are significant at 1024 level are circled in red.
signals identified by IGAP[9,19]. They have an array of functions
previously implicated in AD including aspects of energy
metabolism, protein degradation and the immune system and add
further weight to these pathways as potential therapeutic targets in
Materials and Methods
Stage 1 data
The main dataset was reported by the IGAP consortium[9,19]
and consists in total of 17,008 cases and 37,154 controls. This
sample of AD cases and controls comprises 4 data sets taken from
genome-wide association studies performed by GERAD, EADI,
CHARGE and ADGC (see primary IGAP manuscript[9,19] for
more details). The full details of the samples and methods for
conduct of the GWA studies are provided in the respective
Each of these datasets was imputed with Impute2 or
MACH software using the 1000 genomes data (release
Dec2010) as a reference panel. In total 11,863,202 SNPs were
included in the SNPs allelic association result file. To make our
analysis as conservative as possible, we only included autosomal
SNPs which passed stringent quality control criteria, i.e. we
included only SNPs with minor allele frequencies (MAF) $0.01
and imputation quality score greater than or equal to 0.3 in each
individual study, resulting in 7,055,881 SNPs which are present in
at least 40% of the AD cases and 40% of the controls in the
analysis. The summary statistics across datasets were combined
using fixed-effects inverse variance-weighted meta-analysis. We
corrected all individual SNPs p-values for genomic control (GC)
l = 1.087. These SNPs are well imputed on a large proportion of
the sample, which increases confidence in the accuracy of the
association analysis upon which gene-wide analysis is based.
Stage 2 data
11,632 SNPs with p-values ,1023 in the IGAP meta-analysis
were successfully genotyped in a Stage 2 sample comprising 8,572
cases and 11,312 controls (see primary IGAP manuscript[9,19]
for more details). An additional 771 SNPs were successfully
genotyped to test all genes with gene-wide p-values ,10-4 in the
IGAP Stage 1 analysis, excluding genes reported prior to
IGAP, the four loci reaching genome-wide significance in
the Stage 1 IGAP meta-analysis[9,19] and the 0.5Mb regions
around them (Table S2). These SNPs cover 887 genes and
correspond to 444 independent loci where all genes within
0.5 Mb are counted as one locus.
Assignment of SNPs to genes
SNPs were assigned to genes if they were located within the
genomic sequence lying between the start of the first and the end
of the last exon of any transcript corresponding to that gene. The
chromosome and location for all currently known human SNPs
were taken from the dbSNP132 database, as was their assignment
to genes (using build 37.1). In total, we retained 2,804,431 (39.7%
of the total) SNPs which annotated 28,636 unique genes with 1
16,514 SNPs per gene. For the gene-wide analysis we have
excluded genes which contain only one SNP in the IGAP Stage 1
analysis, leaving a total of 25,310 genes. If a SNP belongs to more
than one gene, it was assigned to each of these genes. In order to
account for possible signals which are correlated with those in a
gene, gene-wide analysis was also performed using a 10 kb window
around genes to assign SNPs to genes.
The gene-wide analysis was performed based on the summary
p-values while controlling for LD and different number of markers
per gene using an approximate statistical approach adopted
for set-based analysis of genetic data. This is a method for
calculating the significance of a set of SNPs in the absence of
individual genotype data based on a theoretical approximation to
Fishers statistic for combining p-values. Fishers statistic (-gln(pi))
combines probabilities and under the null hypothesis has a
chisquare distribution with 2N degrees of freedom, where N is the
number of markers, and the summation above is for i = 1,,N). If
Fishers statistic combines the results of several tests when the tests
are independent, the approximate method combines
non-independent tests and requires only the list of p-values for each SNP
and knowledge of correlations between SNPs. Then the value of
Fishers statistic and the number of degrees of freedom is corrected
by the coefficient which depends upon the number of SNPs and
correlations (LD) between them. This approximation was applied
to the Stage 1 and Stage 2 samples separately, and the resulting
gene-wide p-values combined using Fishers method (since these
are independent). LD between markers was computed using 1000
genomes data. The gene-based genome-wide significant level was
set to 2.561026 to account for the number of tested genes.
Test for excess of associated SNPs/loci
The effective number N of independent SNPs in the whole
genome (excluding genes with SNPs that are genome-wide
significant in the Stage 1 IGAP dataset 6 0.5 Mb was estimated
by the method described in  taking LD into account, as were
the observed number of independent SNPs significant at each
pvalue criterion (adjusting individual SNP p-values for genomic
control l = 1.087 before hand). LD was computed from the 1000
Genomes database (http://www.1000genomes.org/). In the
absence of excess association, the expected number of independent
SNPs significant at significance level a is a normally distributed
random variable whose mean and standard deviation (SD) can be
calculated as aN and !Na(1-a) (mean and SD for a binomial
distribution). The number of independent SNPs (and thus
statistical tests) in the whole genome were estimated as
,3.76106, ,3.66106 and ,3.56106 at significance levels below
0.1, between 0.05 and 0.1, and 0.2 and above respectively (see 
for details on the dependence between the significance levels and
the estimated number of independent tests). We then calculated
mean of the expected number of significant SNPs in intervals a1 ,
p # a2, (a1, a2 = 0, 1026, 1025, , 0.5) as difference between the
expected numbers of independent SNPs at a2 and a1 significance
levels and SD as the square root of sum of the corresponding
We calculated the significance of the excess number of genes
attaining the specified thresholds based upon the assumption that,
under the null hypothesis of no association, the number of
significant genes at a significance level of a in a scan is distributed
as a binomial (N,a), where N is the total number of genes, assuming
that genes are independent. Genes within 0.5 Mb of each other
are counted as one signal when calculating the observed number
of significant genes. This prevents significance being inflated by LD
between genes, where a single association signal gives rise to
several significantly-associated genes. The total number of genes
was not corrected for LD in this way, making the estimate of
significance of the excess number of genes conservative.
Figure 3. Manhattan plot of gene-wide p-values in the Stage 1 dataset and combined gene-wide p-values where Stage 2 data are
available. Each dot represents a gene, genes in blue lie within the previously reported associated regions.
Table S2 List of genes that are genome-wide significant
in the IGAP stage 1 dataset and the flanking regions
which included SNPs either in r2$0.3 or association
pvalue#10-3 whichever covers the largest region.
Table S8 Gene-based analysis results, when single
SNPs p-values, contributing to the gene-based p-value
were adjusted for the best genome-wide significant SNP
in the nearby location.
Figure S1 ZNF3 gene with rs1476679 (ZCWPW1)
reported by Lambert et al (2013) study. SNPs which are significant
at 1e-3 level are circled in red, rs1476679 is highlighted in blue.
Figure S2 NDUFS3 gene rs10838725 (CELF1) reported by
Lambert et al (2013) study. SNPs which are significant at 1e-3
level are circled in red, rs10838725 is highlighted in blue.
Figure S4 QQ-plot of gene-wide p-values for all genes
(A) and excluding previously reported[4-8] GWAS
significantly associated genes 0.5Mb (B) in the discovery
dataset. Genomic control l = 1.08 and 1.07 respectively.
Methods S1 Expression quantitative trait loci (eQTL) and
Methylation quantitative trait loci (meQTL) analyses.
This work was made possible by the generous participation of the control
subjects, the patients and their families. Complete acknowledgments are
detailed in the Materials S3.
Conceived and designed the experiments: VEP D. Harold P. Holmans S.
Seshadri GDS PA JW. Analyzed the data: VEP JCL C. Bellengues LSW
SHC D. Harold P. Holmans A. Richards AJ AV GR MV VC. Contributed
reagents/materials/analysis tools: VEP C. Bellengues LSW SHC LJ P.
Holmans D. Harold AG AV A. Richards ALdS JCL CAIV ACN RS GJ
JCB GWB BGB GR TATW ND AVS VC C. Thomas MAI DZ BNV YK
CFL HS BK MLD MV A. Ruiz MTB C. Reitz F. Panza P. Hollingworth
OH ALF JDB D. Campion PKC C. Baldwin TB VG CC D. Craig NA C.
Berr) OLL PLdJ VD JAJ DE S. Love LL IH DCR GE KS AMG NF VS
AFG MJH MG K. Brown MIK LK PBG BMcG EBL AFG AJM CD ST
DW S. Lovestone ER JG PStGH JC AL A. Bayer DWT LY MT P. Bosco
GS P. Proitsi JC S. Sorbi FSG NCF JH MCDN P. Boss u` RC C. Brayne
DG ES UB M. Mancuso GS S. Moebus PM MdZ WM HH AP M. Boada
F. Pasquier PC BN WP M. Mayhaus LL HH SP MMC MI D. Beekly VA
FZ OV SGY EC KLHN WG C. Razquin P. Pastor IM MJO KMF PVJ
OC MCOD LBC HS D. Blacker S. Mead THM DAB TBH LF CH
RFAGbB P. P Passmore TJM K. Bettens JIR A. Brice KM TMF WAK D.
Hannequin JFP MAN KR KLL JSKL EB MR MH ERM RS DR JFD
RM C. Tzourio AH MMN CG BMP JLH ML MAPV LJL CvB LAF
CMvD A. Ramirez UKBEC S. Seshadri GDS PA JW. Wrote the paper:
VEP C. Bellengues LSW SHC D. Harold LJ P. Holmans AJ LAF S.
Seshadri GDS PA JW.
1. Gatz M , Reynolds CA , Fratiglioni L , Johansson B , Mortimer JA , et al. ( 2006 ) Role of genes and environments for explaining Alzheimer disease . Archives of General Psychiatry 63 : 168 - 174 .
2. Bettens K , Sleegers K , Van Broeckhoven C ( 2013 ) Genetic insights in Alzheimer's disease . Lancet neurology 12 : 92 - 104 .
3. Corder EH , Saunders AM , Strittmatter WJ , Schmechel DE , Gaskell PC , et al. ( 1993 ) Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families . Science 261 : 921 - 923 .
4. Harold D , Abraham R , Hollingworth P , Sims R , Gerrish A , et al. ( 2009 ) Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer's disease . Nature genetics 41 : 1088 - 1093 .
5. Hollingworth P , Harold D , Sims R , Gerrish A , Lambert JC , et al. ( 2011 ) Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer's disease . Nature Genetics 43 : 429 - 435 .
6. Lambert JC , Heath S , Even G , Campion D , Sleegers K , et al. ( 2009 ) Genomewide association study identifies variants at CLU and CR1 associated with Alzheimer's disease . Nature Genetics 41 : 1094 - U1068 .
7. Naj AC , Jun G , Beecham GW , Wang LS , Vardarajan BN , et al. ( 2011 ) Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer's disease . Nature Genetics 43 : 436 - 441 .
8. Seshadri S , Fitzpatrick AL , Ikram MA , DeStefano AL , Gudnason V , et al. ( 2010 ) Genome-wide analysis of genetic loci associated with Alzheimer disease . JAMA: the journal of the American Medical Association 303 : 1832 - 1840 .
9. Lambert JC , Ibrahim-Verbaas CA , Harold D , Naj AC , Sims R , et al. ( 2013 ) Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease . Nat Genet 45 : 1452 - 1458 .
10. Guerreiro RJ , Hardy J ( 2011 ) Alzheimer's disease genetics: lessons to improve disease modelling . Biochemical Society transactions 39 : 910 - 916 .
11. Ioannidis JP ( 2007 ) Non-replication and inconsistency in the genome-wide association setting . Human heredity 64 : 203 - 213 .
12. Neale BM , Sham PC ( 2004 ) The future of association studies: gene-based analysis and replication . American journal of human genetics 75 : 353 - 362 .
13. Moskvina V , O'Donovan MC ( 2007 ) Detailed analysis of the relative power of direct and indirect association studies and the implications for their interpretation . Human heredity 64 : 63 - 73 .
14. Terwilliger JD , Hiekkalinna T ( 2006 ) An utter refutation of the "fundamental theorem of the HapMap" . European journal of human genetics: EJHG 14 : 426 - 437 .
15. Bettens K , Brouwers N , Engelborghs S , Lambert JC , Rogaeva E , et al. ( 2012 ) Both common variations and rare non-synonymous substitutions and small insertion/deletions in CLU are associated with increased Alzheimer risk . Molecular neurodegeneration 7: 3.
16. Risch N ( 1990 ) Linkage strategies for genetically complex traits . I. Multilocus models . American journal of human genetics 46 : 222 - 228 .
17. Lewis CM , Levinson DF , Wise LH , DeLisi LE , Straub RE , et al. ( 2003 ) Genome scan meta-analysis of schizophrenia and bipolar disorder, part II: Schizophrenia . American journal of human genetics 73 : 34 - 48 .
18. Segurado R , Detera-Wadleigh SD , Levinson DF , Lewis CM , Gill M , et al. ( 2003 ) Genome scan meta-analysis of schizophrenia and bipolar disorder, part III: Bipolar disorder . American journal of human genetics 73 : 49 - 62 .
19. Lambert JCea ( 2013 ) Extended meta-analysis of 74,538 individuals identifies 11 new susceptibility loci for Alzheimer's disease .
20. Johnson AD , Handsaker RE , Pulit SL , Nizzari MM , O'Donnell CJ , et al. ( 2008 ) SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap . Bioinformatics 24 : 2938 - 2939 .
21. Webster JA , Gibbs JR , Clarke J , Ray M , Zhang WX , et al. ( 2009 ) Genetic Control of Human Brain Transcript Expression in Alzheimer Disease . American Journal of Human Genetics 84 : 445 - 458 .
22. Trabzuni D , Ryten M , Walker R , Smith C , Imran S , et al. ( 2011 ) Quality control parameters on a large dataset of regionally dissected human control brains for whole genome expression studies . J Neurochem 119 : 275 - 282 .
23. Seux M , Peuget S , Montero MP , Siret C , Rigot V , et al. ( 2011 ) TP53INP1 decreases pancreatic cancer cell migration by regulating SPARC expression . Oncogene 30 : 3049 - 3061 .
24. Seillier M , Peuget S , Gayet O , Gauthier C , N' Guessan P , et al. ( 2012 ) TP53INP1, a tumor suppressor, interacts with LC3 and ATG8-family proteins through the LC3-interacting region (LIR) and promotes autophagy-dependent cell death . Cell death and differentiation 19 : 1525 - 1535 .
25. Giusiano S , Baylot V , Andrieu C , Fazli L , Gleave M , et al. ( 2012 ) TP53INP1 as new therapeutic target in castration-resistant prostate cancer . Prostate 72 : 1286 - 1294 .
26. Driver JA , Beiser A , Au R , Kreger BE , Splansky GL , et al. ( 2012 ) Inverse association between cancer and Alzheimer's disease: results from the Framingham Heart Study . British Medical Journal 344.
27. Roe CM , Fitzpatrick AL , Xiong C , Sieh W , Kuller L , et al. ( 2010 ) Cancer linked to Alzheimer disease but not vascular dementia . Neurology 74 : 106 - 112 .
28. Watson CT , Breden F ( 2012 ) The immunoglobulin heavy chain locus: genetic variation, missing data, and implications for human disease . Genes and immunity 13 : 363 - 373 .
29. Chen Y , Yang LN , Cheng L , Tu S , Guo SJ , et al. ( 2013 ) BAG3 Interactome Analysis Reveals a New Role in Modulating Proteasome Activity . Molecular & cellular proteomics: MCP.
30. Warnatz HJ , Schmidt D , Manke T , Piccini I , Sultan M , et al. ( 2011 ) The BTB and CNC homology 1 (BACH1) target genes are involved in the oxidative stress response and in control of the cell cycle . The Journal of biological chemistry 286 : 23521 - 23532 .
31. Palmieri F ( 2013 ) The mitochondrial transporter family SLC25: identification, properties and physiopathology . Molecular aspects of medicine 34 : 465 - 484 .
32. Katz C , Zaltsman-Amir Y , Mostizky Y , Kollet N , Gross A , et al. ( 2012 ) Molecular basis of the interaction between proapoptotic truncated BID (tBID) protein and mitochondrial carrier homologue 2 (MTCH2) protein: key players in mitochondrial death pathway . The Journal of biological chemistry 287 : 15016 - 15023 .
33. Bernhard F , Landgraf K , Kloting N , Berthold A , Buttner P , et al. ( 2013 ) Functional relevance of genes implicated by obesity genome-wide association study signals for human adipocyte biology . Diabetologia 56 : 311 - 322 .
34. Fall T , Arnlov J , Berne C , Ingelsson E ( 2012 ) The role of obesity-related genetic loci in insulin sensitivity . Diabetic medicine: a journal of the British Diabetic Association 29 : e62 - 66 .
35. Haupt A , Thamer C , Heni M , Machicao F , Machann J , et al. ( 2010 ) Novel obesity risk loci do not determine distribution of body fat depots: a whole-body MRI/MRS study . Obesity 18 : 1212 - 1217 .
36. Benit P , Slama A , Cartault F , Giurgea I , Chretien D , et al. ( 2004 ) Mutant NDUFS3 subunit of mitochondrial complex I causes Leigh syndrome . Journal of medical genetics 41 : 14 - 17 .
37. Dahl HH ( 1998 ) Getting to the nucleus of mitochondrial disorders: identification of respiratory chain-enzyme genes causing Leigh syndrome . American journal of human genetics 63 : 1594 - 1597 .
38. Howie BN , Donnelly P , Marchini J ( 2009 ) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies . PLoS genetics 5: e1000529.
39. Li Y , Willer CJ , Ding J , Scheet P , Abecasis GR ( 2010 ) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes . Genetic epidemiology 34 : 816 - 834 .
40. Brown MB ( 1975 ) A method for combining non-independent, one-sided tests of significance . Biometrics 31 : 978 - 992 .
41. Moskvina V , O'Dushlaine C , Purcell S , Craddock N , Holmans P , et al. ( 2011 ) Evaluation of an approximation method for assessment of overall significance of multiple-dependent tests in a genomewide association study . Genetic epidemiology 35 : 861 - 866 .
42. Kiezun A , Garimella K , Do R , Stitziel NO , Neale BM , et al. ( 2012 ) Exome sequencing and the genetic basis of complex traits . Nature genetics 44 : 623 - 630 .
43. Moskvina V , Schmidt KM ( 2008 ) On multiple-testing correction in genomewide association studies . Genetic epidemiology 32 : 567 - 573 .