Weighted Burden Analysis of Exome-Sequenced Case-Control Sample Implicates Synaptic Genes in Schizophrenia Aetiology

Behavior Genetics, Mar 2018

A previous study of exome-sequenced schizophrenia cases and controls reported an excess of singleton, gene-disruptive variants among cases, concentrated in particular gene sets. The dataset included a number of subjects with a substantial Finnish contribution to ancestry. We have reanalysed the same dataset after removal of these subjects and we have also included non-singleton variants of all types using a weighted burden test which assigns higher weights to variants predicted to have a greater effect on protein function. We investigated the same 31 gene sets as previously and also 1454 GO gene sets. The reduced dataset consisted of 4225 cases and 5834 controls. No individual variants or genes were significantly enriched in cases but 13 out of the 31 gene sets were significant after Bonferroni correction and the “FMRP targets” set produced a signed log p value (SLP) of 7.1. The gene within this set with the highest SLP, equal to 3.4, was FYN, which codes for a tyrosine kinase which phosphorylates glutamate metabotropic receptors and ionotropic NMDA receptors, thus modulating their trafficking, subcellular distribution and function. In the most recent GWAS of schizophrenia it was identified as a “prioritized candidate gene”. Two of the subunits of the NMDA receptor which are substrates of FYN are coded for by GRIN1 (SLP = 1.7) and GRIN2B (SLP = 2.1). Of note, for some sets there was a substantial enrichment of non-singleton variants. Of 1454 GO gene sets, three were significant after Bonferroni correction. Identifying specific genes and variants will depend on genotyping them in larger samples and/or demonstrating that they cosegregate with illness within pedigrees.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:


Weighted Burden Analysis of Exome-Sequenced Case-Control Sample Implicates Synaptic Genes in Schizophrenia Aetiology

Weighted Burden Analysis of Exome-Sequenced Case-Control Sample Implicates Synaptic Genes in Schizophrenia Aetiology David Curtis 0 1 2 Leda Coelewij 0 1 2 Shou‑Hwa Liu 0 1 2 Jack Humphrey 0 1 2 Richard Mott 0 1 2 0 Department of Neurodegenerative Disease, UCL Institute of Neurology, University College London , London , UK 1 Centre for Psychiatry, Barts and the London School of Medicine and Dentistry , London , UK 2 UCL Genetics Institute, University College London , Darwin Building, Gower Street, London WC1E 6BT , UK 3 David Curtis A previous study of exome-sequenced schizophrenia cases and controls reported an excess of singleton, gene-disruptive variants among cases, concentrated in particular gene sets. The dataset included a number of subjects with a substantial Finnish contribution to ancestry. We have reanalysed the same dataset after removal of these subjects and we have also included nonsingleton variants of all types using a weighted burden test which assigns higher weights to variants predicted to have a greater effect on protein function. We investigated the same 31 gene sets as previously and also 1454 GO gene sets. The reduced dataset consisted of 4225 cases and 5834 controls. No individual variants or genes were significantly enriched in cases but 13 out of the 31 gene sets were significant after Bonferroni correction and the “FMRP targets” set produced a signed log p value (SLP) of 7.1. The gene within this set with the highest SLP, equal to 3.4, was FYN, which codes for a tyrosine kinase which phosphorylates glutamate metabotropic receptors and ionotropic NMDA receptors, thus modulating their trafficking, subcellular distribution and function. In the most recent GWAS of schizophrenia it was identified as a “prioritized candidate gene”. Two of the subunits of the NMDA receptor which are substrates of FYN are coded for by GRIN1 (SLP = 1.7) and GRIN2B (SLP = 2.1). Of note, for some sets there was a substantial enrichment of non-singleton variants. Of 1454 GO gene sets, three were significant after Bonferroni correction. Identifying specific genes and variants will depend on genotyping them in larger samples and/or demonstrating that they cosegregate with illness within pedigrees. Edited by Stacey Cherny. Schizophrenia; Exome; Gene; Weighted burden test; FYN; FMRP target Introduction Schizophrenia is a severe and disabling mental illness with onset typically in early adult life. It is associated with low fecundity but nevertheless remains fairly common with a lifetime prevalence of around 1% (Power et al. 2013) . A variety of types of genetic variation contribute to risk. Many common variants demonstrate association with small effect sizes whereas extremely rare variants can have very large effect sizes. 108 SNPs have been reported to be genomewide significant with odds ratio (OR) of 1.1–1.2 and it is likely that many other variants will achieve statistical significance when larger samples are genotyped (Schizophrenia Working Group of the Psychiatric Genomics Consortium 2014) . Weak effects from common variants may arise from a number of mechanisms. The variant itself may exert a direct effect at some point in the pathogenic process, it may pick up a more indirect effect through involvement in gene regulatory networks or it may be in linkage disequilibrium with other variants have a larger, direct effect (Boyle et al. 2017) . A recent example of the last case is provided by SNPs in the HLA region which tag variant haplotypes of C4, the gene for complement component four, the different haplotypes producing different levels of C4A expression associated with OR for schizophrenia risk of 1.3 (Sekar et al. 2016). Variants associated with small effect sizes will be subject to relatively little selection pressure and hence can remain common. By contrast, extremely rare variants such as some copy number variants (CNVs) or loss of function (LOF) variants of SETD1A may lead to a very high risk of developing schizophrenia (Deciphering Developmental Disorders Study 2017; Rees et al. 2014; Singh et al. 2016) . A proportion of cases of schizophrenia seem to be due to such variants with large effect size arising as de novo mutations (DNMs) (Fromer et al. 2014; Singh et al. 2017) . Such variants are likely to be subject to strong selection pressure and may only persist for a small number of generations. Theoretically, variants acting recessively might persist in the population and still have reasonably large effect size but attempts to identify these have to date been unsuccessful (Curtis 2015; Rees et al. 2015; Ruderfer et al. 2014) . In order to focus attention on only new or recent variants, the Swedish schizophrenia study of whole exome sequence data focussed on what were termed ultra-rare variants (URVs), that is variants which only occurred in a single subject and which were absent from ExAC. The effects of some of these variants on gene function were annotated as damaging or disruptive and these variants, termed dURVs, were found to be commoner in cases than controls across all genes, with the effect concentrated in particular sets of genes including FMRP targets, synaptically localised genes and genes which were LOF intolerant (Genovese et al. 2016) . The present study seeks to analyse this dataset further in order to consider whether rare non-singleton sequence variants, as well as singleton variants, contribute to schizophrenia risk. The dataset used in this study consists of the largest currently available sample of exome-sequenced schizophrenia cases and controls. It overlaps with a number of previously reported analyses. The full dataset consists of 4968 cases with schizophrenia and 6245 controls. Although recruited in Sweden, it should be noted that some subjects have a substantial Finnish component to their ancestry (Genovese et al. 2016) . The earlier phase of this dataset consisted of 2045 cases and 2045 controls and the primary analysis of these subjects revealed an excess among cases of very rare, disruptive mutations spread over a number of different genes though concentrated in particular gene sets (Purcell et al. 2014) . This first phase of the dataset was also used for analyses which attempted to detect recessive effects and to identify Gene Ontology (GO) pathways with an excess of rare, functional variants among cases but which did not produce statistically significant results (Curtis 2013, 2016) . A subset of the full dataset with cases with Finnish ancestry removed was used to demonstrate a method for deriving an exome-wide risk score and to demonstrate an association of schizophrenia with variants in mir137 binding sites (Curtis 2017; Curtis and Emmett 2017) . A genetically homogeneous subset of the full Swedish dataset was combined with a UK case-control association sample and nonsynonymous variants with Minor Allele Frequency (MAF) < 0.001 which were present on the Illumina HumanExome and HumanOmniExpressExome arrays were analysed (Leonenko et  al. 2017). This revealed an enrichment of these variant alleles in LOF intolerant genes and FMRP targets. The present study uses a subset of the Swedish dataset after removal of subjects with a high Finnish ancestry component in order to avoid artefactual results produced by population stratification. It also utilises all rare (MAF< 0.01) variants analysed using a weighted burden test to identify genes and sets of genes associated with schizophrenia risk. Methods The data analysed consisted of whole exome sequence variants downloaded from dbGaP from the Swedish schizophrenia association study containing 4968 cases and 6245 controls (Genovese et al. 2016) . The dataset was managed and annotated using the GENEVARASSOC program which accompanies SCOREASSOC (https://github.com/daven omiddlenamecurtis/geneVarAssoc). Version hg19 of the reference human genome sequence and RefSeq genes were used to select variants on a gene-wise basis. Members of the protocadherin gamma gene cluster, whose transcripts overlap each other but which are entered separately in RefSeq, were treated as a single gene which was labelled PCDHG. A number of QC processes were applied. Variants were excluded if they did not have a PASS in the Variant Call Format (VCF) information field and individual genotype calls were excluded if they had a quality score less than 30. Sites were also excluded if there were more than 10% of genotypes missing or of low quality in either cases or controls or if the heterozygote count was smaller than both homozygote counts in both cohorts. As previously reported (Curtis 2017) , preliminary gene-wise weighted burden tests revealed that several genes had an apparent excess of rare, proteinaltering variants in cases but that these results were driven by variants which were reported in ExAC to be commoner in Finnish as opposed to non-Finnish Europeans (Lek et al. 2016). Accordingly, subjects with an excess of alleles more frequent in Finns were identified using the methods previously described (Curtis 2017) and removed from the dataset, comprising 743 cases and 411 controls. Once this had been done, leaving a sample of 4225 cases and 5834 controls, the gene-wise weighted burden test results conformed well to what would be expected under the null hypothesis with no evidence for inflation of the test statistic across the majority of genes not thought to be implicated in disease. The tests previously carried out for an excess of dURVs among cases (Genovese et al. 2016) were performed on both the full and reduced datasets, with and without including covariates consisting of the total URV count and the first 20 principal components from the SNP and indel genotypes. Weighted burden analysis of genes and gene sets as described below was carried out using SCOREASSOC, which analyses all variants simultaneously and can accord each variant a different weight according to its MAF and its predicted function (Curtis 2012, 2016) . Each variant was annotated using VEP, PolyPhen and SIFT (Adzhubei et al. 2013; Kumar et al. 2009; McLaren et al. 2016) . GENEVARASSOC was used to generate the input files for SCOREASSOC and the default weights were used, for example consisting of 5 for a synonymous variant and 20 for a stop gained variant, except that 10 was added to the weight if the PolyPhen annotation was possibly or probably damaging and also if the SIFT annotation was deleterious. The full set of weights is shown in Supplementary Table S1. SCOREASSOC also weights rare variants more highly than common ones but because it is well-established that no common variants have a large effect on the risk of schizophrenia we excluded variants with MAF > 0.01 in the cases and in the controls, so in practice weighting by rarity had negligible effect. For each subject a gene-wise risk score was derived as the sum of the variant-wise weights, each multiplied by the number of alleles of the variant which the given subject possessed. These scores were then compared between cases and controls using a t test. To indicate the strength of evidence in favour of an excess of rare, functional variants in cases we took the logarithm base ten of the p value from this t test and then gave it a positive sign if the average weighted sum was higher in cases and a negative sign if the average was higher in controls, to produce a signed log p (SLP). In order to explore the contribution of singleton variants, for the analyses of gene sets three sets of variants were used: singleton variants which were only observed in a single subject and not in ExAC; non-singleton variants, observed in more than one subject (though still with MAF < 0.01 in cases and/or controls); all variants, consisting of these singleton and non-singleton combined. Weighted burden analysis within sets of genes was carried out using PATHWAYASSOC, which for each subject sums up the gene-wise scores to produce an overall score for the gene set. These set-wise scores can then be compared between cases and controls using a t-test. This approach has been demonstrated to produce appropriate p values through application to real data, supported by permutation testing (Curtis 2016) . This analysis was applied to the 31 gene sets used in the Swedish study separately using singleton, nonsingleton and all variants. The analysis was also applied using all variants to the 1454 “all GO gene sets, gene symbols” pathways downloaded from the Molecular Signatures Database at http://www.broadinstitute.org/gsea/msigdb/colle ctions.jsp (Subramanian et al. 2005). Logistic regression analyses of dURVs were carried out using R (R Core Team 2014) . Weighted burden tests for genes and gene sets were carried out using SCOREASSOC and PATHWAYASSOC. Results from these programs are expressed as a Signed Log P (SLP) which is positive if there is an excess of variants among cases and negative if there is an excess among controls. Thus, a SLP of 3 would indicate that there was an excess of variants among cases with twotailed significance p < 10−3. Results Preliminary analysis of the whole dataset, (i.e. all individuals before excluding those with Finnish ancestry), using a logistic regression analysis to test for an excess of dURVs among cases was significant (p = 8.7 × 10−10) when the total URV count and principal components were included as covariates. However without covariates this analysis was only marginally significant (p = 0.031). Further investigation showed that subjects with a substantial Finnish component to their ancestry had a larger number of URVs than those who did not. Cases tended to have a larger number of dURVs than controls, but only relative to the total number of URVs, and more cases had a substantial Finnish ancestry component than controls. Thus, in the whole sample the relative excess of dURVs among cases was almost completely masked by the fact that more cases had Finnish ancestry and that these cases had a smaller absolute number of URVs, meaning that overall there was only a small excess in the absolute number of dURVs among cases. Including the total URV count or the principal components or both as covariates allowed the relative excess among cases to become apparent. The analysis was then repeated on the reduced dataset without those subjects with a substantial Finnish ancestry component. Once this had been done, there was a significant absolute excess of dURVs among cases (p = 2.7 × 10−5), without needing to include either total URV count or principal components as covariates. The weighted burden tests evaluated 1,042,483 valid variants in 22,023 genes. As described in the Methods section, in preliminary analyses using the full dataset a number of genes yielded high SLPs. An example was COMT, with SLP = 7.4. On inspection, it seemed that this gene-wise result was largely driven by SNP rs6267, which was heterozygous in 51/6242 controls and 94/4962 cases (OR 2.3, p = 8 × 10−7). However this variant is noted in ExAC to have MAF = 0.002 in non-Finnish Europeans but MAF = 0.05 in Finns. Hence, its increased frequency among cases appeared to be due to the excess of cases with Finnish ancestry. Once all subjects with a substantial Finnish ancestry component were excluded, the SLP for COMT fell to 1.7 and for rs6267 there were 36/5831 heterozygous controls s and 36/4221 cases (OR 1.4, p = 0.2). A similar effect was observed for other genes with excessively high SLPs in the full dataset but not in the reduced dataset, suggesting that removing subjects with substantial Finnish ancestry seemed to produce a satisfactorily homogeneous dataset. QQ-plots for the gene-wise analyses using the reduced dataset are shown in Fig. 1. All of the plots are symmetrical, indicating that the test is unbiased. When only singleton variants are used the gene-wise tests are somewhat underpowered and the gradient is less than 1. However for the tests using non-singleton variants or all variants the SLPs almost exactly follow the distribution expected under the null hypothesis. One outlier is apparent. This is caused by the gene CDCA8 which produces an SLP of − 5.49 with all variants. Further inspection showed that this result was mainly driven by 22 highly weighted variant alleles among controls but only five among cases. For a gene-wise test to be exome-wide significant with 22,023 genes the absolute value of the SLP would need to exceed 5.64, so this result is still within chance expectation. The results for the 31 gene sets which had previously been used in the Swedish study are shown in Table 1. Using the weighted burden test many, though not all, of the sets show an excess of variants among cases. For neurons, pLI09, fmrp and mir137 the non-singleton variants make a substantial contribution but for psd, rbfox13 and rbfox2 the bulk of the effect comes from only the singleton variants. Given that there are 31 sets, a simple Bonferroni correction would mean that a set could be declared statistically significant if the SLP using all variants exceeded − log(31/0.05) = 2.8 although this threshold should be regarded as conservative because the sets overlap each other. For the 13 sets where SLP > 2.8 using all variants, the genes with the highest gene-wise SLPs are shown in Table 2. As expected, there is some overlap between the sets with several genes making contributions to more than one set. The gene with the highest gene-wise SLP in the fmrp set is FYN (SLP = 3.4) and it is also a member of 6 other sets. FYN codes for a tyrosine kinase which The lists of genes were obtained directly from the first author. The symbol used is the same as that used for the name of the file containing the list Gene set OMIM intellectual disability (Hamosh et al. 2005) Expression specific to brain (Fagerberg et al. 2014) Bound by CELF4 (Wagnon et al. 2012) Missense-constrained (Samocha et al. 2014) Involved in developmental disorder (Deciphering Developmental Disorders Study 2017) De novo variants in autism (Fromer et al. 2014) De novo variants in coronary heart disease (Fromer et al. 2014) De novo variants in epilepsy (Fromer et al. 2014) De novo duplications in ASD (Kirov et al. 2012) De novo duplications in bipolar disorder (Kirov et al. 2012) De novo duplications in schizophrenia (Kirov et al. 2012) De novo variants in intellectual disability (Fromer et al. 2014) De novo deletions in ASD (Kirov et al. 2012) De novo deletions in bipolar disorder (Kirov et al. 2012) De novo deletions in schizophrenia (Kirov et al. 2012) De novo variants in schizophrenia (Fromer et al., 2014) Bound by FMRP (Darnell et al. 2011) Implicated by GWAS (Schizophrenia Working Group of the Psychiatric Genomics Consortium 2014) Targets of microRNA-137 (Robinson et al. 2015) Expression specific to neurons (Cahoy et al. 2008) NMDAR and ARC complexes (Kirov et al. 2012) Loss-of-function intolerant (Lek et al. 2016) PSD-95 ( Bayés et al. 2011 ) Bound by RBFOX 1 or 3 (Weyn-Vanhentenryck et al. 2014) Bound by RBFOX 2 (Weyn-Vanhentenryck et al. 2014) Synaptic (Pirooznia et al. 2012) Escape X-inactivation (Cotton et al. 2013) X-linked intellectual disability, Genetic Services Laboratories of the University of Chica go (Gécz et al. 2009 ; Moeschler 2008 ; Moeschler et al. 2006; Rauch et al. 2006 ) X-linked intellectual disability, Greenwood Genetic Centre (Moeschler et al. 2006) X-linked intellectual disability, OMIM (Hamosh et al. 2005) X-linked intellectual disability (combined) alid (107) brain (2660) celf4 (2675) constrained (1005) dd (93) denovo.aut (2927) denovo.chd (249) denovo.epi (322) denovo.gain.asd (1365) denovo.gain.bd (180) denovo.gain.scz (200) denovo.id (251) denovo.loss.asd (1179) denovo.loss.bd (130) denovo.loss.scz (246) denovo.scz (770) fmrp (1244) gwas (91) mir137 (3260) neurons (4747) nmdarc (80) pLI09 (3488) psd95 (120) rbfox13 (3445) rbfox2 (3068) synaptome (1887) x.escape (213) xlid.chicago (77) xlid.gcc (114) xlid.omim (57) xlid (122) constrained dd denovo.aut denovo.id fmrp brain phosphorylates glutamate metabotropic receptors and ionotropic NMDA receptors, which modulates their trafficking, subcellular distribution and function (Mao and Wang 2016a) In the most recent GWAS of schizophrenia FYN was identified as a “prioritized candidate gene” and an intronic marker, rs7757969, was significant at p = 4.8 × 10−8 (Li et al. 2017) . The activity of FYN is regulated by dopamine DRD2 receptors (Mao and Wang 2016b) . FYN is involved in neuronal apoptosis, brain development and synaptic transmission and lower expression has been observed in the platelets of schizophrenic patients compared with controls (Ali and Salter 2001; Du et al. 2012; Hattori et al. 2009) . Two of the subunits of the NMDA receptor which are substrates of FYN are coded for by GRIN1 (SLP = 1.7) and GRIN2B (SLP = 2.1). In all three of these genes, the signal seems to be produced from a number of highly weighted variants which are individually commoner in cases but all are very rare, with MAF < 0.001 even among cases, so it is not possible to identify any obvious candidate variants. Figure  2 shows the QQ plot for the set-wise analyses using the GO gene sets. Given that there is overlap of genes between sets, the SLPs are non-independent and it is expected that the gradient of the QQ plot will be less than 1. For those sets with a negative SLP this is indeed the case and these results are in accordance with the expectation The top ten genes are shown, providing that the gene-wise SLP was at least 1.3, equivalent to p < 0.05 GRIN2B 2.1 PACS1 2.0 KCNQ3 1.8 ANKRD11 1.7 KIF1A 1.6 KCNH1 1.5 DYNC1H1 1.3 KAT6A 1.3 rbfox13 4.0 HPRT1 3.4 KLHL11 3.2 FYN 3.1 DGKI 2.9 GMCL1 2.9 PITPNA 2.8 SLC6A17 2.8 AAK1 2.7 AFF3 2.7 PSME3 4.0 HPRT1 3.7 KLHL11 3.4 FYN 3.3 DGKI 3.3 PITPNA 3.2 SLC6A17 3.1 RABEP2 2.9 AAK1 2.8 AFF3 2.8 PSME3 ADAMTSL1 4.3 TMC4 4.0 OR10Z1 3.2 VAMP2 2.4 FOCAD 2.4 C20orf96 2.3 HERC1 2.3 AGO3 2.2 RNF25 2.2 CDC42BPB 2.2 rbfox2 ARFGEF2 2.5 FYN 3.4 CDC42BPB 2.2 SLC6A17 3.1 EPHB1 2.2 AAK1 2.9 GRIN2B 2.1 AFF3 2.8 TMPRSS12 1.8 PTK2 2.7 KCNQ3 1.8 PREX2 2.5 MBD5 1.7 ARFGEF2 2.5 TNK2 1.7 VAMP2 2.4 SETDB2 1.6 HERC1 2.3 KCNH1 1.5 PACSIN1 2.3 under the null hypoethesis. However the gradient becomes steeper for sets with positive SLPs and this can be interpreted as showing that some sets have an excess of variants among cases above that which would be expected by chance. Given that 1454 GO gene sets were tested, a simple Bonferroni correction would mean that a test could be declared “exome-wide significant” if it achieved an SLP exceeding − log(1454/0.05) = 4.5. Three sets did achieve this threshold. However, given the fact that the set-wise SLPs are not independent a Bonferroni correction might be viewed as conservative and Table 3 shows all sets achieving SLP > 3. The full results are presented in Supplementary Table S2. The most significant set, INTRACELLULAR_SIGNALING_CASCADE with SLP = 5.4, contains FYN and two other genes with gene-wise SLP > 3, S1PR4 (SLP = 3.7) and RTKN (SLP = 3.2). S1PR4 codes for the type 4 receptor for sphingosine-1-phosphate and the mouse strain carrying the mutation with genotype S1pr4tm1Dgen/S1pr4+ has decreased prepulse inhibition as a phenotype (http://www.informatic s.jax.org/allele/genoview/MGI:3606610) (Blake et al. 2017; The Jackson Laboratory, n.d.) . RTKN codes for rhotekin, a scaffold protein that interacts with GTP-bound Rho proteins. Again, inspecting results for individual variants within these genes did not reveal any obvious candidates. The full results for all genes and all gene sets can be downloaded at: http:// Fig. 2 QQ plot for set-wise SLPs for GO sets against the expected SLP if all sets were non-overlapping and independent www.davecurtis.net/downloads/SSS2WeightedBurdenAn alysisResults.tgz. Discussion This analysis identifies a number of sets of genes that meet Bonferroni-corrected criteria for statistical significance. It differs from previous analyses in a number of ways. In contrast to the original analysis of the Swedish dataset (Genovese et al. 2016) it uses non-singleton as well as singleton variants and it clearly demonstrates that there is a contribution to risk from these non-singleton variants. This is extremely important in terms of the prospects for identifying rare risk variants for schizophrenia. If only unique variants conferred risk, that is only variants which occur independently as de novo mutations and then disappear after a small number of generations, then it would not be possible to identify any single variant as definitively affecting risk. One could at best identify perhaps classes of variant occurring in particular genes. Without being able to conclude that any particular variant affected risk, one could not carry out functional studies in model systems with the confidence that one was indeed studying a true risk variant. Additionally, if only unique variants contributed to risk then strategies that might use linkage disequilibrium to implicate untyped variants could not succeed. If, on the other hand, there are risk variants which survive and spread in the population then potentially these could be tagged by haplotypes of common SNPs and imputed from GWAS data, in a way similar to that used to impute C4 risk variants (Sekar et al. 2016) . Alternatively population sequencing may soon become cheap and accurate enough to identify these rare variants directly. This study differs from both the Swedish study (Genovese et al. 2016) and the Swedish-UK study (Leonenko et al. 2017) in that it uses a homogeneous dataset. The original study did not exclude the subjects with a substantial Finnish ancestry component whereas the SwedishUK study did use a homogeneous subset of the Swedish subjects but then combined them with a UK sample. This meant that both studies needed to incorporate principal components to control for population stratification and this to some extent complicates the interpretation of their results. For example, the highly significant enrichment for SLP dURVs reported in the first study only becomes apparent when covariates are included. In the Swedish-UK study, the most highly significant variant (p = 3.4 × 10−7), which occurs in the MCPH gene, has MAF of 0.0046 in cases and of 0.0012 in controls, meaning that the unadjusted risk ratio is approximately 3.8. However after multivariate analysis including covariates the OR is reported as being only 1.2. By contrast, the reduced dataset we have used appears to be sufficiently homogeneous that the test statistic performs as expected without requiring any adjustment for population stratification. This allows for a simple, straightforward interpretation of the results obtained. Another way our analysis differs is that it includes all variants in a single analysis. Variants are assigned different weights according to an arbitrary pre-specified set of weights designed to emphasise those variants more likely to affect gene function. This meant that we carried out only a single analysis for each gene or set of genes, reducing any correction for multiple-testing. Our analyses utilised 1,042,483 variants, compared with the 112,950 used in the Swedish-UK study. Using our method, 14 of the 32 candidate gene sets and 3 of the 1454 GO sets meet formal standards for statistical significance using a conservative Bonferroni correction. As in the other studies, none of the results for individual genes reach formal standards for statistical significance, although the results obtained for FYN are possibly of interest. It seems likely that our results are detecting a real signal originating from rare variants concentrated within some of the genes that are members of the gene sets with high SLPs. These sets overlap each other to a considerable extent and it is difficult to tease out which ones best define a group of schizophrenia risk genes. An attempt to do this formally using exome-wide risk scores did not produce definitive results (Curtis 2017) . It should be noted that different sets might be implicated for different reasons. For example, it may be that the high SLP for targets of miR-137 occurs because disruption of the regulation of these genes by miR-137 can lead to increased risk of schizophrenia, as supported by the association of schizophrenia with markers for miR-137 and with variants in its binding sites (Curtis and Emmett 2017; Olde Loohuis et al. 2017) . On the other hand, there is no reported association of FMRP itself with schizophrenia and the high SLP for its targets may simply reflect that this identifies a group of genes whose mRNA is localised to the synapse. In any event, it is clear that with samples currently available we are only able to identify very broad gene sets but not yet specific genes. With increased sample sizes it will become possible to identify specific genes and variants which have a moderate or large effect on risk. However such variants, although not singletons, will still be very rare and serious attention should be focussed on complementary approaches to confirm them. One such approach would be to use exome sequence data from affected subjects to provide reference haplotypes for imputation into large GWAS datasets, analogously to the way C4 variants implicating risk were identified (Sekar et al. 2016) . Another would be to search for affected relatives of subjects with candidate variants in order to see if the variants cosegregate with disease, a strategy which was successful in implicating RBM12 in the aetiology of psychosis (Curtis 2011; Steinberg et al. 2017) . If and when specific variants are identified as having substantial effects on risk then they can be incorporated into model systems in order to gain insight into the mechanisms affecting the development of schizophrenia. Acknowledgements The authors wish to thank Giulio Genovese for his assistance in providing supporting files and responding to queries. The datasets used for the analysis described in this manuscript were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000473.v2.p2. Samples used for data analysis were provided by the Swedish Cohort Collection supported by the NIMH grant R01MH077139, the Sylvan C. Herman Foundation, the Stanley Medical Research Institute and The Swedish Research Council ( Grants 2009 -4959 and 2011-4659). Support for the exome sequencing was provided by the NIMH Grand Opportunity grant RCMH089905, the Sylvan C. Herman Foundation, a grant from the Stanley Medical Research Institute and multiple gifts to the Stanley Center for Psychiatric Research at the Broad Institute of MIT and Harvard. JH is supported by MRC studentship 516702. Compliance with ethical standards Conflict of interest David Curtis, Leda Coelewij, Shou-Hwa Liu, Jack Humphrey and Richard Mott declare they have no conflict of interest. Human and animal rights All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Informed consent Informed written consent was obtained from all individual participants included in the study by the original researchers. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativeco mmons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. 1 3 Adzhubei I , Jordan DM , Sunyaev SR ( 2013 ) Predicting functional effect of human missense mutations using PolyPhen-2 . Curr Protoc Hum Genet . https://doi.org/10.1002/0471142905.hg0720s76 Ali DW , Salter MW ( 2001 ) NMDA receptor regulation by Src kinase signalling in excitatory synaptic transmission and plasticity . Curr Opin Neurobiol 11 : 336 - 342 Bayés A , van de Lagemaat LN , Collins MO , Croning MDR , Whittle IR , Choudhary JS , Grant SGN ( 2011 ) Characterization of the proteome, diseases and evolution of the human postsynaptic density . Nat Neurosci 14 : 19 - 21 Blake JA , Eppig JT , Kadin JA , Richardson JE , Smith CL , Bult CJ , the Mouse Genome Database Group ( 2017 ) Mouse genome database (MGD) -2017: community knowledge resource for the laboratory mouse . Nucleic Acids Res 45 , D723 - D729 Boyle EA , Li YI , Pritchard JK , Gordon S , Henders AK , Nyholt DR , Madden PA , Heath AC , Martin NG , Montgomery GW , Al E , Consortium M , Consortium P , Study LC , Al E , Consortium GLGG , Investigators M , Al E , Consortium M , Consortium P , Consortium R , Consortium GLGG , Consortium IE , ReproGen Consortium ( 2017 ) Al an expanded view of complex traits: from polygenic to omnigenic . Cell 169 : 1177 - 1186 Cahoy JD , Emery B , Kaushal A , Foo LC , Zamanian JL , Christopherson KS , Xing Y , Lubischer JL , Krieg PA , Krupenko SA , Thompson WJ , Barres BA ( 2008 ) A transcriptome database for astrocytes, neurons, and oligodendrocytes: a new resource for understanding brain development and function . J Neurosci 28 : 264 - 278 Cotton AM , Ge B , Light N , Adoue V , Pastinen T , Brown CJ ( 2013 ) Analysis of expressed SNPs identifies variable extents of expression from the human inactive X chromosome . Genome Biol 14 : R122 Curtis D ( 2011 ) Assessing the contribution family data can make to case-control studies of rare variants . Ann Hum Genet 75 : 630 - 638 Curtis D ( 2012 ) A rapid method for combined analysis of common and rare variants at the level of a region, gene, or pathway . Adv Appl Bioinform Chem 5 : 1 - 9 Curtis D ( 2013 ) Approaches to the detection of recessive effects using next generation sequencing data from outbred populations . Adv Appl Bioinform Chem 6 : 29 Curtis D ( 2015 ) Investigation of recessive effects in schizophrenia using next-generation exome sequence data . Ann Hum Genet . https://doi.org/10.1111/ahg.12109 Curtis D ( 2016 ) Pathway analysis of whole exome sequence data provides further support for the involvement of histone modification in the aetiology of schizophrenia . Psychiatr Genet 26 : 223 - 227 Curtis D ( 2017 ) Construction of an exome-wide risk score for schizophrenia based on a weighted burden test . Ann Hum Genet . https ://doi.org/10.1111/ahg.12212 Curtis D , Emmett W ( 2017 ) Association study of schizophrenia with variants in miR-137 binding sites . Schizophr Res . https://doi. org/10.1016/j.schres. 2017 . 11 .018 Darnell JC , Van Driesche SJ , Zhang C , Hung KYS , Mele A , Fraser CE , Stone EF , Chen C , Fak JJ , Chi SW , Licatalosi DD , Richter JD , Darnell RB ( 2011 ) FMRP stalls ribosomal translocation on mRNAs linked to synaptic function and autism . Cell 146 : 247 - 261 Deciphering Developmental Disorders Study ( 2017 ) Prevalence and architecture of de novo mutations in developmental disorders . Nature 542 , 433 - 438 Du C-P , Tan R , Hou X-Y ( 2012 ) Fyn kinases play a critical role in neuronal apoptosis induced by oxygen and glucose deprivation or amyloid-β peptide treatment . CNS Neurosci Ther 18 : 754 - 761 Fagerberg L , Hallstrom BM , Oksvold P , Kampf C , Djureinovic D , Odeberg J , Habuka M , Tahmasebpoor S , Danielsson A , Edlund K , Asplund A , Sjostedt E , Lundberg E , Szigyarto CA-K , Skogs M , Takanen JO , Berling H , Tegel H , Mulder J , Nilsson P , Schwenk JM , Lindskog C , Danielsson F , Mardinoglu A , Sivertsson A , von Feilitzen K , Forsberg M , Zwahlen M , Olsson I , Navani S , Huss M , Nielsen J , Ponten F , Uhlen M ( 2014 ) Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics . Mol Cell Proteom 13 : 397 - 406 Fromer M , Pocklington AJ , Kavanagh DH , Williams HJ , Dwyer S , Gormley P , Georgieva L , Rees E , Palta P , Ruderfer DM , Carrera N , Humphreys I , Johnson JS , Roussos P , Barker DD , Banks E , Milanova V , Grant SG , Hannon E , Rose SA , Chambert K , Mahajan M , Scolnick EM , Moran JL , Kirov G , Palotie A , McCarroll SA , Holmans P , Sklar P , Owen MJ , Purcell SM , O'Donovan MC ( 2014 ) De novo mutations in schizophrenia implicate synaptic networks . Nature 506 , 179 - 184 Gécz J , Shoubridge C , Corbett M ( 2009 ) The genetic landscape of intellectual disability arising from chromosome X . Trends Genet 25 : 308 - 316 Genovese G , Fromer M , Stahl EA , Ruderfer DM , Chambert K , Landén M , Moran JL , Purcell SM , Sklar P , Sullivan PF , Hultman CM , McCarroll SA ( 2016 ) Increased burden of ultra-rare proteinaltering variants among 4,877 individuals with schizophrenia . Nat Neurosci 19 : 1433 - 1441 Hamosh A , Scott AF , Amberger JS , Bocchini CA , McKusick VA ( 2005 ) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders . Nucleic Acids Res 33 : D514 - 7 Hattori K , Fukuzako H , Hashiguchi T , Hamada S , Murata Y , Isosaka T , Yuasa S , Yagi T ( 2009 ) Decreased expression of Fyn protein and disbalanced alternative splicing patterns in platelets from patients with schizophrenia . Psychiatry Res 168 : 119 - 128 Kirov G , Pocklington AJ , Holmans P , Ivanov D , Ikeda M , Ruderfer D , Moran J , Chambert K , Toncheva D , Georgieva L , Grozeva D , Fjodorova M , Wollerton R , Rees E , Nikolov I , van de Lagemaat LN , Bayés À , Fernandez E , Olason PI , Böttcher Y , Komiyama NH , Collins MO , Choudhary J , Stefansson K , Stefansson H , Grant SGN , Purcell S , Sklar P , O'Donovan MC , Owen MJ ( 2012 ) De novo CNV analysis implicates specific abnormalities of postsynaptic signalling complexes in the pathogenesis of schizophrenia . Mol Psychiatry 17 : 142 - 153 Kumar P , Henikoff S , Ng PC ( 2009 ) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm . Nat Protoc 4 : 1073 - 1081 Leonenko G , Richards AL , Walters JT , Pocklington A , Chambert K , Al Eissa MM , Sharp SI , O'Brien NL , Curtis D , Bass NJ , McQuillin A , Hultman C , Moran JL , McCarroll SA , Sklar P , Neale BM , Holmans PA , Owen MJ , Sullivan PF , O'Donovan MC ( 2017 ) Mutation intolerant genes and targets of FMRP are enriched for nonsynonymous alleles in schizophrenia . Am J Med Genet B . https://doi.org/10.1002/ajmg.b. 32560 Li Z , Chen J , Yu H , He L , Xu Y , Zhang D , Yi Q , Li C , Li X , Shen J , Song Z , Ji W , Wang M , Zhou J , Chen B , Liu Y , Wang J , Wang P , Yang P , Wang Q , Feng G , Liu B , Sun W , Li B , He G , Li W , Wan C , Xu Q , Li W , Wen Z , Liu K , Huang F , Ji J , Ripke S , Yue W , Sullivan PF , O'Donovan MC , Shi Y ( 2017 ) Genome-wide association analysis identifies 30 new susceptibility loci for schizophrenia . Nat Genet 49 : 1576 - 1583 Mao L-M , Wang JQ ( 2016a ) Tyrosine phosphorylation of glutamate receptors by non-receptor tyrosine kinases: roles in depressionlike behavior . Neurotransmitter , Houston, p 3 Mao L-M , Wang JQ ( 2016b) Dopamine D2 receptors are involved in the regulation of fyn and metabotropic glutamate receptor 5 phosphorylation in the rat striatum in vivo . J Neurosci Res 94 : 329 - 338 McLaren W , Gil L , Hunt SE , Riat HS , Ritchie GRS , Thormann A , Flicek P , Cunningham F ( 2016 ) The ensembl variant effect predictor . Genome Biol 17 : 122 Moeschler JB ( 2008 ) Genetic evaluation of intellectual disabilities . Semin Pediatr Neurol 15 : 2 - 9 Moeschler JB , Shevell M , American Academy of Pediatrics Committee on Genetics (2006) Clinical genetic evaluation of the child with mental retardation or developmental delays . Pediatrics 117 : 2304 - 2316 Olde Loohuis NFM , Nadif Kasri N , Glennon JC , van Bokhoven H , Hébert SS , Kaplan BB , Martens GJM , Aschrafi A ( 2017 ) The schizophrenia risk gene MIR137 acts as a hippocampal gene network node orchestrating the expression of genes relevant to nervous system development and function . Prog Neuro-Psychopharmacol Biol Psychiatry 73 : 109 - 118 Pirooznia M , Wang T , Avramopoulos D , Valle D , Thomas G , Huganir RL , Goes FS , Potash JB , Zandi PP ( 2012 ) SynaptomeDB: an ontology-based knowledgebase for synaptic genes . Bioinformatics 28 : 897 - 899 Power RA , Kyaga S , Uher R , MacCabe JH , Långström N , Landen M , McGuffin P , Lewis CM , Lichtenstein P , Svensson AC ( 2013 ) Fecundity of patients with schizophrenia, autism, bipolar disorder, depression, anorexia nervosa, or substance abuse vs their unaffected siblings . JAMA Psychiatry 70 : 22 - 30 Purcell SM , Moran JL , Fromer M , Ruderfer D , Solovieff N , Roussos P , O'Dushlaine C , Chambert K , Bergen SE , Kähler A , Duncan L , Stahl E , Genovese G , Fernández E , Collins MO , Komiyama NH , Choudhary JS , Magnusson PKE , Banks E , Shakir K , Garimella K , Fennell T , DePristo M , Grant SGN , Haggarty SJ , Gabriel S , Scolnick EM , Lander ES , Hultman CM , Sullivan PF , McCarroll SA , Sklar P ( 2014 ) A polygenic burden of rare disruptive mutations in schizophrenia . Nature 506 : 185 - 190 R Core Team ( 2014 ) R: a language and environment for statistical computing . R Foundation for Statistical Computing , Vienna Rauch A , Hoyer J , Guth S , Zweier C , Kraus C , Becker C , Zenker M , Hüffmeier U , Thiel C , Rüschendorf F , Nürnberg P , Reis A , Trautmann U ( 2006 ) Diagnostic yield of various genetic approaches in patients with unexplained developmental delay or mental retardation . Am J Med Genet A 140A: 2063 - 2074 Rees E , Walters JTR , Georgieva L , Isles AR , Chambert KD , Richards AL , Mahoney-Davies G , Legge SE , Moran JL , McCarroll SA , O'Donovan MC , Owen MJ , Kirov G ( 2014 ) Analysis of copy number variations at 15 schizophrenia-associated loci . Br J Psychiatry 204 : 108 - 114 Rees E , Kirov G , Walters JT , Richards AL , Howrigan D , Kavanagh DH , Pocklington AJ , Fromer M , Ruderfer DM , Georgieva L , Carrera N , Gormley P , Palta P , Williams H , Dwyer S , Johnson JS , Roussos P , Barker DD , Banks E , Milanova V , Rose SA , Chambert K , Mahajan M , Scolnick EM , Moran JL , Tsuang MT , Glatt SJ , Chen WJ , Hwu H-G , Taiwanese Trios Exome Sequencing Consortium , Neale BM , Palotie A , Sklar P , Purcell SM , McCarroll SA , Holmans P , Owen MJ , O'Donovan MC ( 2015 ) Analysis of exome sequence in 604 trios for recessive genotypes in schizophrenia . Transl Psychiatry 5 , e607 Robinson EB , Neale BM , Hyman SE ( 2015 ) Genetic research in autism spectrum disorders . Curr Opin Pediatr 27 : 685 - 691 Ruderfer DM , Lim ET , Genovese G , Moran JL , Hultman CM , Sullivan PF , McCarroll SA , Holmans P , Sklar P , Purcell SM ( 2014 ) No evidence for rare recessive and compound heterozygous disruptive variants in schizophrenia . Eur J Hum Genet . https://doi. org/10.1038/ejhg. 2014 .228 Samocha KE , Robinson EB , Sanders SJ , Stevens C , Sabo A , McGrath LM , Kosmicki JA , Rehnström K , Mallick S , Kirby A , Wall DP , MacArthur DG , Gabriel SB , DePristo M , Purcell SM , Palotie A , Boerwinkle E , Buxbaum JD , Cook EH , Gibbs RA , Schellenberg GD , Sutcliffe JS , Devlin B , Roeder K , Neale BM , Daly MJ ( 2014 ) A framework for the interpretation of de novo mutation in human disease . Nat Genet 46 : 944 - 950 Schizophrenia Working Group of the Psychiatric Genomics Consortium ( 2014 ) Biological insights from 108 schizophrenia-associated genetic loci . Nature 511 , 421 - 427 Sekar A , Bialas AR , de Rivera H , Davis A , Hammond TR , Kamitaki N , Tooley K , Presumey J , Baum M , Van Doren V , Genovese G , Rose SA , Handsaker RE , Daly MJ , Carroll MC , Stevens B , McCarroll SA ( 2016 ) Schizophrenia risk from complex variation of complement component 4 . Nature. https://doi.org/10.1038/natur e16549 Singh T , Kurki MI , Curtis D , Purcell SM , Crooks L , McRae J , Suvisaari J , Chheda H , Blackwood D , Breen G , Pietiläinen O , Gerety SS , Ayub M , Blyth M , Cole T , Collier D , Coomber EL , Craddock N , Daly MJ , Danesh J , DiForti M , Foster A , Freimer NB , Geschwind D , Johnstone M , Joss S , Kirov G , Körkkö J , Kuismin O , Holmans P , Hultman CM , Iyegbe C , Lönnqvist J , Männikkö M , McCarroll SA , McGuffin P , McIntosh AM , McQuillin A , Moilanen JS , Moore C , Murray RM , Newbury-Ecob R , Ouwehand W , Paunio T , Prigmore E , Rees E , Roberts D , Sambrook J , Sklar P , Clair DS , Veijola J , Walters JTR , Williams H , Sullivan PF , Hurles ME , O'Donovan MC , Palotie A , Owen MJ , Barrett JC ( 2016 ) Rare loss-of-function variants in SETD1A are associated with schizophrenia and developmental disorders . Nat Neurosci 19 : 571 - 577 Singh T , Walters JTR , Johnstone M , Curtis D , Suvisaari J , Torniainen M , Rees E , Iyegbe C , Blackwood D , McIntosh AM , Kirov G , Geschwind D , Murray RM , Di Forti M , Bramon E , Gandal M , Hultman CM , Sklar P , Palotie A , Sullivan PF , O'Donovan MC , Owen MJ , Barrett JC , Owen MJ , Barrett JC ( 2017 ) The contribution of rare variants to risk of schizophrenia in individuals with and without intellectual disability . Nat Genet . https://doi. org/10.1038/ng.3903 Steinberg S , Gudmundsdottir S , Sveinbjornsson G , Suvisaari J , Paunio T , Torniainen-Holm M , Frigge ML , Jonsdottir GA , Huttenlocher J , Arnarsdottir S , Ingimarsson O , Haraldsson M , Tyrfingsson T , Thorgeirsson TE , Kong A , Norddahl GL , Gudbjartsson DF , Sigurdsson E , Stefansson H , Stefansson K ( 2017 ) Truncating mutations in RBM12 are associated with psychosis . Nat Genet . https://doi. org/10.1038/ng.3894 Subramanian A , Tamayo P , Mootha VK , Mukherjee S , Ebert BL , Gillette MA , Paulovich A , Pomeroy SL , Golub TR , Lander ES , Mesirov JP ( 2005 ) Gene set enrichment analysis: a knowledgebased approach for interpreting genome-wide expression profiles . Proc Nat Acad Sci 102 ( 43 ): 15545 - 15550 Wagnon JL , Briese M , Sun W , Mahaffey CL , Curk T , Rot G , Ule J , Frankel WN ( 2012 ) CELF4 regulates translation and local abundance of a vast set of mRNAs, including genes associated with regulation of synaptic function . PLoS Genet 8 : e1003067 Weyn-Vanhentenryck SM , Mele A , Yan Q , Sun S , Farny N , Zhang Z , Xue C , Herre M , Silver PA , Zhang MQ , Krainer AR , Darnell RB , Zhang C ( 2014 ) HITS-CLIP and integrative modeling define the Rbfox splicing-regulatory network linked to brain development and autism . Cell Rep 6 : 1139 - 1152 The Jackson Laboratory (n.d.) Mouse Genome Database (MGD) at the Mouse Genome Informatics website [WWW Document] . http:// www.informatics. jax.org. Accessed 18 Mar 2018

This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs10519-018-9893-3.pdf

David Curtis, Leda Coelewij, Shou-Hwa Liu, Jack Humphrey, Richard Mott. Weighted Burden Analysis of Exome-Sequenced Case-Control Sample Implicates Synaptic Genes in Schizophrenia Aetiology, Behavior Genetics, 2018, 1-11, DOI: 10.1007/s10519-018-9893-3