Inferring rare disease risk variants based on exact probabilities of sharing by multiple affected relatives (pdf)

Article PDF cannot be displayed. You can download it here:

https://bioinformatics.oxfordjournals.org/content/30/15/2189.full.pdf

Inferring rare disease risk variants based on exact probabilities of sharing by multiple affected relatives

BIOINFORMATICS ORIGINAL PAPER Genetics and population analysis Vol. 30 no. 15 2014, pages 2189–2196 doi:10.1093/bioinformatics/btu198 Advance Access publication April 16, 2014 Inferring rare disease risk variants based on exact probabilities of sharing by multiple affected relatives Alexandre Bureau1,2,*, Samuel G. Younkin3, Margaret M. Parker4, Joan E. Bailey-Wilson5, Mary L. Marazita6, Jeffrey C. Murray7, Elisabeth Mangold8, Hasan Albacha-Hejazi9, Terri H. Beaty4 and Ingo Ruczinski3,* 1 Associate Editor: Jeffrey Barrett ABSTRACT 1 Motivation: Family-based designs are regaining popularity for genomic sequencing studies because they provide a way to test cosegregation with disease of variants that are too rare in the population to be tested individually in a conventional case–control study. Results: Where only a few affected subjects per family are sequenced, the probability that any variant would be shared by all affected relatives—given it occurred in any one family member—provides evidence against the null hypothesis of a complete absence of linkage and association. A P-value can be obtained as the sum of the probabilities of sharing events as (or more) extreme in one or more families. We generalize an existing closed-form expression for exact sharing probabilities to more than two relatives per family. When pedigree founders are related, we show that an approximation of sharing probabilities based on empirical estimates of kinship among founders obtained from genome-wide marker data is accurate for low levels of kinship. We also propose a more generally applicable approach based on Monte Carlo simulations. We applied this method to a study of 55 multiplex families with apparent non-syndromic forms of oral clefts from four distinct populations, with whole exome sequences available for two or three affected members per family. The rare single nucleotide variant rs149253049 in ADAMTS9 shared by affected relatives in three Indian families achieved significance after correcting for multiple comparisons (p ¼ 2 106 ). Availability and implementation: Source code and binaries of the R package RVsharing are freely available for download at http://cran. r-project.org/web/packages/RVsharing/index.html. Contact: or Supplementary information: Supplementary data are available at Bioinformatics online. The advent of high-throughput sequencing of whole exomes and even whole genomes opens the possibility of detecting rare variants (RVs, including those unique to a family, and alleles up to a frequency of 1% in a population) impacting human health. The first successful applications of exome sequencing have been with rare Mendelian traits (Gilissen et al., 2012). A common study design to discover highly penetrant causal variants that are rare in families where previous genotyping has not been performed is to sequence the exome (or increasingly, the whole genome) of two or three affected subjects, and focus on novel variants predicted to be functional and shared by all sequenced family members as likely causal variants (Gilissen et al., 2012). Contrary to monogenic Mendelian traits, considerable genetic heterogeneity must be expected with complex diseases. Familial forms of numerous common complex diseases are caused by RVs, supporting the hypothesis that RVs may explain a part of the so-called ‘missing heritability’ of these diseases, although the extent of the contribution of RVs to complex disease heritability is an ongoing debate (Gibson, 2012). In a family where cases cluster, there is a high probability that multiple affected members carry the same rare disease predisposing variant if such a variant exists and its penetrance is high (Cirulli and Goldstein, 2010; Wijsman, 2012). This gives an advantage to family samples over the samples of unrelated individuals, where disease-causing RVs may be seen only once or twice among tens of thousands of subjects. As with Mendelian disorders, it has initially been proposed to use the RV sharing information to filter out RVs not shared in at least one family (Feng et al., 2011). For variants sufficiently rare so copies in the sequenced relatives are almost certainly identical by descent (IBD), the probability that an RV independent of the disease and detected in at least one sequenced subject would not be shared by other sequenced relatives who are affected was computed by Feng et al. (2011) to quantify the effectiveness of what they call the ‘concordance filter’ in discarding irrelevant RVs. We adopt the view that the probability that an RV would Received on November 20, 2013; revised on March 14, 2014; accepted on April 9, 2014 *To whom correspondence should be addressed. INTRODUCTION Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US. 2189 Centre de Recherche de l’Institut Universitaire en Santé Mentale de Québec, G1J 2G3, 2Département de Médecine Sociale et Préventive, Université Laval, Québec, G1V 0A6 Canada, 3Department of Biostatistics, 4Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, 5Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore, MD 21224, 6Department of Oral Biology, Center for Craniofacial and Dental Genetics, School of Dental Medicine, University of Pittsburgh, PA 15219, 7 Department of Pediatrics, School of Medicine, University of Iowa, IA 52242, USA, 8Institute of Human Genetics, University of Bonn, Bonn D-53127, Germany and 9Dr. Hejazi Clinic, P.O. Box 2519, Riyadh 11461, Saudi Arabia A.Bureau et al. 2 founders are unrelated and we assume the variant is rare enough that a single copy exists among all the alleles present among the nf founders of the pedigree linking the sequenced subjects. In a generalization, we allow founders to be related, and allow for up to two copies of the RV to be introduced into the pedigree by related founders. We finally demonstrate how RV sharing probabilities computed in a single family can be combined across multiple families where the same variant is seen, and how to derive the P-value for the hypothesis test. 2.1 Rare variant sharing probability assuming unrelated founders We define the following random variables: Ci Number of copies of the RV received by sequenced subject i, Fj Indicator variable that founder j introduced one copy of the RV into the pedigree, Dij Number of generations (meioses) between subject i and his or her ancestor j. For a set of n sequenced subjects, we want to compute the probability P½RV shared ¼ P½C1 ¼ . . . ¼ Cn ¼ 1jC1 þ . . . þ Cn 1 ¼ P½C1 ¼ . . . ¼ Cn ¼ 1 P½C1 þ . . . þ Cn 1 nf X P½C1 ¼ . . . ¼ Cn ¼ 1jFj P½Fj ¼ j¼1 nf X P½C1 þ . . . þ Cn 1jFj P½Fj j¼1 where the expression on the third line results from our assumption of a single copy of that RV among all alleles present in the nf founders. The probabilities P½F (...truncated)