Multiple breast cancer risk variants are associated with differential transcript isoform expression in tumors

Human Molecular Genetics, Dec 2015

Genome-wide association studies have identified over 70 single-nucleotide polymorphisms (SNPs) associated with breast cancer. A subset of these SNPs are associated with quantitative expression of nearby genes, but the functional effects of the majority remain unknown. We hypothesized that some risk SNPs may regulate alternative splicing. Using RNA-sequencing data from breast tumors and germline genotypes from The Cancer Genome Atlas, we tested the association between each risk SNP genotype and exon-, exon–exon junction- or transcript-specific expression of nearby genes. Six SNPs were associated with differential transcript expression of seven nearby genes at FDR < 0.05 (BABAM1, DCLRE1B/PHTF1, PEX14, RAD51L1, SRGAP2D and STXBP4). We next developed a Bayesian approach to evaluate, for each SNP, the overlap between the signal of association with breast cancer and the signal of association with alternative splicing. At one locus (SRGAP2D), this method eliminated the possibility that the breast cancer risk and the alternate splicing event were due to the same causal SNP. Lastly, at two loci, we identified the likely causal SNP for the alternative splicing event, and at one, functionally validated the effect of that SNP on alternative splicing using a minigene reporter assay. Our results suggest that the regulation of differential transcript isoform expression is the functional mechanism of some breast cancer risk SNPs and that we can use these associations to identify causal SNPs, target genes and the specific transcripts that may mediate breast cancer risk.

Article PDF cannot be displayed. You can download it here:

https://hmg.oxfordjournals.org/content/24/25/7421.full.pdf

Multiple breast cancer risk variants are associated with differential transcript isoform expression in tumors

Human Molecular Genetics, 2015, Vol. 24, No. 25 7421–7431 doi: 10.1093/hmg/ddv432 Advance Access Publication Date: 15 October 2015 Association Studies Article A S S O C I AT I O N S T U D I E S A R T I C L E Multiple breast cancer risk variants are associated with differential transcript isoform expression in tumors 1 Department of Medicine, 2Institute for Human Genetics, 3Helen Diller Family Comprehensive Cancer Center and, 4 Department of Cell and Tissue Biology, University of California, San Francisco, CA, USA, 5Department of Medicine, Division of Medical Oncology, Stanford University, Stanford, CA, USA and 6Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA *To whom correspondence should be addressed at: 875 Blake Wilbur Drive, Stanford, CA, 94305, USA. Tel: +1 3013326541; Fax: +1 4155144982; Email: Abstract Genome-wide association studies have identified over 70 single-nucleotide polymorphisms (SNPs) associated with breast cancer. A subset of these SNPs are associated with quantitative expression of nearby genes, but the functional effects of the majority remain unknown. We hypothesized that some risk SNPs may regulate alternative splicing. Using RNA-sequencing data from breast tumors and germline genotypes from The Cancer Genome Atlas, we tested the association between each risk SNP genotype and exon-, exon–exon junction- or transcript-specific expression of nearby genes. Six SNPs were associated with differential transcript expression of seven nearby genes at FDR < 0.05 (BABAM1, DCLRE1B/PHTF1, PEX14, RAD51L1, SRGAP2D and STXBP4). We next developed a Bayesian approach to evaluate, for each SNP, the overlap between the signal of association with breast cancer and the signal of association with alternative splicing. At one locus (SRGAP2D), this method eliminated the possibility that the breast cancer risk and the alternate splicing event were due to the same causal SNP. Lastly, at two loci, we identified the likely causal SNP for the alternative splicing event, and at one, functionally validated the effect of that SNP on alternative splicing using a minigene reporter assay. Our results suggest that the regulation of differential transcript isoform expression is the functional mechanism of some breast cancer risk SNPs and that we can use these associations to identify causal SNPs, target genes and the specific transcripts that may mediate breast cancer risk. Introduction Genome-wide association studies (GWASs) have identified thousands of disease risk-associated single-nucleotide polymorphisms (raSNPs), including, to date, 75 that are associated with breast cancer risk (1). The vast majority of raSNPs are located in noncoding regions of the genome; therefore, they, or SNPs in linkage disequilibrium (LD) with them, are likely to influence risk by affecting the regulation of nearby genes or noncoding RNAs (2,3). To determine their function, investigators have tested their association with expression levels of nearby genes (expression quantitative trait loci, or eQTLs) in cis (4–7) or in trans (4) and assessed whether SNPs in LD with the index raSNP demonstrate evidence for transcription factor binding or histone methylation (6,8). These methods have uncovered eight eQTL associations (4,7), three associations with the targets of a nearby transcription Received: May 6, 2015. Revised: September 10, 2015. Accepted: October 9, 2015 © The Author 2015. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 7421 Jennifer L. Caswell1,2,3,5, *, Roman Camarda1,3,4, Alicia Y. Zhou1,3,4, Scott Huntsman1,2,3, Donglei Hu1,2,3, Steven E. Brenner6, Noah Zaitlen1,2, Andrei Goga1,3,4 and Elad Ziv1,2,3 7422 | Human Molecular Genetics, 2015, Vol. 24, No. 25 Results Splicing QTL analysis of breast cancer raSNPs We used the RNA-sequencing (RNA-seq) data and matched germline genotypes for 358 estrogen receptor (ER)-positive breast tumors and 109 ER-negative breast tumors from TCGA. For each of the breast cancer raSNPs, we searched for differential transcript isoform expression of nearby genes (Supplementary Material, Table S1), adjusting for overall gene expression, global expression variability (16,17) and genetic ancestry. We used three complementary approaches, testing the association between raSNPs and (1) rank-normalized reads per kilobase per million mapped reads (RPKM) mapping to each exon, (2) ranknormalized reads per million mapped reads (RPM) mapping to each exon–exon junction and (3) rank-normalized expression estimates of reconstructed transcripts of each annotated isoform, as generated by the RSEM algorithm using UCSC transcripts (chosen as its output is available through TCGA) (3) (Supplementary Material, Tables S2–S4). We identified 13 associations with 10 raSNPs using these methods at FDR < 0.05, including 9 exon associations, 8 junction associations and 6 whole-transcript associations; several splicing QTLs were identified by more than one approach (Fig. 1). Q–Q plots showed deviation from normality at the extremes of the P-value distributions (Supplementary Material, Fig. S1). When the analysis was repeated in the smaller set of ER-negative tumors, we identified four associations with four raSNPs, including two exon associations, two junction associations and two transcript associations (Supplementary Material, Table S5), all of which were also identified in the ER-positive tumors. For the exon-specific test, we also tested for differences in raw counts mapping to each exon, using the negative binomial distribution as implemented by the DEXSeq R Bioconductor software package V1.8.0 (18). Of the nine SNP-gene exon associations identified using rank-normalized RPKM values, seven were significant Figure 1. Flowchart for determining splicing QTL associations. We identified 13 SNP-gene associations through exon, junction and whole-transcript association tests with risk-associated SNPs; several associations were identified by multiple methods. After excluding SNP-gene associations that could not be corroborated with other tests, that could be related to the presence of pseudogenes or paralogs or that could have derived from mapping bias to the reference genome, seven SNP-gene associations remained. at FDR < 0.05 when using DEXSeq, although the methods identified differing numbers of exons as significant (Supplementary Material, Table S6). One additional exon association (DCLRE1B) identified with rank-normalized RPKM values was captured because our test adjusted for overall gene expression with the exon of interest excluded, rather than because of a difference between rank-normalized RPKM values/linear regression versus raw counts/the negative binomial distribution. Given the similarity (...truncated)


This is a preview of a remote PDF: https://hmg.oxfordjournals.org/content/24/25/7421.full.pdf
Article home page: http://hmg.oxfordjournals.org/content/24/25/7421.abstract

Jennifer L. Caswell, Roman Camarda, Alicia Y. Zhou, Scott Huntsman, Donglei Hu, Steven E. Brenner, Noah Zaitlen, Andrei Goga, Elad Ziv. Multiple breast cancer risk variants are associated with differential transcript isoform expression in tumors, Human Molecular Genetics, 2015, pp. 7421-7431, 24/25, DOI: 10.1093/hmg/ddv432