Multiple breast cancer risk variants are associated with differential transcript isoform expression in tumors
Human Molecular Genetics, 2015, Vol. 24, No. 25
7421–7431
doi: 10.1093/hmg/ddv432
Advance Access Publication Date: 15 October 2015
Association Studies Article
A S S O C I AT I O N S T U D I E S A R T I C L E
Multiple breast cancer risk variants are associated with
differential transcript isoform expression in tumors
1
Department of Medicine, 2Institute for Human Genetics, 3Helen Diller Family Comprehensive Cancer Center and,
4
Department of Cell and Tissue Biology, University of California, San Francisco, CA, USA, 5Department of
Medicine, Division of Medical Oncology, Stanford University, Stanford, CA, USA and 6Department of Plant
and Microbial Biology, University of California, Berkeley, CA, USA
*To whom correspondence should be addressed at: 875 Blake Wilbur Drive, Stanford, CA, 94305, USA. Tel: +1 3013326541; Fax: +1 4155144982;
Email:
Abstract
Genome-wide association studies have identified over 70 single-nucleotide polymorphisms (SNPs) associated with breast
cancer. A subset of these SNPs are associated with quantitative expression of nearby genes, but the functional effects of the
majority remain unknown. We hypothesized that some risk SNPs may regulate alternative splicing. Using RNA-sequencing data
from breast tumors and germline genotypes from The Cancer Genome Atlas, we tested the association between each risk SNP
genotype and exon-, exon–exon junction- or transcript-specific expression of nearby genes. Six SNPs were associated with
differential transcript expression of seven nearby genes at FDR < 0.05 (BABAM1, DCLRE1B/PHTF1, PEX14, RAD51L1, SRGAP2D and
STXBP4). We next developed a Bayesian approach to evaluate, for each SNP, the overlap between the signal of association with
breast cancer and the signal of association with alternative splicing. At one locus (SRGAP2D), this method eliminated the
possibility that the breast cancer risk and the alternate splicing event were due to the same causal SNP. Lastly, at two loci, we
identified the likely causal SNP for the alternative splicing event, and at one, functionally validated the effect of that SNP on
alternative splicing using a minigene reporter assay. Our results suggest that the regulation of differential transcript isoform
expression is the functional mechanism of some breast cancer risk SNPs and that we can use these associations to identify
causal SNPs, target genes and the specific transcripts that may mediate breast cancer risk.
Introduction
Genome-wide association studies (GWASs) have identified thousands of disease risk-associated single-nucleotide polymorphisms (raSNPs), including, to date, 75 that are associated with
breast cancer risk (1). The vast majority of raSNPs are located in
noncoding regions of the genome; therefore, they, or SNPs in linkage disequilibrium (LD) with them, are likely to influence risk by
affecting the regulation of nearby genes or noncoding RNAs (2,3).
To determine their function, investigators have tested their association with expression levels of nearby genes (expression quantitative trait loci, or eQTLs) in cis (4–7) or in trans (4) and assessed
whether SNPs in LD with the index raSNP demonstrate evidence
for transcription factor binding or histone methylation (6,8).
These methods have uncovered eight eQTL associations (4,7),
three associations with the targets of a nearby transcription
Received: May 6, 2015. Revised: September 10, 2015. Accepted: October 9, 2015
© The Author 2015. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/),
which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
7421
Jennifer L. Caswell1,2,3,5, *, Roman Camarda1,3,4, Alicia Y. Zhou1,3,4,
Scott Huntsman1,2,3, Donglei Hu1,2,3, Steven E. Brenner6, Noah Zaitlen1,2,
Andrei Goga1,3,4 and Elad Ziv1,2,3
7422
| Human Molecular Genetics, 2015, Vol. 24, No. 25
Results
Splicing QTL analysis of breast cancer raSNPs
We used the RNA-sequencing (RNA-seq) data and matched
germline genotypes for 358 estrogen receptor (ER)-positive breast
tumors and 109 ER-negative breast tumors from TCGA. For each
of the breast cancer raSNPs, we searched for differential transcript isoform expression of nearby genes (Supplementary
Material, Table S1), adjusting for overall gene expression, global
expression variability (16,17) and genetic ancestry. We used
three complementary approaches, testing the association
between raSNPs and (1) rank-normalized reads per kilobase per
million mapped reads (RPKM) mapping to each exon, (2) ranknormalized reads per million mapped reads (RPM) mapping to
each exon–exon junction and (3) rank-normalized expression
estimates of reconstructed transcripts of each annotated isoform, as generated by the RSEM algorithm using UCSC transcripts
(chosen as its output is available through TCGA) (3) (Supplementary Material, Tables S2–S4). We identified 13 associations with 10
raSNPs using these methods at FDR < 0.05, including 9 exon associations, 8 junction associations and 6 whole-transcript associations; several splicing QTLs were identified by more than one
approach (Fig. 1). Q–Q plots showed deviation from normality at
the extremes of the P-value distributions (Supplementary Material, Fig. S1). When the analysis was repeated in the smaller set of
ER-negative tumors, we identified four associations with four
raSNPs, including two exon associations, two junction associations and two transcript associations (Supplementary Material,
Table S5), all of which were also identified in the ER-positive
tumors.
For the exon-specific test, we also tested for differences in raw
counts mapping to each exon, using the negative binomial distribution as implemented by the DEXSeq R Bioconductor software
package V1.8.0 (18). Of the nine SNP-gene exon associations identified using rank-normalized RPKM values, seven were significant
Figure 1. Flowchart for determining splicing QTL associations. We identified 13
SNP-gene associations through exon, junction and whole-transcript association
tests with risk-associated SNPs; several associations were identified by multiple
methods. After excluding SNP-gene associations that could not be corroborated
with other tests, that could be related to the presence of pseudogenes or
paralogs or that could have derived from mapping bias to the reference
genome, seven SNP-gene associations remained.
at FDR < 0.05 when using DEXSeq, although the methods identified differing numbers of exons as significant (Supplementary
Material, Table S6). One additional exon association (DCLRE1B)
identified with rank-normalized RPKM values was captured because our test adjusted for overall gene expression with the exon
of interest excluded, rather than because of a difference between
rank-normalized RPKM values/linear regression versus raw
counts/the negative binomial distribution. Given the similarity (...truncated)