Allele-specific expression in the human heart and its application to postoperative atrial fibrillation and myocardial ischemia
Sigurdsson et al. Genome Medicine
Allele-specific expression in the human heart and its application to postoperative atrial fibrillation and myocardial ischemia
Martin I. Sigurdsson 0 3
Louis Saddic 0 3
Mahyar Heydarpour 0 3
Tzuu-Wang Chang 0 3
Prem Shekar 2
Sary Aranki 2
Gregory S. Couper 1
Stanton K. Shernan 0 3
Jon G. Seidman 4
Simon C. Body 0 3
Jochen D. Muehlschlegel 0 3
0 Department of Anesthesiology, Perioperative and Pain Medicine, Brigham and Women's Hospital / Harvard Medical School , 75 Francis Street, Boston, MA 02115 , USA
1 Division of Cardiac Surgery, Department of Surgery, Tufts Medical Center , Boston, MA , USA
2 Division of Cardiac Surgery, Department of Surgery, Brigham and Women's Hospital / Harvard Medical School , Boston, MA , USA
3 Department of Anesthesiology, Perioperative and Pain Medicine, Brigham and Women's Hospital / Harvard Medical School , 75 Francis Street, Boston, MA 02115 , USA
4 Department of Genetics, Harvard Medical School , Boston, MA , USA
Background: Allele-specific expression (ASE) is differential expression of each of the two chromosomal alleles of an autosomal gene. We assessed ASE patterns in the human left atrium (LA, n = 62) and paired samples from the left ventricle (LV, n = 76) before and after ischemia, and tested the utility of differential ASE to identify genes associated with postoperative atrial fibrillation (poAF) and myocardial ischemia. Methods: Following genotyping from whole blood and whole-genome sequencing of LA and LV samples, we called ASE using sequences overlapping heterozygous SNPs using rigorous quality control to minimize false ASE calling. ASE patterns were compared between cardiac chambers and with a validation cohort from cadaveric tissue. ASE patterns in the LA were compared between patients who had poAF and those who did not. Changes in ASE in the LV were compared between paired baseline and post-ischemia samples. Results: ASE was found for 3404 (5.1%) and 8642 (4.0%) of SNPs analyzed in the LA and LV, respectively. Out of 6157 SNPs with ASE analyzed in both chambers, 2078 had evidence of ASE in both LA and LV (p < 0.0001). The SNP with the greatest ASE difference in the LA of patients with and without postoperative atrial fibrillation was within the gelsolin (GSN) gene, previously associated with atrial fibrillation in mice. The genes with differential ASE in poAF were enriched for myocardial structure genes, indicating the importance of atrial remodeling in the pathophysiology of AF. The greatest change in ASE between paired post-ischemic and baseline samples of the LV was in the zinc finger and homeodomain protein 2 (ZHX2) gene, a modulator of plasma lipids. Genes with differential ASE in ischemia were enriched in the ubiquitin ligase complex pathway associated with the ischemia-reperfusion response. Conclusions: Our results establish a pattern of ASE in the human heart, with a high degree of shared ASE between cardiac chambers as well as chamber-specific ASE. Furthermore, ASE analysis can be used to identify novel genes associated with (poAF) and myocardial ischemia.
Genetic variation causes disease through alteration in
the quantity and function of proteins, among other
mechanisms . Expression of a gene’s messenger RNA
(mRNA) is controlled by numerous mechanisms that are
influenced by local and distant genetic variation, such as
single nucleotide polymorphisms (SNPs).
Quantifying gene expression by RNA sequencing
(RNA-seq) is performed by alignment of short RNA
sequence reads from a tissue that is mapped to a reference
genome sequence. The number of measured mRNA
reads accurately measures gene expression . Methods
utilizing high-throughput RNA-seq have been used to
study the genetic background of multiple cardiovascular
diseases. This has thus far mostly been performed in
animal models, exemplified by the assessment of changes in
gene expression in mouse models of myocardial
ischemia [3, 4]. In humans, we recently used this technique
to measure the effects of ischemia on the gene
expression of the left ventricle (LV) during cardiopulmonary
© The Author(s). 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
bypass . This indicated substantial changes in the
expression of a wide variety of functional categories of
genes between the baseline and post-ischemic samples
in a short amount of time.
Differences in expression between each of the two
SNP alleles in autosomal genes, called allele-specific
expression (ASE) , allows us to examine the biological
control of specific gene expression in health and disease
. This technique has successfully quantified ASE
across a host of tissues, revealing that 2.3% of all tested
SNPs control nearby gene expression, with substantial
tissue-specific variation and moderate similarity in the
ASE pattern of similar tissue types .
Studies on ASE in specific organs are sparse, especially
using living tissues, and the ASE landscape of the human
heart is poorly understood. Furthermore, limited data
exist to show that ASE analysis can be useful to
understand the pathophysiology of cardiac disease. Here we
utilized two unique high-throughput RNA-seq datasets
from the human left atrium (LA) and left ventricle (LV)
during open heart surgery. We used these data to
discover ASE in these tissues after filtering and quality
control of both SNPs and RNA reads to call ASE. Our
hypothesis was that there was ASE common to both
chambers, as well as chamber-specific ASE.
Furthermore, we assessed whether ASE could be used to
identify novel variants associated with cardiac disease by
comparing differential ASE between patients who had
postoperative atrial fibrillation (poAF) and those who
did not, and to find variants with differential ASE
associated with the myocardial response to ischemia.
We obtained tissue samples from two prospective
studies utilizing next-generation RNA-seq and high-density
genome-wide DNA genotyping. Samples from the LA
came from a cohort of 62 Caucasian patients undergoing
mitral valve repair or replacement surgery for mitral
valve regurgitation with cardiopulmonary bypass. During
incision of the LA to access the mitral valve, a small
sample of the left atrial free wall was obtained and used
for RNA isolation.
LV samples were obtained during placement of the LV
vent in 76 Caucasian patients undergoing aortic valve
replacement for aortic stenosis with or without
concomitant coronary artery bypass grafting. A small punch
biopsy of the anterior apex of the LV was obtained at
two time points: immediately after aortic cross clamping
(baseline) and shortly before aortic cross clamp removal
(post-ischemia) as described previously . Between
samples, cold blood cardioplegia was intermittently
administered to reduce the extent of myocardial ischemia.
There was no patient overlap between the LA cohort
and LV cohort.
Patient demographics, surgical, and clinical outcome
phenotypes were collected prospectively. Patients who
donated LA samples were followed for poAF, defined as
AF identified by any clinician during the primary
RNA-seq and genotyping
For both studies, tissue samples were placed in RNAlater
(Ambion; ThermoFisher Scientific) solution, followed by
whole RNA extraction using standardized methods .
After reverse transcription of single-stranded RNA to
double-stranded DNA, isolation of short fragments,
poly(A) addition and ligation of adaptors,
doublestranded sequencing was performed on an Illumina
HiSeq 2000 (Illumina, San Diego, CA, USA). Read
length for the samples was in the range of 90–100 bp.
DNA was isolated from whole blood using standard
methods. SNP genotyping was performed using the
Illumina Omni2.5 array with additional exome content
(Illumina, San Diego, CA, USA) chip, version 1.1 for the
LV samples and version 1.2 for the LA samples.
Sequence alignment and processing for ASE calling
After RNA-seq, adaptor sequences and low-quality reads
were removed, TopHat2 and BowTie algorithms were
used to align the sequenced reads to the human genome
(hg19) . A list of heterozygous SNPs for each
individual was generated and the RNA sequences that uniquely
overlapped the location of each heterozygous SNP were
identified . From the patient mRNA sequence reads
and the location of heterozygous SNPs in each patient,
the appropriate base read at the location of each
heterozygous individual was extracted and counted using
SAMtools and custom made scripts .
ASE calling and statistics
All statistics and imaging was done in R, version 3.1.0.
Several filters were applied to identify SNPs appropriate
for ASE calling . Only SNPs with at least one
heterozygous individual were analyzed. A minimum of 15 RNA
reads crossing the heterozygous SNP were required.
SNPs were also required to be in Hardy–Weinberg
equilibrium (p > 0.00001) and have a genotyping rate of more
than 95%, to be included. Only SNPs in regions with a
mappability score of 1 were included, indicating good
mapping of short reads to the region. Only reads with a
unique mapping to the genome were used. Finally, we
estimated the overall genotyping error (ratio of the
counts of alleles other than reference and alternative
allele to the overall number of allele counts). Using this
estimate, we tested the null hypothesis that the number
of alleles other than the reference and alternative alleles
was the same as the overall genotyping error. The SNPs
where the null hypothesis was rejected at p < 0.05 were
considered to potentially have evidence of genotyping
error or random monoallelic expression and were
excluded from further analysis.
After the application of each filter, we assessed the
reference sequence ratio, defined as REF/ALT + REF, where
REF is the read count of the allele listed in the human
reference genome (hg19) and ALT is the read count of
alternative genome allele. An overall ratio of 0.5
indicates an equal number of reads of reference and
alternative allele, indicating no preference for the allele listed in
the reference genome used for aligning (reference
genome bias). This can be used for individual SNPs to
assess ASE and also as a summary statistic over all SNPs
to assess reference genome bias.
We evaluated both the traditional binomial test and
algorithms utilizing different statistical distribution to call
ASE, with the goal of reducing the number of false
positives and reference genome bias. A binomial test was
used for each SNP using the sum of REF and ALT allele
counts over all individuals. This tests the REF/ALT +
REF ratio against the null hypothesis of a REF/ALT +
REF ratio of 0.5. The edgeR package was used to call
ASE, utilizing negative binominal distribution and
estimation of individual sample and variant expression
dispersion . This was performed using both the sum of
REF and ALT allele counts with a fixed dispersion
estimate of 0.1 and also by using REF and ALT allele counts
from each individual. Alternatively, the limma package
was used to call ASE after voom transformation of the
count matrix using REF and ALT allele counts from each
individual . The results of the algorithms were
compared by QQ and Venn plots to visualize the number of
SNPs/genes with ASE (Additional file 1: Figure S1-S3).
The ASE calling was assessed visually by plotting the
REF/ALT + REF ratio versus p value of the ASE
assumption (Additional file 1: Figure S4). A false discovery rate
(FDR)-adjusted p value < 0.05 was used to avoid
overcalling ASE, and considered indicative of ASE.
After ASE calling in the LA and LV samples, SNPs
with ASE in both chambers were identified, as well as
SNPs with chamber-specific ASE in either LA or LV but
not the other. The absolute number of shared SNPs was
compared against the distribution of the number of
shared SNPs with 10,000 permutations of a random
sample of eligible SNPs.
The left atrial expression of reference and alternative
alleles of each heterozygous SNP was compared between
patients who did and did not have poAF. Differential
ASE during ischemia was tested by comparing the
expression of the reference and alternative alleles of each
heterozygous SNP between the baseline and the
postischemia sample, using paired analysis of the LV samples.
A functional enrichment analysis was performed on the
gene sets with differential ASE (at FDR-adjusted p value
<0.05) using the GeneMANIA algorithm within the
Cytoscape network visualization tool [14, 15]. This
analysis was performed using the default interaction
networks with the exception of the default co-expression
networks, which were exchanged for custom-made LA
and LV co-expression networks using our gene
expression data. The algorithm allowed the inclusion of the
top 20 related genes and at most 20 attributes using
To contrast our result against an independent set of
data, we downloaded the ASE dataset from the
Genotype-Tissue Expression (GTEx) pilot analysis. The
generation of this dataset has been described in detail
elsewhere . In short, the dataset contains results from
RNA-seq, both exome sequencing and genome-wide
RNA-seq of various tissues in several hundred deceased
individuals after variable amount of warm ischemic time.
Sequence alignment and quality control of genotyping is
similar to the one done in this study. The GTEx dataset
release contains counts of reference and alternative
alleles of heterozygous SNPs. We extracted from this
dataset counts of reads overlapping reference and alternative
alleles of heterozygous SNPs from the left atrial
appendage tissue and from the left ventricular tissue. After
filtering the available SNPs using the same minimum
number of overlapping reads and both mappability and
read counts, we applied the same filters of minimum
read numbers and mappability criteria, and then called
ASE with the edgeR algorithm based on individual allele
counts. For those SNPs available for ASE analysis in
both our LA tissue and the GTEx left atrial appendage
tissue, we compared the number of SNPs with ASE in
both datasets with the number of ASE in either dataset.
This was contrasted with the same statistic after 10,000
random permutations of the eligible SNPs. The same
analysis was performed for SNPs in our LV tissue set
and the GTEx LV tissue.
The mean age of patients who had LA sampling (n = 62)
was 58 years and 44% were female. Following the
surgery, 21 (34%) patients had poAF, defined as any atrial
fibrillation diagnosed by clinician during the initial
postoperative hospitalization. The patients with poAF were
older (65 versus 56 years, p = 0.006) and had a higher
rate of hypertension (81% versus 41%, p = 0.01), but the
groups were otherwise highly similar. There was no
difference in the past history of myocardial infarction,
diabetes, or previous history of atrial fibrillation. Similarly
the rates of preoperative use of
renin-angiotensinaldosterone inhibitors, beta-blockers, calcium channel
blockers, and statins were similar between the two
groups. Furthermore, although LA volume was enlarged
(mean LA volume 71 mL/m2), there was no difference in
LA volume between the two groups. Similarly,
cardiopulmonary bypass and aortic cross-clamp time was
comparable between the two groups (data not shown).
The mean age of patients who had paired LV sampling
(n = 76) was 74 years and 42% were female. The vast
majority (86%) of patients had LV ejection fraction of more
than 50%. A total of 43 patients (56%) had concomitant
coronary bypass surgery alongside the aortic valve
replacement. The median ischemic interval created by
aortic cross-clamping was 90 min (range 48–284).
Filtering of RNA reads and SNPs and selection of ASE
After removal of low-quality reads, trimming of adaptor
sequences, and alignment to the human genome, a total
of 421,780,889 reads from the LA samples overlapped a
heterozygous SNP in at least one individual. Of those,
9,451,851 reads were not uniquely mapped to the human
genome and were removed, leaving 412,329,038 reads
for analysis. Similarly, from the LV samples a total of
1,879,293,644 reads overlapped a heterozygous SNP in
at least one individual. Of those, 158,775,428 reads were
not uniquely mapped, leaving 1,720,518,216 reads for
After filtering SNPs with an inadequate number of
overlapping reads and those failing Hardy–Weinberg
equilibrium, along with cross-checking for minimum
genotyping rate and mappability criteria, a total of
112,020 and 214,626 SNPs were available from the LA
and LV samples, respectively (Additional file 1: Table
S1). There was a substantial improvement in reference
genome bias following SNP filtering (Additional file 1:
There was a high agreement of ASE calling between
binomial test and ASE called by both the edgeR and
limma algorithms. However, there was a substantial
variability in the absolute number of SNPs with significant
ASE (Additional file 1: Figure S2). The edgeR ASE
calling using individual sampling, which was used for
further analysis, was sufficiently conservative in both the
LA and LV samples and the QQ curves had a similar
shape for both tissues (Additional file 1: Figure S3).
From the SNPs with ASE based on this algorithm
(FDRadjusted p < 0.05), a higher number of SNPs had REF/
ALT + REF ratio greater than 0.5 than a ratio less than
0.5 in both samples (3383 (59%) versus 2305 (41%), p <
0.001 for the LA samples, and 5539 (64%) versus 3103
(36%), p < 0.001 for the LV samples) (Additional file 1:
Figure S4). This indicates some residual reference
genome bias in the SNPs with ASE called. We
furthermore observed that this bias was more prominent
among SNPs with fewer covering reads (Additional file 1:
SNPs and genes with ASE in LA and LV
In the LA samples, there were 12,768 SNPs with
significant ASE at p < 0.05 (Additional file 2). Of those, 5688
had an FDR-adjusted p value < 0.05 (Fig. 1a). Table 1
shows the ten SNPs with the most significant ASE and
Fig. 1a shows the genomic distribution of all SNPs with
ASE in the LA. From 55,984 SNPs available for analysis
in our LA samples and the genome-wide LA appendage
samples from GTEx, 1356 had evidence of ASE in both
sample sets (p < 0.0001 by permutation, Additional file 1:
Figure S4). Similarly, of 24,830 SNPs available for
analysis in our LA samples and the exome-sequencing LA
appendage samples, 688 had evidence of ASE in both
sample sets (p < 0.0001 by permutation, Additional file 1:
In the LV samples, there were 13,009 SNPs with
significant ASE at p < 0.05 (Additional file 3). Of those,
3774 had an FDR-adjusted p value < 0.05 (Fig. 1b).
Table 2 shows the ten most significant SNPs and Fig. 1b
shows the genomic distribution of all SNPs with ASE in
the LV. From 45,496 SNPs available for analysis in our
LV samples and the genome-wide LV samples from
GTEx, 1358 had evidence of ASE in both sample sets (p
< 0.0001 by permutation, Additional file 1: Figure S8).
Similarly, of 24,830 SNPs available for analysis in our LV
samples and the exome-sequencing LV samples, 531 had
evidence of ASE in both sample sets (p < 0.0001 by
permutation, Additional file 1: Figure S9).
SNPs and genes with ASE in both LA and LV and differential
ASE in LA compared to LV
A total of 79,538 SNPs were available for ASE calling in
both the LA and LV samples and 6157 had evidence of
ASE (at FDR-adjusted p < 0.05) in at least one tissue
(Fig. 2a). Of those, 2078 had evidence of ASE in both
LA and LV, a significantly increased number compared
to random permutations of eligible SNPs (p < 0.0001
Fig. 2b). Out of the SNPs with ASE in either tissue, a
significantly higher ratio of SNPs had the same direction
of REF/ALT + REF ratio (either >0.5 or <0.5 in both
tissue types) in the group of SNPs with ASE in both LA
and LV, compared to both the group with ASE in LA
and not LV (88% versus 72%, p < 0.0001) and the group
with ASE in LV and not LA (88% versus 81%,
p < 0.0001). Table 3 shows the top ten SNPs with ASE in
both LA and LV.
We also tested for differential ASE in the LA
compared to the baseline LV samples. This identified a total
of 215 SNPs with a significantly different ASE between
Fig. 1 A Manhattan plot showing the distribution of SNPs with ASE in (a) left atrium (LA) and (b) left ventricle (LV). Horizontal lines indicate p = 0.05
(solid red) and a Bonferroni-adjusted p value of 0.05/n where n is the number of SNPs tested in each tissue (blue)
the two chambers at FDR p value of < 0.05. Table 4
shows the top ten SNPs with differential ASE in the LA
compared to the baseline LV sample.
Genes with differential ASE in patients with AF and
Finally, we assessed the utility of using ASE to discover
novel genetic variants associated with clinical
phenotypes by testing for differential allelic expression within
groups of patients or paired samples. From the LA
samples, the expression of each allele of available
heterozygous SNPs was compared between the subgroup of
Table 1 SNPs with allele-specific expression in the LA
SNP Gene Chr
patients who had poAF (34%) and those who did not.
This identified 24 SNPs with differential expression of
the REF and ALT alleles between the patients who had
poAF compared to those who did not, at an
FDRadjusted p value of < 0.05 (Table 5). Of those, three SNPs
had more than one heterozygous individual in both the
poAF and control groups, minimizing the effects of a
potential genotyping error on ASE calling. The SNP that
had the most difference in ASE between the patients
with poAF compared to those without was within the
GSN gene. Two SNPs with significantly different ASE
and more than one heterozygous individual in both
Table 2 SNPs with allele-specific expression in the LV
SNP Gene Chr
rs76148815 MTR 1
groups, were within the TNS1, LITAF, and CLDN18
genes (Table 5). Analysis of the functional category of
the genes with differential ASE in poAF revealed
enrichment within categories involved in cardiac structure,
such as cardiac muscle tissue development, the
sarcomere, and the myofibril (Table 6, Additional file 1:
From the LV samples, differential expression of
reference and alternative alleles was compared between the
post-ischemia sample and baseline sample in a paired
manner, searching for changes in ASE as a response to
ischemia. This identified three SNPs with differential
expression of reference and alternative alleles between
post-ischemia and baseline sample at an FDR-adjusted p
value of < 0.05 (Table 7). The SNP with the most
significant change in ASE after ischemia was within the ZHX2
gene, followed by CUL4A (Table 7). Analysis of the
functional category of the genes with differential ASE
between the baseline and post-ischemic sample revealed
enrichment within categories involved in the ubiquitin
ligase complex and the regulation of transcription in
response to stress (Table 8, Additional file 1: Figure S11).
Here we analyzed ASE utilizing a unique dataset of
high-throughput RNA-seq of a large number of samples
obtained from the living human LA and LV. This
eliminates artifacts induced by sampling cadaveric donors
with varying degrees of warm ischemia . We identified
the existence of a substantial amount of ASE in both the
LA and the LV of the heart. While a significant portion
of the ASE is shared between the two chambers, we
Fig. 2 a A Venn diagram showing the number of SNPs with ASE (called at p < 0.05) in both LA and LV and those with ASE in one tissue but not
the other. b The number of SNPs with ASE in both LA and LV (black bar, 2078) is greater than the distribution of 10,000 random draws from the
SNPs available for ASE analysis (gray histogram)
Table 3 SNPs with allele-specific expression in both LA and LV
rs61756583 OGDH 7 44,747,514 4 0.65 2 × 10–10 6 0.80 2 × 10–68 3 × 10–9
The table shows the ten SNPs with the most significantly different ASE in the LA compared to LV samples (baseline pre-ischemia). Shown is the gene the SNP is
located within, the genomic location, number of heterozygous samples for both LA and LV, the REF/ALT + REF ratio, the p value for ASE calling, and finally the
FDR-adjusted p value for the test of differential ASE between LA and LV tissues
rs7988661 LMO7 13 76,427,347 11 0.72 1 × 10–72 30 0.74 2 × 10–70
The table shows ten of the SNPs with significant ASE in both LA and LV, sorted by the sum of ranked p values for ASE calling for both LA and LV. Shown is the
SNP rsID, the gene it is located within, the genomic location, number of heterozygous samples for both LA and LV available for ASE calling, the REF/ALT + REF
ratio, and the FDR-adjusted p value for ASE for LA and LV tissues
observed LA and LV-specific patterns of ASE.
Furthermore, we show that the estimation of differential ASE
can be used to identify genes potentially involved in
pathogenesis of poAF and myocardial ischemia.
Quantifying global gene expression by
highthroughput sequencing is traditionally done by
alignment of short RNA sequence reads isolated to the tissue
of interest to a previously known reference genome
sequence, and subsequently to a library of known mRNA
sequences and quantification of the amount of each
mRNA molecule . This will, however, combine the
expression of both alleles, making assessment of ASE
impossible. In addition to the limitations and
methodological concerns involved in expression quantification,
several specific issues must be considered when
assessing ASE with high-throughput RNA-seq data . SNP
genotyping accuracy is of key importance, since
incorrect genotyping of a homozygous SNP as
heterozygous will lead to incorrect calling of ASE. We performed
SNP genotyping using a separate assay based on DNA
from peripheral blood to minimize genotyping error. A
major source of bias and false positives is due to
reference sequence/allele mapping bias. This is a bias
introduced by the preferential alignment of RNA reads to the
genome if the SNP included in the read is the reference
allele, i.e. the allele included in the reference sequence
used for aligning . It is extremely important to
quantify reference sequence bias and attempt to minimize it
to reduce false positives. In this project, reference
genome bias was minimized by utilizing only samples with
longer read lengths and by aggressive filtering of SNPs
and reads eligible for analysis. After ASE calling, some
degree of reference genome bias still existed, especially
for variants with fewer covering reads. Our results
Table 4 SNPs with differential allele-specific expression in LA compared to LV
Samples REF/ALT + REF FDR-adj.
43 0.96 4 × 10–99
Samples REF/ALT + REF FDR-adj.
Differential ASE LA vs. LV
FDR-adj. p value
Table 5 SNPs with differential allele-specific expression in the LA for patients with poAF
rs4280262 LITAF 16 11,647,492 1801 7 0.45 10 0.57 0.04
The table shows the SNPs with different ASE in the LA in patients who had poAF compared with those who did not. Shown is the gene the SNP is located within,
the genomic location, number of reads overlapping the SNP, number of heterozygous LA samples for both patient groups, the REF/ALT + REF ratio, and the
FDRadjusted p value for differential ASE between the two patient groups
therefore need to be cautiously interpreted, since some
SNPs with ASE are likely false positives from remaining
reference genome bias. Further filtering of reads and
variants to eliminate all reference genome bias is challenging,
as further filtering of reads can substantially reduce the
power of the analysis and other bias can be introduced by
methods such as masking the reference genome .
Finally, the usage of simple statistical methods that are
not specific to gene expression analysis to detect ASE,
such as binomial test, can increase false positives and
negatives since they fail to consider both inter-sample
and inter-variant variability in gene expression and its
measurement during differential expression calling. This
was avoided by using the edgeR algorithm to call ASE.
This utilizes the negative binomial algorithm, and
estimates both the sample and marker-specific dispersion to
generate variability estimates for each individual marker
before calling differential expression.
After calling ASE on the filtered datasets using edgeR,
we found that 3404 (5.1%) of SNPs eligible for ASE
calling in the LA and 8642 (4.0%) of SNPs eligible for
ASE calling in the LV had evidence of ASE at a FDR p
value of <0.05. The genes with the strongest evidence of
ASE in the LA were structural genes involving muscle
proteins, the cardiac contraction chain (TNNT2, TTN),
collagen (COL4A2), and cardiac energy homeostasis and
cardiac fibrosis (TIMP3) [17, 18]. Interestingly, the SNP
showing the greatest ASE in the LV sample was within
the methyltransferase (MTR) gene involved in folate
metabolism. Variants in this gene have been associated with
both congenital heart disease, but only when the
associated variant is inherited from the mother . This
suggests either global or tissue-specific imprinting of this
gene, i.e. expression of each allele dependent on parental
origin. Depending on the parental origin of the reference
and alternative allele, this could potentially generate an
ASE pattern observed within our assay, but we are
limited by the lack of parental genotyping to explore this
further. Also of interest is the plakophilin 2 gene (PKP2),
associated with right ventricular cardiomyopathy 
Cardiac muscle tissue development
Contractile fiber part
Cardiac ventricle morphogenesis
Cardiac ventricle development
Cardiac chamber morphogenesis
Muscle structure development
Striated muscle tissue development
Cardiac chamber development
Muscle tissue development
Ventricular cardiac muscle tissue
Ventricular cardiac muscle tissue
Muscle organ development
Regulation of ATPase activity
Table 6 Functional enrichment analysis of genes with differential
allele-specific expression in poAF
Table 8 Functional enrichment analysis of genes with differential
allele-specific expression in cardiac ischemia
FDR p value
Ubiquitin ligase complex
Cullin-RING ubiquitin ligase complex
Cul4-RING ubiquitin ligase complex
Regulation of transcription from RNA
polymerase II promoter in response to stress
Ubiquitin protein ligase binding
Negative regulation of cell cycle
Regulation of DNA-templated transcription
in response to stress
The table shows the results of functional enrichment analysis of the genes
with differential ASE in response to cardiac ischemia by the GeneMANIA
A second aspect of our analysis was to identify SNPs/
genes with differential ASE in disease states. Comparing
differential ASE in the LA of individuals between those
who had poAF compared to those who did not, we
identified multiple interesting genes, three of which have
previously been associated with AF. The gene with the
most significantly different allele-specific expression was
gelsolin (GSN), a protein that participates in
cytoskeleton maintenance, modulating calcium-channels. A GSN
mouse knockout has been found to have a short PQ
interval and a prolonged QRS and QT interval, and
importantly, an increased susceptibility to AF . The
collagen 3A1 gene (COL1A1) is a known target gene of the
microRNA miR29b. The atrial expression of both
miR29b and COL1A1 has been found to be changed
following ventricular tachypacing in a canine model,
indicating that the gene might play a role in atrial
remodeling following increased myocardial oxygen
demand . Additionally, a frameshift mutation in the
lamin A/C gene (LMNA) has been described in a family
with early-onset AF and sudden cardiac death . The
paucity of genes with prior association with AF in the
list of genes with differential ASE in patients with poAF
shows the power of utilizing ASE analysis to identify
novel genes associated with cardiac disease that cannot
be identified with analysis of the DNA sequence or
overall mRNA expression.
The table shows the results of functional enrichment analysis of the genes
with differential ASE in poAF by the GeneMANIA algorithm
and multiple genes involved in mitochondrial energy
metabolism (MTCH1, OXA1L, MUL1).
From the subset of SNPs eligible for ASE calling in
both LA and LV, we found that 2078 of the SNPs had
ASE in both tissues (at unadjusted p < 0.05), a
significantly higher number than would be expected at random
(Fig. 2b, p < 0.0001). This is in line with results from the
GTEx study, where tissues with similar origin had a
higher number of shared sites with ASE . Especially
interesting are those sites where the ASE pattern differs
between LA and LV. The strongest difference was in the
aconitase gene (ACO2), which is a part of the citric acid
cycle and is essential for maintenance of mitochondrial
DNA . It is plausible that the control of metabolism
fulfilling the differential energy needs of the two
chambers is mediated via ASE.
Table 7 SNPs with differential allele-specific expression in cardiac ischemia
The table shows the SNPs with differential ASE in the post-bypass sample compared with the baseline sample of the same individual, assessing the ASE response
to cardiac ischemia. Shown is the gene each SNP is located within, the genomic location, number of reads overlapping the SNP, number of heterozygous individuals
(each providing pre- and post-bypass sample), the REF/ALT + REF ratio, and the FDR-adjusted p value for differential ASE between the post-bypass compared
to the pre-bypass
Of the two genes with differential ASE and multiple
individuals in both the poAF and control group, the
tensin 1 (TNS1) gene codes for a protein with cytoskeleton
contact roles and contains a motif that mediates signal
transduction . Variants in this gene have been
associated with mitral valve prolapse , which was the
surgical indication for a substantial number of our patients.
Thus, the ASE could contribute to either AF
pathogenesis or the response to mitral valve prolapse. The
functional enrichment analysis of genes with differential ASE
in poAF indicated an enrichment of genes involved in
cardiac structure, that supports the importance of
structural remodeling in the pathogenesis of AF .
Similarly, the comparison of ASE in paired samples
before and after ischemic injury via cardioplegic arrest
identified three genes of interest. The zinc finger and
homeodomain protein 2 (ZHX2) has been identified as a
modulator of plasma lipids  and it is plausible that
regulation or dysregulation of lipid metabolism is a
response to hypoxemia. Interestingly the overexpression of
the ubiquitin E3 ligase gene (CUL4A), found to have
differential ASE in our ischemia model, has been found to
suppress apoptosis and DNA damage in experimental
pheochromocytoma model of ischemia-reperfusion .
This gene has not previously been associated with
response to hypoxia in cardiac cells and is an interesting
target for further studies. Functional enrichment analysis
highlighted the ubiquitin ligase complex pathway, but
other members of this pathway, including CHIP and
MDM2, have been associated with the regulation of
apoptosis in ischemia-reperfusion injury . It is likely
that expression differences affecting this pathway are
involved in the immediate response to ischemia and they
serve as interesting targets for follow-up studies.
Furthermore, our analysis is limited by a relatively short
time and mild ischemic insult between the baseline and
post-ischemic sample. More profound or prolonged
ischemia might yield allele-specific expression changes
within other pathways of interest or a more profound
Our results demonstrate that patterns of tissue-specific
ASE exist in the human heart, some shared between the
LA and LV, as well as chamber-specific ASE.
Furthermore, we have shown the utility of ASE to highlight
novel genes involved in disease states. With increasing
availability of RNA-seq datasets with a high number of
RNA reads of sufficient length and progress in the
methodology of ASE analysis, this opens novel and exciting
areas to advance the understanding of the genetic
background and molecular mechanisms behind the
pathogenesis of common and complex diseases, including
those in the human heart.
Additional file 1: Supplementary Table S1, Supplementary Figures S1–S11.
Assessment of allele-specific expression, quality control, comparison of
allele-specific expression calling algorithms, comparison of allele-specific
expression between datasets, and results from functional enrichment
analysis. (PDF 1091 kb)
Additional file 2: List of SNPs with significant allele-specific expression
in LA. (TXT 1290 kb)
Additional file 3: List of SNPs with significant allele-specific expression
in LV. (TXT 1941 kb)
We acknowledge the contribution to sample collection by the perioperative
genomics center and its staff: James Gosnell, RN; Kujtim Bodinaku, MD;
Svetlana Gorbatov, MPH.
Availability of data and material
Data from the LA/LV datasets analyzed during the current study are available
from the corresponding author upon reasonable request and will be
available on dbGAP in near future (accession number not yet available). The
GTEx project data that support the findings of this study are available from
GTEx consortium but availability restrictions apply. They were used under
license for the current study.
MIS: study design, data analysis, manuscript writing, and revision. LS: data
analysis, critical revision of manuscript, MH: data analysis, critical revision of
manuscript, TWX: data analysis, critical revision of manuscript, PS: obtained
tissue from patients, critical revision of manuscript, SA: obtained tissue from
patients, critical revision of manuscript, GSC: obtained tissue from patients,
critical revision of manuscript, SKS: study conception, critical revision of the
manuscript, JGS: study conception, critical revision of the manuscript, SCB:
study design and conception, data analysis, critical revision of manuscript,
JDM: study design and conception, data analysis, critical revision of
manuscript. All authors have reviewed and approve of the final version of
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
The study was approved by the Partners HealthCare Institutional Review
Board and all patients provided written informed consent. The study
complies with the principles of the Declaration of Helsinki.
1. Oshlack A , Robinson MD , Young MD . From RNA-seq reads to differential expression results . Genome Biol . 2010 ; 11 : 220 .
2. Conesa A , Madrigal P , Tarazona S , Gomez-Cabrero D , Cervera A , McPherson A , et al. A survey of best practices for RNA-seq data analysis . Genome Biol . 2016 ; 17 : 13 .
3. Haubner BJ , Adamowicz-Brice M , Khadayate S , Tiefenthaler V , Metzler B , Aitman T , et al. Complete cardiac regeneration in a mouse model of myocardial infarction . Aging (Albany NY) . 2012 ; 4 : 966 - 77 .
4. Ounzain S , Micheletti R , Beckmann T , Schroen B , Alexanian M , Pezzuto I , et al. Genome-wide profiling of the cardiac transcriptome after myocardial infarction identifies novel heart-specific long non-coding RNAs . Eur Heart J . 2015 ; 36 : 353 - 368a .
5. Muehlschlegel JD , Christodoulou DC , McKean D , Gorham J , Mazaika E , Heydarpour M , et al. Using next-generation RNA sequencing to examine ischemic changes induced by cold blood cardioplegia on the human left ventricular myocardium transcriptome . Anesthesiology . 2015 ; 122 : 537 - 50 .
6. Pastinen T. Genome-wide allele-specific analysis: insights into regulatory variation . Nat Rev Genet . 2010 ; 11 : 533 - 8 .
7. Pickrell JK , Marioni JC , Pai AA , Degner JF , Engelhardt BE , Nkadori E , et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing . Nature . 2010 ; 464 : 768 - 72 .
8. GTEx Consortium . Human genomics . The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans . Science . 2015 ; 348 : 648 - 60 .
9. Kim D , Pertea G , Trapnell C , Pimentel H , Kelley R , Salzberg SL . TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions . Genome Biol . 2013 ; 14 :R36.
10. Li H , Handsaker B , Wysoker A , Fennell T , Ruan J , Homer N , et al. The Sequence Alignment/Map format and SAMtools . Bioinformatics . 2009 ; 25 : 2078 - 9 .
11. Castel SE , Levy-Moonshine A , Mohammadi P , Banks E , Lappalainen T. Tools and best practices for data processing in allelic expression analysis . Genome Biol . 2015 ; 16 : 195 .
12. Robinson MD , McCarthy DJ , Smyth GK . edgeR: a Bioconductor package for differential expression analysis of digital gene expression data . Bioinformatics . 2010 ; 26 : 139 - 40 .
13. Ritchie ME , Phipson B , Wu D , Hu Y , Law CW , Shi W , et al. limma powers differential expression analyses for RNA-sequencing and microarray studies . Nucleic Acids Res . 2015 ; 43 :e47.
14. Warde-Farley D , Donaldson SL , Comes O , Zuberi K , Badrawi R , Chao P , et al. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function . Nucleic Acids Res . 2010 ; 38 : W214 - 20 .
15. Shannon P , Markiel A , Ozier O , Baliga NS , Wang JT , Ramage D , et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks . Genome Res . 2003 ; 13 : 2498 - 504 .
16. Degner JF , Marioni JC , Pai AA , Pickrell JK , Nkadori E , Gilad Y , et al. Effect of read-mapping biases on detecting allele-specific expression from RNAsequencing data . Bioinformatics . 2009 ; 25 : 3207 - 12 .
17. Stohr R , Kappel BA , Carnevale D , Cavalera M , Mavilio M , Arisi I , et al. TIMP3 interplays with apelin to regulate cardiovascular metabolism in hypercholesterolemic mice . Mol Metab . 2015 ; 4 : 741 - 52 .
18. Fan D , Takawale A , Basu R , Patel V , Lee J , Kandalam V , et al. Differential role of TIMP2 and TIMP3 in cardiac hypertrophy, fibrosis, and diastolic dysfunction . Cardiovasc Res . 2014 ; 103 : 268 - 80 .
19. Goldmuntz E , Woyciechowski S , Renstrom D , Lupo PJ , Mitchell LE. Variants of folate metabolism genes and the risk of conotruncal cardiac defects . Circ Cardiovasc Genet . 2008 ; 1 : 126 - 32 .
20. Gerull B , Heuser A , Wichter T , Paul M , Basson CT , McDermott DA , et al. Mutations in the desmosomal protein plakophilin-2 are common in arrhythmogenic right ventricular cardiomyopathy . Nat Genet . 2004 ; 36 : 1162 - 4 .
21. Chen XJ , Wang X , Kaufman BA , Butow RA . Aconitase couples metabolic regulation to mitochondrial DNA maintenance . Science . 2005 ; 307 : 714 - 7 .
22. Schrickel JW , Fink K , Meyer R , Grohe C , Stoeckigt F , Tiemann K , et al. Lack of gelsolin promotes perpetuation of atrial fibrillation in the mouse heart . J Interv Card Electrophysiol . 2009 ; 26 : 3 - 10 .
23. Dawson K , Wakili R , Ordog B , Clauss S , Chen Y , Iwasaki Y , et al. MicroRNA29: a mechanistic contributor and potential biomarker in atrial fibrillation . Circulation . 2013 ; 127 : 1466 - 75 .
24. Pan H , Richards AA , Zhu X , Joglar JA , Yin HL , Garg V. A novel mutation in LAMIN A/C is associated with isolated early-onset atrial fibrillation and progressive atrioventricular block followed by cardiomyopathy and sudden cardiac death . Heart Rhythm . 2009 ; 6 : 707 - 10 .
25. Chen H , Duncan IC , Bozorgchami H , Lo SH. Tensin1 and a previously undocumented family member, tensin2, positively regulate cell migration . Proc Natl Acad Sci U S A . 2002 ; 99 : 733 - 8 .
26. Dina C , Bouatia-Naji N , Tucker N , Delling FN , Toomer K , Durst R , et al. Genetic association analyses highlight biological pathways underlying mitral valve prolapse . Nat Genet . 2015 ; 47 : 1206 - 11 .
27. Iwasaki YK , Nishida K , Kato T , Nattel S. Atrial fibrillation pathophysiology: implications for management . Circulation . 2011 ; 124 : 2264 - 74 .
28. Gargalovic PS , Erbilgin A , Kohannim O , Pagnon J , Wang X , Castellani L , et al. Quantitative trait locus mapping and identification of Zhx2 as a novel regulator of plasma lipid metabolism . Circ Cardiovasc Genet . 2010 ; 3 : 60 - 7 .
29. Tan C , Zhang LY , Chen H , Xiao L , Liu XP , Zhang JX . Overexpression of the human ubiquitin E3 ligase CUL4A alleviates hypoxia-reoxygenation injury in pheochromocytoma (PC12) cells . Biochem Biophys Res Commun . 2011 ; 416 : 403 - 8 .
30. Willis MS , Schisler JC , Patterson C. Appetite for destruction: E3 ubiquitinligase protection in cardiac disease . Future Cardiol . 2008 ; 4 : 65 - 75 .