Transcriptome-Wide Identification of Novel Imprinted Genes in Neonatal Mouse Brain
et al. (2008) Transcriptome-Wide Identification of Novel Imprinted Genes in Neonatal Mouse
Brain. PLoS ONE 3(12): e3839. doi:10.1371/journal.pone.0003839
Transcriptome-Wide Identification of Novel Imprinted Genes in Neonatal Mouse Brain
Xu Wang 0
Qi Sun 0
Sean D. McGrath 0
Elaine R. Mardis 0
Paul D. Soloway 0
Andrew G. Clark 0
Anne C. Ferguson-Smith, University of Cambridge, United Kingdom
0 1 Department of Molecular Biology & Genetics, Cornell University , Ithaca , New York, United States of America, 2 Computational Biology Service Unit, Life Sciences Core Laboratories Center, Cornell University , Ithaca , New York, United States of America, 3 The Genome Center at Washington University, Washington University School of Medicine , St. Louis , Missouri, United States of America, 4 Division of Nutritional Sciences, College of Agriculture and Life Sciences, Cornell University , Ithaca, New York , United States of America
Imprinted genes display differential allelic expression in a manner that depends on the sex of the transmitting parent. The degree of imprinting is often tissue-specific and/or developmental stage-specific, and may be altered in some diseases including cancer. Here we applied Illumina/Solexa sequencing of the transcriptomes of reciprocal F1 mouse neonatal brains and identified 26 genes with parent-of-origin dependent differential allelic expression. Allele-specific Pyrosequencing verified 17 of them, including three novel imprinted genes. The known and novel imprinted genes all are found in proximity to previously reported differentially methylated regions (DMRs). Ten genes known to be imprinted in placenta had sufficient expression levels to attain a read depth that provided statistical power to detect imprinting, and yet all were consistent with non-imprinting in our transcript count data for neonatal brain. Three closely linked and reciprocally imprinted gene pairs were also discovered, and their pattern of expression suggests transcriptional interference. Despite the coverage of more than 5000 genes, this scan only identified three novel imprinted refseq genes in neonatal brain, suggesting that this tissue is nearly exhaustively characterized. This approach has the potential to yield an complete catalog of imprinted genes after application to multiple tissues and developmental stages, shedding light on the mechanism, bioinformatic prediction, and evolution of imprinted genes and diseases associated with genomic imprinting.
-
To date, 98 genes have been shown to undergo genomic
imprinting in mouse, and 56 genes are imprinted in humans, with
an overlapping set of 38 genes imprinted in both species [1]. For
neither species is the list of imprinted genes complete.
Genomewide bioinformatic predictions face the challenge of a high false
positive rate, mostly because the training set of known imprinted
genes is small, and we do not know all the signals driving
tissueand time-specificity of imprinting [2,3]. Attempts at exhaustive
scans for imprinted genes in humans have encountered several
drawbacks, including the challenge of using the most appropriate
tissue and developmental stage, a problem exacerbated by reliance
on lymphoblastoid cell lines (LCLs) [4]. Many imprinted genes
show tissue- and developmental stage-specific expression, and
many are expressed and imprinted only in specific stages of brain
development. Human studies also face the challenge of a low
number of informative heterozygous SNPs, so that allele-specific
assays are useful for only a small subset of individuals. Hence,
pedigree information is needed to distinguish genomic imprinting
from stochastic monoallelic expression [5,6]. These factors greatly
amplify the effort and cost needed for a transcriptome-wide scan
for imprinted genes in humans. By contrast, large-scale mouse
studies have used uniparental disomy [712] to detect
parent-oforigin effects. While this approach has led to the discovery of many
imprinted genes, and to the refinement of phenotypic analysis of
the consequences of disruptions in imprinting, not all genomic
regions are covered by uniparental disomies, and there is a risk
that such aberrant genome configurations may distort expression
patterns. Microarray-based approaches using allele-specific probes
can only detect nearly all-or-none imprinting with confidence,
because quantitative differences between maternal vs. paternal
allelic expression have high error due to the cross hybridization of
the perfect-match and mismatch probes [13,14]. In fact, genomic
imprinting may occur as a continuum from complete uniparental
expression to a slight but significant bias in the parental allele that
is expressed, and a technology that could reliably detect
quantitative differences in allele-specific expression at a
transcriptome scale would greatly accelerate imprinting research.
Illumina sequencing results and SNP coverage
Short-read sequencing (e.g. Illumina/Solexa sequencing) of
transcripts provides many advantages in imprinting studies by
providing a large number of sequence tags that allow simple
counting of transcripts encoded by the two transmitted parental
alleles. In this study, we performed quantitative assessments of
genomic imprinting in transcripts from reciprocal cross progeny of
the AKR/J and PWD/PhJ mouse strains. Total RNA was
extracted from postnatal day 2 (P2) F1 female mouse whole
brains. One run of Illumina sequencing was done for each F1
female brain cDNA sample. We obtained 1072.63 Mbp of
sequence data from the PWD x AKR cross (listing female strain
first) and 1136.35 Mbp from AKR x PWD in 32 bp reads with
high quality (Figure S1.1). On average, 27.74% of the reads were
aligned to the NCBI RefSeq mouse genome database. Sequence
heterogeneity between alleles was great enough to produce poor
performance by ELAND in mapping reads to the genome, so this
mapping was performed with the NCBI BLAST program (Table
S1.1). Altogether, 33,519,739 and 35,510,887 reads were aligned
to the RefSeq database in the respective reciprocal crosses. The
sequences covered 15,491 RefSeq genes with at least one perfectly
matching Illumina read in each of the two reciprocal crosses.
Within these genes, we identified 814,360 and 884,828 reads
spanning Perlegen SNPs for the two respective reciprocal crosses
[15]. After quality control filtering (Table S1.2), 320,804 and
327,451 high quality SNP-containing reads remained, allowing
identification of parent-of-origin of each read (see Methods for
more details). 5,533 RefSeq genes (5,076 unique Entrez genes)
were covered in our study with a total SNP count of four or more
in both reciprocal crosses (Table S1.3). From the mouse Brain
EST Database, among the 5,500 cDNA clones of
polyAcontaining 39-end EST sequences in P4 cerebellum, 3,500 are
distinct species [16]. This contrasts with a recent SAGE study of
P30 mouse brain, where the number of matched GenBank
transcripts with copy number five or more per cell was 4,161 [17],
but those (...truncated)