Transcriptome-Wide Identification of Novel Imprinted Genes in Neonatal Mouse Brain (pdf)

Article PDF cannot be displayed. You can download it here:

http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0003839&type=printable

Transcriptome-Wide Identification of Novel Imprinted Genes in Neonatal Mouse Brain

et al. (2008) Transcriptome-Wide Identification of Novel Imprinted Genes in Neonatal Mouse Brain. PLoS ONE 3(12): e3839. doi:10.1371/journal.pone.0003839 Transcriptome-Wide Identification of Novel Imprinted Genes in Neonatal Mouse Brain Xu Wang 0 Qi Sun 0 Sean D. McGrath 0 Elaine R. Mardis 0 Paul D. Soloway 0 Andrew G. Clark 0 Anne C. Ferguson-Smith, University of Cambridge, United Kingdom 0 1 Department of Molecular Biology & Genetics, Cornell University , Ithaca , New York, United States of America, 2 Computational Biology Service Unit, Life Sciences Core Laboratories Center, Cornell University , Ithaca , New York, United States of America, 3 The Genome Center at Washington University, Washington University School of Medicine , St. Louis , Missouri, United States of America, 4 Division of Nutritional Sciences, College of Agriculture and Life Sciences, Cornell University , Ithaca, New York , United States of America Imprinted genes display differential allelic expression in a manner that depends on the sex of the transmitting parent. The degree of imprinting is often tissue-specific and/or developmental stage-specific, and may be altered in some diseases including cancer. Here we applied Illumina/Solexa sequencing of the transcriptomes of reciprocal F1 mouse neonatal brains and identified 26 genes with parent-of-origin dependent differential allelic expression. Allele-specific Pyrosequencing verified 17 of them, including three novel imprinted genes. The known and novel imprinted genes all are found in proximity to previously reported differentially methylated regions (DMRs). Ten genes known to be imprinted in placenta had sufficient expression levels to attain a read depth that provided statistical power to detect imprinting, and yet all were consistent with non-imprinting in our transcript count data for neonatal brain. Three closely linked and reciprocally imprinted gene pairs were also discovered, and their pattern of expression suggests transcriptional interference. Despite the coverage of more than 5000 genes, this scan only identified three novel imprinted refseq genes in neonatal brain, suggesting that this tissue is nearly exhaustively characterized. This approach has the potential to yield an complete catalog of imprinted genes after application to multiple tissues and developmental stages, shedding light on the mechanism, bioinformatic prediction, and evolution of imprinted genes and diseases associated with genomic imprinting. - To date, 98 genes have been shown to undergo genomic imprinting in mouse, and 56 genes are imprinted in humans, with an overlapping set of 38 genes imprinted in both species [1]. For neither species is the list of imprinted genes complete. Genomewide bioinformatic predictions face the challenge of a high false positive rate, mostly because the training set of known imprinted genes is small, and we do not know all the signals driving tissueand time-specificity of imprinting [2,3]. Attempts at exhaustive scans for imprinted genes in humans have encountered several drawbacks, including the challenge of using the most appropriate tissue and developmental stage, a problem exacerbated by reliance on lymphoblastoid cell lines (LCLs) [4]. Many imprinted genes show tissue- and developmental stage-specific expression, and many are expressed and imprinted only in specific stages of brain development. Human studies also face the challenge of a low number of informative heterozygous SNPs, so that allele-specific assays are useful for only a small subset of individuals. Hence, pedigree information is needed to distinguish genomic imprinting from stochastic monoallelic expression [5,6]. These factors greatly amplify the effort and cost needed for a transcriptome-wide scan for imprinted genes in humans. By contrast, large-scale mouse studies have used uniparental disomy [712] to detect parent-oforigin effects. While this approach has led to the discovery of many imprinted genes, and to the refinement of phenotypic analysis of the consequences of disruptions in imprinting, not all genomic regions are covered by uniparental disomies, and there is a risk that such aberrant genome configurations may distort expression patterns. Microarray-based approaches using allele-specific probes can only detect nearly all-or-none imprinting with confidence, because quantitative differences between maternal vs. paternal allelic expression have high error due to the cross hybridization of the perfect-match and mismatch probes [13,14]. In fact, genomic imprinting may occur as a continuum from complete uniparental expression to a slight but significant bias in the parental allele that is expressed, and a technology that could reliably detect quantitative differences in allele-specific expression at a transcriptome scale would greatly accelerate imprinting research. Illumina sequencing results and SNP coverage Short-read sequencing (e.g. Illumina/Solexa sequencing) of transcripts provides many advantages in imprinting studies by providing a large number of sequence tags that allow simple counting of transcripts encoded by the two transmitted parental alleles. In this study, we performed quantitative assessments of genomic imprinting in transcripts from reciprocal cross progeny of the AKR/J and PWD/PhJ mouse strains. Total RNA was extracted from postnatal day 2 (P2) F1 female mouse whole brains. One run of Illumina sequencing was done for each F1 female brain cDNA sample. We obtained 1072.63 Mbp of sequence data from the PWD x AKR cross (listing female strain first) and 1136.35 Mbp from AKR x PWD in 32 bp reads with high quality (Figure S1.1). On average, 27.74% of the reads were aligned to the NCBI RefSeq mouse genome database. Sequence heterogeneity between alleles was great enough to produce poor performance by ELAND in mapping reads to the genome, so this mapping was performed with the NCBI BLAST program (Table S1.1). Altogether, 33,519,739 and 35,510,887 reads were aligned to the RefSeq database in the respective reciprocal crosses. The sequences covered 15,491 RefSeq genes with at least one perfectly matching Illumina read in each of the two reciprocal crosses. Within these genes, we identified 814,360 and 884,828 reads spanning Perlegen SNPs for the two respective reciprocal crosses [15]. After quality control filtering (Table S1.2), 320,804 and 327,451 high quality SNP-containing reads remained, allowing identification of parent-of-origin of each read (see Methods for more details). 5,533 RefSeq genes (5,076 unique Entrez genes) were covered in our study with a total SNP count of four or more in both reciprocal crosses (Table S1.3). From the mouse Brain EST Database, among the 5,500 cDNA clones of polyAcontaining 39-end EST sequences in P4 cerebellum, 3,500 are distinct species [16]. This contrasts with a recent SAGE study of P30 mouse brain, where the number of matched GenBank transcripts with copy number five or more per cell was 4,161 [17], but those (...truncated)