A method to assess compositional bias in biological sequences and its application to prion-like glutamine/asparagine-rich domains in eukaryotic proteomes

A question of size: the eukaryotic proteome and the problems in defining it

We discuss the problems in defining the extent of the proteomes for completely sequenced eukaryotic organisms (i.e. the total number of protein-coding sequences), focusing on yeast, worm, fly and human. (i) Six years after completion of its genome sequence, the true size of the yeast proteome is still not defined. New small genes are still being discovered, and a large number of...

An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions

Analysis of the role of retrotransposition in gene evolution in vertebrates

Background The dynamics of gene evolution are influenced by several genomic processes. One such process is retrotransposition, where an mRNA transcript is reverse-transcribed and reintegrated into the genomic DNA. Results We have surveyed eight vertebrate genomes (human, chimp, dog, cow, rat, mouse, chicken and the puffer-fish T. nigriviridis), for putatively retrotransposed...

Assessing the genomic evidence for conserved transcribed pseudogenes under selection

Background Transcribed pseudogenes are copies of protein-coding genes that have accumulated indicators of coding-sequence decay (such as frameshifts and premature stop codons), but nonetheless remain transcribed. Recent experimental evidence indicates that transcribed pseudogenes may regulate the expression of homologous genes, through antisense interference, or generation of...

Bioinformatical parsing of folding-on-binding proteins reveals their compositional and evolutionary sequence design

Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes

Background Pseudogenes often manifest themselves as disabled copies of known genes. In prokaryotes, it was generally believed (with a few well-known exceptions) that they were rare. Results We have carried out a comprehensive analysis of the occurrence of pseudogenes in a diverse selection of 64 prokaryote genomes. Overall, we find a total of around 7,000 candidate pseudogenes...

Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome

Estimating the occurrence of primary ubiquinone deficiency by analysis of large-scale sequencing data

Evidence for Retrogene Origins of the Prion Gene Family

The evolutionary origin of prion genes, only known to exist in the vertebrate lineage, had remained elusive until recently. Following a lead from interactome investigations of the murine prion protein, our previous bioinformatic analyses revealed the evolutionary descent of prion genes from an ancestral ZIP metal ion transporter. However, the molecular mechanism of evolution...

Evolutionary behaviour of bacterial prion-like proteins

Exhaustive assignment of compositional bias reveals universally prevalent biased regions: analysis of functional associations in human and Drosophila

Background Compositionally biased (CB) regions are stretches in protein sequences made from mainly a distinct subset of amino acid residues; such regions are frequently associated with a structural role in the cell, or with protein disorder. Results We derived a procedure for the exhaustive assignment and classification of CB regions, and have applied it to thirteen metazoan...

Experimentally Verified Parameter Sets for Modelling Heterogeneous Neocortical Pyramidal-Cell Populations

Models of neocortical networks are increasingly including the diversity of excitatory and inhibitory neuronal classes. Significant variability in cellular properties are also seen within a nominal neuronal class and this heterogeneity can be expected to influence the population response and information processing in networks. Recent studies have examined the population and...

Genomic evidence for non-random endemic populations of decaying exons from mammalian genes

Background Functional diversification of genes in mammalian genomes is engendered by a number of processes, e.g., gene duplication and alternative splicing. Gene duplication is classically discussed as leading to neofunctionalization (generation of new functions), subfunctionalization (generation of a varied function), or pseudogenization (loss of the gene and its function...

Identification of pseudogenes in the Drosophila melanogaster genome

Pseudogenes are copies of genes that cannot produce a protein. They can be detected from disruptions to their apparent coding sequence, caused by frameshifts and premature stop codons. They are classed as either processed pseudogenes (made by reverse transcription from an mRNA) or duplicated pseudogenes, arising from duplication in the genomic DNA and subsequent disablement...

Interaction Networks of Prion, Prionogenic and Prion-Like Proteins in Budding Yeast, and Their Role in Gene Regulation

LPS-annotate: complete annotation of compositionally biased regions in the protein knowledgebase

Compositional bias (i.e. a skew in the composition of a biological sequence towards a subset of residue types) can occur at a wide variety of scales, from compositional biases of whole genomes, down to short regions in individual protein and gene–DNA sequences that are compositionally biased (CB regions). Such CB regions are made from a subset of residue types that are strewn...

Large-Scale Evidence for Conservation of NMD Candidature Across Mammals

Background Alternatively-spliced (AS) forms can vary protein function, intracellular localization and post-translational modifications. AS coupled with mRNA nonsense-mediated decay (NMD) can also control the transcript abundance. Here, we have investigated the genome-scale conservation of alternatively-spliced NMD candidates (AS-NMD candidates), in mammals. Methodology/Principal...

Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs

Background The role of long non-coding RNAs (lncRNAs) in controlling gene expression has garnered increased interest in recent years. Sequencing projects, such as Fantom3 for mouse and H-InvDB for human, have generated abundant data on transcribed components of mammalian cells, the majority of which appear not to be protein-coding. However, much of the non-protein-coding...

Origins and Evolution of the HET-s Prion-Forming Protein: Searching for Other Amyloid-Forming Solenoids

The HET-s prion-forming domain from the filamentous fungus Podospora anserina is gaining considerable interest since it yielded the first well-defined atomic structure of a functional amyloid fibril. This structure has been identified as a left-handed beta solenoid with a triangular hydrophobic core. To delineate the origins of the HET-s prion-forming protein and to discover...