Advanced search    

Search: authors:"Victor Kunin"

20 papers found.
Use AND, OR, NOT, +word, -word, "long phrase", (parentheses) to fine-tune your search.

Evolutionary conservation of sequence and secondary structures in CRISPR repeats

Background Clustered regularly interspaced short palindromic repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in approximately 40% of bacterial and most archaeal genomes analyzed to date. More than 40 gene families, called CRISPR-associated sequences (CASs), appear in conjunction with these repeats and...

Clustering the annotation space of proteins

Background Current protein clustering methods rely on either sequence or functional similarities between proteins, thereby limiting inferences to one of these areas. Results Here we report a new approach, named CLAN, which clusters proteins according to both annotation and sequence similarity. This approach is extremely fast, clustering the complete SwissProt database within...

Experimental factors affecting PCR-based estimates of microbial species richness and evenness

AffiliationsDepartment of Energy Joint Genome Institute, Walnut Creek, CA, USAAnna Engelbrektson, Victor Kunin, Natasha Zvenigorodsky, Feng Chen & Philip HugenholtzDepartment of Plant and Microbial Biology, University of ... • PubMed • Google ScholarSearch for Victor Kunin in:Nature Research journals • PubMed • Google ScholarSearch for Kelly C Wrighton in:Nature Research journals • PubMed • Google ScholarSearch for Natasha

GeneTRACE–reconstruction of gene content of ancestral species

Victor Kunin 0 Christos A. Ouzounis 0 0 Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation , Cambridge CB10 1SD , UK While current computational methods

Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes

Background Pseudogenes often manifest themselves as disabled copies of known genes. In prokaryotes, it was generally believed (with a few well-known exceptions) that they were rare. Results We have carried out a comprehensive analysis of the occurrence of pseudogenes in a diverse selection of 64 prokaryote genomes. Overall, we find a total of around 7,000 candidate pseudogenes...

Functional Evolution of the Yeast Protein Interaction Network

Protein interactions are central to most biological processes. We investigated the dynamics of emergence of the protein interaction network of Saccharomyces cerevisiae by mapping origins of proteins on an evolutionary tree. We demonstrate that evolutionary periods are characterized by distinct connectivity levels of the emerging proteins. We found that the most-connected group of...

Protein families and Tribes in genome sequence space

Anton J. Enright 0 Victor Kunin 0 Christos A. Ouzounis 0 0 Computational Genomics Group, The European Bioinformatics Institute , EMBL Cambridge Outstation, Cambridge CB10 1SD, UK Accurate detection

The properties of protein family space depend on experimental design

Motivation: Databases of protein families often exhibit drastically different properties of the protein family space. Results: We compared the properties of protein family space as reflected by exhaustive protein family databases and databases with predefined families. We used TRIBES, Protomap, ProDom and COGs as representatives of the exhaustive databases, and Pfam-A and...

MagicMatch—cross-referencing sequence identifiers across databases

Motivation: At present, mapping of sequence identifiers across databases is a daunting, time-consuming and computationally expensive process, usually achieved by sequence similarity searches with strict threshold values. Summary: We present a rapid and efficient method to map sequence identifiers across databases. The method uses the MD5 checksum algorithm for message integrity...

Measuring genome conservation across taxa: divided strains and united kingdoms

Species evolutionary relationships have traditionally been defined by sequence similarities of phylogenetic marker molecules, recently followed by whole-genome phylogenies based on gene order, average ortholog similarity or gene content. Here, we introduce genome conservation—a novel metric of evolutionary distances between species that simultaneously takes into account, both...

Denoising inferred functional association networks obtained by gene fusion analysis

Background Gene fusion detection – also known as the 'Rosetta Stone' method – involves the identification of fused composite genes in a set of reference genomes, which indicates potential interactions between its un-fused counterpart genes in query genomes. The precision of this method typically improves with an ever-increasing number of reference genomes. Results In order to...

TreeQ-VISTA: an interactive tree visualization tool with functional annotation query capabilities

Summary: We describe a general multiplatform exploratory tool called TreeQ-Vista, designed for presenting functional annotations in a phylogenetic context. Traits, such as phenotypic and genomic properties, are interactively queried from a user-provided relational database with a user-friendly interface which provides a set of tools for users with or without SQL knowledge. The...

Genome Analysis of the Anaerobic Thermohalophilic Bacterium Halothermothrix orenii

Halothermothirx orenii is a strictly anaerobic thermohalophilic bacterium isolated from sediment of a Tunisian salt lake. It belongs to the order Halanaerobiales in the phylum Firmicutes. The complete sequence revealed that the genome consists of one circular chromosome of 2578146 bps encoding 2451 predicted genes. This is the first genome sequence of an organism belonging to the...

Myriads of protein families, and still counting

From the historical record of genome sequencing, we show that the rate of discovery of new families has remained constant over time, indicating that our knowledge of sequence space is far from complete.

COmplete GENome Tracking (COGENT): a flexible data environment for computational genomics

Summary: We present a database of fully sequenced and published genomes to facilitate the re-distribution of data and ensure reproducibility of results in the field of computational genomics. For its design we have implemented an extremely simple yet powerful schema to allow linking of genome sequence data to other resources. Availability: http://maine.ebi.ac.uk:8000/services...

Expansion of the BioCyc collection of pathway/genome databases to 160 genomes

The BioCyc database collection is a set of 160 pathway/genome databases (PGDBs) for most eukaryotic and prokaryotic species whose genomes have been completely sequenced to date. Each PGDB in the BioCyc collection describes the genome and predicted metabolic network of a single organism, inferred from the MetaCyc database, which is a reference source on metabolic pathways from...

An experimental metagenome data management and analysis system

The application of shotgun sequencing to environmental samples has revealed a new universe of microbial community genomes (metagenomes) involving previously uncultured organisms. Metagenome analysis, which is expected to provide a comprehensive picture of the gene functions and metabolic capacity for microbial communities, needs to be conducted in the context of a comprehensive...

CoGenT++: an extensive and extensible data environment for computational genomics

Kunin 2 7 Christos A. Ouzounis 2 0 Institute of Agrobiotechnology, National Center for Research and Technology , PO Box 361, Thermi, Thessaloniki GR-57001, Greece 1 Laboratory for Microbiology, Belgian

A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea

Sequencing of bacterial and archaeal genomes has revolutionized our understanding of the many roles played by microorganisms1. There are now nearly 1,000 completed bacterial and archaeal genomes available2, most of which were chosen for sequencing on the basis of their physiology. As a result, the perspective provided by the currently available genomes is limited by a highly...