Database

http://database.oxfordjournals.org

List of Papers (Total 687)

blend4php: a PHP API for galaxy

Galaxy is a popular framework for execution of complex analytical pipelines typically for large data sets, and is a commonly used for (but not limited to) genomic, genetic and related biological analysis. It provides a web front-end and integrates with high performance computing resources. Here we report the development of the blend4php library that wraps Galaxy’s RESTful API...

KTCNlncDB—a first platform to investigate lncRNAs expressed in human keratoconus and non-keratoconus corneas

Keratoconus (KTCN, OMIM 148300) is a degenerative eye disorder characterized by progressive stromal thinning that leads to a conical shape of the cornea, resulting in optical aberrations and even loss of visual function. The biochemical background of the disease is poorly understood, which motivated us to perform RNA-Seq experiment, aimed at better characterizing the KTCN...

RAIN: RNA–protein Association and Interaction Networks

Protein association networks can be inferred from a range of resources including experimental data, literature mining and computational predictions. These types of evidence are emerging for non-coding RNAs (ncRNAs) as well. However, integration of ncRNAs into protein association networks is challenging due to data heterogeneity. Here, we present a database of ncRNA–RNA and ncRNA...

Automatic query generation using word embeddings for retrieving passages describing experimental methods

Information regarding the physical interactions among proteins is crucial, since protein–protein interactions (PPIs) are central for many biological processes. The experimental techniques used to verify PPIs are vital for characterizing and assessing the reliability of the identified PPIs. A lot of information about PPIs and the experimental methods are only available in the text...

FARME DB: a functional antibiotic resistance element database

Antibiotic resistance (AR) is a major global public health threat but few resources exist that catalog AR genes outside of a clinical context. Current AR sequence databases are assembled almost exclusively from genomic sequences derived from clinical bacterial isolates and thus do not include many microbial sequences derived from environmental samples that confer resistance in...

Duplicates, redundancies and inconsistencies in the primary nucleotide databases: a descriptive study

GenBank, the EMBL European Nucleotide Archive and the DNA DataBank of Japan, known collectively as the International Nucleotide Sequence Database Collaboration or INSDC, are the three most significant nucleotide sequence databases. Their records are derived from laboratory work undertaken by different individuals, by different teams, with a range of technologies and assumptions...

The BioC-BioGRID corpus: full text articles annotated for curation of protein–protein and genetic interactions

A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This...

MAHMI database: a comprehensive MetaHit-based resource for the study of the mechanism of action of the human microbiota

The Mechanism of Action of the Human Microbiome (MAHMI) database is a unique resource that provides comprehensive information about the sequence of potential immunomodulatory and antiproliferative peptides encrypted in the proteins produced by the human gut microbiota. Currently, MAHMI database contains over 300 hundred million peptide entries, with detailed information about...

FirebrowseR: an R client to the Broad Institute’s Firehose Pipeline

With its Firebrowse service (http://firebrowse.org/) the Broad Institute is making large-scale multi-platform omics data analysis results publicly available through a Representational State Transfer (REST) Application Programmable Interface (API). Querying this database through an API client from an arbitrary programming environment is an essential task, allowing other developers...

Establishment of Kawasaki disease database based on metadata standard

Kawasaki disease (KD) is a rare disease that occurs predominantly in infants and young children. To identify KD susceptibility genes and to develop a diagnostic test, a specific therapy, or prevention method, collecting KD patients’ clinical and genomic data is one of the major issues. For this purpose, Kawasaki Disease Database (KDD) was developed based on the efforts of Korean...

ArthropodaCyc: a CycADS powered collection of BioCyc databases to analyse and compare metabolism of arthropods

Arthropods interact with humans at different levels with highly beneficial roles (e.g. as pollinators), as well as with a negative impact for example as vectors of human or animal diseases, or as agricultural pests. Several arthropod genomes are available at present and many others will be sequenced in the near future in the context of the i5K initiative, offering opportunities...

Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges

Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop...

Can we replace curation with information extraction software?

Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein...

Crowd-sourcing and author submission as alternatives to professional curation

Can we decrease the costs of database curation by crowd-sourcing curation work or by offloading curation to publication authors? This perspective considers the significant experience accumulated by the bioinformatics community with these two alternatives to professional curation in the last 20 years; that experience should be carefully considered when formulating new strategies...

CaspNeuroD: a knowledgebase of predicted caspase cleavage sites in human proteins related to neurodegenerative diseases

Background: A variety of neurodegenerative diseases (NDs) have been associated with deregulated caspase activation that leads to neuronal death. Caspases appear to be involved in the molecular pathology of NDs by directly cleaving important proteins. For instance, several proteins involved in Alzheimer’s disease, including β-amyloid precursor protein (APP) and presenilins, are...

The importance of digitized biocollections as a source of trait data and a new VertNet resource

For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species...

TBro: visualization and management of de novo transcriptomes

RNA sequencing (RNA-seq) has become a powerful tool to understand molecular mechanisms and/or developmental programs. It provides a fast, reliable and cost-effective method to access sets of expressed elements in a qualitative and quantitative manner. Especially for non-model organisms and in absence of a reference genome, RNA-seq data is used to reconstruct and quantify...

Construction of antimicrobial peptide-drug combination networks from scientific literature based on a semi-automated curation workflow

Considerable research efforts are being invested in the development of novel antimicrobial therapies effective against the growing number of multi-drug resistant pathogens. Notably, the combination of different agents is increasingly explored as means to exploit and improve individual agent actions while minimizing microorganism resistance. Although there are several databases on...

Minimizing proteome redundancy in the UniProt Knowledgebase

Advances in high-throughput sequencing have led to an unprecedented growth in genome sequences being submitted to biological databases. In particular, the sequencing of large numbers of nearly identical bacterial genomes during infection outbreaks and for other large-scale studies has resulted in a high level of redundancy in nucleotide databases and consequently in the UniProt...

RiceATM: a platform for identifying the association between rice agronomic traits and miRNA expression

MicroRNAs (miRNAs) are known to play critical roles in plant development and stress-response regulation, and they frequently display multi-targeting characteristics. The control of defined rice phenotypes occurs through multiple genes; however, evidence demonstrating the relationship between agronomic traits and miRNA expression profiles is lacking. In this study, we investigated...

Large-scale inference of gene function through phylogenetic annotation of Gene Ontology terms: case study of the apoptosis and autophagy cellular processes

We previously reported a paradigm for large-scale phylogenomic analysis of gene families that takes advantage of the large corpus of experimentally supported Gene Ontology (GO) annotations. This ‘GO Phylogenetic Annotation’ approach integrates GO annotations from evolutionarily related genes across ∼100 different organisms in the context of a gene family tree, in which curators...

iMITEdb: the genome-wide landscape of miniature inverted-repeat transposable elements in insects

Miniature inverted-repeat transposable elements (MITEs) have attracted much attention due to their widespread occurrence and high copy numbers in eukaryotic genomes. However, the systematic knowledge about MITEs in insects and other animals is still lacking. In this study, we identified 6012 MITE families from 98 insect species genomes. Comparison of these MITEs with known MITEs...

HEDD: the human epigenetic drug database

Epigenetic drugs are chemical compounds that target disordered post-translational modification of histone proteins and DNA through enzymes, and the recognition of these changes by adaptor proteins. Epigenetic drug-related experimental data such as gene expression probed by high-throughput sequencing, co-crystal structure probed by X-RAY diffraction and binding constants probed by...

GeneBase 1.1: a tool to summarize data from NCBI gene datasets and its application to an update of human gene statistics

We release GeneBase 1.1, a local tool with a graphical interface useful for parsing, structuring and indexing data from the National Center for Biotechnology Information (NCBI) Gene data bank. Compared to its predecessor GeneBase (1.0), GeneBase 1.1 now allows dynamic calculation and summarization in terms of median, mean, standard deviation and total for many quantitative...

Reefgenomics.Org - a repository for marine genomics data

Over the last decade, technological advancements have substantially decreased the cost and time of obtaining large amounts of sequencing data. Paired with the exponentially increased computing power, individual labs are now able to sequence genomes or transcriptomes to investigate biological questions of interest. This has led to a significant increase in available sequence data...