Database: The Journal of Biological Databases and Curation

<a href="http://www.oxfordjournals.org/our_journals/databa/about.html">http://www.oxfordjournals.org/our_journals/databa/about.html</a>

List of Papers (Total 687)

SinEx DB: a database for single exon coding sequences in mammalian genomes

Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as ‘single exon genes’ (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are...

Gene regulation knowledge commons: community action takes care of DNA binding transcription factors

A large gap remains between the amount of knowledge in scientific literature and the fraction that gets curated into standardized databases, despite many curation initiatives. Yet the availability of comprehensive knowledge in databases is crucial for exploiting existing background knowledge, both for designing follow-up experiments and for interpreting new experimental data...

HITSZ_CDR: an end-to-end chemical and disease relation extraction system for BioCreative V

In this article, an end-to-end system was proposed for the challenge task of disease named entity recognition (DNER) and chemical-induced disease (CID) relation extraction in BioCreative V, where DNER includes disease mention recognition (DMR) and normalization (DN). Evaluation on the challenge corpus showed that our system achieved the highest F1-scores 86.93% on DMR, 84.11% on...

ANItools web: a web tool for fast genome comparison within multiple bacterial strains

Background: Early classification of prokaryotes was based solely on phenotypic similarities, but modern prokaryote characterization has been strongly influenced by advances in genetic methods. With the fast development of the sequencing technology, the ever increasing number of genomic sequences per species offers the possibility for developing distance determinations based on...

Integration of new alternative reference strain genome sequences into the Saccharomyces genome database

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been...

CoopTFD: a repository for predicted yeast cooperative transcription factor pairs

In eukaryotic cells, transcriptional regulation of gene expression is usually accomplished by cooperative Transcription Factors (TFs). Therefore, knowing cooperative TFs is helpful for uncovering the mechanisms of transcriptional regulation. In yeast, many cooperative TF pairs have been predicted by various algorithms in the literature. However, until now, there is still no...

MET network in PubMed: a text-mined network visualization and curation system

Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network...

Abasy Atlas: a comprehensive inventory of systems, global network properties and systems-level elements across bacteria

The availability of databases electronically encoding curated regulatory networks and of high-throughput technologies and methods to discover regulatory interactions provides an invaluable source of data to understand the principles underpinning the organization and evolution of these networks responsible for cellular regulation. Nevertheless, data on these sources never goes...

gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes

In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs...

URS DataBase: universe of RNA structures and their motifs

The Universe of RNA Structures DataBase (URSDB) stores information obtained from all RNA-containing PDB entries (2935 entries in October 2015). The content of the database is updated regularly. The database consists of 51 tables containing indexed data on various elements of the RNA structures. The database provides a web interface allowing user to select a subset of structures...

GESDB: a platform of simulation resources for genetic epidemiology studies

Computer simulations are routinely conducted to evaluate new statistical methods, to compare the properties among different methods, and to mimic the observed data in genetic epidemiology studies. Conducting simulation studies can become a complicated task as several challenges can occur, such as the selection of an appropriate simulation tool and the specification of parameters...

Predicting structured metadata from unstructured metadata

Enormous amounts of biomedical data have been and are being produced by investigators all over the world. However, one crucial and limiting factor in data reuse is accurate, structured and complete description of the data or data about the data—defined as metadata. We propose a framework to predict structured metadata terms from unstructured metadata for improving quality and...

BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences

BioSharing (http://www.biosharing.org) is a manually curated, searchable portal of three linked registries. These resources cover standards (terminologies, formats and models, and reporting guidelines), databases, and data policies in the life sciences, broadly encompassing the biological, environmental and biomedical sciences. Launched in 2011 and built by the same core team as...

A knowledge-poor approach to chemical-disease relation extraction

The article describes a knowledge-poor approach to the task of extracting Chemical-Disease Relations from PubMed abstracts. A first version of the approach was applied during the participation in the BioCreative V track 3, both in Disease Named Entity Recognition and Normalization (DNER) and in Chemical-induced diseases (CID) relation extraction. For both tasks, we have adopted a...

Argo: enabling the development of bespoke workflows and services for disease annotation

Argo (http://argo.nactem.ac.uk) is a generic text mining workbench that can cater to a variety of use cases, including the semi-automatic annotation of literature. It enables its technical users to build their own customised text mining solutions by providing a wide array of interoperable and configurable elementary components that can be seamlessly integrated into processing...

‘RE:fine drugs’: an interactive dashboard to access drug repurposing opportunities

The process of discovering new drugs has been extremely costly and slow in the last decades despite enormous investment in pharmaceutical research. Drug repurposing enables researchers to speed up the process of discovering other conditions that existing drugs can effectively treat, with low cost and fast FDA approval. Here, we introduce ‘RE:fine Drugs’, a freely available...

A web resource for mining HLA associations with adverse drug reactions: HLA-ADR

Human leukocyte antigens (HLA) are an important family of genes involved in the immune system. Their primary function is to allow the host immune system to be able to distinguish between self and non-self peptides—e.g. derived from invading pathogens. However, these genes have also been implicated in immune-mediated adverse drug reactions (ADRs), presenting a problem to patients...

BELTracker: evidence sentence retrieval for BEL statements

Biological expression language (BEL) is one of the main formal representation models of biological networks. The primary source of information for curating biological networks in BEL representation has been literature. It remains a challenge to identify relevant articles and the corresponding evidence statements for curating and validating BEL statements. In this paper, we...

DPTEdb, an integrative database of transposable elements in dioecious plants

Dioecious plants usually harbor ‘young’ sex chromosomes, providing an opportunity to study the early stages of sex chromosome evolution. Transposable elements (TEs) are mobile DNA elements frequently found in plants and are suggested to play important roles in plant sex chromosome evolution. The genomes of several dioecious plants have been sequenced, offering an opportunity to...

The Chinchilla Research Resource Database: resource for an otolaryngology disease model

The long-tailed chinchilla (Chinchilla lanigera) is an established animal model for diseases of the inner and middle ear, among others. In particular, chinchilla is commonly used to study diseases involving viral and bacterial pathogens and polymicrobial infections of the upper respiratory tract and the ear, such as otitis media. The value of the chinchilla as a model for human...

PepPSy: a web server to prioritize gene products in experimental and biocuration workflows

Among the 20 000 human gene products predicted from genome annotation, about 3000 still lack validation at protein level. We developed PepPSy, a user-friendly gene expression-based prioritization system, to help investigators to determine in which human tissues they should look for an unseen protein. PepPSy can also be used by biocurators to revisit the annotation of specific...

Mining chemical patents with an ensemble of open systems

The significant amount of medicinal chemistry information contained in patents makes them an attractive target for text mining. In this manuscript, we describe systems for named entity recognition (NER) of chemicals and genes/proteins in patents, using the CEMP (for chemicals) and GPRO (for genes/proteins) corpora provided by the CHEMDNER task at BioCreative V. Our chemical NER...

BelSmile: a biomedical semantic role labeling approach for extracting biological expression language from text

Biological expression language (BEL) is one of the most popular languages to represent the causal and correlative relationships among biological events. Automatically extracting and representing biomedical events using BEL can help biologists quickly survey and understand relevant literature. Recently, many researchers have shown interest in biomedical event extraction. However...

BioC-compatible full-text passage detection for protein–protein interactions using extended dependency graph

There has been a large growth in the number of biomedical publications that report experimental results. Many of these results concern detection of protein–protein interactions (PPI). In BioCreative V, we participated in the BioC task and developed a PPI system to detect text passages with PPIs in the full-text articles. By adopting the BioC format, the output of the system can...