Database: The Journal of Biological Databases and Curation

<a href="http://www.oxfordjournals.org/our_journals/databa/about.html">http://www.oxfordjournals.org/our_journals/databa/about.html</a>

List of Papers (Total 687)

BioCreative V CDR task corpus: a resource for chemical disease relation extraction

Community-run, formal evaluations and manually annotated text corpora are critically important for advancing biomedical text-mining research. Recently in BioCreative V, a new challenge was organized for the tasks of disease named entity recognition (DNER) and chemical-induced disease (CID) relation extraction. Given the nature of both tasks, a test collection is required to...

ToxEvaluator: an integrated computational platform to aid the interpretation of toxicology study-related findings

Attempts are frequently made to investigate adverse findings from preclinical toxicology studies in order to better understand underlying toxicity mechanisms. These efforts often begin with limited information, including a description of the adverse finding, knowledge of the structure of the chemical associated with its cause and the intended pharmacological target. ToxEvaluator...

Chemical entity recognition in patents by combining dictionary-based and statistical approaches

We describe the development of a chemical entity recognition system and its application in the CHEMDNER-patent track of BioCreative 2015. This community challenge includes a Chemical Entity Mention in Patents (CEMP) recognition task and a Chemical Passage Detection (CPD) classification task. We addressed both tasks by an ensemble system that combines a dictionary-based approach...

UbiNet: an online resource for exploring the functional associations and regulatory networks of protein ubiquitylation

Protein ubiquitylation catalyzed by E3 ubiquitin ligases are crucial in the regulation of many cellular processes. Owing to the high throughput of mass spectrometry-based proteomics, a number of methods have been developed for the experimental determination of ubiquitylation sites, leading to a large collection of ubiquitylation data. However, there exist no resources for the...

From data repositories to submission portals: rethinking the role of domain-specific databases in CollecTF

Domain-specific databases are essential resources for the biomedical community, leveraging expert knowledge to curate published literature and provide access to referenced data and knowledge. The limited scope of these databases, however, poses important challenges on their infrastructure, visibility, funding and usefulness to the broader scientific community. CollecTF is a...

CoReCG: a comprehensive database of genes associated with colon-rectal cancer

Cancer of large intestine is commonly referred as colorectal cancer, which is also the third most frequently prevailing neoplasm across the globe. Though, much of work is being carried out to understand the mechanism of carcinogenesis and advancement of this disease but, fewer studies has been performed to collate the scattered information of alterations in tumorigenic cells like...

Integrated sequence and immunology filovirus database at Los Alamos

The Ebola outbreak of 2013–15 infected more than 28 000 people and claimed more lives than all previous filovirus outbreaks combined. Governmental agencies, clinical teams, and the world scientific community pulled together in a multifaceted response ranging from prevention and disease control, to evaluating vaccines and therapeutics in human trials. As this epidemic is finally...

Kalium: a database of potassium channel toxins from scorpion venom

Kalium (http://kaliumdb.org/) is a manually curated database that accumulates data on potassium channel toxins purified from scorpion venom (KTx). This database is an open-access resource, and provides easy access to pages of other databases of interest, such as UniProt, PDB, NCBI Taxonomy Browser, and PubMed. General achievements of Kalium are a strict and easy regulation of KTx...

PGP repository: a plant phenomics and genomics data publication infrastructure

Plant genomics and phenomics represents the most promising tools for accelerating yield gains and overcoming emerging crop productivity bottlenecks. However, accessing this wealth of plant diversity requires the characterization of this material using state-of-the-art genomic, phenomic and molecular technologies and the release of subsequent research data via a long-term stable...

A crowdsourcing workflow for extracting chemical-induced disease relations from free text

Relations between chemicals and diseases are one of the most queried biomedical interactions. Although expert manual curation is the standard method for extracting these relations from the literature, it is expensive and impractical to apply to large numbers of documents, and therefore alternative methods are required. We describe here a crowdsourcing workflow for extracting...

Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning

Medicinal chemistry patents contain rich information about chemical compounds. Although much effort has been devoted to extracting chemical entities from scientific literature, limited numbers of patent mining systems are publically available, probably due to the lack of large manually annotated corpora. To accelerate the development of information extraction systems for...

Improving HIV proteome annotation: new features of BioAfrica HIV Proteomics Resource

The Human Immunodeficiency Virus (HIV) is one of the pathogens that cause the greatest global concern, with approximately 35 million people currently infected with HIV. Extensive HIV research has been performed, generating a large amount of HIV and host genomic data. However, no effective vaccine that protects the host from HIV infection is available and HIV is still spreading at...

NPInter v3.0: an upgraded database of noncoding RNA-associated interactions

Despite the fact that a large quantity of noncoding RNAs (ncRNAs) have been identified, their functions remain unclear. To enable researchers to have a better understanding of ncRNAs’ functions, we updated the NPInter database to version 3.0, which contains experimentally verified interactions between ncRNAs (excluding tRNAs and rRNAs), especially long noncoding RNAs (lncRNAs...

Exploiting syntactic and semantics information for chemical–disease relation extraction

Identifying chemical–disease relations (CDR) from biomedical literature could improve chemical safety and toxicity studies. This article proposes a novel syntactic and semantic information exploitation method for CDR extraction. The proposed method consists of a feature-based model, a tree kernel-based model and a neural network model. The feature-based model exploits lexical...

Extraction of chemical-induced diseases using prior knowledge and textual information

We describe our approach to the chemical–disease relation (CDR) task in the BioCreative V challenge. The CDR task consists of two subtasks: automatic disease-named entity recognition and normalization (DNER), and extraction of chemical-induced diseases (CIDs) from Medline abstracts. For the DNER subtask, we used our concept recognition tool Peregrine, in combination with several...

3CDB: a manually curated database of chromosome conformation capture data

Chromosome conformation capture (3C) is a biochemical technology to analyse contact frequencies between selected genomic sites in a cell population. Its recent genomic variants, e.g. Hi-C/ chromatin interaction analysis by paired-end tag (ChIA-PET), have enabled the study of nuclear organization at an unprecedented level. However, due to the inherent low resolution and ultrahigh...

ToxDB: pathway-level interpretation of drug-treatment data

Motivation: Extensive drug treatment gene expression data have been generated in order to identify biomarkers that are predictive for toxicity or to classify compounds. However, such patterns are often highly variable across compounds and lack robustness. We and others have previously shown that supervised expression patterns based on pathway concepts rather than unsupervised...

BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations

Comprehensive knowledge of genomic variants in a biological context is key for precision medicine. As next-generation sequencing technologies improve, the amount of literature containing genomic variant data, such as new functions or related phenotypes, rapidly increases. Because numerous articles are published every day, it is almost impossible to manually curate all the variant...

PmiRExAt: plant miRNA expression atlas database and web applications

High-throughput small RNA (sRNA) sequencing technology enables an entirely new perspective for plant microRNA (miRNA) research and has immense potential to unravel regulatory networks. Novel insights gained through data mining in publically available rich resource of sRNA data will help in designing biotechnology-based approaches for crop improvement to enhance plant yield and...

Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall

Awareness of the adverse effects of chemicals is important in biomedical research and healthcare. Text mining can allow timely and low-cost extraction of this knowledge from the biomedical literature. We extended our text mining solution, LeadMine, to identify diseases and chemical-induced disease relationships (CIDs). LeadMine is a dictionary/grammar-based entity recognizer and...

RegenBase: a knowledge base of spinal cord injury biology for translational research

Spinal cord injury (SCI) research is a data-rich field that aims to identify the biological mechanisms resulting in loss of function and mobility after SCI, as well as develop therapies that promote recovery after injury. SCI experimental methods, data and domain knowledge are locked in the largely unstructured text of scientific publications, making large scale integration with...

ATtRACT—a database of RNA-binding proteins and associated motifs

RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes...

ICEPO: the ion channel electrophysiology ontology

Ion channels are transmembrane proteins that selectively allow ions to flow across the plasma membrane and play key roles in diverse biological processes. A multitude of diseases, called channelopathies, such as epilepsies, muscle paralysis, pain syndromes, cardiac arrhythmias or hypoglycemia are due to ion channel mutations. A wide corpus of literature is available on ion...