Bioinformatics

http://bioinformatics.oxfordjournals.org

List of Papers (Total 9,723)

AlmostSignificant: simplifying quality control of high-throughput sequencing data

Motivation: The current generation of DNA sequencing technologies produce a large amount of data quickly. All of these data need to pass some form of quality control (QC) processing and checking before they can be used for any analysis. The large number of samples that are run through Illumina sequencing machines makes the process of QC an onerous and time-consuming task that ...

PhenomeScape: a cytoscape app to identify differentially regulated sub-networks using known disease associations

Summary: PhenomeScape is a Cytoscape app which provides easy access to the PhenomeExpress algorithm to interpret gene expression data. PhenomeExpress integrates protein interaction networks with known phenotype to gene associations to find active sub-networks enriched in differentially expressed genes. It also incorporates cross-species phenotypes and associations to include ...

ImmQuant: a user-friendly tool for inferring immune cell-type composition from gene-expression data

Summary: The composition of immune-cell subsets is key to the understanding of major diseases and pathologies. Computational deconvolution methods enable researchers to investigate immune cell quantities in complex tissues based on transcriptome data. Here we present ImmQuant, a software tool allowing immunologists to upload transcription profiles of multiple tissue samples, apply ...

BatchQC: interactive software for evaluating sample and batch effects in genomic data

Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. There are several existing batch adjustment tools for ‘-omics’ data, but they do not indicate a priori whether adjustment needs to be conducted or how correction should be ...

Integrating genomic information with protein sequence and 3D atomic level structure at the RCSB protein data bank

Summary: The Protein Data Bank (PDB) now contains more than 120,000 three-dimensional (3D) structures of biological macromolecules. To allow an interpretation of how PDB data relates to other publicly available annotations, we developed a novel data integration platform that maps 3D structural information across various datasets. This integration bridges from the human genome ...

LongISLND: in silico sequencing of lengthy and noisy datatypes

Summary: LongISLND is a software package designed to simulate sequencing data according to the characteristics of third generation, single-molecule sequencing technologies. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the ...

STAMS: STRING-assisted module search for genome wide association studies and application to autism

Motivation: Analyzing genome wide association data in the context of biological pathways helps us understand how genetic variation influences phenotype and increases power to find associations. However, the utility of pathway-based analysis tools is hampered by undercuration and reliance on a distribution of signal across all of the genes in a pathway. Methods that combine genome ...

Fast-SNP: a fast matrix pre-processing algorithm for efficient loopless flux optimization of metabolic models

Motivation: Computation of steady-state flux solutions in large metabolic models is routinely performed using flux balance analysis based on a simple LP (Linear Programming) formulation. A minimal requirement for thermodynamic feasibility of the flux solution is the absence of internal loops, which are enforced using ‘loopless constraints’. The resulting loopless flux problem is a ...

Assembly-based inference of B-cell receptor repertoires from short read RNA sequencing data with V’DJer

Motivation: B-cell receptor (BCR) repertoire profiling is an important tool for understanding the biology of diverse immunologic processes. Current methods for analyzing adaptive immune receptor repertoires depend upon PCR amplification of VDJ rearrangements followed by long read amplicon sequencing spanning the VDJ junctions. While this approach has proven to be effective, it is ...

ReadXplorer 2—detailed read mapping analysis and visualization from one single source

Motivation: The vast amount of already available and currently generated read mapping data requires comprehensive visualization, and should benefit from bioinformatics tools offering a wide spectrum of analysis functionality from just one source. Appropriate handling of multiple mapped reads during mapping analyses remains an issue that demands improvement. Results: The ...

DTMiner: identification of potential disease targets through biomedical literature mining

Motivation: Biomedical researchers often search through massive catalogues of literature to look for potential relationships between genes and diseases. Given the rapid growth of biomedical literature, automatic relation extraction, a crucial technology in biomedical literature mining, has shown great potential to support research of gene-related diseases. Existing work in this ...

A profile-based method for identifying functional divergence of orthologous genes in bacterial genomes

Motivation: Next generation sequencing technologies have provided us with a wealth of information on genetic variation, but predicting the functional significance of this variation is a difficult task. While many comparative genomics studies have focused on gene flux and large scale changes, relatively little attention has been paid to quantifying the effects of single nucleotide ...

Multivariate Welch t-test on distances

Motivation: Permutational non-Euclidean analysis of variance, PERMANOVA, is routinely used in exploratory analysis of multivariate datasets to draw conclusions about the significance of patterns visualized through dimension reduction. This method recognizes that pairwise distance matrix between observations is sufficient to compute within and between group sums of squares necessary ...

LCA*: an entropy-based measure for taxonomic assignment within assembled metagenomes

Motivation: A perennial problem in the analysis of environmental sequence information is the assignment of reads or assembled sequences, e.g. contigs or scaffolds, to discrete taxonomic bins. In the absence of reference genomes for most environmental microorganisms, the use of intrinsic nucleotide patterns and phylogenetic anchors can improve assembly-dependent binning needed for ...

BioPartsDB: a synthetic biology workflow web-application for education and research

Summary: Synthetic biology has become a widely used technology, and expanding applications in research, education and industry require progress tracking for team-based DNA synthesis projects. Although some vendors are beginning to supply multi-kilobase sequence-verified constructs, synthesis workflows starting with short oligos remain important for cost savings and pedagogical ...

LAMPLINK: detection of statistically significant SNP combinations from GWAS data

Summary: One of the major issues in genome-wide association studies is to solve the missing heritability problem. While considering epistatic interactions among multiple SNPs may contribute to solving this problem, existing software cannot detect statistically significant high-order interactions. We propose software named LAMPLINK, which employs a cutting-edge method to enumerate ...

Online interactive analysis of protein structure ensembles with Bio3D-web

Summary: Bio3D-web is an online application for analyzing the sequence, structure and conformational heterogeneity of protein families. Major functionality is provided for identifying protein structure sets for analysis, their alignment and refined structure superposition, sequence and structure conservation analysis, mapping and clustering of conformations and the quantitative ...

MSAViewer: interactive JavaScript visualization of multiple sequence alignments

Summary: The MSAViewer is a quick and easy visualization and analysis JavaScript component for Multiple Sequence Alignment data of any size. Core features include interactive navigation through the alignment, application of popular color schemes, sorting, selecting and filtering. The MSAViewer is ‘web ready’: written entirely in JavaScript, compatible with modern web browsers and ...

ntHash: recursive nucleotide hashing

Motivation: Hashing has been widely used for indexing, querying and rapid similarity search in many bioinformatics applications, including sequence alignment, genome and transcriptome assembly, k-mer counting and error correction. Hence, expediting hashing operations would have a substantial impact in the field, making bioinformatics applications faster and more efficient. Results: ...

NCMine: Core-peripheral based functional module detection using near-clique mining

Motivation: The identification of functional modules from protein–protein interaction (PPI) networks is an important step toward understanding the biological features of PPI networks. The detection of functional modules in PPI networks is often performed by identifying internally densely connected subnetworks, and often produces modules with “core” and “peripheral” proteins. The ...

Drug drug interaction extraction from biomedical literature using syntax convolutional neural network

Motivation: Detecting drug-drug interaction (DDI) has become a vital part of public health safety. Therefore, using text mining techniques to extract DDIs from biomedical literature has received great attentions. However, this research is still at an early stage and its performance has much room to improve. Results: In this article, we present a syntax convolutional neural network ...

PyPanda: a Python package for gene regulatory network reconstruction

Summary: PANDA (Passing Attributes between Networks for Data Assimilation) is a gene regulatory network inference method that uses message-passing to integrate multiple sources of ‘omics data. PANDA was originally coded in C ++. In this application note we describe PyPanda, the Python version of PANDA. PyPanda runs considerably faster than the C ++ version and includes additional ...