VacSol: a high throughput in silico pipeline to predict potential therapeutic targets in prokaryotic pathogens using subtractive reverse vaccinology

Background With advances in reverse vaccinology approaches, a progressive improvement has been observed in the prediction of putative vaccine candidates. Reverse vaccinology has changed the way of discovery and provides a mean to propose target identification in reduced time and labour. In this regard, high throughput genomic sequencing technologies and supporting bioinformatics...

A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies

Background Intra-sample cellular heterogeneity presents numerous challenges to the identification of biomarkers in large Epigenome-Wide Association Studies (EWAS). While a number of reference-based deconvolution algorithms have emerged, their potential remains underexplored and a comparative evaluation of these algorithms beyond tissues such as blood is still lacking. Results...

ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains

Background Many entries in the protein data bank (PDB) are annotated to show their component protein domains according to the Pfam classification, as well as their biological function through the enzyme commission (EC) numbering scheme. However, despite the fact that the biological activity of many proteins often arises from specific domain-domain and domain-ligand interactions...

TipMT: Identification of PCR-based taxon-specific markers

Background Molecular genetic markers are one of the most informative and widely used genome features in clinical and environmental diagnostic studies. A polymerase chain reaction (PCR)-based molecular marker is very attractive because it is suitable to high throughput automation and confers high specificity. However, the design of taxon-specific primers may be difficult and time...

iHMS: a database integrating human histone modification data across developmental stages and tissues

Background Differences in chromatin states are critical to the multiplicity of cell states. Recently genome-wide histone modification maps of diverse human developmental stages and tissues have been charted. Description To facilitate the investigation of epigenetic dynamics and regulatory mechanisms in cellular differentiation processes, we developed iHMS, an integrated human...

SEPIa, a knowledge-driven algorithm for predicting conformational B-cell epitopes from the amino acid sequence

Background The identification of immunogenic regions on the surface of antigens, which are able to be recognized by antibodies and to trigger an immune response, is a major challenge for the design of new and effective vaccines. The prediction of such regions through computational immunology techniques is a challenging goal, which will ultimately lead to a drastic limitation of...

VMCMC: a graphical and statistical analysis tool for Markov chain Monte Carlo traces

Background MCMC-based methods are important for Bayesian inference of phylogeny and related parameters. Although being computationally expensive, MCMC yields estimates of posterior distributions that are useful for estimating parameter values and are easy to use in subsequent analysis. There are, however, sometimes practical difficulties with MCMC, relating to convergence...

Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO

Background Conventional differential gene expression analysis by methods such as student’s t-test, SAM, and Empirical Bayes often searches for statistically significant genes without considering the interactions among them. Network-based approaches provide a natural way to study these interactions and to investigate the rewiring interactions in disease versus control groups. In...

Novel methods to optimize gene and statistic test for evaluation – an application for Escherichia coli

Background Since the recombinant protein was discovered, it has become more popular in many aspects of life science. The value of global pharmaceutical market was $87 billion in 2008 and the sales for industrial enzyme exceeded$4 billion in 2012. This is strong evidence showing the great potential of recombinant protein. However, native genes introduced into a host can cause...

The Drosophila Gene Expression Tool (DGET) for expression analyses

Background Next-generation sequencing technologies have greatly increased our ability to identify gene expression levels, including at specific developmental stages and in specific tissues. Gene expression data can help researchers understand the diverse functions of genes and gene networks, as well as help in the design of specific and efficient functional studies, such as by...

How can functional annotations be derived from profiles of phenotypic annotations?

Background Loss-of-function phenotypes are widely used to infer gene function using the principle that similar phenotypes are indicative of similar functions. However, converting phenotypic to functional annotations requires careful interpretation of phenotypic descriptions and assessment of phenotypic similarity. Understanding how functions and phenotypes are linked will be...

An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data

Background The Human Microbiome has been variously associated with the immune-regulatory mechanisms involved in the prevention or development of many non-infectious human diseases such as autoimmunity, allergy and cancer. Integrative approaches which aim at associating the composition of the human microbiome with other available information, such as clinical covariates and...

SPANG: a SPARQL client supporting generation and reuse of queries for distributed RDF databases

Background Toward improved interoperability of distributed biological databases, an increasing number of datasets have been published in the standardized Resource Description Framework (RDF). Although the powerful SPARQL Protocol and RDF Query Language (SPARQL) provides a basis for exploiting RDF databases, writing SPARQL code is burdensome for users including bioinformaticians...

Not all predicted CRISPR–Cas systems are equal: isolated cas genes and classes of CRISPR like elements

Background The CRISPR–Cas systems in prokaryotes are RNA-guided immune systems that target and deactivate foreign nucleic acids. A typical CRISPR–Cas system consists of a CRISPR array of repeat and spacer units, and a locus of cas genes. The CRISPR and the cas locus are often located next to each other in the genomes. However, there is no quantitative estimate of the co-location...

PyHLA: tests for the association between HLA alleles and diseases

Background Recently, several tools have been designed for human leukocyte antigen (HLA) typing using single nucleotide polymorphism (SNP) array and next-generation sequencing (NGS) data. These tools provide high-throughput and cost-effective approaches for identifying HLA types. Therefore, tools for downstream association analysis are highly desirable. Although several tools have...

Evaluation of logistic regression models and effect of covariates for case–control study in RNA-Seq analysis

Background Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore...

MIDcor, an R-program for deciphering mass interferences in mass spectra of metabolites enriched in stable isotopes

Background Tracing stable isotopes, such as 13 C using various mass spectrometry (MS) methods provides a valuable information necessary for the study of biochemical processes in cells. However, extracting such information requires special care, such as a correction for naturally occurring isotopes, or overlapping mass spectra of various components of the cell culture medium...

Mixture model normalization for non-targeted gas chromatography/mass spectrometry metabolomics data

Background Metabolomics offers a unique integrative perspective for health research, reflecting genetic and environmental contributions to disease-related phenotypes. Identifying robust associations in population-based or large-scale clinical studies demands large numbers of subjects and therefore sample batching for gas-chromatography/mass spectrometry (GC/MS) non-targeted...

Transductive learning as an alternative to translation initiation site identification

Background The correct protein coding region identification is an important and latent problem in the molecular biology field. This problem becomes a challenge due to the lack of deep knowledge about the biological systems and unfamiliarity of conservative characteristics in the messenger RNA (mRNA). Therefore, it is fundamental to research for computational methods aiming to...

DMDtoolkit: a tool for visualizing the mutated dystrophin protein and predicting the clinical severity in DMD

Background Dystrophinopathy is one of the most common human monogenic diseases which results in Duchenne muscular dystrophy (DMD) and Becker muscular dystrophy (BMD). Mutations in the dystrophin gene are responsible for both DMD and BMD. However, the clinical phenotypes and treatments are quite different in these two muscular dystrophies. Since early diagnosis and treatment...

Metabolomics variable selection and classification in the presence of observations below the detection limit using an extension of ERp

Background ERp is a variable selection and classification method for metabolomics data. ERp uses minimized classification error rates, based on data from a control and experimental group, to test the null hypothesis of no difference between the distributions of variables over the two groups. If the associated p-values are significant they indicate discriminatory variables (i.e...

A novel statistical approach for identification of the master regulator transcription factor

Background Transcription factors are known to play key roles in carcinogenesis and therefore, are gaining popularity as potential therapeutic targets in drug development. A ‘master regulator’ transcription factor often appears to control most of the regulatory activities of the other transcription factors and the associated genes. This ‘master regulator’ transcription factor is...

Classifying kinase conformations using a machine learning approach

Background Signaling proteins such as protein kinases adopt a diverse array of conformations to respond to regulatory signals in signaling pathways. Perhaps the most fundamental conformational change of a kinase is the transition between active and inactive states, and defining the conformational features associated with kinase activation is critical for selectively targeting...

BicPAMS: software for biological data analysis with pattern-based biclustering

Background Biclustering has been largely applied for the unsupervised analysis of biological data, being recognised today as a key technique to discover putative modules in both expression data (subsets of genes correlated in subsets of conditions) and network data (groups of coherently interconnected biological entities). However, given its computational complexity, only recent...

Visualizing phylogenetic tree landscapes

Background Genomic-scale sequence alignments are increasingly used to infer phylogenies in order to better understand the processes and patterns of evolution. Different partitions within these new alignments (e.g., genes, codon positions, and structural features) often favor hundreds if not thousands of competing phylogenies. Summarizing and comparing phylogenies obtained from...