BMC Bioinformatics

http://www.biomedcentral.com/bmcbioinformatics/

List of Papers (Total 19,377)

An oscillating reaction network with an exact closed form solution in the time domain

Oscillatory behavior is critical to many life sustaining processes such as cell cycles, circadian rhythms, and notch signaling. Important biological functions depend on the characteristics of these oscillations (hereafter, oscillation characteristics or OCs): frequency (e.g., event timings), amplitude (e.g., signal strength), and phase (e.g., event sequencing). Numerous...

Allele-specific binding (ASB) analyzer for annotation of allele-specific binding SNPs

Allele-specific binding (ASB) events occur when transcription factors (TFs) bind more favorably to one of the two parental alleles at heterozygous single nucleotide polymorphisms (SNPs). Evidence suggests that ASB events could reveal the impact of sequence variations on TF binding and may have implications for the risk of diseases. Here we present ASB-analyzer, a software...

Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer

Hierarchical classification offers a more specific categorization of data and breaks down large classification problems into subproblems, providing improved prediction accuracy and predictive power for undefined categories, while also mitigating the impact of poor-quality data. Despite these advantages, its application in predicting primary cancer is rare. To leverage the...

pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods

Variability in datasets is not only the product of biological processes: they are also the product of technical biases. ComBat and ComBat-Seq are among the most widely used tools for correcting those technical biases, called batch effects, in, respectively, microarray and RNA-Seq expression data. In this technical note, we present a new Python implementation of ComBat and ComBat...

Efficient design of synthetic gene circuits under cell-to-cell variability

Synthetic biologists use and combine diverse biological parts to build systems such as genetic circuits that perform desirable functions in, for example, biomedical or industrial applications. Computer-aided design methods have been developed to help choose appropriate network structures and biological parts for a given design objective. However, they almost always model the...

Lokatt: a hybrid DNA nanopore basecaller with an explicit duration hidden Markov model and a residual LSTM network

Basecalling long DNA sequences is a crucial step in nanopore-based DNA sequencing protocols. In recent years, the CTC-RNN model has become the leading basecalling model, supplanting preceding hidden Markov models (HMMs) that relied on pre-segmenting ion current measurements. However, the CTC-RNN model operates independently of prior biological and physical insights. We present a...

Incorporating mutational heterogeneity to identify genes that are enriched for synonymous mutations in cancer

Synonymous mutations, which change the DNA sequence but not the encoded protein sequence, can affect protein structure and function, mRNA maturation, and mRNA half-lives. The possibility that synonymous mutations might be enriched in cancer has been explored in several recent studies. However, none of these studies control for all three types of mutational heterogeneity (patient...

SingleScan: a comprehensive resource for single-cell sequencing data processing and mining

Single-cell sequencing has shed light on previously inaccessible biological questions from different fields of research, including organism development, immune function, and disease progression. The number of single-cell-based studies increased dramatically over the past decade. Several new methods and tools have been continuously developed, making it extremely tricky to navigate...

G-bic: generating synthetic benchmarks for biclustering

Biclustering is increasingly used in biomedical data analysis, recommendation tasks, and text mining domains, with hundreds of biclustering algorithms proposed. When assessing the performance of these algorithms, more than real datasets are required as they do not offer a solid ground truth. Synthetic data surpass this limitation by producing reference solutions to be compared...

Detection for melanoma skin cancer through ACCF, BPPF, and CLF techniques with machine learning approach

Intense sun exposure is a major risk factor for the development of melanoma, an abnormal proliferation of skin cells. Yet, this more prevalent type of skin cancer can also develop in less-exposed areas, such as those that are shaded. Melanoma is the sixth most common type of skin cancer. In recent years, computer-based methods for imaging and analyzing biological systems have...

Protein–protein interaction site prediction by model ensembling with hybrid feature and self-attention

Protein–protein interactions (PPIs) are crucial in various biological functions and cellular processes. Thus, many computational approaches have been proposed to predict PPI sites. Although significant progress has been made, these methods still have limitations in encoding the characteristics of each amino acid in sequences. Many feature extraction methods rely on the sliding...

HostNet: improved sequence representation in deep neural networks for virus-host prediction

The escalation of viruses over the past decade has highlighted the need to determine their respective hosts, particularly for emerging ones that pose a potential menace to the welfare of both human and animal life. Yet, the traditional means of ascertaining the host range of viruses, which involves field surveillance and laboratory experiments, is a laborious and demanding...

A seed expansion-based method to identify essential proteins by integrating protein–protein interaction sub-networks and multiple biological characteristics

The identification of essential proteins is of great significance in biology and pathology. However, protein–protein interaction (PPI) data obtained through high-throughput technology include a high number of false positives. To overcome this limitation, numerous computational algorithms based on biological characteristics and topological features have been proposed to identify...

Completing a genomic characterisation of microscopic tumour samples with copy number

Genomic insights in settings where tumour sample sizes are limited to just hundreds or even tens of cells hold great clinical potential, but also present significant technical challenges. We previously developed the DigiPico sequencing platform to accurately identify somatic mutations from such samples. Here, we complete this genomic characterisation with copy number. We present...

PMFFRC: a large-scale genomic short reads compression optimizer via memory modeling and redundant clustering

Genomic sequencing reads compressors are essential for balancing high-throughput sequencing short reads generation speed, large-scale genomic data sharing, and infrastructure storage expenditure. However, most existing short reads compressors rarely utilize big-memory systems and duplicative information between diverse sequencing files to achieve a higher compression ratio for...

Biocaiv: an integrative webserver for motif-based clustering analysis and interactive visualization of biological networks

As an important task in bioinformatics, clustering analysis plays a critical role in understanding the functional mechanisms of many complex biological systems, which can be modeled as biological networks. The purpose of clustering analysis in biological networks is to identify functional modules of interest, but there is a lack of online clustering tools that visualize...

Machine learning-based approaches for ubiquitination site prediction in human proteins

Protein ubiquitination is a critical post-translational modification (PTMs) involved in numerous cellular processes. Identifying ubiquitination sites (Ubi-sites) on proteins offers valuable insights into their function and regulatory mechanisms. Due to the cost- and time-consuming nature of traditional approaches for Ubi-site detection, there has been a growing interest in...

Detection of continuous hierarchical heterogeneity by single-cell surface antigen analysis in the prognosis evaluation of acute myeloid leukaemia

Acute myeloid leukaemia (AML) is characterised by the malignant accumulation of myeloid progenitors with a high recurrence rate after chemotherapy. Blasts (leukaemia cells) exhibit a complete myeloid differentiation hierarchy hiding a wide range of temporal information from initial to mature clones, including genesis, phenotypic transformation, and cell fate decisions, which...

scMuffin: an R package to disentangle solid tumor heterogeneity by single-cell gene expression analysis

Single-cell (SC) gene expression analysis is crucial to dissect the complex cellular heterogeneity of solid tumors, which is one of the main obstacles for the development of effective cancer treatments. Such tumors typically contain a mixture of cells with aberrant genomic and transcriptomic profiles affecting specific sub-populations that might have a pivotal role in cancer...

Transformer-based tool recommendation system in Galaxy

Galaxy is a web-based open-source platform for scientific analyses. Researchers use thousands of high-quality tools and workflows for their respective analyses in Galaxy. Tool recommender system predicts a collection of tools that can be used to extend an analysis. In this work, a tool recommender system is developed by training a transformer on workflows available on Galaxy...

AptaTrans: a deep neural network for predicting aptamer-protein interaction using pretrained encoders

Aptamers, which are biomaterials comprised of single-stranded DNA/RNA that form tertiary structures, have significant potential as next-generation materials, particularly for drug discovery. The systematic evolution of ligands by exponential enrichment (SELEX) method is a critical in vitro technique employed to identify aptamers that bind specifically to target proteins. While...

Predicting anticancer synergistic drug combinations based on multi-task learning

The discovery of anticancer drug combinations is a crucial work of anticancer treatment. In recent years, pre-screening drug combinations with synergistic effects in a large-scale search space adopting computational methods, especially deep learning methods, is increasingly popular with researchers. Although achievements have been made to predict anticancer synergistic drug...

CoDock-Ligand: combined template-based docking and CNN-based scoring in ligand binding prediction

For ligand binding prediction, it is crucial for molecular docking programs to integrate template-based modeling with a precise scoring function. Here, we proposed the CoDock-Ligand docking method that combines template-based modeling and the GNINA scoring function, a Convolutional Neural Network-based scoring function, for the ligand binding prediction in CASP15. Among the 21...

Relating mutational signature exposures to clinical data in cancers via signeR 2.0

Cancer is a collection of diseases caused by the deregulation of cell processes, which is triggered by somatic mutations. The search for patterns in somatic mutations, known as mutational signatures, is a growing field of study that has already become a useful tool in oncology. Several algorithms have been proposed to perform one or both the following two tasks: (1) de novo...

A cell abundance analysis based on efficient PAM clustering for a better understanding of the dynamics of endometrial remodelling

Single-cell RNA sequencing (scRNA-seq) is a powerful tool for investigating cell abundance changes during tissue regeneration and remodeling processes. Differential cell abundance supports the initial clustering of all cells; then, the number of cells per cluster and sample are evaluated, and the dependence of these counts concerning the phenotypic covariates of the samples is...