Oscillatory behavior is critical to many life sustaining processes such as cell cycles, circadian rhythms, and notch signaling. Important biological functions depend on the characteristics of these oscillations (hereafter, oscillation characteristics or OCs): frequency (e.g., event timings), amplitude (e.g., signal strength), and phase (e.g., event sequencing). Numerous...
Allele-specific binding (ASB) events occur when transcription factors (TFs) bind more favorably to one of the two parental alleles at heterozygous single nucleotide polymorphisms (SNPs). Evidence suggests that ASB events could reveal the impact of sequence variations on TF binding and may have implications for the risk of diseases. Here we present ASB-analyzer, a software...
Hierarchical classification offers a more specific categorization of data and breaks down large classification problems into subproblems, providing improved prediction accuracy and predictive power for undefined categories, while also mitigating the impact of poor-quality data. Despite these advantages, its application in predicting primary cancer is rare. To leverage the...
Variability in datasets is not only the product of biological processes: they are also the product of technical biases. ComBat and ComBat-Seq are among the most widely used tools for correcting those technical biases, called batch effects, in, respectively, microarray and RNA-Seq expression data. In this technical note, we present a new Python implementation of ComBat and ComBat...
Synthetic biologists use and combine diverse biological parts to build systems such as genetic circuits that perform desirable functions in, for example, biomedical or industrial applications. Computer-aided design methods have been developed to help choose appropriate network structures and biological parts for a given design objective. However, they almost always model the...
Basecalling long DNA sequences is a crucial step in nanopore-based DNA sequencing protocols. In recent years, the CTC-RNN model has become the leading basecalling model, supplanting preceding hidden Markov models (HMMs) that relied on pre-segmenting ion current measurements. However, the CTC-RNN model operates independently of prior biological and physical insights. We present a...
Synonymous mutations, which change the DNA sequence but not the encoded protein sequence, can affect protein structure and function, mRNA maturation, and mRNA half-lives. The possibility that synonymous mutations might be enriched in cancer has been explored in several recent studies. However, none of these studies control for all three types of mutational heterogeneity (patient...
Single-cell sequencing has shed light on previously inaccessible biological questions from different fields of research, including organism development, immune function, and disease progression. The number of single-cell-based studies increased dramatically over the past decade. Several new methods and tools have been continuously developed, making it extremely tricky to navigate...
Biclustering is increasingly used in biomedical data analysis, recommendation tasks, and text mining domains, with hundreds of biclustering algorithms proposed. When assessing the performance of these algorithms, more than real datasets are required as they do not offer a solid ground truth. Synthetic data surpass this limitation by producing reference solutions to be compared...
Intense sun exposure is a major risk factor for the development of melanoma, an abnormal proliferation of skin cells. Yet, this more prevalent type of skin cancer can also develop in less-exposed areas, such as those that are shaded. Melanoma is the sixth most common type of skin cancer. In recent years, computer-based methods for imaging and analyzing biological systems have...
Protein–protein interactions (PPIs) are crucial in various biological functions and cellular processes. Thus, many computational approaches have been proposed to predict PPI sites. Although significant progress has been made, these methods still have limitations in encoding the characteristics of each amino acid in sequences. Many feature extraction methods rely on the sliding...
The escalation of viruses over the past decade has highlighted the need to determine their respective hosts, particularly for emerging ones that pose a potential menace to the welfare of both human and animal life. Yet, the traditional means of ascertaining the host range of viruses, which involves field surveillance and laboratory experiments, is a laborious and demanding...
The identification of essential proteins is of great significance in biology and pathology. However, protein–protein interaction (PPI) data obtained through high-throughput technology include a high number of false positives. To overcome this limitation, numerous computational algorithms based on biological characteristics and topological features have been proposed to identify...
Genomic insights in settings where tumour sample sizes are limited to just hundreds or even tens of cells hold great clinical potential, but also present significant technical challenges. We previously developed the DigiPico sequencing platform to accurately identify somatic mutations from such samples. Here, we complete this genomic characterisation with copy number. We present...
Genomic sequencing reads compressors are essential for balancing high-throughput sequencing short reads generation speed, large-scale genomic data sharing, and infrastructure storage expenditure. However, most existing short reads compressors rarely utilize big-memory systems and duplicative information between diverse sequencing files to achieve a higher compression ratio for...
As an important task in bioinformatics, clustering analysis plays a critical role in understanding the functional mechanisms of many complex biological systems, which can be modeled as biological networks. The purpose of clustering analysis in biological networks is to identify functional modules of interest, but there is a lack of online clustering tools that visualize...
Protein ubiquitination is a critical post-translational modification (PTMs) involved in numerous cellular processes. Identifying ubiquitination sites (Ubi-sites) on proteins offers valuable insights into their function and regulatory mechanisms. Due to the cost- and time-consuming nature of traditional approaches for Ubi-site detection, there has been a growing interest in...
Acute myeloid leukaemia (AML) is characterised by the malignant accumulation of myeloid progenitors with a high recurrence rate after chemotherapy. Blasts (leukaemia cells) exhibit a complete myeloid differentiation hierarchy hiding a wide range of temporal information from initial to mature clones, including genesis, phenotypic transformation, and cell fate decisions, which...
Single-cell (SC) gene expression analysis is crucial to dissect the complex cellular heterogeneity of solid tumors, which is one of the main obstacles for the development of effective cancer treatments. Such tumors typically contain a mixture of cells with aberrant genomic and transcriptomic profiles affecting specific sub-populations that might have a pivotal role in cancer...
Galaxy is a web-based open-source platform for scientific analyses. Researchers use thousands of high-quality tools and workflows for their respective analyses in Galaxy. Tool recommender system predicts a collection of tools that can be used to extend an analysis. In this work, a tool recommender system is developed by training a transformer on workflows available on Galaxy...
Aptamers, which are biomaterials comprised of single-stranded DNA/RNA that form tertiary structures, have significant potential as next-generation materials, particularly for drug discovery. The systematic evolution of ligands by exponential enrichment (SELEX) method is a critical in vitro technique employed to identify aptamers that bind specifically to target proteins. While...
The discovery of anticancer drug combinations is a crucial work of anticancer treatment. In recent years, pre-screening drug combinations with synergistic effects in a large-scale search space adopting computational methods, especially deep learning methods, is increasingly popular with researchers. Although achievements have been made to predict anticancer synergistic drug...
For ligand binding prediction, it is crucial for molecular docking programs to integrate template-based modeling with a precise scoring function. Here, we proposed the CoDock-Ligand docking method that combines template-based modeling and the GNINA scoring function, a Convolutional Neural Network-based scoring function, for the ligand binding prediction in CASP15. Among the 21...
Cancer is a collection of diseases caused by the deregulation of cell processes, which is triggered by somatic mutations. The search for patterns in somatic mutations, known as mutational signatures, is a growing field of study that has already become a useful tool in oncology. Several algorithms have been proposed to perform one or both the following two tasks: (1) de novo...
Single-cell RNA sequencing (scRNA-seq) is a powerful tool for investigating cell abundance changes during tissue regeneration and remodeling processes. Differential cell abundance supports the initial clustering of all cells; then, the number of cells per cluster and sample are evaluated, and the dependence of these counts concerning the phenotypic covariates of the samples is...