BioData Mining

Publishing innovative data science and big data research, BioData Mining advances research on all aspects of data mining applied to high-dimensional biological ...

List of Papers (Total 573)

CAUSALRLSTACK: adaptive balancing of deep representation and causal effect estimation with application to HIV-related health data

Estimating individualized causal effects plays a vital role in data-driven decision-making, especially in high-risk domains such as public health. However, current causal inference models often lack flexibility and generalizability due to the tight coupling between representation learning and effect estimation. This study aims to develop a modular and adaptive framework to...

FISM: harnessing deep learning and reinforcement learning for precision detection of microaneurysms and retinal exudates for early diabetic retinopathy diagnosis

Diabetic retinopathy (DR) is a primary cause of blindness globally and its treatment and management depend on accurate and timely identification. Current approaches for DR detection and segmentation repeatedly fall short in accuracy and sturdiness highlighting the essential for advanced computational methods. In this study propose a deep learning model Fundus Images Segmentation...

Using artificial intelligence (AI) to model clinical variant reporting for next generation sequencing (NGS) oncology assays

Targeted next generation sequencing (NGS) of somatic DNA is now routinely used for diagnostic and predictive reporting in the oncology clinic. The expert genomic analysis required for NGS assays remains a bottleneck to scaling the volume of patients being assessed. This study harnesses data from targeted clinical sequencing to build machine learning models that predict whether...

Automatic computational classification of bone marrow cells for B cell pediatric leukemia using UMAP

B Acute Lymphoblastic Leukemia (B-ALL) accounts for approximately 80% of pediatric leukemia cases. Despite treatment advances, 15–20% of children experience relapse, highlighting the need of improved monitoring of patients and novel strategies leading to successful therapies. Flow Cytometry is an essential technique for measuring residual disease and guiding treatment. However...

WHFDL: an explainable method based on World Hyper-heuristic and Fuzzy Deep Learning approaches for gastric cancer detection using metabolomics data

Gastric Cancer remains one of the most prevalent cancers worldwide, with its prognosis heavily reliant on early detection. Traditional GC diagnostic methods are invasive and risky, prompting interest in non-invasive alternatives that could enhance outcomes. In this study, we introduce a non-invasive approach, World Hyper-heuristic Fuzzy Deep Learning, for gastric cancer...

MediNet: ensemble transfer learning approach for classification of medical drugs-related text reviews using significant combined-embeddings

This research work provides an innovative approach, called MediNet, for drug safety review classification that integrates the strengths of three word embedding approaches: FastText, ELMo, and GloVe, alongside an ensemble of EfficientNetB4 and MobileNet models. The unique blend of these word embeddings captures both context-independent and context-dependent representations...

An intelligent healthcare system for rare disease diagnosis utilizing electronic health records based on a knowledge-guided multimodal transformer framework

Rare diseases are a common problem with millions of patients globally, but their diagnosis is difficult because of varied clinical presentations, small sample size, and disparate biomedical data sources. Current diagnostic tools are not able to combine multimodal information effectively, which results in a timely or wrong diagnosis. To fill this gap, this paper suggests a smart...

A graph-theoretic framework for quantitative analysis of angiogenic networks

The endothelial tube formation assay is an established in vitro model for evaluating angiogenesis. Although widely used, quantification of angiogenic behavior in such assays remains semi-empirical and often lacks spatial, topological, and structural context. Here, we present a graph-theoretic framework to quantify network morphology, temporal dynamics, and spatial heterogeneity...

MoRFs_TransFuse: a MoRFs predictor based on multimodal feature fusion and the lightweight Transformer network

Molecular recognition features (MoRFs) can facilitate specific protein-protein interactions by undergoing disorder-to-order transitions when binding to their protein partners. Thus, it is essential to accurately predict MoRFs. In this paper, we propose an innovative MoRFs prediction method, named MoRFs_TransFuse, based on multimodal feature fusion and a lightweight Transformer...

Decoding ancestry-specific genetic risk: interpretable deep feature selection reveals prostate cancer SNP disparities in diverse populations

The clinical potential of single nucleotide polymorphisms (SNPs) in prostate cancer (PCa) diagnosis has been extensively explored using conventional statistical and machine learning approaches. However, the predictive power and interpretability of these methods remain inadequate for clinical translation, primarily due to limited generalization across high-dimensional SNP datasets...

Cross-regional radiomics: a novel framework for relationship-based feature extraction with validation in Parkinson’s disease motor subtyping

Traditional radiomics approaches focus on single-region feature extraction, limiting their ability to capture complex inter-regional relationships crucial for understanding pathophysiological mechanisms in complex diseases. This study introduces a novel cross-regional radiomics framework that systematically extracts relationship-based features between anatomically and...

Temporal phenotyping and prognostic stratification of patients with sepsis through longitudinal clustering

Sepsis is a critical medical condition characterized by a highly variable and rapidly evolving clinical course, often necessitating early intervention and tailored treatment plans to improve patient outcomes. Due to its complexity and heterogeneity, understanding the progression of sepsis across different patient populations remains a significant challenge. In this study, we...

Construction and validation of a machine learning-based model predicting early readmission in patients with decompensated cirrhosis: a prospective two-center cohort study

Early 30-day readmission remains a significant burden on the socioeconomic and healthcare system in the context of decompensated cirrhosis. Early recognition and accurate identification are crucial. However, current evidence is elusive and traditional scores concerning liver disease severity are lacking specificity and sensitivity. We sought to construct and validate an...

Investigating causal effects of HDL-C on cognitive function through cross-sectional and Mendelian randomization analyses: concentration–response patterns and clues for Alzheimer’s disease prevention

Disrupted cholesterol homeostasis may accelerate cognitive aging. This study investigated the relationship between serum HDL-C levels and cognitive function, utilizing cross-sectional data and Mendelian randomization (MR) analysis. A cross-sectional study was conducted using data from the National Health and Nutrition Examination Survey (NHANES) 2011–2014, including 19,931...

Identification of severity related mutation hotspots in SARS-CoV-2 using a density-based clustering approach

The immune response to SARS-CoV-2 varies greatly among individuals yielding highly varying severity levels among the patients. While there are various methods to spot severity associated biomarkers in COVID-19 patients, we investigated highly mutated regions, or mutation hotspots, within the SARS-CoV-2 genome that correlate with patient severity levels. SARS-CoV-2 mutation...

Improving classification on imbalanced genomic data via KDE–based synthetic sampling

Class imbalance poses a serious challenge in biomedical machine learning, particularly in genomics, where datasets are characterized by extremely high dimensionality and very limited sample sizes. In such settings, standard classifiers tend to favor the majority class, leading to biased predictions — an especially problematic issue in clinical diagnostics where rare conditions...

Development of an AI-powered AR glasses system for real-time first aid guidance in emergency situations

AI-powered augmented reality (AR) systems provide real-time, hands-free guidance, enabling untrained individuals to respond effectively in emergencies. By combining AI decision-making with AR visuals, they enhance awareness, reduce stress, and help bridge the gap before professional help arrives, especially in critical or underserved settings. This paper introduces an AI-powered...

Mapping the evolving trend of research on efferocytosis: a comprehensive data-mining-based study

Efferocytosis, the process by which apoptotic cells are recognized and removed by phagocytes, plays a critical role in maintaining tissue homeostasis and modulating inflammatory responses. Over recent decades, an increasing number of studies have investigated the molecular mechanisms and clinical implications of efferocytosis. This bibliometric analysis aims to map the evolving...

The application of artificial intelligence models in predicting the risk of diabetic foot: a multicenter study

This study explores diabetic foot (DF), a severe complication in diabetes, by combining deep learning (DL) and machine learning (ML) to develop a multi-model prediction tool. Early identification of high-risk DF patients can reduce disability and mortality. The research also aims to create an integrated application to assist clinicians in precise, efficient risk assessment for...

Skin in the game: a review of computational models of the skin

With the vast advances in computing technology, computational (or in silico) modelling has emerged as a transformative tool in dermatology. These findings can provide novel insights into complex biological processes and aid in the development of innovative therapeutic and regenerative strategies for the skin. Modelling combines experimental data and knowledge across multiple...

Exploring the common genetic basis of metabolic syndrome-related diseases and chronic kidney disease: insights from extensive genome-wide cross-trait analyses

Chronic kidney disease (CKD) is a globally prevalent chronic condition characterized by progressive renal function decline, imposing significant economic and psychological burdens on patients. Metabolic syndrome (MetS), characterized by obesity, hypertension, hyperglycemia, and dyslipidemia, is a significant risk factor for CKD. A strong epidemiological association exists between...

Short- and long-term weekly patient-reported outcomes prediction undergoing radiotherapy: single-patient time series model vs. transformer-based multi-patient time series model

Patient-reported outcomes (PROs) are direct reports from patients on health status, symptoms, quality of life, or treatment satisfaction, offering critical insights into subjective experiences that clinical metrics may overlook. Accurately predicting personalized short- and long-term weekly PROs during radiotherapy is essential for monitoring health status, optimizing treatment...

Exo-Tox: Identifying Exotoxins from secreted bacterial proteins

Bacterial exotoxins are secreted proteins able to affect target cells, and associated with diseases. Their accurate identification can enhance drug discovery and ensure the safety of bacteria-based medical applications. However, current toxin predictors prioritize broad coverage by mixing toxins from multiple biological kingdoms and diverse control sets. This general approach has...

Drug repurposing for Alzheimer’s disease using a graph-of-thoughts based large language model to infer drug-disease relationships in a comprehensive knowledge graph

Drug repurposing (DR) offers a promising alternative to the high cost and low success rate of traditional drug development, especially for complex diseases like Alzheimer’s disease (AD). This study addressed DR for AD from three key angles: (1) demonstrating how disease-specific knowledge graphs can improve DR performance, (2) evaluating the role of large language models (LLMs...

circGPAcorr: an integrative tool for functional annotation of circular RNAs using expression data

Circular RNAs play a crucial role in cell development and serve as biomarkers in many diseases. Nevertheless, the function of many circular RNAs remains unknown. This function can be inferred from sponging and silencing interactions with micro RNAs and messenger RNAs. We recently proposed a network-based circRNA functional annotation tool, circGPA. However, validation data for...