Prediction of 5-year overall survival of tongue cancer based machine learning

BMC Oral Health, Aug 2023

We aimed to develop a 5-year overall survival prediction model for patients with oral tongue squamous cell carcinoma based on machine learning methods. The data were obtained from electronic medical records of 224 OTSCC patients at the PLA General Hospital. A five-year overall survival prediction model was constructed using logistic regression, Support Vector Machines, Decision Tree, Random Forest, Extreme Gradient Boosting, and Light Gradient Boosting Machine. Model performance was evaluated according to the area under the curve (AUC) of the receiver operating characteristic curve. The output of the optimal model was explained using the Python package (SHapley Additive exPlanations, SHAP). After passing through the grid search and secondary modeling, the Light Gradient Boosting Machine was the best prediction model (AUC = 0.860). As explained by SHapley Additive exPlanations, N-stage, age, systemic inflammation response index, positive lymph nodes, plasma fibrinogen, lymphocyte-to-monocyte ratio, neutrophil percentage, and T-stage could perform a 5-year overall survival prediction for OTSCC. The 5-year survival rate was 42%. The Light Gradient Boosting Machine prediction model predicted 5-year overall survival in OTSCC patients, and this predictive tool has potential prognostic implications for patients with OTSCC.

Article PDF cannot be displayed. You can download it here:

https://bmcoralhealth.biomedcentral.com/counter/pdf/10.1186/s12903-023-03255-w

Prediction of 5-year overall survival of tongue cancer based machine learning

(2023) 23:567 Li et al. BMC Oral Health https://doi.org/10.1186/s12903-023-03255-w BMC Oral Health Open Access RESEARCH Prediction of 5‑year overall survival of tongue cancer based machine learning Liangbo Li1,2†, Cheng Pu3,4†, Nenghao Jin1,2, Liang Zhu1,2, Yanchun Hu3,4, Piero Cascone5, Ye Tao2*    and Haizhong Zhang2*    Abstract Objective We aimed to develop a 5-year overall survival prediction model for patients with oral tongue squamous cell carcinoma based on machine learning methods. Subjects and methods The data were obtained from electronic medical records of 224 OTSCC patients at the PLA General Hospital. A five-year overall survival prediction model was constructed using logistic regression, Support Vector Machines, Decision Tree, Random Forest, Extreme Gradient Boosting, and Light Gradient Boosting Machine. Model performance was evaluated according to the area under the curve (AUC) of the receiver operating characteristic curve. The output of the optimal model was explained using the Python package (SHapley Additive exPlanations, SHAP). Results After passing through the grid search and secondary modeling, the Light Gradient Boosting Machine was the best prediction model (AUC = 0.860). As explained by SHapley Additive exPlanations, N-stage, age, systemic inflammation response index, positive lymph nodes, plasma fibrinogen, lymphocyte-to-monocyte ratio, neutrophil percentage, and T-stage could perform a 5-year overall survival prediction for OTSCC. The 5-year survival rate was 42%. Conclusion The Light Gradient Boosting Machine prediction model predicted 5-year overall survival in OTSCC patients, and this predictive tool has potential prognostic implications for patients with OTSCC. Keywords Overall survival, Prediction model, Oral tongue squamous cell carcinoma, Machine learning, Electronic medical records † Liangbo Li and Cheng Pu contributed equally to this work. *Correspondence: Ye Tao Haizhong Zhang 1 Medical School of Chinese PLA, Beijing, China 2 Department of Stomatology, Chinese PLA General Hospital, 28 Fuxing Road, Haidian District, Beijing 100853, China 3 Key Laboratory of Animal Disease and Human Health of Sichuan Province, Chengdu, China 4 College of Veterinary Medicine, Sichuan Agricultural University, Sichuan, China 5 Unicamillus International Meical University, Rome, Italy Introduction Oral tongue squamous cell carcinoma (OTSCC) is a common oral cancer. Because OTSCC is characterized by local invasion and early lymph node metastasis, it often leads to a high recurrence rate and mortality rate [1, 2]. According to statistics in the United States, 17,060 tongue cancer cases increased, and 3,020 tongue cancer patients died per day in 2019 [3]. Therefore, a clinically OTSCC survival prediction model is needed to assist clinicians in the treatment to make timely use of tertiary prevention strategies to reduce recurrence and complications [4]. Currently, the TNM staging system is an objective and accurate tool for predicting prognosis in OTSCC patients © The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativeco mmons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Li et al. BMC Oral Health (2023) 23:567 Page 2 of 9 [5]. This prognostic tool only considers the characteristics of the tumor itself and does not contain multiple complex factors [6, 7]. Additionally, not everyone can afford it due to the expensive operation cost. Therefore, it is necessary to identify a simple, economic and accurate prognostic tool. There have been relevant studies showing that machine learning of large medical data obtained from real-world electronic medical records is supporting doctors in the diagnosis and management of diabetic nephropathy [8]. Inspired by this, we hoped to use machine learning technology to build a predictive model to predict the 5-year survival rate of OTSCC patients based on electronic medical records. To the best of our knowledge, there is no predictive model of OTSCC patient survival using six machine learning methods based on electronic medical records. tumors; (3) patients receiving anti-tumor treatment before surgery. After applying strict inclusion and exclusion criteria, 224 patients finally met the requirements. The endpoint event of the present study was the overall survival rate (OS). The OS was defined as the interval between the date of surgery and death or the last followup. The last follow-up date was 1 April 2022. The flow chart of this study is shown in Fig. 1. Materials and methods Model development Data source Data were obtained from the electronic medical records of 224 patients with OTSCC reported at the PLA General Hospital from August 2009 to December 2017, containing 51 clinical features as follows: age, sex, height, weight, body mass index (BMI), hypertension, diabetes, white blood cell count (WBC), neutrophil percentage (N), lymphocyte percentage (L), monocyte percentage (M), platelet count (PLT), lymphocyte-to-monocyte ratio (LMR), platelet-to-lymphocyte ratio (PLR), neutrophil-to-lymphocyte ratio (NLR), systemic inflammation response index (SIRI), hematocrit (Hct), mean cellular hemoglobin concentration (MCHC), average platelet volume (MPV), activated partial thrombin time (APTT), plasma fibrinogen (FIB), hemoglobin (Hb), albumin, glycosylated Hb, targeted therapy, tumor size, tumor location, T-stage, N-stage, positive lymph nodes, histologic grade, OTSCC classification, urinary specific gravity (SG), urinary red blood cell count (RBC), blood urea nitrogen (BUN), serum creatinine (SCR), serum uric acid (SUA), total bilirubin (T-BiL), direct bilirubin (D-BiL), homocysteine (HCY), γ-glutamine transferase (GGT), random blood glucose (RBG), total cholesterol (TC), triglyceride (TG), high-density lipoprotein (HDL), low density lipoprotein (LDL), calcium (Ca), phosphorus (P), serum potassium (K), serum sodium (Na), and bicarbonate. Predictive models were used to construct the 5-year overall survival of OTSCC patients using six machine learning methods, (...truncated)


This is a preview of a remote PDF: https://bmcoralhealth.biomedcentral.com/counter/pdf/10.1186/s12903-023-03255-w
Article home page: https://bmcoralhealth.biomedcentral.com/articles/10.1186/s12903-023-03255-w

Li, Liangbo, Pu, Cheng, Jin, Nenghao, Zhu, Liang, Hu, Yanchun, Cascone, Piero, Tao, Ye, Zhang, Haizhong. Prediction of 5-year overall survival of tongue cancer based machine learning, BMC Oral Health, 2023, pp. 1-9, Volume 23, Issue 1, DOI: 10.1186/s12903-023-03255-w