Prediction of 5-year overall survival of tongue cancer based machine learning
(2023) 23:567
Li et al. BMC Oral Health
https://doi.org/10.1186/s12903-023-03255-w
BMC Oral Health
Open Access
RESEARCH
Prediction of 5‑year overall survival
of tongue cancer based machine learning
Liangbo Li1,2†, Cheng Pu3,4†, Nenghao Jin1,2, Liang Zhu1,2, Yanchun Hu3,4, Piero Cascone5, Ye Tao2* and
Haizhong Zhang2*
Abstract
Objective We aimed to develop a 5-year overall survival prediction model for patients with oral tongue squamous
cell carcinoma based on machine learning methods.
Subjects and methods The data were obtained from electronic medical records of 224 OTSCC patients at the PLA
General Hospital. A five-year overall survival prediction model was constructed using logistic regression, Support
Vector Machines, Decision Tree, Random Forest, Extreme Gradient Boosting, and Light Gradient Boosting Machine.
Model performance was evaluated according to the area under the curve (AUC) of the receiver operating characteristic curve. The output of the optimal model was explained using the Python package (SHapley Additive exPlanations,
SHAP).
Results After passing through the grid search and secondary modeling, the Light Gradient Boosting Machine
was the best prediction model (AUC = 0.860). As explained by SHapley Additive exPlanations, N-stage, age, systemic
inflammation response index, positive lymph nodes, plasma fibrinogen, lymphocyte-to-monocyte ratio, neutrophil
percentage, and T-stage could perform a 5-year overall survival prediction for OTSCC. The 5-year survival rate was 42%.
Conclusion The Light Gradient Boosting Machine prediction model predicted 5-year overall survival in OTSCC
patients, and this predictive tool has potential prognostic implications for patients with OTSCC.
Keywords Overall survival, Prediction model, Oral tongue squamous cell carcinoma, Machine learning, Electronic
medical records
†
Liangbo Li and Cheng Pu contributed equally to this work.
*Correspondence:
Ye Tao
Haizhong Zhang
1
Medical School of Chinese PLA, Beijing, China
2
Department of Stomatology, Chinese PLA General Hospital, 28 Fuxing
Road, Haidian District, Beijing 100853, China
3
Key Laboratory of Animal Disease and Human Health of Sichuan
Province, Chengdu, China
4
College of Veterinary Medicine, Sichuan Agricultural University, Sichuan,
China
5
Unicamillus International Meical University, Rome, Italy
Introduction
Oral tongue squamous cell carcinoma (OTSCC) is a
common oral cancer. Because OTSCC is characterized by local invasion and early lymph node metastasis,
it often leads to a high recurrence rate and mortality
rate [1, 2]. According to statistics in the United States,
17,060 tongue cancer cases increased, and 3,020 tongue
cancer patients died per day in 2019 [3]. Therefore, a
clinically OTSCC survival prediction model is needed
to assist clinicians in the treatment to make timely use
of tertiary prevention strategies to reduce recurrence
and complications [4].
Currently, the TNM staging system is an objective and
accurate tool for predicting prognosis in OTSCC patients
© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativeco
mmons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Li et al. BMC Oral Health
(2023) 23:567
Page 2 of 9
[5]. This prognostic tool only considers the characteristics of the tumor itself and does not contain multiple
complex factors [6, 7]. Additionally, not everyone can
afford it due to the expensive operation cost. Therefore, it
is necessary to identify a simple, economic and accurate
prognostic tool.
There have been relevant studies showing that machine
learning of large medical data obtained from real-world
electronic medical records is supporting doctors in the
diagnosis and management of diabetic nephropathy [8].
Inspired by this, we hoped to use machine learning technology to build a predictive model to predict the 5-year
survival rate of OTSCC patients based on electronic
medical records. To the best of our knowledge, there is
no predictive model of OTSCC patient survival using six
machine learning methods based on electronic medical
records.
tumors; (3) patients receiving anti-tumor treatment
before surgery. After applying strict inclusion and exclusion criteria, 224 patients finally met the requirements.
The endpoint event of the present study was the overall
survival rate (OS). The OS was defined as the interval
between the date of surgery and death or the last followup. The last follow-up date was 1 April 2022. The flow
chart of this study is shown in Fig. 1.
Materials and methods
Model development
Data source
Data were obtained from the electronic medical records
of 224 patients with OTSCC reported at the PLA General
Hospital from August 2009 to December 2017, containing 51 clinical features as follows: age, sex, height, weight,
body mass index (BMI), hypertension, diabetes, white
blood cell count (WBC), neutrophil percentage (N), lymphocyte percentage (L), monocyte percentage (M), platelet count (PLT), lymphocyte-to-monocyte ratio (LMR),
platelet-to-lymphocyte ratio (PLR), neutrophil-to-lymphocyte ratio (NLR), systemic inflammation response
index (SIRI), hematocrit (Hct), mean cellular hemoglobin
concentration (MCHC), average platelet volume (MPV),
activated partial thrombin time (APTT), plasma fibrinogen (FIB), hemoglobin (Hb), albumin, glycosylated Hb,
targeted therapy, tumor size, tumor location, T-stage,
N-stage, positive lymph nodes, histologic grade, OTSCC
classification, urinary specific gravity (SG), urinary red
blood cell count (RBC), blood urea nitrogen (BUN),
serum creatinine (SCR), serum uric acid (SUA), total
bilirubin (T-BiL), direct bilirubin (D-BiL), homocysteine
(HCY), γ-glutamine transferase (GGT), random blood
glucose (RBG), total cholesterol (TC), triglyceride (TG),
high-density lipoprotein (HDL), low density lipoprotein
(LDL), calcium (Ca), phosphorus (P), serum potassium
(K), serum sodium (Na), and bicarbonate.
Predictive models were used to construct the 5-year
overall survival of OTSCC patients using six machine
learning methods, (...truncated)