Machine learning based prediction for oncologic outcomes of renal cell carcinoma after surgery using Korean Renal Cell Carcinoma (KORCC) database

Scientific Reports, Apr 2023

We developed a novel prediction model for recurrence and survival in patients with localized renal cell carcinoma (RCC) after surgery and a novel statistical method of machine learning (ML) to improve accuracy in predicting outcomes using a large Asian nationwide dataset, updated KOrean Renal Cell Carcinoma (KORCC) database that covered data for a total of 10,068 patients who had received surgery for RCC. After data pre-processing, feature selection was performed with an elastic net. Nine variables for recurrence and 13 variables for survival were extracted from 206 variables. Synthetic minority oversampling technique (SMOTE) was used for the training data set to solve the imbalance problem. We applied the most of existing ML algorithms introduced so far to evaluate the performance. We also performed subgroup analysis according to the histologic type. Diagnostic performances of all prediction models achieved high accuracy (range, 0.77–0.94) and F1-score (range, 0.77–0.97) in all tested metrics. In an external validation set, high accuracy and F1-score were well maintained in both recurrence and survival. In subgroup analysis of both clear and non-clear cell type RCC group, we also found a good prediction performance.

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41598-023-30826-2.pdf

Machine learning based prediction for oncologic outcomes of renal cell carcinoma after surgery using Korean Renal Cell Carcinoma (KORCC) database

www.nature.com/scientificreports OPEN Machine learning based prediction for oncologic outcomes of renal cell carcinoma after surgery using Korean Renal Cell Carcinoma (KORCC) database Jung Kwon Kim 1,2, Sangchul Lee 1,2, Sung Kyu Hong 1,2, Cheol Kwak 2,3, Chang Wook Jeong 2,3, Seok Ho Kang 4, Sung‑Hoo Hong 5, Yong‑June Kim 6, Jinsoo Chung 7, Eu Chang Hwang 8, Tae Gyun Kwon 9, Seok‑Soo Byun 1,10*, Yu Jin Jung 10, Junghyun Lim 11, Jiyeon Kim 11 & Hyeju Oh 11 We developed a novel prediction model for recurrence and survival in patients with localized renal cell carcinoma (RCC) after surgery and a novel statistical method of machine learning (ML) to improve accuracy in predicting outcomes using a large Asian nationwide dataset, updated KOrean Renal Cell Carcinoma (KORCC) database that covered data for a total of 10,068 patients who had received surgery for RCC. After data pre-processing, feature selection was performed with an elastic net. Nine variables for recurrence and 13 variables for survival were extracted from 206 variables. Synthetic minority oversampling technique (SMOTE) was used for the training data set to solve the imbalance problem. We applied the most of existing ML algorithms introduced so far to evaluate the performance. We also performed subgroup analysis according to the histologic type. Diagnostic performances of all prediction models achieved high accuracy (range, 0.77–0.94) and F1-score (range, 0.77–0.97) in all tested metrics. In an external validation set, high accuracy and F1-score were well maintained in both recurrence and survival. In subgroup analysis of both clear and non-clear cell type RCC group, we also found a good prediction performance. The incidence of renal cell carcinoma (RCC) is increasing worldwide. Approximately 76,000 new cases and almost 14,000 deaths from RCC were reported in the US in 2 0211. In Korea, we also observed the same trend according to the latest cancer incidence statistics from the Korea Central Cancer R egistry2. Among them, clear cell type RCC represents approximately 70% cases in adults3. Estimated 5-year survival rate of localized RCC patients is approximately 90%. However, in about 30% of either recurrence or metastasis cases, the survival rate is drastically reduced4. Thus, it is imperative to predict the high-risk group for recurrence in advance and establish a differentiated surveillance protocol for patients who have undergone a curative surgery. Over the past decades, several nomograms for recurrence and/or survival of localized RCC have been developed and applied in clinical p ractice5–8. Among them, the Kattan nomogram based on pathological T stage, nuclear grade, tumor size, necrosis, vascular invasion, and clinical presentation was the first introduced and widely used m odel5,6. Subsequently, the Leibovich model was developed by Mayo Clinic to estimate the risk of metastasis or recurrence using tumor stage, regional lymph node status, tumor size, nuclear grade and histologic 1 Department of Urology, Seoul National University Bundang Hospital, Seongnam, Korea. 2Department of Urology, Seoul National University College of Medicine, Seoul, Korea. 3Department of Urology, Seoul National University Hospital, Seoul, Korea. 4Department of Urology, Korea University Anam Hospital, Seoul, Korea. 5Department of Urology, Seoul St. Mary’s Hospital, The Catholic University of Korea, Seoul, Korea. 6Department of Urology, Chungbuk National University Hospital, Cheongju, Korea. 7Department of Urology, National Cancer Center, Goyang, Korea. 8Department of Urology, Chonnam National University Medical School, Gwangju, Korea. 9Department of Urology, Kyungpook National University Chilgok Hospital, Daegu, Korea. 10Department of Medical Device Development, Seoul National University College of Medicine, Seoul, Korea. 11The IMC Lnc., Daegu, Korea. *email: Scientific Reports | (2023) 13:5778 | https://doi.org/10.1038/s41598-023-30826-2 1 Vol.:(0123456789) www.nature.com/scientificreports/ tumor necrosis7. The most recently developed model known as the GRANT score was based on patient age, nuclear grade, and pathologic T/N stage8. However, these models were developed and validated using a small cohort from a single institution. In addition, they were limited to Western datasets. Moreover, their prediction accuracies were not as high as expected. For most models, their accuracy values were around 0.75–8. Thus, we tried to develop a novel prediction model for recurrence and survival in patients with localized RCC after surgery using a large Asian nationwide dataset. We also used a novel statistical method of machine learning (ML) to improve accuracy in predicting outcomes. Materials and methods Ethics statement. The Institutional Review Board (IRB) of Seoul National University Bundang Hospital approved this study (approval number: B-2106-688-108). The requirement for obtaining written informed consent from patients was waived by the IRB due to the retrospective nature of this study. Personal identifiers were completely deleted to ensure that data were analyzed anonymously. Our study was conducted according to the ethical standards of the 1964 Declaration of Helsinki and its later amendments. Data sets. The KOrean Renal Cell Carcinoma (KORCC) database was first established in 2011. It had data from eight academic institutions n ationwide9. Recently, data of each institution were updated from March to June 2021. Subsequently, the updated KORCC database covered data of a total of 10,068 patients who had received surgery for RCC with 206 variables, including demographic, perioperative, pathologic, and survival information. Model development (n = 4,829) and internal validation (n = 2,070) were performed using data from seven centers except data from Seoul National University Bundang Hospital (SNUBH, n = 3,169). External validation was performed using data from the SNUBH to assess the generality of the model performance. SNUBH was suitable for external validation because of its size and diverse patient population. All study procedures were performed according to the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) recommendations10. All institutions obtained IRB approvals before inputting data into the database. Unified data templates were used for consistent data collection at each institution. Survival data were retrospectively reviewed from medical records or identified from death certificate data. Data processing and feature selection. Data pre-processing mainly included processing missing values to obtain a reliable set of data. The missing value imputation process was divided into three aspects: patients, predictors, and statistics. At first, we eliminated patients with missing basic information. Subsequently, we performed predictive analytics for variables including total protein, Hb, creatinine. For this method, we used Euclidean distance to determine the si (...truncated)


This is a preview of a remote PDF: https://www.nature.com/articles/s41598-023-30826-2.pdf
Article home page: https://www.nature.com/articles/s41598-023-30826-2

Kim, Jung Kwon, Lee, Sangchul, Hong, Sung Kyu, Kwak, Cheol, Jeong, Chang Wook, Kang, Seok Ho, Hong, Sung-Hoo, Kim, Yong-June, Chung, Jinsoo, Hwang, Eu Chang, Kwon, Tae Gyun, Byun, Seok-Soo, Jung, Yu Jin, Lim, Junghyun, Kim, Jiyeon, Oh, Hyeju. Machine learning based prediction for oncologic outcomes of renal cell carcinoma after surgery using Korean Renal Cell Carcinoma (KORCC) database, Scientific Reports, DOI: 10.1038/s41598-023-30826-2