Deep learning application of vertebral compression fracture detection using mask R-CNN
www.nature.com/scientificreports
OPEN
Deep learning application
of vertebral compression fracture
detection using mask R‑CNN
Seungyoon Paik 1, Jiwon Park 2, Jae Young Hong 2 & Sung Won Han 1*
Vertebral compression fractures (VCFs) of the thoracolumbar spine are commonly caused by
osteoporosis or result from traumatic events. Early diagnosis of vertebral compression fractures can
prevent further damage to patients. When assessing these fractures, plain radiographs are used as
the primary diagnostic modality. In this study, we developed a deep learning based fracture detection
model that could be used as a tool for primary care in the orthopedic department. We constructed a
VCF dataset using 487 lateral radiographs, which included 598 fractures in the L1-T11 vertebra. For
detecting VCFs, Mask R-CNN model was trained and optimized, and was compared to three other
popular models on instance segmentation, Cascade Mask R-CNN, YOLOACT, and YOLOv5. With
Mask R-CNN we achieved highest mean average precision score of 0.58, and were able to locate each
fracture pixel-wise. In addition, the model showed high overall sensitivity, specificity, and accuracy,
indicating that it detected fractures accurately and without misdiagnosis. Our model can be a
potential tool for detecting VCFs from a simple radiograph and assisting doctors in making appropriate
decisions in initial diagnosis.
Vertebral compression fractures (VCFs) are breaks or cracks in the vertebrae, which can cause the spine to
weaken or collapse. VCFs affect approximately 1 to 1.5 million people annually in the United States1. Although
some VCFs are caused by trauma or tumors, they are more common in the elderly and women with osteoporosis. Most VCFs occur in the thoracic and lumbar vertebrae, or at the thoracolumbar junction. In the diagnosis
of VCFs, plain radiographs are the initial diagnostic modality. When neurological disorder is suspected, other
more complex modalities such as computed tomography (CT), magnetic resonance imaging (MRI) are ordered.
Identifying fractures in bone images is a time-consuming and labor-intensive process, that requires manual
inspection by a highly trained radiologist or an o
rthopedic2. Inexperience of the clinician or fatigue caused by
excessive workloads of physicians can lead to an inaccurate diagnosis, which can be fatal to patients.
Deep learning (DL) algorithms, particularly convolutional neural networks (CNN), have become a powerful method in medical imaging d
iagnosis3,4. Because they are designed to learn spatial hierarchies of features
through convolution layers, they are widely used in computer vision tasks such as image classification, object
detection, and segmentation. Many studies dealt with identifying bone fractures of various areas of the body
using medical images5–7. Recently, there have been numerous studies on the use of CNN-based algorithms to
assist spinal disease diagnosis including vertebral fractures. Some studies proposed segmentation models for the
vertebrae8–11. These studies utilized detection and segmentation models, and they approached VCF diagnosis as
a two-step process of segmenting every vertebra and the evaluating each of them. Other studies applied CNNbased models for classification of radiograph for diagnostic p
urposes12–15.
However, there have been very few studies dealing with the detection of vertebral fractures on X-rays due
to several reasons. It is difficult to acquire a sufficient amount of radiographs of the spine for a specific fracture
compared to other fractures of the body, because radiographs are not used for a final diagnosis. Moreover, the
labeling process for each fracture on the radiograph is very labor-intensive and challenging, because even experts
should match each radiograph with CT or MRI results to find the ground truth. Existing studies regarding the
diagnosis of vertebral fractures with DL algorithms are mostly focused on the classification of the medical image
or the segmentation of each vertebra. It can be observed that most of the existing works have focused on the
classification of each medical image, or considered a two-step process of evaluating fractures after segmenting
every spine. In this study, (1) we constructed a high-quality dataset of VCFs of L1-T11 vertebra on lateral spinal
X-rays, which were annotated based on the MRI results; (2) subsequently, we proposed a pipeline of training
1
School of Industrial and Management Engineering, Korea University, Anam‑ro 145, Seongbuk‑gu, Seoul 02841,
South Korea. 2Department of Orthopaedic Surgery, Korea University Ansan Hospital, 123, Jeokgeum‑ro,
Danwon‑gu, Ansan, Gyeonggi‑do, South Korea. *email:
Scientific Reports |
(2024) 14:16308
| https://doi.org/10.1038/s41598-024-67017-6
1
Vol.:(0123456789)
www.nature.com/scientificreports/
and optimizing a highly accurate Mask R-CNN model to directly locate and classify the fracture and compared
it with other popular CNN-based models; (3) and finally, we showed the feasibility of developing a generalized
deep learning based diagnosis tool and widened the possibility of real-world use of the model to assist doctors
in detection VCFs.
Materials and methods
Data source and preprocessing
The dataset used in this study was obtained as lateral thoracolumbar radiographs of patients from Ansan Hospital, the University of Korea. The collected dataset contained 487 radiographs with fractures, and 141 normal
radiographs. Only X-rays confirmed as compression fractures based on MRI results were collected and labeled.
The X-ray was de-identified before being used, so that each patient’s personal information was removed according
to the ethical guidelines. Overall, 598 segmentation masks of marked fractures were extracted from 487 lateral
thoracolumbar X-rays and used to train and test each model.
A total of six MRI-based class labels were defined and locations were marked during data preprocessing : L1
, L2 , L3 , L4 , T11, T12 fractures. Two orthopedic experts labelled the location and the type of vertebra, using an
open source labeling software ‘labelme’, version 5.0.2 (https://github.com/labelmeai/labelme)16. Each polygon
mask included fracture information on fractures in the six classes (L1-T11), and coordinates of identified fractures at each point of the polygon. Figure 1 shows an example of labeled data used in training. Multiple VCFs
were identified in approximately 20% of the patients, and were also labeled as separate polygons.
Study settings
In this study, approximately 70% (346 radiographs) of the dataset were used to train the neural network, and
approximately 15% each were allocated to validation (71 radiographs) and test data (70 radiographs). Train,
validation, test data were split in a stratified manner to consider classwise distribution. Radiographs with no
fractures were used only in the test phase. We used stochastic gradient descent considering momentum as the
optimization method. The learning rate was se (...truncated)