Categorized contrast enhanced mammography dataset for diagnostic and artificial intelligence research

Scientific Data, Jun 2022

Contrast-enhanced spectral mammography (CESM) is a relatively recent imaging modality with increased diagnostic accuracy compared to digital mammography (DM). New deep learning (DL) models were developed that have accuracies equal to that of an average radiologist. However, most studies trained the DL models on DM images as no datasets exist for CESM images. We aim to resolve this limitation by releasing a Categorized Digital Database for Low energy and Subtracted Contrast Enhanced Spectral Mammography images (CDD-CESM) to evaluate decision support systems. The dataset includes 2006 images, with an average resolution of 2355 × 1315, consisting of 310 mass images, 48 architectural distortion images, 222 asymmetry images, 238 calcifications images, 334 mass enhancement images, 184 non-mass enhancement images, 159 postoperative images, 8 post neoadjuvant chemotherapy images, and 751 normal images, with 248 images having more than one finding. This is the first dataset to incorporate data selection, segmentation annotation, medical reports, and pathological diagnosis for all cases. Moreover, we propose and evaluate a DL-based technique to automatically segment abnormal findings in images.

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41597-022-01238-0.pdf

Categorized contrast enhanced mammography dataset for diagnostic and artificial intelligence research

www.nature.com/scientificdata Categorized contrast enhanced Data Descriptor mammography dataset for diagnostic and artificial intelligence research OPEN Rana Khaled 1 ✉, Maha Helal1, Omar Alfarghaly Hebatalla El Kassas1 & Aly Fahmy2 ✉ 2✉ , Omnia Mokhtar1, Abeer Elkorany2 ✉, Contrast-enhanced spectral mammography (CESM) is a relatively recent imaging modality with increased diagnostic accuracy compared to digital mammography (DM). New deep learning (DL) models were developed that have accuracies equal to that of an average radiologist. However, most studies trained the DL models on DM images as no datasets exist for CESM images. We aim to resolve this limitation by releasing a Categorized Digital Database for Low energy and Subtracted Contrast Enhanced Spectral Mammography images (CDD-CESM) to evaluate decision support systems. The dataset includes 2006 images, with an average resolution of 2355 × 1315, consisting of 310 mass images, 48 architectural distortion images, 222 asymmetry images, 238 calcifications images, 334 mass enhancement images, 184 non-mass enhancement images, 159 postoperative images, 8 post neoadjuvant chemotherapy images, and 751 normal images, with 248 images having more than one finding. This is the first dataset to incorporate data selection, segmentation annotation, medical reports, and pathological diagnosis for all cases. Moreover, we propose and evaluate a DL-based technique to automatically segment abnormal findings in images. Background & Summary Digital mammography (DM) is the gold standard imaging modality for early detection of breast cancer. However, limitations exist in patients with dense breasts as its overall sensitivity decreases1. Contrast-enhanced spectral mammography (CESM) is a contrast-based digital mammogram that has been approved by the Food and Drug Administration (FDA) in 2011 to be used as an adjunct to DM and ultrasound examinations for localization and characterization of occult or inconclusive lesions. Dual-energy image acquisition is performed where low and high-energy images are obtained. Several studies proved that low-energy images obtained appear like the standard DM images and are non-inferior to them2. High-energy images are non-interpretable; to overcome this, low and high-energy images are recombined and subtracted through appropriate image processing to suppress the background breast parenchyma after the acquisition. Figure 1 shows the resulting subtracted images obtained for interpretation, revealing contrast enhancement areas in a suppressed breast tissue background. Findings could be identified according to their density, morphologic, and enhancement characteristics3. However, estimating whether a lesion is benign or malignant without being seen by a radiologist is challenging due to the significant variation in the lesions’ visual characteristics4. Computer-aided detection (CAD) systems were introduced in the early 2000’s to help radiologists interpret mammography images. However, this proved to be challenging in clinical practice due to the increased rate of false positives marked by the CAD systems, which can distract the radiologists5. Currently, the use of artificial intelligence (AI) in radiology is still in its early stages. Nonetheless, algorithms that analyze pixel data distinguish patterns from images that might not have been previously identified even by expert radiologists6. Deep learning (DL) has a promising potential in performing many tasks such as automatically detecting lesions and helping radiologists provide a more accurate diagnosis. Moreover, new multimodal DL models like the perceiver7 make 1 Cairo University, National Institute of Cancer, Radiology Department, Cairo, 11796, Egypt. 2Cairo University, Computers and Artificial Intelligence, Computer Science Department, Cairo, 12613, Egypt. ✉e-mail: r_hkhaled@ hotmail.com; ; ; Scientific Data | (2022) 9:122 | https://doi.org/10.1038/s41597-022-01238-0 1 www.nature.com/scientificdata/ www.nature.com/scientificdata Fig. 1 (a) Low-energy, (b) High-energy, and (c) Subtracted image. it feasible to train on large datasets and extract good unsupervised image representations that can be used on a wide range of tasks. However, fully annotated and large-sized datasets are required and will be crucial for training new DL networks or fine-tuning existing pre-trained DL networks and evaluating them. This is why it is important for radiologists to understand the impact of these machine-learning (ML) based analytical tools and recognize how they might influence and change the radiological practice soon8. In the past couple of years, a small number of public mammography datasets were released, including the Digital Database for Screening Mammography (DDSM)9, the Image Retrieval in Medical Applications (IRMA) project10, the Mammographic Imaging Analysis Society (MIAS) database11, and the Curated Breast Imaging Subset of DDSM (CBIS-DDSM)012. These datasets contain DM images only, and none include CESM images. In this paper, we present a CESM categorized dataset that provides easily-accessible low energy images with corresponding subtracted CESM images, abnormality segmentation annotation, verified medical reports, and pathological diagnosis for all cases. It will add to the ongoing advancements in future mammography DL-based systems. We also propose a new DL-based technique to automatically segment the abnormal findings in the images without intervention from radiologists, as segmentation annotation is a time-consuming task. Methods We collected and reformatted the data into an easily-accessible format. Figure 2 displays the flow diagram of the process to prepare our dataset: image preprocessing, manual annotations, and the automatic segmentation. Technique of contrast enhanced mammography examination. CESM is done using the standard DM equipment but with additional software that performs dual-energy image acquisition. Two minutes after intravenously injecting the patient with non-ionic low-osmolar iodinated contrast material (dose: 1.5 mL/kg), craniocaudal (CC) and mediolateral oblique (MLO) views are obtained. Each view comprises two exposures, one with low energy (peak kilo-voltage values ranging from 26 to 31kVp) and one with high energy (45 to 49 kVp). A complete examination is carried out in about 5–6 minutes. Description of dataset. The dataset is a collection of low-energy images with their corresponding subtracted CESM images gathered from the Radiology Department of the National Cancer Institute, Cairo University, Egypt over the period from January 2019 to February 2021. The images are all high resolution with an average of 2355 × 1315 pixels. Institutional review board approval and patient informed consent to carry out and publish data were obtained from 326 female patients aged from 18 to 90 years. The dataset contains 2006 images with CC and MLO views (1003 low energy images and 1003 subtracted CESM images), samples of low (...truncated)


This is a preview of a remote PDF: https://www.nature.com/articles/s41597-022-01238-0.pdf
Article home page: https://www.nature.com/articles/s41597-022-01238-0

Khaled, Rana, Helal, Maha, Alfarghaly, Omar, Mokhtar, Omnia, Elkorany, Abeer, El Kassas, Hebatalla, Fahmy, Aly. Categorized contrast enhanced mammography dataset for diagnostic and artificial intelligence research, Scientific Data, DOI: 10.1038/s41597-022-01238-0