Finite sample size errors in the context of multiple error sources in quantitative medical imaging: An evaluation for breast magnetic resonance diffusion-weighted imaging (pdf)

Article PDF cannot be displayed. You can download it here:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0341201&type=printable

Finite sample size errors in the context of multiple error sources in quantitative medical imaging: An evaluation for breast magnetic resonance diffusion-weighted imaging

RESEARCH ARTICLE Finite sample size errors in the context of multiple error sources in quantitative medical imaging: An evaluation for breast magnetic resonance diffusion-weighted imaging Jessica V. Eberle 1*, Sebastian Bickelhaupt1,2, Lorenz A. Kapsner 1,3, Sabine Ohlmeyer 1, Evelyn Wenkel1, Michael Uder1, Dominika Skwierawska Katharina Tkotz1, Dominique Hadler1, Tristan A. Kuder4, Frederik B. Laun1 , 1 1 Institute of Radiology, Universitätsklinikum Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany, 2 German Cancer Research Center (DKFZ), Medical Imaging and Radiology - Cancer Prevention, Heidelberg, Germany, 3 Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Medical Informatics, Erlangen, Germany, 4 German Cancer Research Center (DKFZ), Medical Physics in Radiology, Heidelberg, Germany * Abstract OPEN ACCESS Citation: Eberle JV, Bickelhaupt S, Kapsner LA, Ohlmeyer S, Wenkel E, Uder M, et al. (2026) Finite sample size errors in the context of multiple error sources in quantitative medical imaging: An evaluation for breast magnetic resonance diffusion-weighted imaging. PLoS One 21(6): e0341201. https://doi.org/10.1371/ journal.pone.0341201 Editor: Pascal A. T. Baltzer, Medical University of Vienna, AUSTRIA Received: July 26, 2025 Accepted: May 15, 2026 Published: June 4, 2026 Copyright: © 2026 Eberle et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data availability statement: All relevant data are within the paper and its Supporting Information files. Funding: The author(s) received no specific funding for this work. Background Selecting appropriate sample sizes in magnetic resonance imaging studies is a complex process that requires to balance statistical rigor with the practical challenges of measuring a large patient population. In this Institutional Review Board approved study, we evaluate the dominant error types (“finite N” errors versus precision errors) for apparent diffusion coefficient (ADC)-based lesion characterization in diffusionweighted magnetic resonance imaging (DWI) of the female breast in a local dataset and compare our results with current literature. Methods First, in a literature review including 24 published breast DWI studies, the standard error of the area under the receiver operating characteristic curve as a measure of sample size-related errors (finite N errors) was estimated for the reported ADC values and compared to the values, derived from expert readings of a university hospital’s cohort of 171 patients with suspicious breast lesions. Second, precision errors were assessed based on published analyses of the coefficient of variation of ADC values, measured in breast DWI exams. Results Finite N errors were dominant in the in-house study and most of the 24 reviewed studies. The median sample size at which finite N errors and precision errors were equal was determined to be n = 932. PLOS One | https://doi.org/10.1371/journal.pone.0341201 June 4, 2026 1 / 24 Competing interests: The authors have declared that no competing interests exist. Abbreviations: ADC: apparent diffusion coefficient, AUC: area under the receiver operating characteristic curve, BI-RADS: Breast Imaging Reporting and Data System, COV: coefficient of variation, DWI: diffusion-weighted imaging, MRI: magnetic resonance imaging, PDF: probability density function, Std: standard deviation. Discussion This analysis of dominant error types shows that the required sample sizes for the considered use case are not unreasonably large and that reducing sample sizes may not be justified based on the merits of the conducted analysis. Nonetheless, incorporating dominant error type assessments into future studies may provide valuable insights for optimizing study design and improving methodological rigor. Introduction Choosing an adequate sample size is a key task in research, whether for planning a study, obtaining institutional review board approval, or during the publication review process. Established methods to determine an adequate sample size are often based on (estimated) effect sizes, the desired significance level, and statistical power. These methods are well established and widely used in research; however, they also have limitations. For example, effect sizes may not be known a priori, and there are no strict rules on how to choose the significance level [1–3]. A standard level for the significance threshold is 0.05, but there are also reasons to choose other values, such as 0.005 [4]. Given this uncertainty, examining established practices may provide useful guidance. In the field of magnetic resonance imaging (MRI) research, for example, Hanspach et al. and Bögerl et al. investigated the sample sizes in methodological and clinical MRI studies, with median sample sizes of n = 6 [2] and n = 74 [5], respectively. While these provided descriptive information, they did not assess the suitability of the sample sizes used. To address this limitation, the present study assesses the adequacy of sample sizes, following a methodology common in measurement science – namely, estimating individual uncertainty contributions and identifying which contributes most to the total uncertainty (see Ch. 2–3 of [6]). This can be used to identify the limiting factor in diagnostic performance and guide methodological optimization. Uncertainty in a quantitative MRI research study may be introduced by using a finite sample size, leading to a “finite N” error. Naturally, further error sources will be present in any study. At a conceptual level, these error types may be classified into accuracy and precision errors. Precision refers to the test-retest-reproducibility, whereas accuracy refers to how close the mean measured quantitative value is to the true value. Generally, accuracy is much harder to assess in quantitative medical imaging studies, where a reliable ground truth is usually missing. As reports on precision are thus generally more readily available, we focused on the comparison of finite N errors and precision errors in the present investigation. Such an assessment may guide study planning. For example, when the precision error dominates relative to the finite N error, further increasing the sample size may have a limited effect, and efforts may be better directed toward improving measurement precision rather than recruiting additional patients and burdening them with MRI exams. PLOS One | https://doi.org/10.1371/journal.pone.0341201 June 4, 2026 2 / 24 For our analysis, we chose a use case that we deemed representative of the field – apparent diffusion coefficient (ADC)-based lesion characterization in diffusion-weighted magnetic resonance imaging (DWI) of the female breast, which is an established and relevant application field with suf (...truncated)