Finite sample size errors in the context of multiple error sources in quantitative medical imaging: An evaluation for breast magnetic resonance diffusion-weighted imaging
RESEARCH ARTICLE
Finite sample size errors in the context of
multiple error sources in quantitative medical
imaging: An evaluation for breast magnetic
resonance diffusion-weighted imaging
Jessica V. Eberle 1*, Sebastian Bickelhaupt1,2, Lorenz A. Kapsner 1,3,
Sabine Ohlmeyer 1, Evelyn Wenkel1, Michael Uder1, Dominika Skwierawska
Katharina Tkotz1, Dominique Hadler1, Tristan A. Kuder4, Frederik B. Laun1
,
1
1 Institute of Radiology, Universitätsklinikum Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg
(FAU), Erlangen, Germany, 2 German Cancer Research Center (DKFZ), Medical Imaging and Radiology
- Cancer Prevention, Heidelberg, Germany, 3 Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU),
Medical Informatics, Erlangen, Germany, 4 German Cancer Research Center (DKFZ), Medical Physics in
Radiology, Heidelberg, Germany
*
Abstract
OPEN ACCESS
Citation: Eberle JV, Bickelhaupt S, Kapsner LA,
Ohlmeyer S, Wenkel E, Uder M, et al. (2026)
Finite sample size errors in the context of
multiple error sources in quantitative medical
imaging: An evaluation for breast magnetic
resonance diffusion-weighted imaging. PLoS
One 21(6): e0341201. https://doi.org/10.1371/
journal.pone.0341201
Editor: Pascal A. T. Baltzer, Medical University
of Vienna, AUSTRIA
Received: July 26, 2025
Accepted: May 15, 2026
Published: June 4, 2026
Copyright: © 2026 Eberle et al. This is an open
access article distributed under the terms of
the Creative Commons Attribution License,
which permits unrestricted use, distribution,
and reproduction in any medium, provided the
original author and source are credited.
Data availability statement: All relevant
data are within the paper and its Supporting
Information files.
Funding: The author(s) received no specific
funding for this work.
Background
Selecting appropriate sample sizes in magnetic resonance imaging studies is a
complex process that requires to balance statistical rigor with the practical challenges
of measuring a large patient population. In this Institutional Review Board approved
study, we evaluate the dominant error types (“finite N” errors versus precision errors)
for apparent diffusion coefficient (ADC)-based lesion characterization in diffusionweighted magnetic resonance imaging (DWI) of the female breast in a local dataset
and compare our results with current literature.
Methods
First, in a literature review including 24 published breast DWI studies, the standard
error of the area under the receiver operating characteristic curve as a measure of
sample size-related errors (finite N errors) was estimated for the reported ADC values
and compared to the values, derived from expert readings of a university hospital’s
cohort of 171 patients with suspicious breast lesions. Second, precision errors were
assessed based on published analyses of the coefficient of variation of ADC values,
measured in breast DWI exams.
Results
Finite N errors were dominant in the in-house study and most of the 24 reviewed
studies. The median sample size at which finite N errors and precision errors were
equal was determined to be n = 932.
PLOS One | https://doi.org/10.1371/journal.pone.0341201 June 4, 2026
1 / 24
Competing interests: The authors have
declared that no competing interests exist.
Abbreviations: ADC: apparent diffusion coefficient, AUC: area under the receiver operating
characteristic curve, BI-RADS: Breast Imaging
Reporting and Data System, COV: coefficient
of variation, DWI: diffusion-weighted imaging,
MRI: magnetic resonance imaging, PDF: probability density function, Std: standard deviation.
Discussion
This analysis of dominant error types shows that the required sample sizes for the
considered use case are not unreasonably large and that reducing sample sizes may
not be justified based on the merits of the conducted analysis. Nonetheless, incorporating dominant error type assessments into future studies may provide valuable
insights for optimizing study design and improving methodological rigor.
Introduction
Choosing an adequate sample size is a key task in research, whether for planning a
study, obtaining institutional review board approval, or during the publication review
process. Established methods to determine an adequate sample size are often based
on (estimated) effect sizes, the desired significance level, and statistical power.
These methods are well established and widely used in research; however, they also
have limitations. For example, effect sizes may not be known a priori, and there are
no strict rules on how to choose the significance level [1–3]. A standard level for the
significance threshold is 0.05, but there are also reasons to choose other values,
such as 0.005 [4].
Given this uncertainty, examining established practices may provide useful guidance. In the field of magnetic resonance imaging (MRI) research, for example, Hanspach et al. and Bögerl et al. investigated the sample sizes in methodological and
clinical MRI studies, with median sample sizes of n = 6 [2] and n = 74 [5], respectively.
While these provided descriptive information, they did not assess the suitability of the
sample sizes used. To address this limitation, the present study assesses the adequacy of sample sizes, following a methodology common in measurement science –
namely, estimating individual uncertainty contributions and identifying which contributes most to the total uncertainty (see Ch. 2–3 of [6]). This can be used to identify the
limiting factor in diagnostic performance and guide methodological optimization.
Uncertainty in a quantitative MRI research study may be introduced by using a
finite sample size, leading to a “finite N” error. Naturally, further error sources will
be present in any study. At a conceptual level, these error types may be classified
into accuracy and precision errors. Precision refers to the test-retest-reproducibility,
whereas accuracy refers to how close the mean measured quantitative value is to
the true value. Generally, accuracy is much harder to assess in quantitative medical
imaging studies, where a reliable ground truth is usually missing. As reports on precision are thus generally more readily available, we focused on the comparison of finite
N errors and precision errors in the present investigation.
Such an assessment may guide study planning. For example, when the precision
error dominates relative to the finite N error, further increasing the sample size may
have a limited effect, and efforts may be better directed toward improving measurement precision rather than recruiting additional patients and burdening them with MRI
exams.
PLOS One | https://doi.org/10.1371/journal.pone.0341201 June 4, 2026
2 / 24
For our analysis, we chose a use case that we deemed representative of the field – apparent diffusion coefficient
(ADC)-based lesion characterization in diffusion-weighted magnetic resonance imaging (DWI) of the female breast, which
is an established and relevant application field with suf (...truncated)