Test–retest reliability of performance variables during treadmill rollerski skating
European Journal of Applied Physiology
https://doi.org/10.1007/s00421-025-05746-w
ORIGINAL ARTICLE
Test–retest reliability of performance variables during treadmill
rollerski skating
Thomas Losnegard1,2 · Paul André Solberg2 · Magne Lund‑Hansen1 · Martin Skaugen2 · Joar Hansen3 ·
Knut Skovereng4 · Øyvind Sandbakk5
Received: 11 October 2024 / Accepted: 25 February 2025
© The Author(s) 2025
Abstract
Purpose We examined the test–retest reliability of rollerski testing across a familiarization trial followed by three separate
test trials (T1–T3) conducted within a 14-day period.
Methods Ten competitive cross-country skiers performed three sub-maximal tests (5%, speed range 10–16 km h−1) and a
maximal speed test until failure (MTF; ~ 5–8 min, 7%, > 10 km h−1) on a rollerski treadmill using the Gear 3 ski skating subtechnique. Reliability was assessed as within-subject typical error, expressed as a coefficient of variation (CV%, [confidence
limits]) intraclass correlation (ICC, [confidence limits]), and changes in mean (%).
Results The speed at MTF demonstrated a mean CV (T1–T3) of 1.5% [1.1, 2.6] and an ICC of 0.96 [0.87, 0.99], but a systematic familiarization bias from T1 to T2 (1.2% [0.1, 2.3]) and T2 to T3 (2.2% [0.1, 4.3]). Peak oxygen uptake exhibited
a mean CV of 2.2% [1.6, 3.8] and an ICC of 0.93 [0.78, 0.98], with no systematic changes from T1 to T2 (− 0.2% [− 2.0,
1.6]) and T2 to T3 (1.8% [− 1.1, 4.7]). VO2 at submaximal load showed a mean CV of 2.1% [1.5, 3.3] and an ICC of 0.94
[0.84,0.99], with no systematic changes from T1 to T2 (− 0.7% [− 2.4, 1.1]) and T2 to T3 (− 0.1% [− 2.4, 2.3]).
Conclusion The relatively low CV and high ICC for most measures suggest a high degree of test–retest reliability. However,
the systematic mean changes in MTF indicate that familiarization trials are essential to provide valuable information about
individual changes. Overall, these reliability measures can be used as a framework by practitioners to discern true changes
when testing on a rollerski treadmill.
Keywords Oxygen uptake · Performance · Training · Nordic skiing · Cross-country skiing
Introduction
Communicated by Michael I Lindinger.
* Thomas Losnegard
1
Department of Physical Performance, The Norwegian
School of Sport Sciences, Ullevål Stadion, Post Box 4014,
0806 Oslo, Norway
2
Norwegian Olympic Federation, Oslo, Norway
3
Section of Health and Exercise Physiology, Inland Norway
University of Applied Science, Lillehammer, Norway
4
Department of Neuromedicine and Movement Science,
Faculty of Medicine and Health Sciences, Center for Elite
Sports Research, Norwegian University of Science
and Technology, Trondheim, Norway
5
School of Sport Science, UiT The Artic University
of Norway, Tromsø, Norway
Physiological variables that determine performance are often
evaluated by both researchers and coaches to provide diagnostic information about training-induced changes. In most
endurance sports, laboratory tests are conducted in a “sportspecific” manner, aimed at identifying precise traininginduced adaptations. However, reliable results are crucial
for the valid interpretation of data from such physiological
tests. According to Hopkins (2000) three crucial measures
of reliability should be quantified: within-subject variation,
retest correlation, and changes in the mean. When a subject
undergoes multiple tests, random variation between trials
can occur, observed as the standard deviation of individual
values. This within-subject variation, also known as typical
error, is expressed as the coefficient of variation (CV) of
measurement.
Vol.:(0123456789)
European Journal of Applied Physiology
Cross-country skiing, biathlon, and Nordic combined are
Olympic sports that utilize the freestyle skiing technique,
also known as ski skating. Similar to other endurance sports,
higher aerobic metabolic energy turnover (i.e., peak oxygen uptake ( VO2peak) and its fractional utilization) and/or a
reduced cost of locomotion (i.e., enhanced work economy/
efficiency) are the primary drivers of performance responses
and are, therefore, commonly measured test values. In these
sports, large rollerski treadmills are used for “sport-specific”
testing, which allows skiers to replicate their skiing technique and simulate on-snow skiing accurately from a biomechanical perspective (Myklebust et al. 2014, 2022).
Physiological testing on rollerskis has been extensively
utilized for decades (Hoffman et al. 1994; Holmberg et al.
2005; Sandbakk et al. 2010; Losnegard et al. 2013) yet the
learning effect and typical error over multiple tests have not
been thoroughly examined. Such information is crucial when
determining training-induced changes or evaluating experimental interventions. However, to date, only two studies
have investigated the test–retest reliability of performance
and physiological measurements during treadmill skiing.
Losnegard et al. (2013) reported the CV in VO2peak (2.3%),
O2-cost (1.2%), and 1000-m time-trial performance (2.7%)
while rollerski skating on a treadmill. Bucher et al. (2023)
conducted a test–retest reliability study of a comprehensive
test battery, including a VO2max test using the diagonal stride
technique and a 24-min time-trial test while double poling
on a treadmill. The CV was 1.4% for VO2peak and 1.0% for
the 24-min time trial. However, since these previous studies
only included two trials, less is known about possible learning effects over multiple trials within a short testing period.
Cross-country skiing is performed at varying speeds and
inclines due to significant variations in terrain. From a testing perspective, this means that the most relevant inclines
and speeds must be covered. Nevertheless, there is general
agreement that moderate uphill terrain is particularly relevant for testing, considering the importance of uphill performance, the avoidance of excessively high speeds indoors
where air drag is absent, and the induction of competitionrelevant speeds during both submaximal and maximal testing (Sandbakk et al. 2010; Losnegard et al. 2013; McGawley and Holmberg 2014; Andersson et al. 2016). On such
inclines, the Gear 3 skating sub-technique (i.e., synchronized
pole plants for every ski push-off) is the most used subtechnique during races and testing (Andersson et al. 2010;
Sandbakk et al. 2011; Sollie et al. 2021).
The aim of the present study was to examine the
test–retest reliability of performance-determining variables from submaximal and maximal tests using the skating
technique during treadmill rollerskiing. We chose a protocol with a constant incline and increasing speed, in which
within-subject variation, test–retest correlation, and changes
in the mean were investigated.
Methods
Participants
Four female and six male competitive cross-country skiers
were recruited (age range 20–30 years). The participants
were categorized as Tier 3 according to McKay et al. (2022).
All subjects were familiar with testing and training on a
rollerski trea (...truncated)