The challenges of implementing hybrid baselines for the interpretation of longitudinal behavioral data from individuals (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41746-026-02668-5.pdf

The challenges of implementing hybrid baselines for the interpretation of longitudinal behavioral data from individuals

npj | digital medicine Comment Published in partnership with Seoul National University Bundang Hospital https://doi.org/10.1038/s41746-026-02668-5 The challenges of implementing hybrid baselines for the interpretation of longitudinal behavioral data from individuals 1234567890():,; 1234567890():,; Sandra Anna Just, Enrico Tedeschi, Einar Holsbø, Karl Øyvind Mikalsen, Lars Ailo Bongo, Philipp Homan & Brita Elvevåg Establishing whether observed behavioral differences reﬂect meaningful change in an individual necessitates baselines speciﬁc to the individual and task. Automated hybrid solutions combine adaptive baselines with ﬁxed thresholds. Applying this approach to behavioral science harbors challenges: the pronounced gap between observable measurement and underlying construct means ground truth is typically unavailable. A stepwise framework is proposed to determine and evaluate the validity of baselines for longitudinal behavioral measurements. Check for updates as baselines conﬂates variability between individuals with variability within an individual and mischaracterizes intra-individual differences and putative change3. Moreover, static group-level norms ignore temporal variability, which is essential for distinguishing routine ﬂuctuations from unusual patterns in a time series5. Reliable interpretation of longitudinal behavioral data, therefore, requires baselines derived from an individual’s prior measurements rather than group averages6. This principle applies broadly across behavioral research, including clinical trials. For example, although psychiatric assessment is often criterion-based (i.e., is a symptom present or not?), changes in symptom severity must still be interpreted relative to an individual’s baseline psychopathology. To illustrate, consider a patient who experiences auditory hallucinations multiple times per day, who will have a fundamentally different baseline than one who hears voices only a few times per month. A change to experiencing hallucinations once a week would represent improvement for the former but worsening for the latter. Individual baselines are therefore essential for accurate interpretation of symptom trajectories in clinical research. The necessity for data-driven individual baselines in behavioral science Group-level norms are not suitable for the interpretation of longitudinal data from individuals Behavioral science domains such as psychology have overwhelmingly been built around group-level inference, aiming to understand inter-individual differences. The ﬁeld lacks a comparable tradition of treating intra-individual change as a primary object of interpretation. This gap is signiﬁcant because determining how a person changes over time requires different conceptualizations and statistical approaches than determining how individuals differ from one another at a given time1,2. Group-level norms help quantify how a single, cross-sectional measurement compares to a reference population. However, these norms are not appropriate when evaluating intraindividual change in longitudinal behavioral measurements3. For example, when an individual’s memory performance is assessed repeatedly over time, group-level norms are useful to indicate how task performance compares to a group average at each single measurement. They are not useful for understanding how the individual’s measurements relate to each other over time, how the measurements are augmented – and potentially transformed – by practice or learning effects, or when they represent a change. Intra-individual measurements taken over time need to be interpreted relative to a baseline. In behavioral science, group averages are unsuitable as individual baselines. Treating a group average as an individual baseline assumes ergodicity, whereby individual-level statistical properties equal those observed at the group level. This assumption likely does not hold for most behavioral processes4. As shown in Fig. 1 (a), the use of group averages npj Digital Medicine | (2026)9:331 Longitudinal behavioral data can now be collected at an unprecedented density and frequency – through remote, digitized, or automated assessments7–10. These measurements produce time-series datasets, whose value lies in the trajectories and dependencies they reveal rather than single measurements. Moreover, as one task alone is rarely used to infer human behavior, measurements are collected across different assays. Such repeated and combined observations contain rich information about intra-individual patterns and dependencies, but also present methodological challenges that cannot be addressed using group-level norms. The challenge is that the sheer volume and density of repeated measurements transform the very nature of the data being collected. There are two reasons for this. First, the experimental process becomes more familiar to the participant over time, and they learn what is expected from them and may perform better. Indeed, most cognitive and behavioral tasks are associated with familiarity, learning, and practice effects. Second, as the years, months, weeks of assessment go by, inevitably, participants will be required to engage in behavioral tasks where stimuli material may overlap with tasks they have previously taken part in. The observed effects will be the result of a combination of effects (e.g., multiplicative and task transfer effects). Thus, as data collection continues, the datasets have the potential to get messier and more complex (in terms of discerning what is attributable speciﬁcally to the task construct versus other variables). 1 npj | digital medicine Comment Fig. 1 | Hybrid baselines for the interpretation of longitudinal behavioral data. The four panels show hypothetical line charts of longitudinal measurements. (a–c) show the trajectory of raw scores from two individuals for the same behavioral measure, while (d) shows the trajectory of raw scores from one person in two different measures. (a) Group-average baselines ignore different individual averages and trajectories. Scores from person A (purple) and person B (brown) are plotted alongside a group baseline that represents the group’s average (yellow) with a deﬁned uncertainty range (shaded light yellow area). Person A shows an individual average of scores far above the group baseline. Based on that baseline, the sudden decline in their last measurement is missed as it is still considered ‘above average’. In comparison, scores from person B decline more gradually but are ﬂagged as they lie below the group-average baseline. (b) Individual baselines support interpretation of longitudinal data. Plotted scores are identical to (a), but now each person has an individual baseline (baseline 1 and 2). While both baselines start off at the group average, they continuously adapt to the person’s measurements. In contrast to (a), person A’s last measurement is now ﬂagged as it falls below their adapted individualaverage baseline (...truncated)