Testing validity inferences for Genetic Drift Inventory scores using Rasch modeling and item order analyses
Tornabene et al. Evo Edu Outreach (2018) 11:6
https://doi.org/10.1186/s12052-018-0082-x
RESEARCH ARTICLE
Open Access
Testing validity inferences for Genetic
Drift Inventory scores using Rasch modeling
and item order analyses
Robyn E. Tornabene1*, Erik Lavington2 and Ross H. Nehm1
Abstract
Background: Concept inventories (CIs) are commonly used tools for assessing student understanding of scientific
and naive ideas, yet the body of empirical evidence supporting the inferences drawn from CI scores is often limited
in scope and remains deeply rooted in Classical Test Theory. The Genetic Drift Inventory (GeDI) is a relatively new CI
designed for use in diagnosing undergraduate students’ conceptual understanding of genetic drift. This study seeks to
expand the sources of evidence examining validity and reliability inferences produced by GeDI scores. Specifically, our
research focused on: (1) GeDI instrument and item properties as revealed by Rasch modeling, (2) item order effects on
response patterns, and (3) generalization to a new geographic sample.
Methods: A sample of 336 advanced undergraduate biology majors completed four equivalent versions of the GeDI.
Rasch analysis was used to examine instrument dimensionality, item fit properties, person and item reliability, and
alignment of item difficulty with person ability. To investigate whether the presentation order of GeDI item suites
influenced overall student performance, scores were compared from randomly assigned, equivalent test versions
varying in item-suite presentation order. Scores from this sample were also compared with scores from similar but
geographically distinct samples to examine generalizability of score patterns.
Results: Rasch analysis indicated that the GeDI was unidimensional, with good fit to the Rasch model. Items had
high reliability and were well matched to the ability of the sample. Person reliability was low. Rotating the GeDI’s item
suites had no significant impact on scores, suggesting each suite functioned independently. Scores from our new
sample from the NE United States were comparable to those from other geographic regions and provide evidence
in support of score generalizability. Overall, most instrument features were robust. Suggestions for improvement
include: (1) incorporation of additional items to differentiate high-ability persons and improve person reliability, and
(2) re-examination of items with redundant or low difficulty levels.
Conclusions: Rasch analyses of the GEDI instrument and item order effects expand the range and quality of evidence in support of validity claims and illustrate changes that are likely to improve the quality of this (and other)
evolution education instruments.
Keywords: Concept inventories, Assessment, Validity evidence, Psychometrics, Genetic drift, Evolution
*Correspondence:
1
Science Education Program, Institute for STEM Education, Stony Brook
University, 092 Life Sciences Building, Stony Brook, NY 11794‑5233, USA
Full list of author information is available at the end of the article
© The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License
(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium,
provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license,
and indicate if changes were made.
Tornabene et al. Evo Edu Outreach (2018) 11:6
Page 2 of 16
Introduction
The accurate measurement of student understanding
is an essential feature of educational practice because
it provides evidence-based insights into students’ conceptual ecologies, guides learning progression development, and permits empirical evaluation of the efficacy
of alternative educational interventions (National
Research Council 2001). A diverse array of assessment
tools and types have been developed for evolution
educators (Table 1). They range from static, multiplechoice formats (e.g., Price et al. 2014) to open-ended
questions whose answers can be scored by computers
(e.g., Moharreri et al. 2014). Available assessment tools
cover many different evolutionary concepts, including
natural selection, evo-devo, genetic drift, and macroevolution. These assessments vary significantly in the
types of information that they can reveal about student
understanding, in the situations in which they are most
appropriately implemented, and in the robustness of
the inferences that they are able to support (American
Association for the Advancement of Science (AAAS)
2011; American Educational Research Association,
American Psychological Association, and National
Council on Measurement in Education (AERA, APA,
NCME) 2014; Nehm and Schonfeld 2008).
Concept inventories (CIs) are a type of researchbased educational assessment designed to rapidly reveal
(through easy administration and scoring) students’ preferences for normative (i.e., scientifically accurate) or nonnormative (e.g., preconceptions, misconceptions) facets
of core ideas (e.g., natural selection, genetic drift) (Nehm
and Haertig 2012, p. 56–57). Although CIs have become
indispensable tools for assessing undergraduate students’
conceptual understandings of many core ideas in the sciences (e.g., force and motion, chemical bonding), few
have been carefully evaluated in terms of (1) the forms
of validity outlined in the Standards for Educational and
Psychological Testing (American Educational Research
Association, American Psychological Association, and
National Council on Measurement in Education (AERA,
APA, NCME) 2014), (2) item order effects and associated
response biases (Federer et al. 2015, 2016), or (3) item
properties using ratio-scaled data (generated by Rasch
or Item Response Theory [IRT] analyses; Boone, Staver
and Yale 2014). Consequently, validity evidence—that is,
evidence that the measures derived from CIs accurately
Table 1 Evolution education instruments measuring knowledge of evolutionary processes: potential to elicit scientific
and naive ideas about adaptive and non-adaptive evolution
Instrument
Formata and target population
Conceptions measuredb
NS-S
NS-N
GD-S
GD-N
Bishop and Anderson’s diagnostic instrument Combination MC and O
Rc: undergraduates
(introductory biology non-majors)
(Bishop and Anderson 1990)
Intended
Intended
Possibled
Possibled
Concept Inventory of Natural Selection (CINS) 20 MC: undergraduates
(Anderson, Fisher and Norman 2002)
Intended
Intended
Assessing Contextual Reasoning about Natural Selection (ACORNS) (Nehm et al. 2012)
Flexible number OR: undergraduates
Intended
Intended
Possibled
Possibled
Conceptual Assessment of Natural Selection
(CANS) (Kalinowski et al. 2016)
24 MC: undergraduates (introductory biology Intended
majors)
Intended
e
Daphne Assessment for Natural Selection
(DANS) (Furtak et al. 2014)
26 MC: high school
Intended
Genetic Drift Inventory (GeDI) (...truncated)