Testing validity inferences for Genetic Drift Inventory scores using Rasch modeling and item order analyses (pdf)

Article PDF cannot be displayed. You can download it here:

https://evolution-outreach.biomedcentral.com/track/pdf/10.1186/s12052-018-0082-x

Testing validity inferences for Genetic Drift Inventory scores using Rasch modeling and item order analyses

Tornabene et al. Evo Edu Outreach (2018) 11:6 https://doi.org/10.1186/s12052-018-0082-x RESEARCH ARTICLE Open Access Testing validity inferences for Genetic Drift Inventory scores using Rasch modeling and item order analyses Robyn E. Tornabene1*, Erik Lavington2 and Ross H. Nehm1 Abstract Background: Concept inventories (CIs) are commonly used tools for assessing student understanding of scientific and naive ideas, yet the body of empirical evidence supporting the inferences drawn from CI scores is often limited in scope and remains deeply rooted in Classical Test Theory. The Genetic Drift Inventory (GeDI) is a relatively new CI designed for use in diagnosing undergraduate students’ conceptual understanding of genetic drift. This study seeks to expand the sources of evidence examining validity and reliability inferences produced by GeDI scores. Specifically, our research focused on: (1) GeDI instrument and item properties as revealed by Rasch modeling, (2) item order effects on response patterns, and (3) generalization to a new geographic sample. Methods: A sample of 336 advanced undergraduate biology majors completed four equivalent versions of the GeDI. Rasch analysis was used to examine instrument dimensionality, item fit properties, person and item reliability, and alignment of item difficulty with person ability. To investigate whether the presentation order of GeDI item suites influenced overall student performance, scores were compared from randomly assigned, equivalent test versions varying in item-suite presentation order. Scores from this sample were also compared with scores from similar but geographically distinct samples to examine generalizability of score patterns. Results: Rasch analysis indicated that the GeDI was unidimensional, with good fit to the Rasch model. Items had high reliability and were well matched to the ability of the sample. Person reliability was low. Rotating the GeDI’s item suites had no significant impact on scores, suggesting each suite functioned independently. Scores from our new sample from the NE United States were comparable to those from other geographic regions and provide evidence in support of score generalizability. Overall, most instrument features were robust. Suggestions for improvement include: (1) incorporation of additional items to differentiate high-ability persons and improve person reliability, and (2) re-examination of items with redundant or low difficulty levels. Conclusions: Rasch analyses of the GEDI instrument and item order effects expand the range and quality of evidence in support of validity claims and illustrate changes that are likely to improve the quality of this (and other) evolution education instruments. Keywords: Concept inventories, Assessment, Validity evidence, Psychometrics, Genetic drift, Evolution *Correspondence: 1 Science Education Program, Institute for STEM Education, Stony Brook University, 092 Life Sciences Building, Stony Brook, NY 11794‑5233, USA Full list of author information is available at the end of the article © The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Tornabene et al. Evo Edu Outreach (2018) 11:6 Page 2 of 16 Introduction The accurate measurement of student understanding is an essential feature of educational practice because it provides evidence-based insights into students’ conceptual ecologies, guides learning progression development, and permits empirical evaluation of the efficacy of alternative educational interventions (National Research Council 2001). A diverse array of assessment tools and types have been developed for evolution educators (Table 1). They range from static, multiplechoice formats (e.g., Price et al. 2014) to open-ended questions whose answers can be scored by computers (e.g., Moharreri et al. 2014). Available assessment tools cover many different evolutionary concepts, including natural selection, evo-devo, genetic drift, and macroevolution. These assessments vary significantly in the types of information that they can reveal about student understanding, in the situations in which they are most appropriately implemented, and in the robustness of the inferences that they are able to support (American Association for the Advancement of Science (AAAS) 2011; American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (AERA, APA, NCME) 2014; Nehm and Schonfeld 2008). Concept inventories (CIs) are a type of researchbased educational assessment designed to rapidly reveal (through easy administration and scoring) students’ preferences for normative (i.e., scientifically accurate) or nonnormative (e.g., preconceptions, misconceptions) facets of core ideas (e.g., natural selection, genetic drift) (Nehm and Haertig 2012, p. 56–57). Although CIs have become indispensable tools for assessing undergraduate students’ conceptual understandings of many core ideas in the sciences (e.g., force and motion, chemical bonding), few have been carefully evaluated in terms of (1) the forms of validity outlined in the Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (AERA, APA, NCME) 2014), (2) item order effects and associated response biases (Federer et al. 2015, 2016), or (3) item properties using ratio-scaled data (generated by Rasch or Item Response Theory [IRT] analyses; Boone, Staver and Yale 2014). Consequently, validity evidence—that is, evidence that the measures derived from CIs accurately Table 1 Evolution education instruments measuring knowledge of evolutionary processes: potential to elicit scientific and naive ideas about adaptive and non-adaptive evolution Instrument Formata and target population Conceptions measuredb NS-S NS-N GD-S GD-N Bishop and Anderson’s diagnostic instrument Combination MC and O Rc: undergraduates (introductory biology non-majors) (Bishop and Anderson 1990) Intended Intended Possibled Possibled Concept Inventory of Natural Selection (CINS) 20 MC: undergraduates (Anderson, Fisher and Norman 2002) Intended Intended Assessing Contextual Reasoning about Natural Selection (ACORNS) (Nehm et al. 2012) Flexible number OR: undergraduates Intended Intended Possibled Possibled Conceptual Assessment of Natural Selection (CANS) (Kalinowski et al. 2016) 24 MC: undergraduates (introductory biology Intended majors) Intended e Daphne Assessment for Natural Selection (DANS) (Furtak et al. 2014) 26 MC: high school Intended Genetic Drift Inventory (GeDI) (...truncated)