Data quality in online human-subjects research: Comparisons between MTurk, Prolific, CloudResearch, Qualtrics, and SONA (pdf)

Article PDF cannot be displayed. You can download it here:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0279720&type=printable

Data quality in online human-subjects research: Comparisons between MTurk, Prolific, CloudResearch, Qualtrics, and SONA

PLOS ONE RESEARCH ARTICLE Data quality in online human-subjects research: Comparisons between MTurk, Prolific, CloudResearch, Qualtrics, and SONA Benjamin D. Douglas1, Patrick J. Ewell2, Markus Brauer ID1* 1 Department of Psychology, University of Wisconsin–Madison, Madison, Wisconsin, United States of America, 2 Department of Psychology, Kenyon College, Gambier, Ohio, United States of America a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS Citation: Douglas BD, Ewell PJ, Brauer M (2023) Data quality in online human-subjects research: Comparisons between MTurk, Prolific, CloudResearch, Qualtrics, and SONA. PLoS ONE 18(3): e0279720. https://doi.org/10.1371/journal. pone.0279720 Editor: Jeffrey S. Hallam, Kent State University, UNITED STATES Received: April 11, 2022 * Abstract With the proliferation of online data collection in human-subjects research, concerns have been raised over the presence of inattentive survey participants and non-human respondents (bots). We compared the quality of the data collected through five commonly used platforms. Data quality was indicated by the percentage of participants who meaningfully respond to the researcher’s question (high quality) versus those who only contribute noise (low quality). We found that compared to MTurk, Qualtrics, or an undergraduate student sample (i.e., SONA), participants on Prolific and CloudResearch were more likely to pass various attention checks, provide meaningful answers, follow instructions, remember previously presented information, have a unique IP address and geolocation, and work slowly enough to be able to read all the items. We divided the samples into high- and low-quality respondents and computed the cost we paid per high-quality respondent. Prolific ($1.90) and CloudResearch ($2.00) were cheaper than MTurk ($4.36) and Qualtrics ($8.17). SONA cost $0.00, yet took the longest to collect the data. Accepted: December 13, 2022 Published: March 14, 2023 Copyright: © 2023 Douglas et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: Complete data cannot be shared publicly because these data contain identifiable information. The anonymized data underlying the results presented in the study are available at (https://osf.io/ern4u/). As required by our IRB, personally identifiable information including IP address, Geo Location, and participant worker ID have been removed from the uploaded data. We have also included an R Markdown file accompanying the anonymized data file to allow researchers to see the code and output used in our analyses. Introduction As online data collection for human subjects through platforms like Amazon’s Mechanical Turk (MTurk) becomes increasingly common [1], so too have concerns over the quality of these data. Do participants on these platforms provide meaningful responses? A recent study found that data quality from respondents on MTurk has decreased since 2015 [2]. That is, the number of incoherent responses to open ended questions, inconsistent responses to the same questions, responses in which participants report experiencing something impossible or highly improbable, and patterns of responses indicating inattentive survey taking have increased. Such patterns of responding are of particular concern as researchers have found that low-quality respondents can confound established correlations between variables, either strengthening them [3–5] or, in some instances, changing the direction of the correlation [5]. While there is evidence that data screening procedures can improve data quality, these results are mixed [6]. Furthermore, even with the ability to clean data post-hoc, paying for participants can be expensive for researchers. Thus, it is important for researchers to understand their options PLOS ONE | https://doi.org/10.1371/journal.pone.0279720 March 14, 2023 1 / 17 PLOS ONE Funding: BD, MB BRITE Lab Grant: Behavioral Research Insights Through Experiments Lab. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Data quality in online human-subjects research with respect to online data collection and the quality of the data that each online platform provides. We investigated which of the most frequently used online data collection platforms produce the highest data quality. Defining high-quality data To be able to compare the data quality of various platforms, it is important to first clarify what we mean when we describe a participant’s response as being high or low quality. Researchers have identified various categories of low-quality responding: inattentive respondents (those who hastily take a survey or do not follow the study’s explicit directions), dishonest respondents (those who deliberately provide false information), respondents who fail to comprehend a study’s directions, or unreliable respondents (those who provide different responses over time) [7]. Some researchers may be interested in data quality from the perspective of external validity. While representativeness is not the primary concern of this article, we provide a comparison between the demographic characteristics of the participants from each platform’s sample and the US population in S1 Appendix. In the present article, we only address data quality with respect to the percentage of participants who meaningfully respond to the researcher’s question (high quality) versus those who only contribute noise (low quality). Please note that our use of the term “data quality” does not concern whether a sample generalizes to a larger population. We only address data quality with respect to the percentage of participants who meaningfully respond to the researcher’s question (high quality) versus those who only contribute noise (low quality). Data quality for online data collection In general, researchers examining data quality from online survey platforms have consistently found large proportions of low-quality responses [8–10]. Likewise, the Pew Research Center has noted as much as 4% of responses to online polls are from low-quality participants [11]. Given that some responses will inevitably be low-quality when conducting online research, researchers using these platforms benefit from knowing which platforms offer the highest quality data. Previous studies concerning data quality have mainly focused on MTurk and have found mixed results. Initially, the studies found that alpha values for personality scales remained within one hundredth of alphas based on in-person data collection, even when MTurk participants were paid as little as $0.01 [12]. Likewise, Roulin [13] noted several benefits to collecti (...truncated)