Data quality in online human-subjects research: Comparisons between MTurk, Prolific, CloudResearch, Qualtrics, and SONA
PLOS ONE
RESEARCH ARTICLE
Data quality in online human-subjects
research: Comparisons between MTurk,
Prolific, CloudResearch, Qualtrics, and SONA
Benjamin D. Douglas1, Patrick J. Ewell2, Markus Brauer ID1*
1 Department of Psychology, University of Wisconsin–Madison, Madison, Wisconsin, United States of
America, 2 Department of Psychology, Kenyon College, Gambier, Ohio, United States of America
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Douglas BD, Ewell PJ, Brauer M (2023)
Data quality in online human-subjects research:
Comparisons between MTurk, Prolific,
CloudResearch, Qualtrics, and SONA. PLoS ONE
18(3): e0279720. https://doi.org/10.1371/journal.
pone.0279720
Editor: Jeffrey S. Hallam, Kent State University,
UNITED STATES
Received: April 11, 2022
*
Abstract
With the proliferation of online data collection in human-subjects research, concerns have
been raised over the presence of inattentive survey participants and non-human respondents (bots). We compared the quality of the data collected through five commonly used
platforms. Data quality was indicated by the percentage of participants who meaningfully
respond to the researcher’s question (high quality) versus those who only contribute noise
(low quality). We found that compared to MTurk, Qualtrics, or an undergraduate student
sample (i.e., SONA), participants on Prolific and CloudResearch were more likely to pass
various attention checks, provide meaningful answers, follow instructions, remember previously presented information, have a unique IP address and geolocation, and work slowly
enough to be able to read all the items. We divided the samples into high- and low-quality
respondents and computed the cost we paid per high-quality respondent. Prolific ($1.90)
and CloudResearch ($2.00) were cheaper than MTurk ($4.36) and Qualtrics ($8.17). SONA
cost $0.00, yet took the longest to collect the data.
Accepted: December 13, 2022
Published: March 14, 2023
Copyright: © 2023 Douglas et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: Complete data
cannot be shared publicly because these data
contain identifiable information. The anonymized
data underlying the results presented in the study
are available at (https://osf.io/ern4u/). As required
by our IRB, personally identifiable information
including IP address, Geo Location, and participant
worker ID have been removed from the uploaded
data. We have also included an R Markdown file
accompanying the anonymized data file to allow
researchers to see the code and output used in our
analyses.
Introduction
As online data collection for human subjects through platforms like Amazon’s Mechanical
Turk (MTurk) becomes increasingly common [1], so too have concerns over the quality of
these data. Do participants on these platforms provide meaningful responses? A recent study
found that data quality from respondents on MTurk has decreased since 2015 [2]. That is, the
number of incoherent responses to open ended questions, inconsistent responses to the same
questions, responses in which participants report experiencing something impossible or highly
improbable, and patterns of responses indicating inattentive survey taking have increased.
Such patterns of responding are of particular concern as researchers have found that low-quality respondents can confound established correlations between variables, either strengthening
them [3–5] or, in some instances, changing the direction of the correlation [5]. While there is
evidence that data screening procedures can improve data quality, these results are mixed [6].
Furthermore, even with the ability to clean data post-hoc, paying for participants can be
expensive for researchers. Thus, it is important for researchers to understand their options
PLOS ONE | https://doi.org/10.1371/journal.pone.0279720 March 14, 2023
1 / 17
PLOS ONE
Funding: BD, MB BRITE Lab Grant: Behavioral
Research Insights Through Experiments Lab. The
funders had no role in study design, data collection
and analysis, decision to publish, or preparation of
the manuscript.
Competing interests: The authors have declared
that no competing interests exist.
Data quality in online human-subjects research
with respect to online data collection and the quality of the data that each online platform provides. We investigated which of the most frequently used online data collection platforms produce the highest data quality.
Defining high-quality data
To be able to compare the data quality of various platforms, it is important to first clarify what
we mean when we describe a participant’s response as being high or low quality. Researchers
have identified various categories of low-quality responding: inattentive respondents (those
who hastily take a survey or do not follow the study’s explicit directions), dishonest respondents (those who deliberately provide false information), respondents who fail to comprehend
a study’s directions, or unreliable respondents (those who provide different responses over
time) [7]. Some researchers may be interested in data quality from the perspective of external
validity. While representativeness is not the primary concern of this article, we provide a comparison between the demographic characteristics of the participants from each platform’s sample and the US population in S1 Appendix. In the present article, we only address data quality
with respect to the percentage of participants who meaningfully respond to the researcher’s
question (high quality) versus those who only contribute noise (low quality). Please note that
our use of the term “data quality” does not concern whether a sample generalizes to a larger
population. We only address data quality with respect to the percentage of participants who
meaningfully respond to the researcher’s question (high quality) versus those who only contribute noise (low quality).
Data quality for online data collection
In general, researchers examining data quality from online survey platforms have consistently
found large proportions of low-quality responses [8–10]. Likewise, the Pew Research Center
has noted as much as 4% of responses to online polls are from low-quality participants [11].
Given that some responses will inevitably be low-quality when conducting online research,
researchers using these platforms benefit from knowing which platforms offer the highest
quality data.
Previous studies concerning data quality have mainly focused on MTurk and have found
mixed results. Initially, the studies found that alpha values for personality scales remained
within one hundredth of alphas based on in-person data collection, even when MTurk participants were paid as little as $0.01 [12]. Likewise, Roulin [13] noted several benefits to collecti (...truncated)