Functions for traditional and multilevel approaches to signal detection theory
RUTH HORRY
0
ELIN M. SKAGERBERG
0
0
University of Sussex
, Brighton,
England
In the present article, functions written in the freeware R are presented that calculate several measures from traditional signal detection theory for each individual in a sample, along with summary statistics for the sample. Bias-corrected and accelerated bootstrap confidence intervals are also produced. Arguments are made for using an alternative approachmultilevel generalized linear modelsand a function is presented for it. These functions are part of the R package sdtalt, which is available on the Comprehensive R Archive Network. Recent data from memory recognition studies are used to illustrate these functions.
-
Methods from signal detection theory (SDT) are
popular in many areas of science, including psychology (Swets,
1996; Wickens, 2002). The traditional approach to SDT
within psychology is to calculate a measure of
diagnosticity (or accuracy) for each individual and then to perform
statistics on these aggregate numbers. There are several
different measures that could be used, along with arguments
among methodologists about their relative merits, but a
popular measure in psychology is d . In SDT terminology,
d is the distance, in standard deviations, between the
distribution for items with whichever characteristic is being
looked for and the distribution for items without this
characteristic. Calculating aggregate values for each individual
and then performing statistics on them has both theoretical
and practical disadvantages. A theoretical disadvantage is
the fact that participants values will be weighted equally,
which may not be appropriate if they have been involved in
different numbers of trials. A practical disadvantage is the
fact that it is difficult to include variables that vary among
trials, such as response time (RT).
The traditional SDT model, sometimes called the
Gaussian/normal model, is equivalent to a probit
regression (DeCarlo, 1998). In medical diagnostics, the
logistic model is usually used. This model is equivalent to
running a logistic regression for each individual (Zhou,
Obuchowski, & McClish, 2002). These two approaches,
which are both generalized linear models (GLMs;
McCullagh & Nelder, 1989), yield nearly equivalent results.
The main statistics of accuracy for these models are d
and lnOR (i.e., the log-odds-ratio) for the Gaussian model
and the logistic model, respectively. And, except at the
extremes, these two measures are approximately
proportional (d .6 lnOR). The flexibility of GLMs allows
variables that can take different values for each trial to be
easily incorporated, thus addressing the practical
disadvantage listed above. Analyzing the data with a multilevel
model addresses the theoretical disadvantage from above.
It allows the analysis to be conducted in a single step and
weighs the data according to the number of trials, subject
to some further assumptions. For example, it is usually
assumed that the d or the lnOR values come from a
normally distributed population (for recent reviews of
multilevel modeling written for psychologists, see Baayen,
Davidson, & Bates, 2008; Hoffman & Rovine, 2007; and
Wright & London, 2009).
The remainder of the present article is divided into four
main sections. The two main functions described require
different data input formats. In the first section, functions
that allow the user to move between these formats are
described. In the next section, the function sdt is presented
for the traditional approach. Several different measures
are reported for each individual participant, and the user
has the option of reporting the mean (or a trimmed mean)
for the sample. Users can also enter their own statistics
into the function. Bias-corrected and accelerated (BCa)
bootstrap confidence intervals can be printed for all of
these measures. Next, the mlmsdt function is presented
for the multilevel approach. The user can include both
numeric and categorical covariates, and can see how these
relate to diagnosticity for either the normal or the logistic
model. The function produces what is called an S4 class
object, which allows many sample and individual
statistics to be evaluated. The final section is a summary of
these functions.
R was used for these functions because of its popularity,
because it is free, and because of the quality and quantity
of available functions (R Development Core Team, 2008).
We will assume that users have some experience with R
or S-Plus (and that they have access to the Internet). The
functions and example data sets have been combined into
the package sdtalt, which is part of the Comprehensive
R Archive Network (CRAN). To load the functions onto
your hard drive from within R, type
install.packages("sdtalt")
You may need to choose among locations from which to
download the package (there are about 50 mirror sites for
CRAN). To make the functions active, type
The present article is based on sdtalt 0.1-0.
One area in which SDT is common in psychology is
memory recognition research (Banks, 1970; Wright,
Gabbert, Memon, & London, 2008). To make the descriptions
more concrete, examples from memory recognition will
be used.
Changing the Format:
format2to4 and format4to2
These two functions are used to change the data between
the formats used for the sdt and mlmsdt functions. For
format2to4, three variables (subject number, true state
of the item, and what the participant says) are entered, with
a separate line for each trial. If you have a within-subjects
design, then let each condition by person correspond to a
different subject number, or run the conditions separately.
The function assumes that saying old and being old (using
the memory recognition terms) have higher values than do
saying new and being new. The output is an n 5
numeric data matrix, where the n refers to the subject number,
with columns named subno, hits, fa, misses, and
cr. You can change the column names returned by
specifying cnames = c("c1","c2","c3","c4","c5"),
where c1c5 are the names you want.
subno, isold, and saysoldare based on a typical
memory recognition study, but SDT has many other
applications. To change these, you can write, for example,
cnames = c("partno","true","says") within
the function.
An obvious question is why we have not allowed both of
the main functions to accept data in either format,
recognize the form of the input (by looking at the dimensions of
the input data frame), and then call the appropriate
function. This would be simple to do and would be appropriate
if we were producing a stand-alone package. However,
because this is part of R, we assume that users will want to
perform other statistics on these reformatted data frames.
Of course, if users do not want to produce two commands
or for some reason are against creating a new data object,
they can simply nest the reformatting functions within the
others.
Traditional SDT Function: sdt
For any individual, the situation in which SDT has
traditio (...truncated)