Functions for traditional and multilevel approaches to signal detection theory (pdf)

Article PDF cannot be displayed. You can download it here:

http://link.springer.com/content/pdf/10.3758%2FBRM.41.2.257.pdf

Functions for traditional and multilevel approaches to signal detection theory

RUTH HORRY 0 ELIN M. SKAGERBERG 0 0 University of Sussex , Brighton, England In the present article, functions written in the freeware R are presented that calculate several measures from traditional signal detection theory for each individual in a sample, along with summary statistics for the sample. Bias-corrected and accelerated bootstrap confidence intervals are also produced. Arguments are made for using an alternative approachmultilevel generalized linear modelsand a function is presented for it. These functions are part of the R package sdtalt, which is available on the Comprehensive R Archive Network. Recent data from memory recognition studies are used to illustrate these functions. - Methods from signal detection theory (SDT) are popular in many areas of science, including psychology (Swets, 1996; Wickens, 2002). The traditional approach to SDT within psychology is to calculate a measure of diagnosticity (or accuracy) for each individual and then to perform statistics on these aggregate numbers. There are several different measures that could be used, along with arguments among methodologists about their relative merits, but a popular measure in psychology is d . In SDT terminology, d is the distance, in standard deviations, between the distribution for items with whichever characteristic is being looked for and the distribution for items without this characteristic. Calculating aggregate values for each individual and then performing statistics on them has both theoretical and practical disadvantages. A theoretical disadvantage is the fact that participants values will be weighted equally, which may not be appropriate if they have been involved in different numbers of trials. A practical disadvantage is the fact that it is difficult to include variables that vary among trials, such as response time (RT). The traditional SDT model, sometimes called the Gaussian/normal model, is equivalent to a probit regression (DeCarlo, 1998). In medical diagnostics, the logistic model is usually used. This model is equivalent to running a logistic regression for each individual (Zhou, Obuchowski, & McClish, 2002). These two approaches, which are both generalized linear models (GLMs; McCullagh & Nelder, 1989), yield nearly equivalent results. The main statistics of accuracy for these models are d and lnOR (i.e., the log-odds-ratio) for the Gaussian model and the logistic model, respectively. And, except at the extremes, these two measures are approximately proportional (d .6 lnOR). The flexibility of GLMs allows variables that can take different values for each trial to be easily incorporated, thus addressing the practical disadvantage listed above. Analyzing the data with a multilevel model addresses the theoretical disadvantage from above. It allows the analysis to be conducted in a single step and weighs the data according to the number of trials, subject to some further assumptions. For example, it is usually assumed that the d or the lnOR values come from a normally distributed population (for recent reviews of multilevel modeling written for psychologists, see Baayen, Davidson, & Bates, 2008; Hoffman & Rovine, 2007; and Wright & London, 2009). The remainder of the present article is divided into four main sections. The two main functions described require different data input formats. In the first section, functions that allow the user to move between these formats are described. In the next section, the function sdt is presented for the traditional approach. Several different measures are reported for each individual participant, and the user has the option of reporting the mean (or a trimmed mean) for the sample. Users can also enter their own statistics into the function. Bias-corrected and accelerated (BCa) bootstrap confidence intervals can be printed for all of these measures. Next, the mlmsdt function is presented for the multilevel approach. The user can include both numeric and categorical covariates, and can see how these relate to diagnosticity for either the normal or the logistic model. The function produces what is called an S4 class object, which allows many sample and individual statistics to be evaluated. The final section is a summary of these functions. R was used for these functions because of its popularity, because it is free, and because of the quality and quantity of available functions (R Development Core Team, 2008). We will assume that users have some experience with R or S-Plus (and that they have access to the Internet). The functions and example data sets have been combined into the package sdtalt, which is part of the Comprehensive R Archive Network (CRAN). To load the functions onto your hard drive from within R, type install.packages("sdtalt") You may need to choose among locations from which to download the package (there are about 50 mirror sites for CRAN). To make the functions active, type The present article is based on sdtalt 0.1-0. One area in which SDT is common in psychology is memory recognition research (Banks, 1970; Wright, Gabbert, Memon, & London, 2008). To make the descriptions more concrete, examples from memory recognition will be used. Changing the Format: format2to4 and format4to2 These two functions are used to change the data between the formats used for the sdt and mlmsdt functions. For format2to4, three variables (subject number, true state of the item, and what the participant says) are entered, with a separate line for each trial. If you have a within-subjects design, then let each condition by person correspond to a different subject number, or run the conditions separately. The function assumes that saying old and being old (using the memory recognition terms) have higher values than do saying new and being new. The output is an n 5 numeric data matrix, where the n refers to the subject number, with columns named subno, hits, fa, misses, and cr. You can change the column names returned by specifying cnames = c("c1","c2","c3","c4","c5"), where c1c5 are the names you want. subno, isold, and saysoldare based on a typical memory recognition study, but SDT has many other applications. To change these, you can write, for example, cnames = c("partno","true","says") within the function. An obvious question is why we have not allowed both of the main functions to accept data in either format, recognize the form of the input (by looking at the dimensions of the input data frame), and then call the appropriate function. This would be simple to do and would be appropriate if we were producing a stand-alone package. However, because this is part of R, we assume that users will want to perform other statistics on these reformatted data frames. Of course, if users do not want to produce two commands or for some reason are against creating a new data object, they can simply nest the reformatting functions within the others. Traditional SDT Function: sdt For any individual, the situation in which SDT has traditio (...truncated)