QSAR Study of Skin Sensitization Using Local Lymph Node Assay Data
Int. J. Mol. Sci. 2004, 5, 56-66
International Journal of
Molecular Sciences
ISSN 1422-0067
© 2004 by MDPI
www.mdpi.org/ijms/
QSAR Study of Skin Sensitization Using Local Lymph Node
Assay Data
Adam Fedorowicz,1 Lingyi Zheng,2 Harshinder Singh1,2 and Eugene Demchuk1,3
1
National Institute for Occupational Safety and Health, Morgantown, WV. E-mail:
Department of Statistics, West Virginia University, Morgantown, WV.
3
School of Pharmacy, West Virginia University, Morgantown, WV.
2
Received: 28 April 2003 / Accepted: 18 July 2003 / Published: 30 January 2004
Abstract: Allergic Contact Dermatitis (ACD) is a common work-related skin disease that
often develops as a result of repetitive skin exposures to a sensitizing chemical agent. A
variety of experimental tests have been suggested to assess the skin sensitization potential.
We applied a method of Quantitative Structure-Activity Relationship (QSAR) to relate
measured and calculated physical-chemical properties of chemical compounds to their
sensitization potential. Using statistical methods, each of these properties, called molecular
descriptors, was tested for its propensity to predict the sensitization potential. A few of the
most informative descriptors were subsequently selected to build a model of skin
sensitization. In this work sensitization data for the murine Local Lymph Node Assay
(LLNA) were used. In principle, LLNA provides a standardized continuous scale suitable
for quantitative assessment of skin sensitization. However, at present many LLNA results
are still reported on a dichotomous scale, which is consistent with the scale of guinea pig
tests, which were widely used in past years. Therefore, in this study only a dichotomous
version of the LLNA data was used. To the statistical end, we relied on the logistic
regression approach. This approach provides a statistical tool for investigating and
predicting skin sensitization that is expressed only in categorical terms of activity and nonactivity. Based on the data of compounds used in this study, our results suggest a QSAR
model of ACD that is based on the following descriptors: nDB (number of double bonds),
C-003 (number of CHR3 molecular subfragments), GATS6M (autocorrelation coefficient)
and HATS6m (GETAWAY descriptor), although the relevance of the identified descriptors
to the continuous ACD QSAR has yet to be shown. The proposed QSAR model gives a
percentage of positively predicted responses of 83% on the training set of compounds, and
in cross validation it correctly identifies 79% of responses.
Keywords: ACD, LLNA, binary QSAR, logistic regression, skin sensitization.
Int. J. Mol. Sci. 2004, 5
57
Introduction
The Bureau of Labor Statistics estimates that occupational skin diseases constitute the second
largest group of occupational injuries in the U.S. [1]. Among them, Occupational Contact Dermatitis
(OCD) is the most common cause of work-related skin illness comprising up to 95% of registered
cases. Allergic Contact Dermatitis (ACD) may lead to severe recurrent forms of OCD because of longlasting memory of the immune system. ACD, which is an adaptive, T-cell mediated immune response
[2], usually develops as a result of repetitive skin exposures to a sensitizing chemical agent. At least a
single excessive exposure is essential in the development of the immune response. Information that
leads to the development of recommended skin exposure limits that would prevent workers from
sensitizing overexposures is an important factor impacting public health. A variety of experimental
tests have been suggested to assess the skin sensitization potential of a chemical [3]. Unfortunately,
many experimental protocols result in a dichotomous conclusion, more appropriate for
denial/acceptance decision-making in design and manufacturing of new chemicals rather than for
preventive protection of workers occupationally involved with sensitizing chemical agents. The murine
Local Lymph Node Assay (LLNA) has the capacity to provide dose response data that can be used as a
standardized continuous scale in the quantitative assessment of skin sensitization.
A combination of methods in statistics and computational chemistry, commonly referred to as
Quantitative Structure-Activity Relationship (QSAR) modeling, complements the experimental
approach. A method of QSAR is based on the examination of measured and calculated molecular
descriptors, with known biological activity, in this work the sensitization potential, and then relating a
few of the most informative descriptors to the target bioactivity. The structure-activity relationships
constructed this way provide a means of investigating and predicting the sensitization potential of the
chemicals.
We rely on LLNA data to quantify the skin sensitization potential [4]. At present, the LLNA data
are (1) outnumbered by the long history of guinea pig assays, and (2) often reported as dichotomous
and congruous to the guinea pig data. Therefore, the work has been started using LLNA data in a
dichotomous format to identify molecular descriptors that may be effective in the continuous-scale
LLNA QSAR. The work began from building a database of chemical names, structures, properties and
bioactivities, along with the design of appropriate software. Our immediate goal is to identify a pool of
potentially informative molecular descriptor classes that are most appropriate for QSAR modeling to
predict skin sensitization potential. In the present work, a QSAR based on a logistic regression is
proposed. The logistic regression permits construction of standard QSAR equations, in which the
activity data are represented only in terms of activity (1) or non-activity (0) values. In order to evaluate
molecular properties, which can be associated with LLNA data on skin sensitization, 1204 molecular
descriptors were calculated and tested for their significance in predicting the skin sensitization
potential. Only a limited number of molecular descriptors were found to be statistically associated with
skin sensitization.
58
Int. J. Mol. Sci. 2004, 5
Materials and Methods
In the present study, a pool of 54 LLNA-tested compounds was used, of which 25 were sensitizers
and 29 were negative controls [5, 6]. The molecular structures of these compounds were first encoded
using the SMILES notation and subsequently transformed into three-dimensional co-ordinates using
Cerius2 from Accelrys, Inc (Accelrys, San Diego, USA, http://www.accelrys.com/cerius2). The
Dragon 2.1 software developed by Milano Chemometrics and QSAR Research Group was used to
calculate a total of 1204 molecular descriptors (http://www.disat.unimib.it/chm/Dragon.htm), for each
of the studied compounds. The statistical analysis was carried out using the SAS 8.2 statistical package
[7].
The linear probability model is inadequate for modeling the probability of positive LLNA
sensitization response, since it is heteroscedastic and often leads to uninterpretable results. The logistic
r (...truncated)