Visual Agreement Analyses of Traditional Chinese Medicine: A Multiple-Dimensional Scaling Approach
Visual Agreement Analyses of Traditional Chinese Medicine: A Multiple-Dimensional Scaling Approach
Lun-Chien Lo,1,2 John Y. Chiang,3 Tsung-Lin Cheng,2 and Pei-Shuan Shieh2
1Department of Traditional Chinese Medicine, Changhua Christian Hospital, Changhua 50006, Taiwan
2Graduate Institute of Statistics and Information Science, National Changhua University of Education, Changhua 50058, Taiwan
3Department of Computer Science and Engineering, National Sun Yat-Sen University, Kaohsiung 80424, Taiwan
Received 4 July 2012; Revised 9 August 2012; Accepted 17 August 2012
Academic Editor: Zhaoxiang Bian
Copyright © 2012 Lun-Chien Lo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
The study of TCM agreement in terms of a powerful statistical tool becomes critical in providing objective evaluations. Several previous studies have conducted on the issue of consistency of TCM, and the results have indicated that agreements are low. Traditional agreement measures only provide a single value which is not sufficient to justify if the agreement among several raters is strong or not. In light of this observation, a novel visual agreement analysis for TCM via multiple dimensional scaling (MDS) is proposed in this study. If there are clusters present in the raters in a latent manner, MDS can prove itself as an effective distinguisher. In this study, a group of doctors, consisting of 11 experienced TCM practitioners having clinical experience ranging from 3 to 15 years with a mean of 5.5 years from the Chinese Medicine Department at Changhua Christian Hospital (CCH) in Taiwan were asked to diagnose a total of fifteen tongue images, the Eight Principles derived from the TCM theorem. The results of statistical analysis show that, if there are clusters present in the raters in a latent manner, MDS can prove itself as an effective distinguisher.
Reliability is an indispensable requirement in the biomedical diagnostics. The intraclass or interclass reliabilities have been proposed by many authors [1–7]. There are many works studying agreement measures for western medical diagnostics. However, only a few of them perform agreement analysis for TCM practitioners. In most of the literature concerning TCM agreement, even though complex combinations of TCM diagnostics are considered, a so-called proportion of agreement measure is adopted. The “proportion of agreement,” shown by evidence, overlooks the possible bias caused by randomness. In order to remedy the bias, Cohen proposed his renowned “alpha” measure. Soon after his contribution, weighted kappa, Fleiss kappa, and so forth had been proposed to deal with more complex data types and more raters. Reference  considered a reliability measure called “Krippendorff’s alpha” to investigate the agreement of tongue diagnoses when there are many practitioners, and the data is ordinal. Krippendorff’s alpha coefficient equal to 0.7343 was reported in their study.
The core of diagnosis in Chinese Medicine is “pattern identification/syndrome differentiation and treatment” with inspection, listening, and smelling examination, inquiry, and palpation as the bases. Inspection tops the four diagnoses, and tongue diagnosis is a crucial part during observation. The tongue is connected to the internal organs through meridians; thus the conditions of organs, qi, blood, and body fluids as well as the degree and progression of disease are all reflected on the tongue. Organ conditions, properties, and variations of pathogens can be revealed through observation of tongue. Tongue inspection refers to the shape, color, and coating of a tongue that is, the degree of dimension for tongue diagnosis is three. Krippendorff’s alpha is a good approach for agreement analysis when evaluating the agreement of many TCM practitioners with ordinal data. However, it is complex, and only a single index representing agreement is rendered. More importantly, Krippendorff’s alpha cannot deal with high-dimensional ordinal data obtained through the TCM tongue diagnosis. These two aforementioned pitfalls invalidate the application of Krippendorff’s alpha to the analysis of multidimensional agreement data. Other effective means has to be sought.
In light of the previous observation, we aim at proposing an effective approach to simultaneously deal with highdimensional ordinal data as well as the case when clusters present in the rating result.
A single value of agreement can only represent the “averaging mass” of agreement. We can hardly derive any meaning information out of the single agreement measure, especially when there are clusters present. For example, in the diagnosis of tongue shapes (thick, medium, and thin), suppose that there are three TCM practitioners judging some patients as “thick” and the other three practitioners “medium.” We might reach a low-agreement conclusion, though the agreement is strong, respectively, within each of the two groups. It is interesting that, although it might be low in overall agreement, different TCM prescriptions could work well equally. With these perspectives, an alternative approach, such as multiple-dimensional scaling (MDS), may prove itself as a better alternative to analyzing the agreement of diagnostics among many TCM practitioners with high-dimensional ordinal data. Kupper and Hafner proposed a method to assess the extent of interrater agreement when each unit to be rated is characterized by a subset of distinct nominal attributes . When the attribute data is high-dimensional, the interrater agreement can be treated as the similarity used in multiple-dimensional scaling  (MDS). The essence of MDS is an attempt to represent the observed similarities or dissimilarities in the form of a geometrical model by embedding the stimuli of interest in some coordinate space so that a specified measure of distance, for example, Euclidean distance, between the points in the space represents the observed proximities. In other words, MDS is the search for a low-dimensional space where each space point represents stimulus and the distance between points corresponds to dissimilarity.
In this study, we recruited eleven TCM practitioners with ages ranging from 29 to 47. A total of 15 tongue pictures, taken by the Automatic Tongue Diagnosis System (ATDS) developed to extract tongue features to assist clinical diagnosis, are randomly chosen.
For each of these fifteen tongue images, the recruited TCM practitioners have to identify the patterns according to Eight principles. The Eight principal syndromes are made up of four pairs of opposites, namely, Yin and Yang, Cold and Hot, Empty and Full (or Deficiency and Excess), Exterior and Interior. A symptom or disease can possess several of these properties simultaneously.
2. Method and Results2.1. Patients and TCM Tongue Inspectors
Fifteen pictures of tongues are randomly selected from the archive of the Department of TCM, Changhua Christian Hospital (CCH). The pictures were taken by a digital image capturing and analyzing system called ATDS and were rated by eleven TCM practitioners with ages ranging from 29 to 47. The recruited TCM physicians have to classify each image, based on the Eight Principles, according to the features revealed by the tongues.
2.2. Statistical Analysis
In this study we use four dissimilarity measures to conduct a nonmetric MDS which was first proposed by Kruskal [11, 12]. The four measures refer to Kupper and Hafner’s IAMA  (interrater agreement for multiple attributes), mean character difference (MCD), index of association  (IOA), and average Cohen’s kappa (Cohen’s kappa). The IAMA measure is a chance-corrected concordance. Among these four measures, IAMA and Cohen’s kappa belong to similarity measures, while the other two measure dissimilarity. These four measures will be described in detail in the Appendix. Table 1 is a summary of the patterns of the fifteen patients that are identified by the eleven TCM physicians of CCH according to the Eight Principles. The letters in the body of the table refer to specific TCM physicians. In Table 2, the dissimilarities obtained by IAMA among the TCM physicians are listed. For example, the interrater agreement between rater A and rater C is 0.2462 therefore the dissimilarity can be defined by 1−0.2462=0.7538. Naturally, the diagonal entries are identically zero. The MDS graphs of agreement measures by the proposed four approaches are illustrated in Figure 1. The upper-left graph uses IAMA measure to conduct MDS, the upper-right one corresponds to the MCD method, the lower-left one represents the IOA method, and the lower-right one employs averaging Cohen’s kappa of each attribute in the eight patterns between two distinct raters.
Table 1: A summary of the patterns of the fifteen patients identified by the eleven TCM physicians. Alphabets in the table entries correspond to different TCM physicians participated.
Table 2: Dissimilarities obtained by IAMA among the TCM physicians.
Figure 1: MDS graphs for multiple attributes of Eight Principles for 11 TCM practitioners and 15 patients.
We summarize the diagnoses of the patterns of the fifteen patients in Table 1. According to the four measures mentioned previously, MDS analysis may be conducted to further derive these similarity or dissimilarity measures. Figure 1 shows that the MDS graphs by IAMA and Cohen’s kappa are similar. Rater C is an outlier for all these four graphs. Besides, the graphs by IAMA and Cohen’s kappa share some characteristics in common. Note raters I and F are a little away from the biggest cluster formed by raters B, D, E, G, J, and K. Secondly, raters A and H form a small cluster. Traditional MDS distances using MCD or IOA lead to similar results. From Figure 1, raters C, I, H, and A are isolated singletons. There exists only one cluster formed by raters B, D, E, F, G, J, and K. In all these four graphs, raters B, D, E, G, J and K form a cluster.
In the TCM diagnostics, the practitioners are routinely confronted with a multiple-dimensional qualitative problem of symptom identification. Conventionally, the diagnosis according to Eight Principles summarizes the dynamics of a patient pursuing TCM treatment. When a TCM practitioner receives the information taken by way of the four diagnostics called “inspection, listening (smelling), inquiring and palpation,” he has to distinguish the patterns which are coherent with the symptoms exhibited by the patients. Therefore, how to measure the agreement of the diagnoses according to the vector attributes observed by TCM practitioners is an important issue.
For a single attribute, the researchers are used to adopt Cohen’s kappa, Fleiss kappa, or Krippendorff’s alpha to obtain a single-valued agreement measure. There is a drawback in these popular agreement measures. It does not have a rule of thumb to judge the level of agreement. In this study, we introduce a novel approach in deriving interrater agreement including IAMA proposed by Kupper and Hafner and the averaging Cohen’s kappa, to calculate dissimilarities between any pair of raters. Using the dissimilarity measures, the MDS analysis can be conducted and an agreement graph is subsequently obtained. Figure 1 shows that rater C remains an outlier for all of the four methods. It might be due to that his diagnosis includes many “mixture” patterns, for example, “Yin” mixed with “Yang,” or “Cold” mixed with “Hot,” and so forth. Rater C is a senior TCM physician in the department of TCM of CCH and has a very long experience of research. Moreover, raters A and H are not only TCM practitioners in CCH, but also participate actively in advanced TCM studies for many years. From these analyses, other than agreement, we can distinguish the raters by clusters. As we mentioned in the Introduction section, the conventional single agreement is quite restricted in terms of successfully interpreting the meaning hidden underneath. It cannot judge whether a given “moderate” agreement coefficient is sufficient to quantify the reliability of TCM diagnostics or not. If there are clusters present in the raters in a latent manner, MDS can prove itself as an effective distinguisher.
AppendicesA. IAMA Responses Proposed by Kupper and Hafner
Consider a study in which two equally trained raters, say raters A and B, independently examine each of ? units. Let ?? denote the subset of attribute for the ?th unit chosen by rater A, and let card(??)=??, 0≤??≤?, denote the cardinality of set ??. The symbol ? stands for the complement of set ?. We may depict the data for the ?th unit as follows.
Define the random variable ???=card?∩??+card??∩??=?−??−??+2??(A.1) to be the number of attributes for the ?th unit either chosen by both raters or not chosen by either rater. Define the following agreement proportion: ??=???,(A.2) the overall concordance 1?=???=1??,(A.3) and the chance-corrected concordance ???=?−?01−?0,where?0=1????=1?min?,??.(A.4)
B. MCD and IOA Differences
The mean character difference (MCD) and index of association (IOA) are popular distances used in MDS analysis. Let ?=(?1,…,??) and ?=(?1,…,??) be two vectors of attributes. The MCD distance is defined as 1?(?,?)=???=1||??−??||,(B.1) and the IOA distance is defined by 1?(?,?)=2??=1||||??∑??=1??−??∑??=1??||||.(B.2)
Conflict of Interests
No competing financial interests exist.
The authors are grateful to the anonymous reviewers for their valuable suggestions.
References A. Goodman Leo and H. Kruskal William, “Measures of association for cross classifications,” Journal of the American Statistical Association, vol. 49, pp. 732–764, 1954. View at Google ScholarJ. Cohen, “A coefficient of agreement for nominal scales,” Educational and Psychological Measurement, vol. 20, no. 1, pp. 37–46, 1960. View at Google ScholarJ. Cohen, “Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit,” Psychological Bulletin, vol. 70, no. 4, pp. 213–220, 1968. View at Publisher · View at Google Scholar · View at ScopusJ. L. Fleiss and C. Jacob, “The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability,” Educational and Psychological Measurement, vol. 33, pp. 613–619, 1973. View at Google ScholarJ. L. Fleiss, “Measuring nominal scale agreement among many raters,” Psychological Bulletin, vol. 76, no. 5, pp. 378–382, 1971. View at Publisher · View at Google Scholar · View at ScopusK. Krippendorff, “Estimating the reliability, systematic error, and random error of interval data,” Educational and Psychological Measurement, vol. 30, no. 1, pp. 61–70, 1970. View at Google ScholarK. Krippendorff, “Quantitative guidelines for communicable disease control programs,” Biometrics, vol. 34, no. 1, p. 142, 1978. View at Google Scholar · View at ScopusL. C. Lo, T. L. Cheng, Y. C. Huang, Y. L. Chen, and J. T. Wang, “Analysis of agreement on traditional Chinese medical diagnostics for many practitioners,” Evidence-Based complementary and Alternative Medicine, vol. 2012, Article ID 17801, 5 pages, 2012. View at Publisher · View at Google ScholarL. L. Kupper and K. B. Hafner, “On assessing interrater agreement for multiple attribute responses,” Biometrics, vol. 45, no. 3, pp. 957–967, 1989. View at Google Scholar · View at ScopusB. S. Everitt, S. Landau, M. Leese, and D. Stahl, Cluster Analysis, Wiley Series in Probability and Statistics, John Wiley & Sons, LTD, Chichester, UK, 2011. J. B. Kruskal, “Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis,” Psychometrika, vol. 29, no. 1, pp. 1–27, 1964. View at Publisher · View at Google Scholar · View at ScopusJ. B. Kruskal, “Nonmetric multidimensional scaling: a numerical method,” Psychometrika, vol. 29, no. 2, pp. 115–129, 1964. View at Publisher · View at Google Scholar · View at Scopus