Discrimination between Alzheimer's Disease and Mild Cognitive Impairment Using SOM and PSO-SVM
Discrimination between Alzheimer's Disease and Mild Cognitive Impairment Using SOM and PSO-SVM
Shih-Ting Yang,1 Jiann-Der Lee,1 Tzyh-Chyang Chang,1,2 Chung-Hsien Huang,1 Jiun-Jie Wang,3 Wen-Chuin Hsu,4,5 Hsiao-Lung Chan,1 Yau-Yau Wai,3,6 and Kuan-Yi Li7
1Department of Electrical Engineering, Chang Gung University, Tao-Yuan 333, Taiwan
2Department of Occupational Therapy, Bali Psychiatric Center, New Taipei City 249, Taiwan
3Department of Medical Imaging and Radiological Sciences, Chang Gung University, Tao-Yuan 333, Taiwan
4Department of Neuroscience, Chang Gung Memorial Hospital, Tao-Yuan 333, Taiwan
5Chang Gung Dementia Center, Chang Gung Memorial Hospital, Tao-Yuan 333, Taiwan
6Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital, Tao-Yuan 333, Taiwan
7Department of Occupational Therapy, Chang Gung University, Tao-Yuan 333, Taiwan
Received 15 February 2013; Accepted 13 April 2013
Academic Editor: Chung-Ming Chen
Copyright © 2013 Shih-Ting Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
In this study, an MRI-based classification framework was proposed to distinguish the patients with AD and MCI from normal participants by using multiple features and different classifiers. First, we extracted features (volume and shape) from MRI data by using a series of image processing steps. Subsequently, we applied principal component analysis (PCA) to convert a set of features of possibly correlated variables into a smaller set of values of linearly uncorrelated variables, decreasing the dimensions of feature space. Finally, we developed a novel data mining framework in combination with support vector machine (SVM) and particle swarm optimization (PSO) for the AD/MCI classification. In order to compare the hybrid method with traditional classifier, two kinds of classifiers, that is, SVM and a self-organizing map (SOM), were trained for patient classification. With the proposed framework, the classification accuracy is improved up to 82.35% and 77.78% in patients with AD and MCI. The result achieved up to 94.12% and 88.89% in AD and MCI by combining the volumetric features and shape features and using PCA. The present results suggest that novel multivariate methods of pattern matching reach a clinically relevant accuracy for the a priori prediction of the progression from MCI to AD.
Alzheimer’s disease (AD)  is the most common type of dementia. Clinical signs are characterized by progressive cognitive deterioration, together with declining activities of daily living and by neuropsychiatric symptoms or behavioral changes. The early detection of AD is potentially challenging because of several reasons. First of all, there existed no known biomarkers. The disease usually has an insidious onset which can be a combination of genetic and environmental factors. It is difficult to differentiate other types of dementia.
Mild cognitive impairment (MCI) is a transitional stage between normal aging and demented status. The syndrome is defined by the greater cognitive decline than age and education matched individuals, but no interference of daily function . According to the major symptoms, MCI is characterized with memory loss and cognitive impairment. Research has reported that MCI has a risk between 10% to 64% developing AD [3, 4]. AD is a progressively neuro-degenerative disorder and is distinguished from MCI by the progressive deterioration of daily function. The prevalence of AD increases dramatically at age 65 and it affects approximately 26 million people worldwide, which may increase fourfolds by the year of 2050. Recent reports in the treatment or prevention of AD lead to a growing concerns in the early diagnosis. Therefore, the detection of changes in brain tissues that reflect the pathological processes of MCI would prevent or postpone the disease progresses either from normal control to MCI or from MCI to AD. If MCI can be diagnosed at an early stage and effectively intervened, then it is possible to reduce the advanced damages.
Since the poor performance in memory and execution function indicates the high risk of dementia, the probable AD patients are usually evaluated by standardized neuropsychological tests [5–8]. Additionally, many studies have been proposed to examine the predictive abilities of nuclear imaging with respect to AD and other dementia illnesses [9–13]. However, under the consideration of imaging cost and noninvasive requirement, magnetic resonance imaging (MRI) has been widely used for early detection and diagnosis of MCI and AD [14–17].
Atrophy typically starts in the medial temporal and limbic areas, subsequently extending to parietal association areas, and finally to frontal and primary cortices. Early changes in hippocampus and entorhinal cortex have been demonstrated with the help of MRI, and these changes are consistent with the underlying pathology of MCI and AD. Many studies have used manual or automatic methods to measure hippocampus and entorhinal cortex [18–20]. Hippocampal volumes and entorhinal cortex measures have been found to be equally accurate in distinguishing between AD and normal cognitive elderly subjects . However, the segmentation and identification of hippocampus or entorhinal cortex are usually sensitive to the subjective opinion of the operator and also time consuming. In addition, the enlargement of ventricles is also a significant characteristic of AD due to neuronal loss. Ventricles are filled with cerebrospinal fluid (CSF) and surrounded by gray matter (GM) and white matter (WM). As a result, by measuring the ventricular enlargement, hemispheric atrophy rate shows higher correlation with the disease progression.
In this study, we have designed an MRI-based classification framework to distinguish the patients of MCI and AD from normal individuals using multiple features and different classifiers. Since the features adopted here are volume-related and shape-related, we also aimed to investigate whether the combination of both statistical analysis and principal component analysis (PCA) would improve the accuracies of classification than using volume-related alone, shape-related alone, or all features. Our hypothesis was that the combination of all MRI-based features is helpful for distinguishing the patients with early Alzheimer’s disease from the subjects with mild cognitive impairment and healthy controls, respectively.
The remainder of this paper is organized as follows. Section 2 illustrated the proposed scheme, including features extraction and used classifiers, that is, self-organizing map (SOM), support vector machine (SVM), particle swarm optimization (PSO), and the proposed hybrid PSO-SVM. Statistical analysis, experimental results, and discussion are revealed in Section 3. Finally, conclusions are included in Section 4.
2. The Proposed Schemes
Figure 1 is the flowchart that demonstrated the system we proposed. In the step of Feature Extraction, spatial normalization is performed by coregistering the brain MRI data from each individual to a T1-weighted MRI template such that these images of the investigated subjects will be in the same scale space. Next, with the aids of segmentation and morphological procedures, all MRI brain images are segmented into GM, WM, CSF, and ventricle’s tissues and shape descriptors. Here, volume-related and shape-related features are utilized for further classification. The step of Feature Reduction is divided into two parts: (1) Mann-Whitney U test is adopted to filter out the features with low discriminative power; (2) principal component analysis (PCA) is applied to reduce the dimensions of feature space. Route I only uses U test; Route II is combined with U test and PCA. At last, a classifier, for example, SOM, SVM, and PSO-SVM, is employed to classify tested volunteers into three categories: normal individuals, MCI, and AD patients. The details of the proposed method are described below.
Figure 1: Flowchart of the proposed image-aided diagnosis system.
2.1. Spatial Normalization of MRI Data
Spatial normalization of the brain images is useful for determining what happens generically over individuals. It is a procedure to register an MRI data set to a standard coordinate system, also known as Talairach and Tournoux coordinate system . With the aid of normalization, all images were spatially normalized to stereotactic space ICBM-152  via a 12-degrees-of-freedom affine transformation which normalizes the brain in terms of dimensions, position, and spatial orientation.
2.2. Volume Features Extraction
The volumes of brain tissues such as GM, WM, and CSF indicate important information, especially in brain degeneration diseases . A clustering-based segmentation algorithm provided by SPM8  is using a modified Gaussian mixture model to extract GM, WM and CSF probability maps from whole-brain MRI data. The intensities of voxels belonging to each of these clusters conform to a normal distribution which can be described by a mean, a variance, and the number of voxels belonging to the distribution. Here, the volumes of GM, WM, CSF, and whole-brain are calculated by v o l u m e t i s s u e ≈ ∀ ? ∈ ? ? ? t i s s u e , ∣ ? ( ? ) > 0 . 5 v o l u m e W h o l e ≈ ∀ ? ∈ ? ? ? G M ∨ W M , ∣ ? ( ? ) > 0 . 5 ( 1 ) where i is any pixel of the MRI data and ? ( ? ) stands for the gray level of ? . ? means the cluster. tissue stands for the parts of GM, WM, or CSF. Figure 2 illustrates the segmentation results of the normal individual and AD patient used in this study.
Figure 2: Segmentation results of a normal individual and an AD patient used in this study.
Next, we employ region growing and double threshold algorithm  to extract binary ventricle volume data, that is, ? ( ? , ? , ? ) . The morphological operators, for example, erosion and dilation, are used to obtain the binary ventricle regions. And the edges of binary images are detected by applying Sobel operation on a slice-by-slice basis. Then, this segmented region will construct a binary mask image. In this mask image, 1 (white) denotes the ventricle pixel, and 0 (black) denotes the nonventricle pixel. Finally, we can calculate the volume of cerebral ventricle by v o l u m e V e n t r i c l e ≈ ∀ ? ∈ ? ? ? V e n t r i c l e , ∣ ? ( ? ) = 1 ( 2 ) where ? is any pixel of the mask data, ? is the mask image, and ? ( ? ) denotes the gray level of ? .
2.3. Shape Features Extraction
The volume features, which are extracted from the whole three dimensional volume, cannot capture the variation of the anatomical shape. Wang et al. [27, 28] proposed a ventricle shape-based method for improved classification of Alzheimer’s patients. Therefore, to enhance the accuracy of the classification, in addition to the volume features, we also added ventricle shape features. Figure 3 shows the sagittal view of ventricle that we segmented. The shape features we analyzed are composed of two types: three-dimensional shape features and two-dimensional shape features. The algorithms to obtain these features are illustrated in the following subsections.
Figure 3: Sagittal view of segmented ventricle.
2.3.1. 3D Shape Features
To obtain the feature of 3D shape, a leave-one-out method is used to construct training set and testing set following Wang’s method. Three sets of probability map were then built using ? ? 1 ( ? , ? , ? ) = ? ? ? = 1 ? ? ? ( ? , ? , ? ) , ( 3 ) where ? indicates the type of the subjects, inclusive of normal control, AD, and MCI. ? is the number of training samples, and ? denotes the gray level of the ventricular mask image. In order to compare the differences of patients (AD and MCI) and normal controls, we subtracted the normal probability map from the patient probability map to obtain the discriminate map. At last, a matching coefficient (MC) between a testing input and the discriminate map is calculated by M C ? N o r m a l o r p a t i e n t = ∀ ? , ? , ? ? ( ? , ? , ? ) × ? ? N o r m a l o r p a t i e n t ( ? , ? , ? ) , ( 4 ) where ? ( ? , ? , ? ) is the discriminate map and ? denotes the testing ventricular mask image.
2.3.2. 2D Shape Features
The 2D shape features are extracted from the segmented ventricles on a slice-by-slice basis. In 2D viewpoint, there are many 2D ventricle slices for each case. In order to effectively compare the differences in each case, we selected the slices with maximum areas from 3D ventricle data sets as the datum plane. These 2D shape features used herein are referred to the work of Yang et al.  and listed as follows: (1) Area, (2) Perimeter, (3) Compactness, (4) Elongation, (5) Rectangularity, (6) Distances, (7) Minimum thickness, and (8) Mean signature value.
2.4. Learning Methods for Classification
Machine learning algorithms can be organized into a taxonomy based on the desired outcome of the algorithm or the type of input available during training the machine. They are often divided into supervised, nonsupervised, and reinforcement learning (RL). Supervised learning requires the explicit provision of input-output (I/O) pairs and the task is one of constructing a mapping from one to the other. Non-supervised learning has no concept of target data and performs processing only on the input data. In contrast, RL uses a scalar reward signal to evaluate I/O pairs and hence discover, through trial and error, the optimal outputs for each input. In this sense, RL can be thought of as intermediary to supervised and non-supervised learning since some form of supervision is present, albeit in the weaker guise of the reward signal. As such, the trained algorithm may be treated as a “black box” encapsulating knowledge gleaned from the training data whose inputs are useful for producing the expected outcome. For this reason, machine learning and computer-aided diagnostics (CADs) have been of growing interest in the field of medical applications. To evaluate whether the performance of supervised and non-supervised methods is good or not, we used three classifiers to produce the outcome.
In many researches of pattern recognition, dataset is often divided into two subsets of training and testing. The former is used to create the model, and the latter is used to assess the accuracy of the model to predict the unknown sample. This method can be called Train-and-Test method. Cross-validation is the experimental method to effectively estimate the generalization error. In this study, leave-one-out cross-validation (LOOCV) is adopted in three classifiers to estimate dependable generalization error. LOOCV involves using a single observation from the original sample as the validation data, and the remaining observations as the training data. In this section, the classifiers we adopted are illustrated in the following subsections particularly.
2.4.1. Self-Organizing Map Architecture
A self-organizing map (SOM) is a type of artificial neural network for the visualization of high-dimensional data. In general, SOMs are divided into two parts: training and mapping. Training builds the map using input examples, called a Kohonen map . An SOM consists of components called nodes or neurons. Each node has a set of neighbors. When this node wins a competition, not only its weight is adjusted, but those of the neighbors are also changed. They are not changed as much though. The further the neighbor is from the winner, the smaller its weight change. Furthermore, as training goes on, the neighborhood gradually shrinks. At the end of training, the neighborhoods have shrunk to zero size.
When a training example is fed to the network, its Euclidean distance to all weight vectors is computed by using (5). Here ? denotes the dimension of data, and ? is the index of the data item in a given sequence, ? ? ( ? ) = 1 ( ? ) , ? 2 ( ? ) , … , ? ? ( ? ) . ( 5 )
The neuron with weight vector most similar to the input is called the best matching unit (BMU). The weights of the BMU and neurons close to it in the SOM lattice are adjusted towards the input vector. The magnitude of the change decreases with time and with distance from the BMU. The update formula for a neuron with weight vector is ? ? ( ? + 1 ) = ? ? ( ? ) + ? ( ? ) ℎ ? ? ( ? ) ? ( ? ) − ? ? ( ? ) , ( 6 ) where ? ( ? ) is a monotonically decreasing learning coefficient and ? ( ? ) is the input vector. The neighborhood function ℎ ? ? ( ? ) depends on the lattice distance between the BMU and neuron. The neighborhood function ℎ ? ? ( ? ) is ℎ ? ? ? ( ? ) = − ‖ ? ? − ? ? ‖ 2 2 ? 2 . ( ? ) ( 7 )
Figure 4 illustrates the procedure of SOM classifier. In this study, we use a two-stage method for learning . First, we adopt less iterative time, higher learning rate, and large neighborhood distance for learning and make it convergence speedily. After repeating many times, we can acquire network parameters which have the best convergence. Next, combining higher iterative time, less learning rate, and small neighborhood distance with network parameters obtained in first stage to conduct second learning and adjust network parameters slowly. At last, we obtain these parameters: iterative time is set as 1000 epochs, ordering phase learning rate = 0.9, tuning phase learning rate = 0.5, and tuning phase neighborhood distance = 0.5. In order to verify the stability of SOM to generalize the correct tendency, the classifier was trained 10 times to get reliable results. Thirty cases are chosen ( A D = 7 , N o r m a l = 7 , M C I = 8 ) to be the training set randomly. Scaling of variables is of special importance in our model since the SOM algorithm uses Euclidean metric to measure distances between vectors. In order to solve this problem, we achieved this by linearly scaling all variables so that their variances were equal to one.
Figure 4: Basic procedure of SOM classifier.
2.4.2. Support Vector Machine
SVM is a type of artificial neural networks that is, trained by using supervised learning, have shown their advantage on reducing training-and-testing errors, resulting in obtaining higher recognition accuracy . However, some feature data are linearly nonseparable. In some situations, features are not perfectly separable, especially at the border between categories. To allow some flexibility in separating the categories, SVMs utilize a cost parameter, denoted as ? , to control the trade-off between allowing training errors and forcing rigid margins. The cost function with ? is defined as (8), where ? ? is a slack variable, C o s t = ? ? ? = 1 ? ? . ( 8 )
Mapping the patterns in a high dimension feature space is generated through combining features to form a kernel matrix. The kernel matrix is usually constructed by using a kernel function which takes two patterns as arguments and outputs a value. In this study, a radial basis function (RBF) kernel, as shown in (9), is employed. We use one-against-rest assembles classifiers that distinguish one from all the other classes. This strategy consists of constructing one SVM per class, which is trained to distinguish the samples of one class from the samples of all remaining classes. Usually, classification of an unknown pattern is done according to the maximum output among all SVMs, ? ? ? , ? ? = ? − ? ‖ ? ? − ? ? ‖ F i t ? , ? = ? = 1 , 2 , … , ? , ( 9 ) where ? ? denotes the input vector, ? ? denotes the ? th prototype vector, and F i t ? = correctly − classified/total number of testing data. Finally, the optimal solution can be solved by using Lagrange method, ? ? ≡ 1 2 ‖ ? ‖ 2 + ? ? ? = 1 ? ? − ? ? = 1 ? ? ? ? ? ⋅ ? ? + ? − 1 + ? ? , ? ? ≡ ? ? = 1 ? ? − 1 2 ? ? = 1 ? ? ? ? ? ? ? ? ? ? ? , ? ? , ( 1 0 ) where ‖ ? ‖ is the Euclidean norm of ? , ? ? that stands for the Lagrange multipliers, ? ? is the Lagrange function, and ? ? is the dual solution of ? ? . ? and ? are used to control the trade-off between training errors and generalization ability in SVM with RBF kernel. Therefore, a PSO was utilized to find the optimal combination of ? and ? .
2.4.3. Hybrid PSO-SVM
Particle swarm optimization (PSO) algorithm [33, 34] uses particles moving in an ? -dimensional space to search solutions of an optimization problem with ? variables. In our approach, PSO is initialized and searches for the optimal particle iteratively. Each particle represents a candidate solution. SVM classifier is built for each candidate solution to evaluate its performance. Velocity and position of particles can be updated by ? ? + 1 ? ? = ? ⋅ ? ? ? ? + ? 1 r a n d 1 p b e s t ? ? ? − ? ? ? ? + ? 2 r a n d 2 g b e s t ? ? ? − ? ? ? ? ? ? + 1 ? ? = ? ? ? ? + ? ? + 1 ? ? , ( 1 1 ) where ? is evolutionary generation, ? ? ? is the velocity of particle ? on dimension ? , and ? ? ? stands for the position of particle ? on dimension ? . Inertia weight ? is used to balance the global exploration and local exploitation, r a n d 1 and r a n d 2 are random functions, and ? 1 and ? 2 are personal and social learning factors. As we know, if the number of particles, denoted as ? , is too large, it might cause the optimization process to be time consuming. On the contrary, if ? is too small, then it is hard to find the optimal solution due to the limited search area. In the literature , it is proven that the optimal solution can be obtained when ? is between 20 and 40. In this work, the number of the iterations and ? is set to 200 and 30, respectively. Similarly, the parameters ? 1 , ? 2 , and ? will affect the convergence of optimization process. If they are set too large, it causes the particle velocity to be speedy and thus cannot obtain the optimal solution. On the other hand, it is time consuming to find the optimal solution . Therefore, we set ? 1 , ? 2 , and ? to 2, 2, and 0.8, respectively.
More specifically, based on the approach , the proposed hybrid PSO-SVM aims at optimizing the accuracy of SVM classifier by randomly generating the parameters ( ? and ? ) and estimating the best values for regularization of kernel parameters for SVM model. Basic operation of hybrid PSO-SVM proposed in this paper is given in Figure 5.
Figure 5: Basic operation of proposed PSO-SVM approach.
This process continues until the performance of SVM converges. The termination criteria are that the iteration number reaches the maximum number of iterations (100%) or the value of global optimal fitness does not improve after 200 consecutive iterations. In this study, 22 cases were chosen (AD = 7, Normal = 7, MCI = 8) to be the training set.
3. Experimental Results and Discussion3.1. Materials
According to the research , most patients with Alzheimer’s disease are aged 65 years or older. Therefore, most of the subjects in the whole data we choose are over 65 years old. The image data used in this study were provided by Chang Gung Memorial Hospital, Lin-Kou, Taiwan. The degree of clinical severity for each participant was evaluated by experienced clinicians whom conducted independent semistructured interviews which included a set of questions regarding the functional status of the participant, along with a standardized neurologic, psychiatric, and health examinations. This interview generates an overall Clinical Dementia Rating (CDR) and Mini Mental State Examination (MMSE) score. The whole dataset consists of three groups comprising normal control, MCI, and AD. Demographic information is provided in Table 1.
Table 1: Demographic data and cognitive scores.
The whole-brain MRI scans were obtained by a 3T MR scanner (Trio A TIM system, Siemens, Erlangen, Germany). T1-weighted images were acquired by magnetization-prepared 180 degrees radio-frequency pulses and rapid gradient-echo (T1-MPRAGE) series. The following imaging parameters were used: repetition time (TR) = 2000 ms, echo time (TE) = 4.16 ms, and flip angle = 9 degrees. The results were represented as a 2 2 4 × 2 5 6 matrix, and slice thickness = 1 mm in 160 slices.
3.2. Statistical Analysis and Classification
Through image processing techniques, we obtained individual volume and shape features. In order to confirm whether there is a significant effect of the classification for these features, we use statistical MW test to compare differences between three groups on various features (continuous variables).
The MW test, also called a Mann-Whitney ? or Mann-Whitney Wilcoxon test, is a nonparametric rank-based test for identifying the difference between populations with respect to their medians or means. The test does not require sample data to be normal (sample > 30), and it is relatively insensitive to the nonhomogeneity of the variance of sample data. The null hypothesis is that the two populations from which samples have been drawn have equal medians or means. The alternatives are that the populations do not have equal medians. The two samples are combined, and all sample observations are ranked from smallest to largest. It was performed on each feature to evaluate its discriminative power, as shown in (12). ? o b t is the smaller value taken from the sum of ? 1 and ? 2 , where ? 1 and ? 2 are the sizes of the first and second samples, respectively, ? ? = ? o b t − ? 1 ? 2 / 2 ? 1 ? 2 ? 1 + ? 2 . + 1 / 1 2 ( 1 2 )
The ? values obtained from the tests can provide the probability that a variation would assume a value greater than or equal to the observed value strictly by chance. It is known that the ? value which is less than the predetermined significance level (0.05) would result in the rejection of the null hypothesis at the 5% (significance) level. All statistical results of volume and shape features we adopted ( < 0.05) are shown in Table 2, inclusive of three volume features and seventeen shape features.
Table 2: Statistical analysis of features.
Although the features we adopted have statistical significance ( < 0.05) between three groups, some of the features may be redundant or have high correlation. Therefore, principal component analysis (PCA)  is used to reduce the dimensionality of a data set consisting of a large number of interrelated variables, while retaining as much as possible of the variation present in the data set. On the other hand, it can also improve the computation time required for classification. This is achieved by transforming to a new set of variables, the principal components (PCs), which are uncorrelated and are ordered so that the first few retain most of the variation present in all of the original variables. In order to effectively represent all the data, we used the PCs that captured 95% total variation in data set. To train a volume-feature-based classification, the first two principal components were adopted. To train a shape-feature-based classification, only the first eight principal components were adopted. When we integrated volume and shape features into classification, the first six principal components were used to stand for all of the features. Table 3 gives the variances and the coefficients of the PCs, when the analysis is done on the correlation matrix. The symbol * indicates that this PCA coefficient is used as a feature for classification. SOM, SVM, and PSO-SVM were used to train a classifier, and the results were presented in Tables 4, 5, 6, 7, 8, and 9.
Table 3: PCs and their proportion of total variation.
Table 4: Classification results (SOM).
Table 5: Confused matrix with SOM (volume + shape/volume + shape + PCA).
Table 6: Classification results (SVM).
Table 7: Confused matrix with SVM (volume + shape/volume + shape + PCA).
Table 8: Classification results (PSO-SVM).
Table 9: Confused matrix with PSO-SVM (volume + shape/volume + shape + PCA).
It showed the results of accuracy (proportion of all subjects correctly classified), sensitivity (proportion of individuals with a true positive result), and specificity (proportion of individuals with a true negative result) when using different features. The derivations of accuracy, sensitivity, and specificity were expressed in (13), where TP = true positive, TN = true negative, and FP = false positive. Obviously, incorporating shape features, volume features, and PCA provided excellent classification ability than using only one of them, A c c u r a c y ( A C C ) = ( T P + T N ) ( = ? + ? ) S e n s i t i v i t y o r t r u e p o s i t i v e r a t e ( T P R ) T P ? = T P ( = T P + F N ) S p e c i ﬁ c i t y o r T r u e N e g a t i v e R a t e ( T N R ) T N ? = T N . ( F P + T N )
In this study, we investigated the feasibility of using anatomical MR images to extract different types of features as a predictive marker for AD and MCI. We employed multiple features and different classifiers to identify the patients with AD and MCI from normal participants. From the results, volumetric analysis, inclusive of gray/white matter, cerebrospinal fluid, and local shape analysis on ventricle, provides significant atrophy information. Especially, the properties of gray matter volume, ventricular area, elongation, mean signature value, and distances show the statistical significance ( < 0.01). This implies that using the volume and shape features have the potential ability to identify normal control, AD, and MCI.
By combining both the volumetric features and shape features, the classification accuracy of SOM reached up to 76.47% and 66.67% in patients with AD and MCI, respectively. Moreover, with the help of PCA algorithm, the classification result was improved up to 88.24% and 72.22% in patients with AD and MCI, respectively. The classification accuracy of SVM reached up to 76.47% and 77.78% in patients with AD and MCI, respectively. Moreover, with the help of PCA algorithm, the classification result was improved up to 82.35% and 83.33% in patients with AD and MCI, respectively. With the hybrid classification framework based on PSO, the result achieved up to 82.35% and 77.78% in AD and MCI. Moreover, with the help of PCA algorithm, the classification result was improved up to 94.12% and 88.89% in patients with AD and MCI, respectively. According to the results, combining PSO-SVM with statistical analysis and principal component analysis (PCA) would improve the accuracy of classification.
It was also noted that the classification ability was significant for AD and normal control than the patients with MCI. MCI is a transitional stage between normal cognitive aging and dementia. Therefore, the characteristics of patients with MCI were similar to AD subjects. On the other hand, the characteristic of patients with MCI was also possibly similar to normal participants. Combination with other features was essential to improve the accuracy of classification ability for patients with MCI in an early stage.
In this paper, we compared different methods for the classification of patients with AD and MCI based on anatomical T1-weighted MRI. To evaluate and compare the performances of each method, two classification experiments were performed: CN versus AD and CN versus MCI. It is observed that the volume features and shape features can be integrated to increase classification accuracy with the low computational complexity. Classification results also verify our hypothesis that the combination of multimodal features, including volume and shape features, outperforms a single modality of features, possibly because different features are mutually complementary. Furthermore, it is proven that statistical analysis and PCA can achieve accuracies significantly better than all the features that are adopted. In the performance of classifiers used here, it is shown that PSO-SVM can achieve the best accuracy, sensitivity, and specificity, no matter for CN versus AD and CN versus MCI.
For the moment, the classified results are greater for patients with AD and normal participants than for patients with MCI. It can provide clinically useful information at the large-scale population-based screening studies. The results would be welcomed for prognosticating disease progression and providing an objective evaluation of cognitive rehabilitation treatments for dementing illness.
The work was supported by National Science Council, Taiwan, under Grant no. NSC98-2221-E-182-040-MY3 and Chang Gung Memorial Hospital with Grant no. CMRPD270053.
References S. Gauthier, B. Reisberg, M. Zaudig et al., “Mild cognitive impairment,” The Lancet, vol. 367, no. 9518, pp. 1262–1270, 2006. View at Publisher · View at Google Scholar · View at ScopusD. M. Geslani, M. C. Tierney, N. Herrmann, and J. P. Szalai, “Mild cognitive impairment: an operational definition and its conversion rate to Alzheimer's disease,” Dementia and Geriatric Cognitive Disorders, vol. 19, no. 5-6, pp. 383–389, 2005. View at Publisher · View at Google Scholar · View at ScopusR. C. Petersen, “Mild cognitive impairment as a diagnostic entity,” Journal of Internal Medicine, vol. 256, no. 3, pp. 183–194, 2004. View at Publisher · View at Google Scholar · View at ScopusAlzheimer's disease facts and figures, 2012, http://www.alz.org/alzheimers_disease_facts_figures.asp?type=homepage. M. F. Folstein, S. E. Folstein, and P. R. McHugh, “‘Mini mental state’. A practical method for grading the cognitive state of patients for the clinician,” Journal of Psychiatric Research, vol. 12, no. 3, pp. 189–198, 1975. View at Publisher · View at Google Scholar · View at ScopusC. P. Hughes, L. Berg, and W. L. Danziger, “A new clinical scale for the staging of dementia,” British Journal of Psychiatry, vol. 140, no. 6, pp. 566–572, 1982. View at Google Scholar · View at ScopusE. L. Teng, K. Hasegawa, A. Homma et al., “The cognitive abilities screening instrument (CASI): a practical test for cross-cultural epidemiological studies of dementia,” International Psychogeriatrics, vol. 6, no. 1, pp. 45–58, 1994. View at Publisher · View at Google Scholar · View at ScopusT. N. Tombaugh, “Trail Making Test A and B: normative data stratified by age and education,” Archives of Clinical Neuropsychology, vol. 19, no. 2, pp. 203–214, 2004. View at Publisher · View at Google Scholar · View at ScopusP. Padilla, J. M. Górriz, J. Ramírez et al., “Analysis of SPECT brain images for the diagnosis of Alzheimer's disease based on NMF for feature extraction,” Neuroscience Letters, vol. 479, no. 3, pp. 192–196, 2010. View at Publisher · View at Google Scholar · View at ScopusR. Chaves, J. Ramirez, J. M. Gorriz, and C. G. Puntonet, “Alzheimers’s Disease Neuroimaging Initiative, Association rule-based feature selection method for Alzheimer’s disease diagnosis,” Expert Systems With Applications, vol. 39, no. 14, pp. 11766–11774, 2012. View at Google ScholarJ. Ramírez, J. M. Górriz, F. Segovia et al., “Computer aided diagnosis system for the Alzheimer's disease based on partial least squares and random forest SPECT image classification,” Neuroscience Letters, vol. 472, no. 2, pp. 99–103, 2010. View at Publisher · View at Google Scholar · View at ScopusA. Gallix, J. M. Gorriz, J. Ramirez, I. A. Illan, and E. W. Lang, “On the empirical mode decomposition applied to the analysis of brain SPECT images,” Expert Systems With Applications, vol. 39, no. 18, pp. 13451–13461, 2012. View at Google ScholarD. Salas-Gonzalez, J. M. Gorriz, J. Ramirez et al., “Two approaches to selecting set of voxels for the diagnosis of Alzheimer's disease using brain SPECT images,” Digital Signal Processing, vol. 21, pp. 746–755, 2012. View at Google ScholarJ. E. Iglesias, J. Jiang, C. Y. Liu, and Z. Tu, “Alzheimers's Disease Neuroimaging Initiative, Classification of Alzheimer's disease using a self-smoothing operator,” in Proceedings of the 14th International Conference on Medical Image Computing and Computer Assisted Intervention, 2012. P. Vemuri, J. L. Gunter, M. L. Senjem et al., “Alzheimer's disease diagnosis in individual subjects using structural MR images: validation studies,” NeuroImage, vol. 39, no. 3, pp. 1186–1197, 2008. View at Publisher · View at Google Scholar · View at ScopusD. Zhang, Y. Wang, L. Zhou, H. Yuan, and D. Shen, “Multimodal classification of Alzheimer's disease and mild cognitive impairment,” NeuroImage, vol. 55, no. 3, pp. 856–867, 2011. View at Publisher · View at Google Scholar · View at ScopusP. Vemuri, H. J. Wiste, S. D. Weigand et al., “MRI and CSF biomarkers in normal, MCI, and AD subjects: predicting future clinical change,” Neurology, vol. 73, no. 4, pp. 294–301, 2009. View at Publisher · View at Google Scholar · View at ScopusK. Juottonen, M. P. Laakso, K. Partanen, and H. Soininen, “Comparative MR analysis of the entorhinal cortex and hippocampus in diagnosing Alzheimer disease,” American Journal of Neuroradiology, vol. 20, no. 1, pp. 139–144, 1999. View at Google Scholar · View at ScopusO. Colliot, G. Chételat, M. Chupin et al., “Discrimination between Alzheimer disease, mild cognitive impairment, and normal aging by using automated segmentation of the hippocampus,” Radiology, vol. 248, no. 1, pp. 194–201, 2008. View at Publisher · View at Google Scholar · View at ScopusJ. H. Morra, Z. Tu, L. G. Apostolova et al., “Automated mapping of hippocampal atrophy in 1-year repeat MRI data from 490 subjects with Alzheimer's disease, mild cognitive impairment, and elderly controls,” NeuroImage, vol. 45, no. 1, pp. S3–S15, 2009. View at Publisher · View at Google Scholar · View at ScopusK. Kantarci, “Magnetic resonance markers for early diagnosis and progression of Alzheimer's disease,” Expert Review of Neurotherapeutics, vol. 5, no. 5, pp. 663–670, 2005. View at Publisher · View at Google Scholar · View at ScopusJ. Talairach and P. Tournoux, Co-Planar Stereotaxic Atlas of the Human Brain, Thieme, New York, NY, USA, 1988. J. Mazziotta, A. Toga, A. Evans et al., “A probabilistic atlas and reference system for the human brain: International Consortium for Brain Mapping (ICBM),” Philosophical Transactions of the Royal Society B, vol. 356, no. 1412, pp. 1293–1322, 2001. View at Publisher · View at Google Scholar · View at ScopusK. H. Fritzsche, A. von Wangenheim, D. D. Abdala, and H. P. Meinzer, “A computational method for the estimation of atrophic changes in Alzheimer's disease and mild cognitive impairment,” Computerized Medical Imaging and Graphics, vol. 32, no. 4, pp. 294–303, 2008. View at Publisher · View at Google Scholar · View at ScopusUCL Institute of Neurology, http://www.fil.ion.ucl.ac.uk/spm/. C. F. Jiang, C. H. Huang, and S. T. Yang, “Using maximal cross-section detection for the registration of 3D image data of the head,” Journal of Medical and Biological Engineering, vol. 31, no. 3, pp. 217–226, 2011. View at Publisher · View at Google Scholar · View at ScopusJ. Wang, A. Ekin, and G. De Haan, “Shape analysis of brain ventricles for improved classification of alzheimer's patients,” in Proceedings of the 15th IEEE International Conference on Image Processing (ICIP '08), pp. 2252–2255, October 2008. View at Publisher · View at Google Scholar · View at ScopusJ. Wang, G. De Haan, D. Unay, O. Soldea, and A. Ekin, “Voxel-based discriminant map classification on brain ventricles for Alzheimer's disease,” in Medical Imaging: Image Processing, vol. 7259 of Proceedings of SPIE, February 2009. View at Publisher · View at Google Scholar · View at ScopusS. T. Yang, J. D. Lee, C. H. Huang, J. J. Wang, W. C. Hsu, and Y. Y. Wai, “An image-aided diagnosis system for dementia classification based on multiple features and self-organizing map,” Lecture Notes in Computer Science, vol. 6444, no. 2, pp. 462–469, 2010. View at Publisher · View at Google Scholar · View at ScopusT. Kohonen, “The self-organizing map,” Proceedings of the IEEE, vol. 78, no. 9, pp. 1464–1480, 1990. View at Publisher · View at Google Scholar · View at ScopusS. Wu and T. W. S. Chow, “Self-organizing and self-evolving neurons: a new neural network for optimization,” IEEE Transactions on Neural Networks, vol. 18, no. 2, pp. 385–396, 2007. View at Publisher · View at Google Scholar · View at ScopusC. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995. View at Publisher · View at Google Scholar · View at ScopusJ. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of the IEEE International Conference on Neural Networks, vol. 4, pp. 1942–1948, December 1995. View at ScopusZ. Cui, L. Wang, and Y. Tan, “Particle swarm optimization with active congregation,” ICIC Express Letters, vol. 4, no. 4, pp. 1167–1172, 2010. View at Google Scholar · View at ScopusM. Kudo and J. Sklansky, “Comparison of algorithms that select features for pattern classifiers,” Pattern Recognition, vol. 33, no. 1, pp. 25–41, 2000. View at Google Scholar · View at ScopusY. Shi and R. Eberhart, “Modified particle swarm optimizer,” in Proceedings of the IEEE International Conference on Evolutionary Computation (ICEC '98), pp. 69–73, May 1998. View at ScopusC. J. Tu, L. Y. Chuang, J. Y. Chang, and C. H. Yang, “Feature selection using PSO-SVM,” IAENG International Journal of Computer Science, vol. 33, no. 1, pp. 138–143, 2007. View at Google ScholarI. T. Jolliffe, Principal Component Analysis, Springer Series in Statistics, Springer, New York, NY, USA, 2nd edition, 2002.