A classification of the questionnaire of reviewers and applicants

Ekonomiczne Problemy Usług, Dec 2013

Agata Kopacz, Marek Kozłowski, Jarosław Protasiewicz, Tomasz Stanisławek

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:


A classification of the questionnaire of reviewers and applicants

Tomasz Stanis?awek, Jaros?aw Protasiewicz, Marek Koz?owski, Agata Kopacz A?classification of the?questionnaire of reviewers and applicants Ekonomiczne Problemy Us?ug nr - ZESZYTY NAUKOWE UNIWERSYTETU SZCZECI?SKIEGO A questionnaire is a research instrument consisting of a series of questions in order to gather information from respondents. Usually, aquestionnaire consists of a number of questions that the respondent has to answer in a set format. A questionnaire1 can be defined as a series of processes, that extract useful information in order to solve problems, by asking people involved in the problem the same question, collecting data as answers to the questions, and analyzing them. Questionnaires are mainly conducted for statistical analysis of the responses. A form of the questionnaire consists of open-ended and closed-ended questions. A closed-ended question limits respondents with a given number of options from which they must choose to answer the question. The response options for a closed-ended question should be exhaustive and mutually exclusive. An open-ended question asks the respondent to formulate his own answer. 1 H. Inui, M. Murata, K. Uchimoto, H. Isahara, c lassiffication of open-ended questionnaires based on surface information in sentence structure, In Proceedings of the 6th NLPRS2001, pp. 315-322, 2001. This kind of question gives the answering person a scope of information that seems appropriate to them. A respondent?s answer to an open-ended question is afterwards coded into a response scale or multi-label categorized. The open form of the questionnaire consists of one style of responding to the questions. This open form is also called a free descriptive questionnaire, since, in that style, the respondents freely describe answers to the prepared questions. This format has been distinguished from the fixed-alternative, in which answers are of a closed form2. Questionnaire data, that consist only of closed answers is relatively easy to handle, because they are structured. Researchers have proposed many methods for analyzing these kinds of answers, using such multivariate analysis techniques as cluster analysis and correspondence analysis. Questionnaire data that includes open answers is much more difficult to analyze automatically. At first, they are segmented (split into sequences of sentences) and tokenized (sentences are divided into lists of words). Next, texts represented as vectors of tokens are processed by text mining methods such as text-clustering techniques or the self-organizing map technique. The idea here is to view each answer as a vector of words and to use similarity measures to cluster the vectors. Those kind of methods are effective for summarizing answers, but they are inefficient in extracting target characteristics. Other researchers have proposed methods for analyzing open answers on the basis of associations between the words. The approach is based on calculating associations between word pairs based on their co-occurrences in open answers and then visually present the words and associations on a two-dimensional map3. In the paper4 authors are focused on the open questions in the questionnaire and discuss the problems encountered during the analysis of the responses to such questions, from the viewpoint of statistical NLP. Combining statistical analyses and information retrieval techniques in which the context of questionnaires is discussed5. 2 ibidem. 3 K. Yamanishi, H. Li., Mining open answers in questionnaire data, IEEE Intelligent Systems 2002. 4 L. Lebart, A. Salem, L. Berry,e xploring Textual Data, Kluwer Academic Publishers 1998. 5 S. Hirasawa, F. Shih, W. Yang, s tudent questionnaire analyses for class management by text mining both in Japanese and in c hinese, In Proc. 2007 IEEE International Conference on System, Man and Cybernetics 2007. Authors introduce the methods of data mining and text mining (e.g. LSI, EM algorithms) in order to cope with questions answered by a fixed format and those by a free format. Apart from using traditional classifiers, there are also works focused on applying the association rules techniques to analyze questionnaire data6. Based on fuzzy techniques they discover fuzzy association rules from the questionnaire datasets, so that all different data types can be handled in a uniform manner. Answers to open-ended questions often contain valuable information. The main problem associated with the analysis of survey data is that the manual handling is both cumbersome and very costly, especially when it exists in large volume. However, the analysis method for the open-ended answers has not been established well enough, and classification based on the content of the answers often needs manual operations. The costs of such operations are high and the result of human judgment is a lack of objectivity. In general, processing of answers in natural language is difficult because of the enormous variation in linguistic expression. This problem might be solved by applying language processing techniques, such as information extraction or automatic classification. Our aim was to find the best computational approaches, using machine learning methods for the automatic classification of collected open-ended questionnaires, in order to speed up and reduce costs of a questionnaire?s analysis. The presented approach is based on segmentation of open answers into words and conducting an analysis of the word, as well as in phrase levels. We have developed a survey analysis system that works on these principles. The proposed text mining methods provides a new way of analyzing natural-language responses to questionnaires. Using multi-label categorization techniques, we are able to extract semantic information about the open-ended questions, which is complex and multi-dimensional. This paper reports the results of our preliminary experiments, using svm, naive bayes for questionnaire classification. Methods 1.1. Questionnaire of reviewers and applicants Questionnaire foundations. Information Processing Institute supports many processes of grant funding in Poland by providing information systems. The first information system have been developed for science funding streams (OSF) managed by Ministry of Science and Higher Education. It has been launched on-line in 2004, and after this success more science funding processes have been computerized, for instance: Polish-Norwegian Research Fund (PN FBN), Polish-Swiss Research Programme (PSPB), Innovative Economy (PO IG). All of them are managed by Information Processing Institute. These systems usually contain the following modules: tools for on-line proposals preparing; tools for proposals processing used by an agency; a database and algorithms for selecting of reviewers; on-line tool for reviews. Almost 19k reviewers have been asked since July 2011, whether they can prepare reviews using these systems. As a result, 132.5k requests for reviews were sent but 20.5k of them were returned by reviewers. The vast majority of reviews was prepared for grant programs managed by Ministry of Science and Higher Education. The reviewer?s distribution was: 44% professors, 30% associate professors7, 20% assistant professors8 and 7% others. Most of them were employed at universities (67,1%), and 14,2% in research institutes, and 18,7% in other places9. Peer review process assumes that experts assessors are qualified and able to perform reasonable review about any scholarly work and research project, but in fact, the peer review is widely criticized. Neff and Olden10 maintain that this process is open to misuse and influences on the editor and reviewer 7 In Polish: dr hab. 8 In Polish: doktor. 9 Procedures for review and selection of reviewers, ed. J. Protasiewicz, Vol. 1 (in Polish), Information Processing Institute 2012. 10 B.D. Neff, J.D. Olden, is peer review a game of chance?, BioScience 2006, 56 (4), pp. 333-340. integrity. For only 47% of scientists an article published in peer-reviewed journal proves its high quality11. Information obtained from foreign literature and desk research, were the inspiration for conducting an anonymous online survey. The aim of this study was to verify researchers perception of problems with peer review process in Poland. The survey was conducted on a group which included research staff, both reviewers (almost 20%) and applicants (45%). 35% of respondents had experience in both areas. Most respondents were assistants professors (43%), 28% were professors, 24% were associate professors and 5% with unreported degree. Respondents came from different disciplines, such as medicine, biology, economy, chemistry, physics, history, philology or computer science. 95% of the respondents had experience in Ministry of Science and Higher Education grant programs. 18% of scientists took part in the Innovative Economy and 17% in the National Centre for Research and Development programs. Polish-Swiss Research Programme applies 14% of respondents and Polish-Norwegian Research Fund 4%12. a nswer categories and subcategories. The survey contained 14 closed-ended questions about researcher?s perception of the peer review process in Poland, and one open-ended question which was a request for any further comments or suggestions about the experience of the peer review process. The questionnaire was completed by 8190 people, but theopen-ended question was commented out only by 2615 of them (about 32%). According to the OPI experts, 301 answers were incomplete or irrelevant. The analysis of the answers would be time consuming and expensive. Therefore, our aim was to carry out an automatic classification using machine learning methods. The answers have been categorized in five categories of problems which consisted of sixteen subcategories13 (Table 1). 11 N. Macnab, G. Thomas, Quality in research and the significance of community assessment and peer review: education?s idiosyncrasy, International Journal of Research & Method in Education 2007, 30(3), pp. 39-352. 12 Procedures for review and selection? 13 Categories and subcategories were identified by the OPI experts, but mainly by Agata Kopacz. The categories and the subcategories of answers to the open-ended question Problem definition. Lets consider a set of answers to the open-ended question in the questionnaire and denote it as Each answer di , i = 1,2,...n may contain many statements d = [d1, d2 ,..., dn ]T di = [si,1, si,2 ,...si,m ]T (1) (2) and they can refer to various problems mentioned by responders. These problems we defined in Table 1. Let denote a category as ca and corresponding subcategory as scab. An answer di can belong to many categories or subcategories. The task is to build a classifier which will be able to automatically assign categories and subcategories to each answer di . We have divided the set d into the training set d Train and the testing set d Test. The experts have manually prepared the training set in a special way: all answers di in the training set were split into statements si, j and next subcategories sc were assigned to them. One subcategory was assigned to one statement. A statement is treated as a set of sentences or one sentence which should contain a consistent message in the same category. c lassifiers Selected classification algorithms. Among many classification algorithms, there are some especially important, such as Support Vector Machines (SVM) and classifiers based on Bayes theorem: Naive Bayes (NB) and Multinominal Naive Bayes (MNB). Naive Bayesian classifiers are based on two assumptions. Firstly, they consider documents as a bag of words where word position ina document does not affect the result of classification. Secondly, they assume that probability of word?s occurrence in a document di is independent from probability of other word?s occurrences for the given class. Therefore, we can easily calculate conditional probability that a sentence di combined form a bunch of words xi,1, xi,2 ,...xi,k belongs to a class cl ? c . k P(cl | xi,1, xi,2 ,...xi,k ) ? P(cl )? P(xi,k | cl ) k=1 and finally determines to which class belongs the document cwinner = arg max cl P(cl | xi,1, xi,2 ,...xi,k ) (3) (4) Although an assumption of features independence is rather untrue, a Naive Bayes classifier works surprisingly well in practice. In this case a distribution of each feature P(xi,k | cl ) is not defined. If we assume that each feature has multinomial distribution, then we have Multinomial Naive Bayes. This assumption works well, for instance, in case of text classification where can be used in the word counts model. A bayesian classifier is learned from a set d Train and this process involves: extracting vocabulary; computing a prior P(cl ) ; calculating a likelihood P(xi,k | cl ) of belonging each word xi,k to each decision class cl . These values are calculated as ratio between a number of documents or words representing a particular class and a total number of documents or words in class. There is possibility that a particular word in the test set d Test , does not occur in the training set d Train , so its likelihood will be equal to zero. Thus, due to the multiplication of the probabilities, an entire reviewer?s answer will not be properly classified. There are several ways to solve this problem. The most frequent solution is to use Laplace smoothing or determination the likelihood of low value correlated to all other probabilities14. Support Vector Machines was firstly presented in 1995 by Valdimir Vapnik. SVM uses a principle of structural risk minimization. The main idea of algorithm is to find such decision boundary which can separate classes usually a positive one and a negative one. Regarding the classification problem there are distinguished linear and nonlinear cases. The SVM classifiers consider a document or a sentence as a bag of words x similarly to Naive Bayes. In the linear case the classes are separated by a hyperplane: w * x ? b = 0 (5) where the weights w are selected during teaching process using the train set d Train and quadratic programming. Nonlinear cases are solved by using soft margin methods which allows some errors or by using a kernel function such as multinomial, gaussian or hyperbolic tangent15. m ulti-class and multi-label classification. Typically a bayesian classifier assigns only one class with the highest probability while testing a particular answer (eq. 4). But as we mentioned previously, an answer di , i = 1,2,...n to an open-ended question can belong to many subcategories, which we denote as the classes cl , l = 1,...l . Therefore, this case contains either multi-class and multi-label problems, because the data set contains many classes (categories and subcategories - see Table 1). and the answers are assigned to many classes (labels). We can solve this issue in two ways. The first approach assumes that it is possible to use only one classifier in the manner of multi-label classification. The classifier e.g. Multinomial Naive Bayes produces as an output a vector of probabilities - one value for each class (eq. 3). The classes with the highest 14 D. Fragoudis, D. Meretakis, S. Likothanassis, b est terms: a n efficient feature-selection algorithm for text categorization, Knowledge and Information Systems 2005, 8 (1), pp. 16-33; T. Hastie, R. Tibshirani, J. Friedman, The e lements of s tatistical l earning, Springer, New York 2009; Z. Hoare, l andscapes of naive bayes classifiers, Pattern Analysis and Application 2008, 11 (1), pp. 59-72. 15 B. Liu, w eb Data Mining: e xploring h yperlinks, c ontents and u sage Data, Springer, New York, 2010; W. Noble, w hat is a support vector machine?, ?Nature Biotechnology? 2006, No. 24, pp. 1565-1567; C. Silva, B. Ribeiro, o n text-based mining with active learning and background knowledge using svm, Journal of Soft Computing - A Fusion of Foundations, Methodologies and Applications 2007, 11( 6 ), pp. 519-530. probability are taken as an outcome, but someone must decide how many classes should be taken into account. The second approach is using the procedure called one vs others. This procedure implies the use l - 1 classifiers to solve the multi-class problem. Each classifier e.g Multinomial Naive Bayes is trained in a binary manner to recognize one class and all others. In classification stage all classifiers verify a new example and finally many classes can be assigned to it. There can be a situation when all classifiers choose class ?others? and the tested example will be unclassified, or on the other hand too many classes will be assigned. In order to avoid the over classification someone has to experimentally choose a probability threshold of belonging to the class16. m odel improvements. Before classification, the texts are pre-processed what involves: lemmatization, removing stopwords, determination the validity of the words, using TF-IDF (term frequency - inverse document frequency). The classifiers are trained using TF-IDF values of words from pre-processed sentences. We call it a basic form of our classification model. It is easy to notice that the quality of classifiers depends on the quality of texts pre-processing. We propose three improvements of the basic classification model. Firstly, the answers to open-ended question contain many misspellings what can interfere the lemmatization process. They can be corrected by an electronic vocabulary set. In case of the questionnaire it could be the Polish dictionary, for instance http://www.sjp.pl. Secondly, we deal with the texts in Polish, and we know that the Polish language has different grammar than English, so it needs special algorithms in order to properly extract keywords. We have developed the algorithm - Polish Keyword Extractor17, which is based on Rapid Automatic Keyword Extraction (RAKE) and KEA. Finally, we should note that effectiveness of classification models depend on the quality of a training set and especially often on its size. The experts have agreed that the answers containing up to 220 words (about one or two sentences) should be classified in only one subcategory. Svm classifier parameter optimization. The parameters choice for SVM classifier is a nontrivial and laborious task, because there is no automatic and deterministic method which would allow selection of the best parameters to a specific issue. It is a nonlinear problem, and additionally involves many computations in case of classification of the questionnaire. Therefore, we propose applying a differential evolution (DE) algorithm18 to optimize the parameters of SVM classifier. DE as a one of the evolutionary algorithms uses a population containing the vectors, which represent potential solutions. Finding the best vector means finding the best classifier parameters. It involves the following steps: initialization - a population of vectors is randomly created while keeping constraints for each parameter; mutation - for each vector is created a mutated vector, assuming that they differ from each other; recombination - a new vector is created in order to increase diversity of the population, provided that at least one parameter is derived from a mutated vector; selection - a vector formed during recombination is tested by an objective function, and the better one (new or old) is added to the new population. The algorithm stops when achieves fixed number of generations and the best matched vector is returned as an outcome. 2.1. c lassification and assessment To analyse open-ended questions using supervised methods we need to build a training set at first. Therefore, we divide our evaluation process into two stages (Preliminary model selection and Final classification procedure). In the first stage we build training set and provide classifier models which best match this problem. In the second one, we classify all open-ended questions by the classifier models selected in the first stage. Preliminary model selection. We propose a preliminary classification stage in order to select the appropriate models and pre-processing procedures. This stage involves four experiments - we denote them as experiment 1, 2, 3, 4 in the section Results. Each experiment contains the following steps: 18 R. Storn, K. Price, Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces, Journal of Global Optimization 1997, 11, pp. 341-359. 1. The experts create an initial training set DTrain with the same size (a number of answers di ) for each subcategory scl. 2. Various classification models are tested using cross-validation procedure and the best classification model is chosen for further experiments. 3. Classification of the answers which have not been yet assigned to subcategory (usually 100 answers) using the model selected in the previous step. 4. The experts verify the experiment outcomes. 5. Based on classification errors the classification models are adjusted. 6. The training set d Train is increased by classified answers (label assigned by experts), and a new experiment starts from point 2. The training set sizes for consecutive experiments were as follows: 14 for experiment 1, 24 for experiment 2, 34 for experiment 3, 43 for experiment 4. Using the above algorithm we have tested two approaches to classification problem: using one classifier in comparison to using many classifiers; model improvements, which were discussed above. There are experiments 1-4 for which details can be found in the section r esults. f inal classification procedure. After selection of an adequate classification model, we conduct classification experiments of all answers to the open-ended question by repeating the following steps: 1. A classification program randomly selects 100 new answers from di, which have been unclassified yet. 2. The best classifier among tested in the previous experiment iteration classifies the answers. 3. The experts (people) verify the classification results. 4. All classifier types carry out experiments and the best one is chosen according to the selection criteria. 5. The classification program adds the verified answers by experts to the train set, and the next iteration is performed starting from point 1. Using the above algorithm we perform the final classification and also optimize the SVM classifier parameters. There are experiments 5-17, which details can be found in the section r esults. Assessment measures. Classifiers need to be assessed on the basis of their outcomes. There are several measures that would be useful, but we should be aware of their meaning and use only the most suitable for our single label and multi-label problem. Really simple and useful are measures based on comparison of a real subcategory and classifier decisions - as a result, the following values are received: true positive (TP), false positive (FP), false negative (FN) and true negative (TN). A combination of these values gives three measures: ? precision Pr ec ? recall, called also sensitivitTyPFP TP PRreecc TPTPFP TPTPFN Pr ec Re c TP TPTPFP TP FN Re c Pr eTcP* Re c ? F-measure (or F-scFore)2, wThPichFRiNsehcarmonic mean of precision and recall Pr ec F 2 Pr ec * Re c Pr ec Re c F 2 Pr ec * Re c (8) Pr ec Re c The evaluation of the multi-label data is difficult because it can be partially correct, we use Exact Match Ratio (EM)19. This measure indicates the percentage of examples that have all their labels correctly classified. Exact Match, EM (D) 1 k k i 1 I (csi li ) where, k is test example, I is the indicator function, li is a label subcategory vector of the i-th example, sci is predicted subcategory vector. Another important issue is measuring of multi-label data, that can be represent (just like a single label data) by number of examples (n) and the number of subcategories (cs). We select three measures specific to the multi-label 19 M.S. Sorower, a l iterature s urvey on algorithms for Multi-label l earning, 2010. LCARD (D) N li problem, introduced in20. Label Cardinality (cl ard ) is standard measure, that simply take the average number of labels associated witch each example: N LC ARD (D) LC ARD (D) li N lii1 i 1 N N (10) where li is a number of subcategories in i-th example. The second one is a Label Density (ld ens ), relates to (cl ard ) and includes the size of the label space. These measure gives good idea how frequently label occurs: LDENS (D)1 L1 LC ARD (D) (11) LDENS (D) LC ARD (D) L Very often we use average values counted from many experiments. Therefore, measures presented above are denoted with prefix Avg in the section r esults. Results Model selection. Initial experiments focused on assessing two classification models. We have tested MNB single multi-label classifier in comparison to using many classifiers by the procedure one vs others. In four experiments (we denote them as 1-4) 994 answers (36,1% of all answers) were classified using preliminary model selection procedure (see section Classification and assessment). Basing on the results which are presented in Table 2 we can conclude that the individual MNB classifier gives better or similar results as the procedure one vs others. Moreover this classification model is also less complicated and easier to implement. 20 G. Tsoumakas, I. Katakis, op. cit. Comparison standard MNB classier with One vs others MNB classier After selecting the classification model we have tested three improvements of the basic classification model: misspelling correction using Polish dictionary (SJP); finding the most important words using Polish Keyword Extractor (PKE); enlarging data set (see section c lassifiers. Model improvements). In the experiments involving model improvements we used the same data set as in the previous experiments therefore we denote them also as 1-4. The results presented in the Table 3 indicate that models containing improvements could be more efficient than basic Multinomial Naive Bayes classifier. Especially spelling correction using Polish language dictionary and large data set significantly improve the quality of classification. F ,72 ,45 - ,9 ,7 2 9 4 - 23 ,9 - 93 ,774 3 4 3 4 3 t ce 0 ,1 27 ,81 - ,16 ,60 - ,51 ,19 en R 3 46 - ,3 4 2 4 3 4 m i re ec ,1 8 14 ,97 - 15 ,36 xp rP 84 43 - 15 ,41 - ,4 3 4 e 2 2 12 ,73 - 5 ,1 EM 25 ,1 - 52 ,0 - , 2 2 2 2 2 1 2 t3 ec ,91 ,7 ,9 2 ,2 6 ,9 4 ,4 ,1 ,19 ,77 en R 2 54 45 3 74 5 03 4 55 33 4 5 s t m eenm ierxp recP ,495 ,845 ,045 ,951 ,845 ,049 ,851 ,467 ,409 ,634 ,527 ,422 v e o r p m i l e d o m i r e p x e n m ithw te2n ecR ,404 ,449 ,468 ,353 ,397 ,423 ,295 ,353 ,372 ,468 ,519 ,532 m 4 i t-s1 rexp rceP ,855 ,115 ,384 ,784 ,654 ,444 ,704 ,204 ,483 ,826 ,195 ,655 n e e o i t iifac te1n ecR ,383 ,654 ,515 ,492 ,214 ,475 ,492 ,724 05 ,833 ,445 ,746 s m s i laC erxp recP ,114 ,772 ,702 ,753 ,142 ,082 ,573 ,592 ,022 ,464 ,043 ,052 e 7 ,7 ,4 8 0 ,93 0 0 M ,5 0 0 8 0 0 52 ,1 E 3 2 3 ten re ss g b la is mc s u f An o e o ee e o ee e o ee e o e e on tw trh o tw trh o tw trh o tw trh n n n f inal classification. According to above findings, for final classification we decided to use the MNB classifier but enriched by Polish language dictionary and larger data set (we call them MNB classifier with improvements). Moreover, in the new experiments we have evaluated questionnaires using SVM classifier with default parameters and also parameters selected manually in an intuitive way. Before proceeding to the final classification, the experts have improved the training set. They examined the shortest texts that may adversely affect the quality of the classifier result by adding more relevant data from original reviewer?s response. We carried out five experiments (we denote them as 5-9) by classifying in each one 100 new answers. The results presented in the Table 4 shows the best results are achieved when the classifier assigns two classes like in the case of experiments 1-4. In all cases the average recall and precision are between 49-52%, the F-score is about 50%. The average exactly match (AvgEM) is better when classifier returns only one class and is 29,08% for SVM with gaussian kernel. There is a small difference between performance of SVM and MNB classifier in this case. Classification experiments 5-9 MNB classifier with improvemnts SVM classifier (Polynomial kernel, eksponent = 1; C = 1;) SVM classifier (RBF kernel, gamma = 0.01; C = 21) One vs others MNB classifier (threshold - 0,9) Source: own. Assignet number of class one two one two one two AvgEM AvgPrec AvgRec After analysing the results of experiments 5-9 the experts decided to join two subcategories (disclosure and anonymity) into one because these subcategories were difficult to differentiate. Moreover, they suggested to increase number of answers in one experiment to 150 in order to obtain a more representative sample of data. Therefore, the next eight experiments (we denote them as 10-17) we carried out by classifying 150 new answers in each one. The other parameters were the same like in the previous experiments. The results are presented in the Table 5. There is no significant improvements but on the other hand SVM classifier with parameters selected manually achieved slightly better results. When we used SVM algorithm we achieved F-score about 1-1,5 percentage points better than MNB classifier and 6,82 percentage points better than model one vs others using MNB classifier. MNB classifier SVM classifier (Polynomial kernel, eksponent = 1; C = 1) SVM classifier (RBF kernel, gamma = 0.01; C = 21) One vs others MNB classifier (threshold - 0,9) Source: own. Classification experiments 10-17 Assignet number of class one two one two one two AvgEM AvgPrec AvgRec o ptimization of Svm parameters. In the previous experiments we have used default parameters or manually selected for the SVM classifier. We believe that it is possible to find optimal parameters, which can improve classification quality, and it can be done by using differential evolution (DE) algorithm (see section c lassifiers. s VM classifier parameter optimization). In order to find theoptimal parameters for SVM classifier we used again data from experiments 10-17. The half of experiments was carried out using thetraining set, and the half using the test set. The cost function can be presented as: cost function 100 (12) st function w10he0re Favg10 13 is an average F-score from experiments 10-13. Given the fact, there was a huge number of iterations of training set?s evaluations we decided to set small population size equal to 20 andmaximum iteration equal to 100. Other parameters ofthe DE algorithm were chosen intuitively: standard deviation (0.1), scale factor (0.9) and recombination probability (0.9). Vectors created from SVM parameters were the input data for DE algorithm. Before evaluation, it was necessary to set minimum and maximum values for all included parameters. Experiments involved comparing optimization on polynomial and RBF kernel (The best results shown on Table 6) to the primary performance. Optimization SVM parameters one two one two one two one two 26,92 20,98 28,39 22,09 27,48 20,24 29,13 21,92 73,1 59,44 74,38 59,49 73,66 59,63 77,7 62,21 Dataset statistics. Increasing popularity of multi-label classification in academic literature causes the emergence of publicly available dataset21. In order to facilitate further analysis and evaluation of this dataset we present all multi-label specific measurements that were described in Section 2.1 (Table 7). Equally important in multi-label classification is knowing the label set frequencies (Figure 1). Source: own. Fig. 1. The label distributions of dataset Source: own. 21 Our dataset can be available via email: . Discussion We have evaluated several machine learning methods to carry out an automatic classification of open-ended questions. There were presented the multi-label classifiers, which are responsible for labelling open-ended questions. In the classification experiments, we used the MNB and SVM methods and obtained the average precision of about 77% and the average recall of about 55%. At first we have tested MNB a single multi-label classifier in comparison to procedure one vs others. We concluded that the individual MNB classifier gives better or similar results as the procedure one vs others and it is less complicated. Surprisingly, one vs others model has slightly higher recall than standard classifier with assigned only one class. The experiments involving model improvements (Polish language dictionary and larger data set) achieved better results than basic Multinomial Naive Bayes classifier. In the other hand, model that using Polish Keyword Extractor algorithm is much worse in comparison to all others. The reported factors shows clear improvement after we aggregate two most likely subcategories (experts decided to aggregate two subcategories: disclosure and anonymity into one because they were often mistaken). Compared to the previous experiments in Table 4 , F-score increased by 6%. In order to find the best parameters in SVM classifier we used Differential Evaluation algorithm. After closer look at the Table 6, we noticed that there is no such a big difference between results achieved by SVM classifier with parameters selected manually and parameters selected by evaluation on DE algorithm (about 0,5% on AvgEM and AvgF). However, SVM classifier with default settings reaches much worse results than the two previously mentioned. This means that it is important to look for optimal parameters for SVM classifier and not necessarily use for that optimization methods like evolutionary algorithms. Conclusion The on-going studies on the automatic classification of open-ended texts are still in an early stage. But the desire to use the classification or analysis method of response texts of open-ended questionnaires is increasing. In this research, we conducted automatic classification of texts of an open-ended questionnaire. The results show that our best classification model (SVM classifier with parameters selected by DE algorithm) works well for multi-criteria classification and can produce questionnaire categories similar to those produced by humans. While questionnaires are inexpensive, quick, and easy to analyze, often the questionnaire produce many problems (which influenced the achieved results by automatic classifiers). The people conducting the research may never know if the respondent understood the question that was asked. Specificity of questions causes that, the information gained can be minimal. Questionnaires conducted by mail or online produce very low return rates (only 32% of our respondents answered open-ended questions). The other problem associated with return rates is that often people that who return the questionnaire are those that have a really positive or a really negative viewpoint and want their opinion to be heard. People that are most likely to be unbiased typically don?t respond because it is not worth their time. Using machine learning algorithms speeded up process of questionnaires analysis. On the other hand the experts were still needed for models improvement and tuning. In future work, we plan to proceed with theanalysis of characteristic expressions in texts of open-ended questionnaires based on these experimental results, and investigate other multi-label classification methods which can be applied to open-ended questions. The most critical problem is the estimation of number of classes (labels), which we will try to resolve by using prediction methods. Acknowledgements The authors would like to thank the consultant in the field of the classification theory Prof. Witold Pedrycz, and also Anna Plewa, S?awomir Dadas and Kinga Skolimowska for grammatical correction of the article. r eferences KLASYf IKACJA ANKIET RECENZENT? W I APLIKANT? W Streszczenie Artyku? opisuje metody wieloetykietowej klasyfikacji tekst?w z pytania otwartego ankiety przy wykorzystaniu technik uczenia maszynowego. Ma to na celu zwi?kszenie szybko?ci oraz redukcj? koszt?w analizy otwartego pytania w ankiecie. Na pocz?tku zosta?y opisane r??ne modele klasyfikator?w wieloetykietowych, za pomoc? kt?rych przyporz?dkowuje si? kategori? do tekst?w. W do?wiadczeniach wykorzystywane zosta?y klasyfikatory jednoetykietowe: Wielomianowy Naiwny Bayes (MNB) oraz Maszyna Wektor?w No?nych (SVM). Za ich pomoc? uzyskali?my ?redni? precyzj? na poziomie 77% oraz ?redni? dok?adno?? na poziomie 55%. Eksperymenty uwzgl?dnia?y wiele usprawnie? (wielko?? zbioru ucz?cego, korekt? s?ownictwa, optymalizacj? parametr?w klasyfikatora SVM przy u?yciu metod ewolucyjnych...), dzi?ki kt?rym zwi?kszyli?my skuteczno?? klasyfikacji w por?wnaniu do pierwotnego modelu. Zaproponowana metoda zosta?a u?yta do automatycznego przyporz?dkowania kategorii do tekst?w z otwartego pytania w ankiecie. 6 Y. Chen , C. Weng , Mining fuzzy association rules from questionnaire data , Knowledge-Based Systems Journal 2009 . 16 G. Tsoumakas , I. Katakis, Multi-label classifcation: a n overview , Int J Data Warehousing and Mining 2007 , 1 - 13 . 17 Procedures for review and selection of reviewers, ed . J. Protasiewicz , Vol. 2 (in Polish), Information Processing Institute 2012 . Neff B.D. , Olden J.D. , is peer review a game of chance? , BioScience 2006 , 56 ( 4 ). Liu B., w eb Data Mining: e xploring h yperlinks, c ontents and u sage Data , Springer, New York 2010 . Silva C. , Ribeiro B., o n text-based mining with active learning and background knowledge using svm , ?Journal of Soft Computing - A Fusion of Foundations, Methodologies and Applications? 2007 , 11 ( 6 ). Fragoudis D. , Meretakis D. , Likothanassis S., b est terms: a n efficient feature-selection algorithm for text categorization , ?Knowledge and Information Systems? 2005 , 8 ( 1 ). Lebart L. , Salem A. , Berry L.,e xploring Textual Data, Kluwer Academic Publishers 1998 . Tsoumakas G. , Katakis I., Multi-label classifcation: a n overview , Int J Data Warehousing and Mining 2007 . Inui H. , Murata M. , Uchimoto K. , Isahara H ., c lassiffication of open-ended questionnaires based on surface information in sentence structure , In Proceedings of the 6th NLPRS2001 2001 . Yamanishi K. , Li H. , Mining open answers in questionnaire data, ?IEEE Intelligent Systems? 2002 . Sorower M.S., a l iterature s urvey on a lgorithms for Multi-label l earning , Oregon State University 2010 . Macnab N. , Thomas G. , Quality in research and the significance of community assessment and peer review: education's idiosyncrasy , ? International Journal of Research & Method in Education? 2007 , 30 ( 3 ). Procedures for review and selection of reviewers , Vol. 1 , ed. J. Protasiewicz, Information Processing Institute , Warsaw 2012 [in Polish]. Procedures for review and selection of reviewers , Vol. 2 , ed. J. Protasiewicz, Information Processing Institute , Warsaw 2012 [in Polish]. Storn R. , Price K. , Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces , ?Journal of Global Optimization? 1997 , 11 . Hirasawa S. , Shih F. , Yang W., s tudent questionnaire analyses for class management by text mining both in japanese and in c hinese , In Proc. 2007 IEEE International Conference on System, Man and Cybernetics , 2007 . Hastie T. , Tibshirani R. , Friedman J. , The e lements of s tatistical l earning , Springer, New York 2009 . Noble W., w hat is a support vector machine? , ?Nature Biotechnology? 2006 , 24 . Chen Y. , Weng C. , Mining fuzzy association rules from questionnaire data , ?Knowledge-Based Systems Journal? 2009 . Hoare Z., l andscapes of naive bayes classifiers , ?Pattern Analysis and Application? 2008 , 11 ( 1 ).

This is a preview of a remote PDF: http://bazhum.muzhp.pl/media//files/Ekonomiczne_Problemy_Uslug/Ekonomiczne_Problemy_Uslug-r2013-t-n106/Ekonomiczne_Problemy_Uslug-r2013-t-n106-s321-343/Ekonomiczne_Problemy_Uslug-r2013-t-n106-s321-343.pdf

Agata Kopacz, Marek Kozłowski, Jarosław Protasiewicz, Tomasz Stanisławek. A classification of the questionnaire of reviewers and applicants, Ekonomiczne Problemy Usług, 2013, 321-343,