A classification of the questionnaire of reviewers and applicants
Tomasz Stanis?awek, Jaros?aw Protasiewicz, Marek Koz?owski, Agata Kopacz A?classification of the?questionnaire of reviewers and applicants
Ekonomiczne Problemy Us?ug nr
ZESZYTY NAUKOWE UNIWERSYTETU SZCZECI?SKIEGO
A questionnaire is a research instrument consisting of a series of questions
in order to gather information from respondents. Usually, aquestionnaire consists
of a number of questions that the respondent has to answer in a set format. A
questionnaire1 can be defined as a series of processes, that extract useful information
in order to solve problems, by asking people involved in the problem the same
question, collecting data as answers to the questions, and analyzing them.
Questionnaires are mainly conducted for statistical analysis of the responses.
A form of the questionnaire consists of open-ended and closed-ended
questions. A closed-ended question limits respondents with a given number
of options from which they must choose to answer the question. The response
options for a closed-ended question should be exhaustive and mutually
exclusive. An open-ended question asks the respondent to formulate his own answer.
1 H. Inui, M. Murata, K. Uchimoto, H. Isahara, c lassiffication of open-ended
questionnaires based on surface information in sentence structure, In Proceedings of the 6th
NLPRS2001, pp. 315-322, 2001.
This kind of question gives the answering person a scope of information that
seems appropriate to them. A respondent?s answer to an open-ended question
is afterwards coded into a response scale or multi-label categorized.
The open form of the questionnaire consists of one style of responding to
the questions. This open form is also called a free descriptive questionnaire,
since, in that style, the respondents freely describe answers to the prepared
questions. This format has been distinguished from the fixed-alternative,
in which answers are of a closed form2.
Questionnaire data, that consist only of closed answers is relatively easy
to handle, because they are structured. Researchers have proposed many
methods for analyzing these kinds of answers, using such multivariate analysis
techniques as cluster analysis and correspondence analysis. Questionnaire data
that includes open answers is much more difficult to analyze automatically.
At first, they are segmented (split into sequences of sentences) and tokenized
(sentences are divided into lists of words). Next, texts represented as vectors
of tokens are processed by text mining methods such as text-clustering
techniques or the self-organizing map technique. The idea here is to view each
answer as a vector of words and to use similarity measures to cluster the vectors.
Those kind of methods are effective for summarizing answers, but they are
inefficient in extracting target characteristics. Other researchers have proposed
methods for analyzing open answers on the basis of associations between
the words. The approach is based on calculating associations between word
pairs based on their co-occurrences in open answers and then visually present
the words and associations on a two-dimensional map3. In the paper4 authors
are focused on the open questions in the questionnaire and discuss the
problems encountered during the analysis of the responses to such questions, from
the viewpoint of statistical NLP. Combining statistical analyses and
information retrieval techniques in which the context of questionnaires is discussed5.
3 K. Yamanishi, H. Li., Mining open answers in questionnaire data, IEEE Intelligent
4 L. Lebart, A. Salem, L. Berry,e xploring Textual Data, Kluwer Academic Publishers 1998.
5 S. Hirasawa, F. Shih, W. Yang, s tudent questionnaire analyses for class management
by text mining both in Japanese and in c hinese, In Proc. 2007 IEEE International Conference
on System, Man and Cybernetics 2007.
Authors introduce the methods of data mining and text mining (e.g. LSI,
EM algorithms) in order to cope with questions answered by a fixed format
and those by a free format. Apart from using traditional classifiers, there are
also works focused on applying the association rules techniques to analyze
questionnaire data6. Based on fuzzy techniques they discover fuzzy
association rules from the questionnaire datasets, so that all different data types can
be handled in a uniform manner.
Answers to open-ended questions often contain valuable information.
The main problem associated with the analysis of survey data is that the
manual handling is both cumbersome and very costly, especially when it exists
in large volume. However, the analysis method for the open-ended answers
has not been established well enough, and classification based on the content
of the answers often needs manual operations. The costs of such operations
are high and the result of human judgment is a lack of objectivity. In general,
processing of answers in natural language is difficult because of the enormous
variation in linguistic expression. This problem might be solved by applying
language processing techniques, such as information extraction or automatic
Our aim was to find the best computational approaches, using machine
learning methods for the automatic classification of collected open-ended
questionnaires, in order to speed up and reduce costs of a questionnaire?s
analysis. The presented approach is based on segmentation of open answers
into words and conducting an analysis of the word, as well as in phrase levels.
We have developed a survey analysis system that works on these principles.
The proposed text mining methods provides a new way of analyzing
natural-language responses to questionnaires. Using multi-label categorization
techniques, we are able to extract semantic information about the open-ended
questions, which is complex and multi-dimensional. This paper reports the
results of our preliminary experiments, using svm, naive bayes for questionnaire
1.1. Questionnaire of reviewers and applicants
Questionnaire foundations. Information Processing Institute supports
many processes of grant funding in Poland by providing information systems.
The first information system have been developed for science funding streams
(OSF) managed by Ministry of Science and Higher Education. It has been
launched on-line in 2004, and after this success more science funding
processes have been computerized, for instance: Polish-Norwegian Research Fund
(PN FBN), Polish-Swiss Research Programme (PSPB), Innovative Economy
(PO IG). All of them are managed by Information Processing Institute. These
systems usually contain the following modules: tools for on-line proposals
preparing; tools for proposals processing used by an agency; a database and
algorithms for selecting of reviewers; on-line tool for reviews.
Almost 19k reviewers have been asked since July 2011, whether they can
prepare reviews using these systems. As a result, 132.5k requests for reviews
were sent but 20.5k of them were returned by reviewers. The vast majority
of reviews was prepared for grant programs managed by Ministry of Science
and Higher Education. The reviewer?s distribution was: 44% professors, 30%
associate professors7, 20% assistant professors8 and 7% others. Most of them
were employed at universities (67,1%), and 14,2% in research institutes,
and 18,7% in other places9.
Peer review process assumes that experts assessors are qualified and able
to perform reasonable review about any scholarly work and research project,
but in fact, the peer review is widely criticized. Neff and Olden10 maintain
that this process is open to misuse and influences on the editor and reviewer
7 In Polish: dr hab.
8 In Polish: doktor.
9 Procedures for review and selection of reviewers, ed. J. Protasiewicz, Vol. 1 (in Polish),
Information Processing Institute 2012.
10 B.D. Neff, J.D. Olden, is peer review a game of chance?, BioScience 2006, 56 (4),
integrity. For only 47% of scientists an article published in peer-reviewed
journal proves its high quality11.
Information obtained from foreign literature and desk research, were
the inspiration for conducting an anonymous online survey. The aim of this
study was to verify researchers perception of problems with peer review
process in Poland. The survey was conducted on a group which included research
staff, both reviewers (almost 20%) and applicants (45%). 35% of respondents
had experience in both areas. Most respondents were assistants professors
(43%), 28% were professors, 24% were associate professors and 5% with
unreported degree. Respondents came from different disciplines, such as medicine,
biology, economy, chemistry, physics, history, philology or computer science.
95% of the respondents had experience in Ministry of Science and Higher
Education grant programs. 18% of scientists took part in the Innovative
Economy and 17% in the National Centre for Research and Development
programs. Polish-Swiss Research Programme applies 14% of respondents
and Polish-Norwegian Research Fund 4%12.
a nswer categories and subcategories. The survey contained 14
closed-ended questions about researcher?s perception of the peer review process
in Poland, and one open-ended question which was a request for any further
comments or suggestions about the experience of the peer review process.
The questionnaire was completed by 8190 people, but theopen-ended question
was commented out only by 2615 of them (about 32%). According to the OPI
experts, 301 answers were incomplete or irrelevant. The analysis of the answers
would be time consuming and expensive. Therefore, our aim was to carry out
an automatic classification using machine learning methods. The answers have
been categorized in five categories of problems which consisted of sixteen
subcategories13 (Table 1).
11 N. Macnab, G. Thomas, Quality in research and the significance of community
assessment and peer review: education?s idiosyncrasy, International Journal of Research & Method
in Education 2007, 30(3), pp. 39-352.
12 Procedures for review and selection?
13 Categories and subcategories were identified by the OPI experts, but mainly by Agata
The categories and the subcategories of answers to the open-ended question
Problem definition. Lets consider a set of answers to the open-ended
question in the questionnaire and denote it as
Each answer di , i = 1,2,...n may contain many statements
d = [d1, d2 ,..., dn ]T
di = [si,1, si,2 ,...si,m ]T
and they can refer to various problems mentioned by responders. These
problems we defined in Table 1. Let denote a category as ca and corresponding
subcategory as scab. An answer di can belong to many categories or
subcategories. The task is to build a classifier which will be able to automatically
assign categories and subcategories to each answer di . We have divided the set
d into the training set d Train and the testing set d Test. The experts have
manually prepared the training set in a special way: all answers di in the training
set were split into statements si, j and next subcategories sc were assigned to
them. One subcategory was assigned to one statement. A statement is treated
as a set of sentences or one sentence which should contain a consistent message
in the same category.
Selected classification algorithms. Among many classification
algorithms, there are some especially important, such as Support Vector
Machines (SVM) and classifiers based on Bayes theorem: Naive Bayes (NB)
and Multinominal Naive Bayes (MNB).
Naive Bayesian classifiers are based on two assumptions. Firstly, they
consider documents as a bag of words where word position ina document does
not affect the result of classification. Secondly, they assume that probability
of word?s occurrence in a document di is independent from probability of other
word?s occurrences for the given class. Therefore, we can easily calculate
conditional probability that a sentence di combined form a bunch of words
xi,1, xi,2 ,...xi,k belongs to a class cl ? c .
P(cl | xi,1, xi,2 ,...xi,k ) ? P(cl )? P(xi,k | cl )
and finally determines to which class belongs the document
cwinner = arg max cl P(cl | xi,1, xi,2 ,...xi,k )
Although an assumption of features independence is rather untrue,
a Naive Bayes classifier works surprisingly well in practice. In this case
a distribution of each feature P(xi,k | cl ) is not defined. If we assume that
each feature has multinomial distribution, then we have Multinomial Naive
Bayes. This assumption works well, for instance, in case of text classification
where can be used in the word counts model. A bayesian classifier is learned
from a set d Train and this process involves: extracting vocabulary; computing
a prior P(cl ) ; calculating a likelihood P(xi,k | cl ) of belonging each word
xi,k to each decision class cl . These values are calculated as ratio between
a number of documents or words representing a particular class and a total
number of documents or words in class. There is possibility that a particular
word in the test set d Test , does not occur in the training set d Train , so its
likelihood will be equal to zero. Thus, due to the multiplication of the
probabilities, an entire reviewer?s answer will not be properly classified. There
are several ways to solve this problem. The most frequent solution is to use
Laplace smoothing or determination the likelihood of low value correlated to
all other probabilities14.
Support Vector Machines was firstly presented in 1995 by Valdimir
Vapnik. SVM uses a principle of structural risk minimization. The main idea
of algorithm is to find such decision boundary which can separate classes
usually a positive one and a negative one. Regarding the classification problem
there are distinguished linear and nonlinear cases. The SVM classifiers
consider a document or a sentence as a bag of words x similarly to Naive Bayes.
In the linear case the classes are separated by a hyperplane:
w * x ? b = 0
where the weights w are selected during teaching process using the train
set d Train and quadratic programming. Nonlinear cases are solved by using
soft margin methods which allows some errors or by using a kernel function
such as multinomial, gaussian or hyperbolic tangent15.
m ulti-class and multi-label classification. Typically a bayesian classifier
assigns only one class with the highest probability while testing a particular
answer (eq. 4). But as we mentioned previously, an answer di , i = 1,2,...n to
an open-ended question can belong to many subcategories, which we denote
as the classes cl , l = 1,...l . Therefore, this case contains either multi-class
and multi-label problems, because the data set contains many classes (categories
and subcategories - see Table 1). and the answers are assigned to many classes
(labels). We can solve this issue in two ways. The first approach assumes that it
is possible to use only one classifier in the manner of multi-label classification.
The classifier e.g. Multinomial Naive Bayes produces as an output a vector
of probabilities - one value for each class (eq. 3). The classes with the highest
14 D. Fragoudis, D. Meretakis, S. Likothanassis, b est terms: a n efficient feature-selection
algorithm for text categorization, Knowledge and Information Systems 2005, 8 (1), pp. 16-33;
T. Hastie, R. Tibshirani, J. Friedman, The e lements of s tatistical l earning, Springer,
New York 2009; Z. Hoare, l andscapes of naive bayes classifiers, Pattern Analysis
and Application 2008, 11 (1), pp. 59-72.
15 B. Liu, w eb Data Mining: e xploring h yperlinks, c ontents and u sage Data, Springer,
New York, 2010; W. Noble, w hat is a support vector machine?, ?Nature Biotechnology?
2006, No. 24, pp. 1565-1567; C. Silva, B. Ribeiro, o n text-based mining with active learning
and background knowledge using svm, Journal of Soft Computing - A Fusion of Foundations,
Methodologies and Applications 2007, 11(
), pp. 519-530.
probability are taken as an outcome, but someone must decide how many
classes should be taken into account. The second approach is using the procedure
called one vs others. This procedure implies the use l - 1 classifiers to solve
the multi-class problem. Each classifier e.g Multinomial Naive Bayes is trained
in a binary manner to recognize one class and all others. In classification stage
all classifiers verify a new example and finally many classes can be assigned
to it. There can be a situation when all classifiers choose class ?others?
and the tested example will be unclassified, or on the other hand too many
classes will be assigned. In order to avoid the over classification someone has
to experimentally choose a probability threshold of belonging to the class16.
m odel improvements. Before classification, the texts are pre-processed
what involves: lemmatization, removing stopwords, determination the validity
of the words, using TF-IDF (term frequency - inverse document frequency).
The classifiers are trained using TF-IDF values of words from pre-processed
sentences. We call it a basic form of our classification model. It is easy to notice
that the quality of classifiers depends on the quality of texts pre-processing.
We propose three improvements of the basic classification model. Firstly,
the answers to open-ended question contain many misspellings what can
interfere the lemmatization process. They can be corrected by an electronic
vocabulary set. In case of the questionnaire it could be the Polish dictionary,
for instance http://www.sjp.pl. Secondly, we deal with the texts in Polish,
and we know that the Polish language has different grammar than English,
so it needs special algorithms in order to properly extract keywords. We have
developed the algorithm - Polish Keyword Extractor17, which is based on Rapid
Automatic Keyword Extraction (RAKE) and KEA. Finally, we should note
that effectiveness of classification models depend on the quality of a training
set and especially often on its size. The experts have agreed that the answers
containing up to 220 words (about one or two sentences) should be classified
in only one subcategory.
Svm classifier parameter optimization. The parameters choice for
SVM classifier is a nontrivial and laborious task, because there is no automatic
and deterministic method which would allow selection of the best
parameters to a specific issue. It is a nonlinear problem, and additionally involves
many computations in case of classification of the questionnaire. Therefore,
we propose applying a differential evolution (DE) algorithm18 to optimize
the parameters of SVM classifier. DE as a one of the evolutionary algorithms
uses a population containing the vectors, which represent potential solutions.
Finding the best vector means finding the best classifier parameters. It involves
the following steps: initialization - a population of vectors is randomly created
while keeping constraints for each parameter; mutation - for each vector is
created a mutated vector, assuming that they differ from each other;
recombination - a new vector is created in order to increase diversity of the population,
provided that at least one parameter is derived from a mutated vector; selection
- a vector formed during recombination is tested by an objective function,
and the better one (new or old) is added to the new population. The algorithm
stops when achieves fixed number of generations and the best matched vector
is returned as an outcome.
2.1. c lassification and assessment
To analyse open-ended questions using supervised methods we need to
build a training set at first. Therefore, we divide our evaluation process into
two stages (Preliminary model selection and Final classification procedure).
In the first stage we build training set and provide classifier models which best
match this problem. In the second one, we classify all open-ended questions
by the classifier models selected in the first stage.
Preliminary model selection. We propose a preliminary classification
stage in order to select the appropriate models and pre-processing procedures.
This stage involves four experiments - we denote them as experiment 1, 2, 3,
4 in the section Results. Each experiment contains the following steps:
18 R. Storn, K. Price, Differential evolution - a simple and efficient heuristic for global
optimization over continuous spaces, Journal of Global Optimization 1997, 11, pp. 341-359.
1. The experts create an initial training set DTrain with the same size
(a number of answers di ) for each subcategory scl.
2. Various classification models are tested using cross-validation
procedure and the best classification model is chosen for further experiments.
3. Classification of the answers which have not been yet assigned to
subcategory (usually 100 answers) using the model selected in the previous
4. The experts verify the experiment outcomes.
5. Based on classification errors the classification models are adjusted.
6. The training set d Train is increased by classified answers (label
assigned by experts), and a new experiment starts from point 2.
The training set sizes for consecutive experiments were as follows: 14 for
experiment 1, 24 for experiment 2, 34 for experiment 3, 43 for experiment 4.
Using the above algorithm we have tested two approaches to classification
problem: using one classifier in comparison to using many classifiers; model
improvements, which were discussed above. There are experiments 1-4 for
which details can be found in the section r esults.
f inal classification procedure. After selection of an adequate
classification model, we conduct classification experiments of all answers to the
open-ended question by repeating the following steps:
1. A classification program randomly selects 100 new answers from di,
which have been unclassified yet.
2. The best classifier among tested in the previous experiment iteration
classifies the answers.
3. The experts (people) verify the classification results.
4. All classifier types carry out experiments and the best one is chosen
according to the selection criteria.
5. The classification program adds the verified answers by experts to
the train set, and the next iteration is performed starting from point 1.
Using the above algorithm we perform the final classification and also
optimize the SVM classifier parameters. There are experiments 5-17, which
details can be found in the section r esults.
Assessment measures. Classifiers need to be assessed on the basis
of their outcomes. There are several measures that would be useful, but we
should be aware of their meaning and use only the most suitable for our single
label and multi-label problem. Really simple and useful are measures based
on comparison of a real subcategory and classifier decisions - as a result,
the following values are received: true positive (TP), false positive (FP), false
negative (FN) and true negative (TN). A combination of these values gives
? recall, called also sensitivitTyPFP
Re c Pr eTcP* Re c
? F-measure (or F-scFore)2, wThPichFRiNsehcarmonic mean of precision and recall
F 2 Pr ec * Re c
Pr ec Re c
F 2 Pr ec * Re c (8)
Pr ec Re c
The evaluation of the multi-label data is difficult because it can be
partially correct, we use Exact Match Ratio (EM)19. This measure indicates
the percentage of examples that have all their labels correctly classified.
Exact Match, EM (D)
k i 1
I (csi li )
where, k is test example, I is the indicator function, li is a label subcategory
vector of the i-th example, sci is predicted subcategory vector.
Another important issue is measuring of multi-label data, that can be
represent (just like a single label data) by number of examples (n) and the
number of subcategories (cs). We select three measures specific to the multi-label
19 M.S. Sorower, a l iterature s urvey on algorithms for Multi-label l earning, 2010.
problem, introduced in20. Label Cardinality (cl ard ) is standard measure, that
simply take the average number of labels associated witch each example:
LC ARD (D)
LC ARD (D)
i 1 N
where li is a number of subcategories in i-th example.
The second one is a Label Density (ld ens ), relates to (cl ard ) and includes
the size of the label space. These measure gives good idea how frequently
LDENS (D)1 L1 LC ARD (D) (11)
LDENS (D) LC ARD (D)
Very often we use average values counted from many experiments.
Therefore, measures presented above are denoted with prefix Avg in the
section r esults.
Model selection. Initial experiments focused on assessing two
classification models. We have tested MNB single multi-label classifier in comparison
to using many classifiers by the procedure one vs others. In four experiments
(we denote them as 1-4) 994 answers (36,1% of all answers) were classified
using preliminary model selection procedure (see section Classification
and assessment). Basing on the results which are presented in Table 2 we can
conclude that the individual MNB classifier gives better or similar results
as the procedure one vs others. Moreover this classification model is also less
complicated and easier to implement.
20 G. Tsoumakas, I. Katakis, op. cit.
Comparison standard MNB classier with One vs others MNB classier
After selecting the classification model we have tested three
improvements of the basic classification model: misspelling correction using Polish
dictionary (SJP); finding the most important words using Polish Keyword
Extractor (PKE); enlarging data set (see section c lassifiers. Model
improvements). In the experiments involving model improvements we used the same
data set as in the previous experiments therefore we denote them also as 1-4.
The results presented in the Table 3 indicate that models containing
improvements could be more efficient than basic Multinomial Naive Bayes classifier.
Especially spelling correction using Polish language dictionary and large data
set significantly improve the quality of classification.
F ,72 ,45 - ,9 ,7
9 4 - 23 ,9 - 93 ,774
3 4 3 4 3
t ce 0 ,1 27 ,81 - ,16 ,60 - ,51 ,19
en R 3 46 - ,3 4 2 4 3 4
re ec ,1 8 14 ,97 - 15 ,36
xp rP 84 43 - 15 ,41 - ,4 3 4
2 2 12 ,73 - 5 ,1
EM 25 ,1 - 52 ,0 - , 2 2
2 2 2 1 2
t3 ec ,91 ,7 ,9 2 ,2 6 ,9 4 ,4 ,1 ,19 ,77
en R 2 54 45 3 74 5 03 4 55 33 4 5
eenm ierxp recP ,495 ,845 ,045 ,951 ,845 ,049 ,851 ,467 ,409 ,634 ,527 ,422
ithw te2n ecR ,404 ,449 ,468 ,353 ,397 ,423 ,295 ,353 ,372 ,468 ,519 ,532
t-s1 rexp rceP ,855 ,115 ,384 ,784 ,654 ,444 ,704 ,204 ,483 ,826 ,195 ,655
iifac te1n ecR ,383 ,654 ,515 ,492 ,214 ,475 ,492 ,724 05 ,833 ,445 ,746
laC erxp recP ,114 ,772 ,702 ,753 ,142 ,082 ,573 ,592 ,022 ,464 ,043 ,052
7 ,7 ,4 8 0 ,93 0 0
M ,5 0 0 8 0 0 52 ,1
E 3 2
ten re ss
g b la
s u f
e o ee e o ee e o ee e o e
on tw trh o tw trh o tw trh o tw trh
n n n
f inal classification. According to above findings, for final classification
we decided to use the MNB classifier but enriched by Polish language
dictionary and larger data set (we call them MNB classifier with improvements).
Moreover, in the new experiments we have evaluated questionnaires using
SVM classifier with default parameters and also parameters selected manually
in an intuitive way. Before proceeding to the final classification, the experts
have improved the training set. They examined the shortest texts that may
adversely affect the quality of the classifier result by adding more relevant data
from original reviewer?s response. We carried out five experiments (we denote
them as 5-9) by classifying in each one 100 new answers. The results presented
in the Table 4 shows the best results are achieved when the classifier assigns
two classes like in the case of experiments 1-4. In all cases the average recall
and precision are between 49-52%, the F-score is about 50%. The average
exactly match (AvgEM) is better when classifier returns only one class and is
29,08% for SVM with gaussian kernel. There is a small difference between
performance of SVM and MNB classifier in this case.
Classification experiments 5-9
MNB classifier with improvemnts
SVM classifier (Polynomial kernel, eksponent = 1; C = 1;) SVM classifier (RBF kernel, gamma = 0.01; C = 21)
One vs others MNB classifier
(threshold - 0,9)
After analysing the results of experiments 5-9 the experts decided to join
two subcategories (disclosure and anonymity) into one because these
subcategories were difficult to differentiate. Moreover, they suggested to increase
number of answers in one experiment to 150 in order to obtain a more
representative sample of data. Therefore, the next eight experiments (we denote
them as 10-17) we carried out by classifying 150 new answers in each one.
The other parameters were the same like in the previous experiments. The
results are presented in the Table 5. There is no significant improvements but
on the other hand SVM classifier with parameters selected manually achieved
slightly better results. When we used SVM algorithm we achieved F-score
about 1-1,5 percentage points better than MNB classifier and 6,82 percentage
points better than model one vs others using MNB classifier.
SVM classifier (Polynomial kernel, eksponent = 1; C = 1)
SVM classifier (RBF kernel, gamma = 0.01; C = 21)
One vs others MNB classifier (threshold - 0,9)
Classification experiments 10-17
number of class
o ptimization of Svm parameters. In the previous experiments we
have used default parameters or manually selected for the SVM classifier.
We believe that it is possible to find optimal parameters, which can improve
classification quality, and it can be done by using differential evolution (DE)
algorithm (see section c lassifiers. s VM classifier parameter optimization).
In order to find theoptimal parameters for SVM classifier we used again data
from experiments 10-17. The half of experiments was carried out using thetraining
set, and the half using the test set. The cost function can be presented as:
w10he0re Favg10 13 is an average F-score from experiments 10-13.
Given the fact, there was a huge number of iterations of training set?s
evaluations we decided to set small population size equal to 20 andmaximum iteration
equal to 100. Other parameters ofthe DE algorithm were chosen intuitively:
standard deviation (0.1), scale factor (0.9) and recombination probability (0.9). Vectors
created from SVM parameters were the input data for DE algorithm. Before
evaluation, it was necessary to set minimum and maximum values for all
included parameters. Experiments involved comparing optimization on polynomial
and RBF kernel (The best results shown on Table 6) to the primary performance.
Optimization SVM parameters
Dataset statistics. Increasing popularity of multi-label classification
in academic literature causes the emergence of publicly available dataset21.
In order to facilitate further analysis and evaluation of this dataset we present all
multi-label specific measurements that were described in Section 2.1 (Table 7).
Equally important in multi-label classification is knowing the label set
frequencies (Figure 1).
Fig. 1. The label distributions of dataset
21 Our dataset can be available via email: .
We have evaluated several machine learning methods to carry out an
automatic classification of open-ended questions. There were presented
the multi-label classifiers, which are responsible for labelling open-ended
questions. In the classification experiments, we used the MNB and SVM
methods and obtained the average precision of about 77% and the average
recall of about 55%.
At first we have tested MNB a single multi-label classifier in comparison
to procedure one vs others. We concluded that the individual MNB classifier
gives better or similar results as the procedure one vs others and it is less
complicated. Surprisingly, one vs others model has slightly higher recall than
standard classifier with assigned only one class.
The experiments involving model improvements (Polish language
dictionary and larger data set) achieved better results than basic Multinomial Naive
Bayes classifier. In the other hand, model that using Polish Keyword Extractor
algorithm is much worse in comparison to all others.
The reported factors shows clear improvement after we aggregate two
most likely subcategories (experts decided to aggregate two subcategories:
disclosure and anonymity into one because they were often mistaken).
Compared to the previous experiments in Table 4 , F-score increased by 6%.
In order to find the best parameters in SVM classifier we used Differential
Evaluation algorithm. After closer look at the Table 6, we noticed that there
is no such a big difference between results achieved by SVM classifier with
parameters selected manually and parameters selected by evaluation on DE
algorithm (about 0,5% on AvgEM and AvgF). However, SVM classifier with
default settings reaches much worse results than the two previously mentioned.
This means that it is important to look for optimal parameters for SVM
classifier and not necessarily use for that optimization methods like evolutionary
The on-going studies on the automatic classification of open-ended texts
are still in an early stage. But the desire to use the classification or analysis
method of response texts of open-ended questionnaires is increasing. In this
research, we conducted automatic classification of texts of an open-ended
questionnaire. The results show that our best classification model (SVM
classifier with parameters selected by DE algorithm) works well for
multi-criteria classification and can produce questionnaire categories similar to
those produced by humans.
While questionnaires are inexpensive, quick, and easy to analyze, often
the questionnaire produce many problems (which influenced the achieved
results by automatic classifiers). The people conducting the research may never
know if the respondent understood the question that was asked. Specificity
of questions causes that, the information gained can be minimal. Questionnaires
conducted by mail or online produce very low return rates (only 32% of our
respondents answered open-ended questions). The other problem associated
with return rates is that often people that who return the questionnaire are
those that have a really positive or a really negative viewpoint and want their
opinion to be heard. People that are most likely to be unbiased typically don?t
respond because it is not worth their time.
Using machine learning algorithms speeded up process of questionnaires
analysis. On the other hand the experts were still needed for models improvement
and tuning. In future work, we plan to proceed with theanalysis of characteristic
expressions in texts of open-ended questionnaires based on these experimental
results, and investigate other multi-label classification methods which can be applied
to open-ended questions. The most critical problem is the estimation of number
of classes (labels), which we will try to resolve by using prediction methods.
The authors would like to thank the consultant in the field of the
classification theory Prof. Witold Pedrycz, and also Anna Plewa, S?awomir Dadas
and Kinga Skolimowska for grammatical correction of the article.
KLASYf IKACJA ANKIET RECENZENT? W I APLIKANT? W
Artyku? opisuje metody wieloetykietowej klasyfikacji tekst?w z pytania
otwartego ankiety przy wykorzystaniu technik uczenia maszynowego. Ma to na celu
zwi?kszenie szybko?ci oraz redukcj? koszt?w analizy otwartego pytania w ankiecie.
Na pocz?tku zosta?y opisane r??ne modele klasyfikator?w wieloetykietowych, za
pomoc? kt?rych przyporz?dkowuje si? kategori? do tekst?w. W do?wiadczeniach
wykorzystywane zosta?y klasyfikatory jednoetykietowe: Wielomianowy Naiwny Bayes
(MNB) oraz Maszyna Wektor?w No?nych (SVM). Za ich pomoc? uzyskali?my ?redni?
precyzj? na poziomie 77% oraz ?redni? dok?adno?? na poziomie 55%. Eksperymenty
uwzgl?dnia?y wiele usprawnie? (wielko?? zbioru ucz?cego, korekt? s?ownictwa,
optymalizacj? parametr?w klasyfikatora SVM przy u?yciu metod ewolucyjnych...),
dzi?ki kt?rym zwi?kszyli?my skuteczno?? klasyfikacji w por?wnaniu do pierwotnego
modelu. Zaproponowana metoda zosta?a u?yta do automatycznego przyporz?dkowania
kategorii do tekst?w z otwartego pytania w ankiecie.
6 Y. Chen , C. Weng , Mining fuzzy association rules from questionnaire data , Knowledge-Based Systems Journal 2009 .
16 G. Tsoumakas , I. Katakis, Multi-label classifcation: a n overview , Int J Data Warehousing and Mining 2007 , 1 - 13 .
17 Procedures for review and selection of reviewers, ed . J. Protasiewicz , Vol. 2 (in Polish), Information Processing Institute 2012 .
Neff B.D. , Olden J.D. , is peer review a game of chance? , BioScience 2006 , 56 ( 4 ).
Liu B., w eb Data Mining: e xploring h yperlinks, c ontents and u sage Data , Springer, New York 2010 .
Silva C. , Ribeiro B., o n text-based mining with active learning and background knowledge using svm , ?Journal of Soft Computing - A Fusion of Foundations, Methodologies and Applications? 2007 , 11 ( 6 ).
Fragoudis D. , Meretakis D. , Likothanassis S., b est terms: a n efficient feature-selection algorithm for text categorization , ?Knowledge and Information Systems? 2005 , 8 ( 1 ).
Lebart L. , Salem A. , Berry L.,e xploring Textual Data, Kluwer Academic Publishers 1998 .
Tsoumakas G. , Katakis I., Multi-label classifcation: a n overview , Int J Data Warehousing and Mining 2007 .
Inui H. , Murata M. , Uchimoto K. , Isahara H ., c lassiffication of open-ended questionnaires based on surface information in sentence structure , In Proceedings of the 6th NLPRS2001 2001 .
Yamanishi K. , Li H. , Mining open answers in questionnaire data, ?IEEE Intelligent Systems? 2002 .
Sorower M.S., a l iterature s urvey on a lgorithms for Multi-label l earning , Oregon State University 2010 .
Macnab N. , Thomas G. , Quality in research and the significance of community assessment and peer review: education's idiosyncrasy , ? International Journal of Research & Method in Education? 2007 , 30 ( 3 ).
Procedures for review and selection of reviewers , Vol. 1 , ed. J. Protasiewicz, Information Processing Institute , Warsaw 2012 [in Polish].
Procedures for review and selection of reviewers , Vol. 2 , ed. J. Protasiewicz, Information Processing Institute , Warsaw 2012 [in Polish].
Storn R. , Price K. , Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces , ?Journal of Global Optimization? 1997 , 11 .
Hirasawa S. , Shih F. , Yang W., s tudent questionnaire analyses for class management by text mining both in japanese and in c hinese , In Proc. 2007 IEEE International Conference on System, Man and Cybernetics , 2007 .
Hastie T. , Tibshirani R. , Friedman J. , The e lements of s tatistical l earning , Springer, New York 2009 .
Noble W., w hat is a support vector machine? , ?Nature Biotechnology? 2006 , 24 .
Chen Y. , Weng C. , Mining fuzzy association rules from questionnaire data , ?Knowledge-Based Systems Journal? 2009 .
Hoare Z., l andscapes of naive bayes classifiers , ?Pattern Analysis and Application? 2008 , 11 ( 1 ).