Visualization of medical rule-based knowledge bases (pdf)

Article PDF cannot be displayed. You can download it here:

http://yadda.icm.edu.pl/yadda/element/bwmeta1.element.baztech-495f38b8-3be1-4974-9aeb-4d755d3daa1e/c/2015_v24_NowakBrzezinska.pdf

Visualization of medical rule-based knowledge bases

JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 24/2015, ISSN 1642-6037 data mining, medical knowledge bases, cluster visualization, hierarchical clustering, treemaps Agnieszka NOWAK-BRZEZIŃSKA1 , Tomasz RYBOTYCKI1 VISUALIZATION OF MEDICAL RULE-BASED KNOWLEDGE BASES In this work the topic of applying clustering as a knowledge extraction method from real-world data is discussed. The authors propose hierarchical clustering method and visualization technique for knowledge base representation in the context of medical knowledge bases for which data mining techniques are successfully employed and may resolve different problems. What is more, the authors analyze the impact of different clustering parameters on the result of searching through such a structure. Particular attention was also given to the problem of cluster visualization. Authors review selected, two-dimensional approaches, stating their advantages and drawbacks in the context of representing complex cluster structures. 1. INTRODUCTION In the domain of Decision Support Systems and Data Mining, last decade brought along a significant development of new algorithms, tools and applications. The knowledge bases (KB) are constantly increasing in volume, thus the knowledge stored as a set of rules or patterns is getting progressively more complex and much harder to interpret or analyze. Recent advances in the field of artificial intelligence have led to the emergence of expert systems, computational tools designed to capture and make available the knowledge of domain experts. The number of medical expert systems is growing and thanks to progress in key areas such as knowledge acquisition, model-based reasoning and system integration for clinical environments their efficiency is getting better everyday. It is essential for physicians to understand the current state of such research as well as remaining theoretical and logistic barriers before full potential of these systems can be used and new patterns can be discovered. Among many other methods, doctors can use the visualization and analysis of medical data for the purpose of extracting a new and potentially hidden knowledge - common and unusual. The extraction and discovery of knowledge hidden in the data have become particularly important in recent years, especially when taking into consideration the constantly growing amount of information stored in databases and data warehouses. The data is collected because it can potentially be the source of previously unknown and useful correlations, anomalies and trends [4]. However, the discovered patterns denominated in the form of an analytical model, may possess a complicated structure, which hinder the further analysis process. But not only does the excessive amount of available information affect the difficulty of research. A more important factor is their complicated structure, both in terms of high dimensionality, as well as used data types. In this 1 Institute of Computer Science, University of Silesia, 39 Bedzińska Str., 41-200 Sosnowiec, Poland CLASSIFICATION paper a specific type of knowledge representation, like rules (denoted as Horn’s clauses) is considered. Unfortunately, if we use — possibly different — tools for automatic acquisition and/or extraction of rules, the number of them grows rapidly. For modern problems, KB can count up to hundreds or thousands of rules. For such KBs, the number of possible inference paths is enormous. In such cases knowledge engineer can not be totally aware that all possible rule interactions are legal and lead to expected results. The big size of KB causes problems with inference efficiency and interpretation of inference results. Even for domain expert it is difficult to analyze the presented knowledge if the number of elements to analyze is too big. In such cases clustering rules and visualizing resultant structure can be helpful. That is why the authors propose a method of reorganization of the KB from a set of not related rules to groups of similar rules (using cluster analysis methods). Besides the information about the rules in each cluster the visualization of clusters is generated. Such a representation of a KB, especially in specific areas (like medicine), seems to be very helpful for expert in exploring the given domain. The paper consists of 6 sections. In Section 1 the general information about the authors scientific goals’ motivation is presented. The description of the cluster analysis idea for rules in KB is included in Section 2. The following section presents the methods of visualization of a hierarchical data structure. Section 4 contains the description of the software created by authors in order to achieve grouping and graphical representation of data. The experiments with the analysis of their results are considered in section 5. Section 6 contains the summary. 2. HIERARCHICAL CLUSTERING ALGORITHM Hierarchical clustering (or hierarchical cluster analysis) is one of many methods of cluster analysis. It seeks to build a hierarchical structure of clusters. Most basic hierarchical clustering algorithms merge (or divide) only two (one) clusters during one iteration step and because of that the resultant structure of the algorithm is tree-like. There are two types of hierarchical clustering algorithms: - agglomerative hierarchical clustering algorithms or AGNES (from agglomerative nesting), - divisive hierarchical clustering algorithms or DIANA (from divisive analysis). In divisive hierarchical clustering algorithms, at the beginning, all objects are members of one default group. During every iteration step this basic group is divided into smaller groups until the stop condition is met. These methods are used less often than agglomerative methods, because finding an effective way to divide cluster is a nontrivial task [6]. Agglomerative hierarchical clustering (AHC) algorithms presents different approach. During their each iteration step clusters are merged with other clusters. At the beginning each object is considered a cluster itself (or one may say that each object is placed within a cluster that consists only of that object). It can be said that these two types are reverse of one another [5]. In this paper following version of classic (basic) agglomerative hierarchical clustering algorithm [6] was used. 1) Place each object in separate cluster. 2) Build similarity matrix for every cluster pair. 3) Using similarity matrix find most similar pair of clusters and merge them. 4) Update similarity matrix. 5) If stop condition was met end the procedure. 6) Else repeat from step 3. 7) Return structure built this way. One of the greatest advantages of these kinds of algorithms is that they are independent of how similarity of object is described. There are many methods of specifying resemblance (or 92 CLASSIFICATION distance) of objects of different types [6]. In some cases complex objects consists of numerical and symbolic data are analyzed and it’s im (...truncated)