Knowledge and Information Systems

http://link.springer.com/journal/10115

List of Papers (Total 38)

Efficient approaches for multi-agent planning

Multi-agent planning (MAP) deals with planning systems that reason on long-term goals by multiple collaborative agents which want to maintain privacy on their knowledge. Recently, new MAP techniques have been devised to provide efficient solutions. Most approaches expand distributed searches using modified planners, where agents exchange public information. They present two...

Origo: causal inference by compression

Causal inference from observational data is one of the most fundamental problems in science. In general, the task is to tell whether it is more likely that \(X\) caused \(Y\), or vice versa, given only data over their joint distribution. In this paper we propose a general inference framework based on Kolmogorov complexity, as well as a practical and computable instantiation based...

Concurrent bilateral negotiation for open e-markets: the Conan strategy

We develop a novel strategy that supports software agents to make decisions on how to negotiate for a resource in open and dynamic e-markets. Although existing negotiation strategies offer a number of sophisticated features, including modelling an opponent and negotiating with many opponents simultaneously, they abstract away from the dynamicity of the market and the model that...

Activity qualifiers using an argument-based construction

Based on an argumentation theory approach, we present a novel method for evaluating complex goal-based activities by generalizing a notion of qualifier defined in the health domain. Three instances of the general qualifier are proposed: Performance, Actuation and Capacity; the first one evaluates what a person does, the second how an individual follows an action plan, and the...

The BigGrams: the semi-supervised information extraction system from HTML: an improvement in the wrapper induction

The aim of this study is to propose an information extraction system, called BigGrams, which is able to retrieve relevant and structural information (relevant phrases, keywords) from semi-structural web pages, i.e. HTML documents. For this purpose, a novel semi-supervised wrappers induction algorithm has been developed and embedded in the BigGrams system. The wrappers induction...

Discovering topic structures of a temporally evolving document corpus

In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of...

Event stream-based process discovery using abstract representations

The aim of process discovery, originating from the area of process mining, is to discover a process model based on business process execution data. A majority of process discovery techniques relies on an event log as an input. An event log is a static source of historical data capturing the execution of a business process. In this paper, we focus on process discovery relying on...

\(L_p\) -Support vector machines for uplift modeling

Uplift modeling is a branch of machine learning which aims to predict not the class itself, but the difference between the class variable behavior in two groups: treatment and control. Objects in the treatment group have been subjected to some action, while objects in the control group have not. By including the control group, it is possible to build a model which predicts the...

Prequential AUC: properties of the area under the ROC curve for data streams with concept drift

Modern data-driven systems often require classifiers capable of dealing with streaming imbalanced data and concept changes. The assessment of learning algorithms in such scenarios is still a challenge, as existing online evaluation measures focus on efficiency, but are susceptible to class ratio changes over time. In case of static data, the area under the receiver operating...

Random indexing of multidimensional data

Random indexing (RI) is a lightweight dimension reduction method, which is used, for example, to approximate vector semantic relationships in online natural language processing systems. Here we generalise RI to multidimensional arrays and therefore enable approximation of higher-order statistical relationships in data. The generalised method is a sparse implementation of random...

Context-dependent combination of sensor information in Dempster–Shafer theory for BDI

There has been much interest in the belief–desire–intention (BDI) agent-based model for developing scalable intelligent systems, e.g. using the AgentSpeak framework. However, reasoning from sensor information in these large-scale systems remains a significant challenge. For example, agents may be faced with information from heterogeneous sources which is uncertain and incomplete...

Partial materialization for online analytical processing over multi-tagged document collections

The New York Times Annotated Corpus, the ACM Digital Library, and PubMed are three prototypical examples of document collections in which each document is tagged with keywords or phrases. Such collections can be viewed as high-dimensional document cubes against which browsers and search systems can be applied in a manner similar to online analytical processing against data cubes...

Clustering XML documents by patterns

Now that the use of XML is prevalent, methods for mining semi-structured documents have become even more important. In particular, one of the areas that could greatly benefit from in-depth analysis of XML’s semi-structured nature is cluster analysis. Most of the XML clustering approaches developed so far employ pairwise similarity measures. In this paper, we study clustering...

Model-based probabilistic frequent itemset mining

Data uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems, and data integration. To handle a large amount of imprecise information, uncertain databases have been recently developed. In this paper, we study how to efficiently discover frequent itemsets from large uncertain databases, interpreted under the Possible World...

Decision trees for uplift modeling with single and multiple treatments

Most classification approaches aim at achieving high prediction accuracy on a given dataset. However, in most practical cases, some action such as mailing an offer or treating a patient is to be taken on the classified objects, and we should model not the class probabilities themselves, but instead, the change in class probabilities caused by the action. The action should then be...

Decision rules extraction from data stream in the presence of changing context for diabetes treatment

The knowledge extraction is an important element of the e-Health system. In this paper, we introduce a new method for decision rules extraction called Graph-based Rules Inducer to support the medical interview in the diabetes treatment. The emphasis is put on the capability of hidden context change tracking. The context is understood as a set of all factors affecting patient...

Correcting evaluation bias of relational classifiers with network cross validation

Recently, a number of modeling techniques have been developed for data mining and machine learning in relational and network domains where the instances are not independent and identically distributed (i.i.d.). These methods specifically exploit the statistical dependencies among instances in order to improve classification accuracy. However, there has been little focus on how...

Data preprocessing techniques for classification without discrimination

Recently, the following Discrimination-Aware Classification Problem was introduced: Suppose we are given training data that exhibit unlawful discrimination; e.g., toward sensitive attributes such as gender or ethnicity. The task is to learn a classifier that optimizes accuracy, but does not have this discrimination in its predictions on test data. This problem is relevant in many...

A hybrid decision tree training method using data streams

Classical classification methods usually assume that pattern recognition models do not depend on the timing of the data. However, this assumption is not valid in cases where new data frequently become available. Such situations are common in practice, for example, spam filtering or fraud detection, where dependencies between feature values and class numbers are continually...

Product selection for promotion planning

This paper addresses a very important question—how to select the right products to promote in order to maximize promotional benefit. We set up a framework to incorporate promotion decisions into the data-mining process, formulate the profit maximization problem as an optimization problem, and propose a heuristic search solution to discover the right products to promote. Moreover...