We investigate the learning of quantitative structure activity relationships (QSARs) as a case-study of meta-learning. This application area is of the highest societal importance, as it is a key step in the development of new medicines. The standard QSAR learning problem is: given a target (usually a protein) and a set of chemical compounds (small molecules) with associated ...

Ensembles of classifiers are among the best performing classifiers available in many data mining applications, including the mining of data streams. Rather than training one classifier, multiple classifiers are trained, and their predictions are combined according to a given voting schedule. An important prerequisite for ensembles to be successful is that the individual models are ...

This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action selection. Therefore, computational emotion models are usually grounded in the agent’s ...

What is the simplest thing you can do to solve a problem? In the context of semi-supervised feature selection, we tackle exactly this—how much we can gain from two simple classifier-independent strategies. If we have some binary labelled data and some unlabelled, we could assume the unlabelled data are all positives, or assume them all negatives. These minimalist, seemingly naive, ...

Identifying context-specific entity networks from aggregated data is an important task, arising often in bioinformatics and neuroimaging applications. Computationally, this task can be formulated as jointly estimating multiple different, but related, sparse undirected graphical models (UGM) from aggregated samples across several contexts. Previous joint-UGM studies have mostly ...

For semi-supervised techniques to be applied safely in practice we at least want methods to outperform their supervised counterparts. We study this question for classification using the well-known quadratic surrogate loss function. Unlike other approaches to semi-supervised learning, the procedure proposed in this work does not rely on assumptions that are not intrinsic to the ...

Additive composition (Foltz et al. in Discourse Process 15:285–307, 1998; Landauer and Dumais in Psychol Rev 104(2):211, 1997; Mitchell and Lapata in Cognit Sci 34(8):1388–1429, 2010) is a widely used method for computing meanings of phrases, which takes the average of vector representations of the constituent words. In this article, we prove an upper bound for the bias of additive ...

Expressive interpretation forms an important but complex aspect of music, particularly in Western classical music. Modeling the relation between musical expression and structural aspects of the score being performed is an ongoing line of research. Prior work has shown that some simple numerical descriptors of the score (capturing dynamics annotations and pitch) are effective for ...

In this paper we focus on constructing binary classifiers that are built on the premise of minimising an upper bound on their future misclassification rate. We pay particular attention to the approach taken by the minimax probability machine (Lanckriet et al. in J Mach Learn Res 3:555–582, 2003), which directly minimises an upper bound on the future misclassification rate in a ...

We present the hierarchical Dirichlet scaling process (HDSP), a Bayesian nonparametric mixed membership model. The HDSP generalizes the hierarchical Dirichlet process to model the correlation structure between metadata in the corpus and mixture components. We construct the HDSP based on the normalized gamma representation of the Dirichlet process, and this construction allows ...

Multi-label classification (MLC) tasks are encountered more and more frequently in machine learning applications. While MLC methods exist for the classical batch setting, only a few methods are available for streaming setting. In this paper, we propose a new methodology for MLC via multi-target regression in a streaming setting. Moreover, we develop a streaming multi-target ...

A preference relation-based Top-N recommendation approach is proposed to capture both second-order and higher-order interactions among users and items. Traditionally Top-N recommendation was achieved by predicting the item ratings first, and then inferring the item rankings, based on the assumption of availability of explicit feedback such as ratings, and the assumption that ...

We consider the problem of estimating the class prior in an unlabeled dataset. Under the assumption that an additional labeled dataset is available, the class prior can be estimated by fitting a mixture of class-wise data distributions to the unlabeled data distribution. However, in practice, such an additional labeled dataset is often not available. In this paper, we show that, ...

We consider optimization of generalized performance metrics for binary classification by means of surrogate losses. We focus on a class of metrics, which are linear-fractional functions of the false positive and false negative rates (examples of which include \(F_{\beta }\)-measure, Jaccard similarity coefficient, AM measure, and many others). Our analysis concerns the following ...

We introduce a new recursive aggregation procedure called Bernstein Online Aggregation (BOA). Its exponential weights include a second order refinement. The procedure is optimal for the model selection aggregation problem in the bounded iid setting for the square loss: the excess of risk of its batch version achieves the fast rate of convergence \(\log (M)/n\) in deviation. The BOA ...

Community discovery has emerged during the last decade as one of the most challenging problems in social network analysis. Many algorithms have been proposed to find communities on static networks, i.e. networks which do not change in time. However, social networks are dynamic realities (e.g. call graphs, online social networks): in such scenarios static community discovery fails ...

Conventional wisdom in machine learning says that all algorithms are expected to follow the trajectory of a learning curve which is often colloquially referred to as ‘more data the better’. We call this ‘the gravity of learning curve’, and it is assumed that no learning algorithms are ‘gravity-defiant’. Contrary to the conventional wisdom, this paper provides the theoretical ...