Machine Learning

Machine Learning is an international forum for research on computational approaches to learning. The journal publishes articles reporting substantive results ...

List of Papers (Total 2,153)

Navigating explanatory multiverse through counterfactual path geometry

Counterfactual explanations are the de facto standard when tasked with interpreting decisions of (opaque) predictive models. Their generation is often subject to technical and domain-specific constraints that aim to maximise their real-life utility. In addition to considering desiderata pertaining to the counterfactual instance itself, guaranteeing existence of a viable path...

One transformer for all time series: representing and training with time-dependent heterogeneous tabular data

There is a recent growing interest in applying Deep Learning techniques to tabular data in order to replicate the success of other Artificial Intelligence areas in this structured domain. Particularly interesting is the case in which tabular data have a time dependence, such as, for instance, financial transactions. However, the heterogeneity of the tabular values, in which...

Drop-in efficient self-attention approximation method

Transformers have achieved state-of-the-art performance in most common tasks to which they have been applied. Those achievements are attributed to the Self-Attention mechanism at their core. Self-Attention is understood to map the relationship between tokens of any given sequence. This exhaustive mapping incurs massive costs in memory and inference time, as Self-Attention scales...

Cost-sensitive classification with cost uncertainty: do we need surrogate losses?

In many binary classification applications, the costs of false positives and negatives are imbalanced. Furthermore, there is often uncertainty about the exact costs of these errors. A natural measure-of-interest to be minimised in such scenarios is the expected misclassification cost. We identify many situations where this measure has analytic gradients, and thus it can be used...

Temporal ensemble of multiple patterns’ instances for continuous prediction of events

In real-life data of various domains, such as traffic, meteorology, or healthcare data, events may have varying durations. Moreover, heterogeneous multivariate temporal data may consist of varying samplings, including regular sampling in different frequencies or irregular, as well as events data of different types, having fixed or varying duration. We propose to uniformly...

An unsupervised adversarial domain adaptation based on variational auto-encoder

Collecting a large amount of labeled data in machine learning is always challenging. Often, even with sufficient data, domain differences can cause a shift or bias in data distribution, affecting model performance during testing. Domain adaptation methods, especially adversarial techniques, are effective solutions for these challenges. The goal is to learn a classifier for an...

Enhanced route planning with calibrated uncertainty set

This paper investigates the application of probabilistic prediction methodologies in route planning within a road network context. Specifically, we introduce the Conformalized Quantile Regression for Graph Autoencoders (CQR-GAE), which leverages the conformal prediction technique to offer a coverage guarantee, thus improving the reliability and robustness of our predictions. By...

Online dimensionality reduction through stacked generalization of spectral methods with deep networks

Analyzing large volumes of high-dimensional data poses significant challenges. Dimensionality reduction aims to reveal the most prominent properties of data by embedding them into a low-dimensional representation. Spectral dimensionality reduction methods using kernel matrices have been proven to yield optimal results. Online versions of those methods are desirable to...

DatRel: a noise-tolerant data relocation approach for effective synthetic data generation in imbalanced classifiers

Most machine learning algorithms tend to bias towards the majority class when a dataset exhibits a skewed distribution in the class variable. This is called the class imbalance problem and is frequently encountered in real-life applications. One of the most prevalent methods for addressing class imbalance is data resampling, which generates or removes samples to balance the...

Adaptive optimization for prediction with missing data

When training predictive models on data with missing entries, the most widely used and versatile approach is a pipeline technique where we first impute missing entries and then compute predictions. In this paper, we view prediction with missing data as a two-stage adaptive optimization problem and propose a new class of models, adaptive linear regression models, where the...

Neural RELAGGS

Multi-relational databases are the basis of most consolidated data collections in science and industry today. Most learning and mining algorithms, however, require data to be represented in a propositional form. While there is a variety of specialized machine learning algorithms that can operate directly on multi-relational data sets, propositionalization algorithms transform...

An end-to-end explainability framework for spatio-temporal predictive modeling

The rising adoption of AI models in real-world applications characterized by sensor data creates an urgent need for inference explanation mechanisms to support domain experts in making informed decisions. Explainable AI (XAI) opens up a new opportunity to extend black-box deep learning models with such inference explanation capabilities. However, existing XAI approaches for...

Likelihood-ratio-based confidence intervals for neural networks

This paper introduces a first implementation of a novel likelihood-ratio-based approach for constructing confidence intervals for neural networks. Our method, called DeepLR, offers several qualitative advantages: most notably, the ability to construct asymmetric intervals that expand in regions with a limited amount of data, and the inherent incorporation of factors such as the...

Generalized median of means principle for Bayesian inference

The topic of robustness is experiencing a resurgence of interest in the statistical and machine learning communities. In particular, robust algorithms making use of the so-called median of means estimator were shown to satisfy strong performance guarantees for many problems, including estimation of the mean, covariance structure as well as linear regression. In this work, we...

A survey on self-supervised methods for visual representation learning

Learning meaningful representations is at the heart of many tasks in the field of modern machine learning. Recently, a lot of methods were introduced that allow learning of image representations without supervision. These representations can then be used in downstream tasks like classification or object detection. The quality of these representations is close to supervised...

Pairwise learning to rank by neural networks revisited: reconstruction, theoretical analysis and practical performance

We reevaluate the pairwise learning to rank approach based on neural nets, called RankNet, and present a theoretical analysis of its architecture. We show mathematically that the model can, under certain conditions, learn reflexive, antisymmetric, and transitive relations, enabling simplified training and improved performance. Experimental results on the LETOR MSLR-WEB10K, MQ2007...

Intramodal consistency in triplet-based cross-modal learning for image retrieval

Cross-modal retrieval requires building a common latent space that captures and correlates information from different data modalities, usually images and texts. Cross-modal training based on the triplet loss with hard negative mining is a state-of-the-art technique to address this problem. This paper shows that such approach is not always effective in handling intra-modal...

Deep Errors-in-Variables using a diffusion model

Errors-in-Variables is the statistical concept used to explicitly model input variable errors caused, for example, by noise. While it has long been known in statistics that not accounting for such errors can produce a substantial bias, the vast majority of deep learning models have thus far neglected Errors-in-Variables approaches. Reasons for this include a significant increase...

TCR: topologically consistent reweighting for XGBoost in regression tasks

Gradient boosted tree ensembles (GBTEs) such as XGBoost continue to outperform other machine learning models on tabular data. However, the plethora of adjustable hyperparameters can exacerbate optimisation, especially in regression tasks with no intuitive performance measures such as accuracy and confidence. Automated machine learning frameworks alleviate the hyperparameter...

On the usefulness of the fit-on-test view on evaluating calibration of classifiers

Calibrated uncertainty estimates are essential for classifiers used in safety-critical applications. If a classifier is uncalibrated, then there is a unique way to calibrate its uncertainty using the idealistic true calibration map corresponding to this classifier. Although the true calibration map is typically unknown in practice, it can be estimated with many post-hoc...

Testing exchangeability in the batch mode with e-values and Markov alternatives

The topic of this paper is testing the assumption of exchangeability, which is the standard assumption in mainstream machine learning. The common approaches are online testing by betting (such as conformal testing) and the older batch testing using p-values (as in classical hypothesis testing). The approach of this paper is intermediate in that we are interested in batch testing...

Calibrated explanations for regression

Artificial Intelligence (AI) methods are an integral part of modern decision support systems. The best-performing predictive models used in AI-based decision support systems lack transparency. Explainable Artificial Intelligence (XAI) aims to create AI systems that can explain their rationale to human users. Local explanations in XAI can provide information about the causes of...

HorNets: learning from discrete and continuous signals with routing neural networks

Construction of neural network architectures suitable for learning from both continuous and discrete tabular data is challenging, as contemporary high-dimensional tabular data sets are often characterized by a relatively small set of instances and the request for efficient learning. We propose HorNets (Horn Networks), a neural network architecture with state-of-the-art...