International Journal of Computer Vision

International Journal of Computer Vision (IJCV) details the science and engineering of this rapidly growing field. Regular articles present major technical ...

List of Papers (Total 490)

HiLM-D: Enhancing MLLMs with Multi-scale High-Resolution Details for Autonomous Driving

May 2025 | Ding, Xinpeng, Han, Jianhua, Xu, Hang, et al.

Recent efforts to use natural language for interpretable driving focus mainly on planning, neglecting perception tasks. In this paper, we address this gap by introducing ROLISP (Risk Object Localization and Intention and Suggestion Prediction), which towards interpretable risk object detection and suggestion for ego car motions. Accurate ROLISP implementation requires extensive...

May 2025
Ding, Xinpeng, Han, Jianhua, Xu, Hang, et al.

Correction: Consistent Prompt Tuning for Generalized Category Discovery

May 2025 | Yang, Muli, Yin, Jie, Gu, Yanan, et al.

May 2025
Yang, Muli, Yin, Jie, Gu, Yanan, et al.

A Closer Look at Benchmarking Self-supervised Pre-training with Image Classification

Apr 2025 | Marks, Markus, Knott, Manuel, Kondapaneni, Neehar, et al.

Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels. The model is forced to learn about the data’s inherent structure or context by solving a pretext task. With SSL, models can learn from abundant and cheap unlabeled data, significantly reducing the cost of training models where labels...

Apr 2025
Marks, Markus, Knott, Manuel, Kondapaneni, Neehar, et al.

Correction: Investigating Self-Supervised Methods for Label-Efficient Learning

Apr 2025 | Nandam, Srinivasa Rao, Atito, Sara, Feng, Zhenhua, et al.

Apr 2025
Nandam, Srinivasa Rao, Atito, Sara, Feng, Zhenhua, et al.

Correction: Parameter Efficient Fine-Tuning for Multi-modal Generative Vision Models with Möbius-Inspired Transformation

Apr 2025 | Duan, Haoran, Shao, Shuai, Zhai, Bing, et al.

Apr 2025
Duan, Haoran, Shao, Shuai, Zhai, Bing, et al.

NU-AIR: A Neuromorphic Urban Aerial Dataset for Detection and Localization of Pedestrians and Vehicles

Apr 2025 | Iaboni, Craig, Kelly, Thomas, Abichandani, Pramod

This paper presents an open-source aerial neuromorphic dataset that captures pedestrians and vehicles moving in an urban environment. The dataset, titled NU-AIR, features over 70 min of event footage acquired with a 640 $$\times $$ 480 resolution neuromorphic sensor mounted on a quadrotor operating in an urban environment. Crowds of pedestrians, different types of vehicles, and...

Apr 2025
Iaboni, Craig, Kelly, Thomas, Abichandani, Pramod

Advances in 3D Neural Stylization: A Survey

Mar 2025 | Chen, Yingshu, Shao, Guocheng, Shum, Ka Chun, et al.

Modern artificial intelligence offers a novel and transformative approach to creating digital art across diverse styles and modalities like images, videos and 3D data, unleashing the power of creativity and revolutionizing the way that we perceive and interact with visual content. This paper reports on recent advances in stylized 3D asset creation and manipulation with the...

Mar 2025
Chen, Yingshu, Shao, Guocheng, Shum, Ka Chun, et al.

FlowSDF: Flow Matching for Medical Image Segmentation Using Distance Transforms

Mar 2025 | Bogensperger, Lea, Narnhofer, Dominik, Falk, Alexander, et al.

Medical image segmentation plays an important role in accurately identifying and isolating regions of interest within medical images. Generative approaches are particularly effective in modeling the statistical properties of segmentation masks that are closely related to the respective structures. In this work we introduce FlowSDF, an image-guided conditional flow matching...

Mar 2025
Bogensperger, Lea, Narnhofer, Dominik, Falk, Alexander, et al.

Guest Editorial: Special Issue on Biometrics Security and Privacy

Mar 2025 | Wan, Jun, Ross, Arun, Escalera, Sergio

Mar 2025
Wan, Jun, Ross, Arun, Escalera, Sergio

Guest Editorial: Special Issue on Large-Scale Generative Models for Content Creation and Manipulation

Mar 2025 | He, Shengfeng, Gao, Lin, Fu, Hongbo, et al.

Mar 2025
He, Shengfeng, Gao, Lin, Fu, Hongbo, et al.

LMD: Light-Weight Prediction Quality Estimation for Object Detection in Lidar Point Clouds

Feb 2025 | Riedlinger, Tobias, Schubert, Marius, Penquitt, Sarina, et al.

Object detection on Lidar point cloud data is a promising technology for autonomous driving and robotics which has seen a significant rise in performance and accuracy during recent years. Particularly uncertainty estimation is a crucial component for down-stream tasks and deep neural networks remain error-prone even for predictions with high confidence. Previously proposed...

Feb 2025
Riedlinger, Tobias, Schubert, Marius, Penquitt, Sarina, et al.

Realistic Evaluation of Deep Active Learning for Image Classification and Semantic Segmentation

Feb 2025 | Mittal, Sudhanshu, Niemeijer, Joshua, Çiçek, Özgün, et al.

Active learning aims to reduce the high labeling cost involved in training machine learning models on large datasets by efficiently labeling only the most informative samples. Recently, deep active learning has shown success on various tasks. However, the conventional evaluation schemes are either incomplete or below par. This study critically assesses various active learning...

Feb 2025
Mittal, Sudhanshu, Niemeijer, Joshua, Çiçek, Özgün, et al.

A Survey on Deep Stereo Matching in the Twenties

Feb 2025 | Tosi, Fabio, Bartolomei, Luca, Poggi, Matteo

Stereo matching is close to hitting a half-century of history, yet witnessed a rapid evolution in the last decade thanks to deep learning. While previous surveys in the late 2010s covered the first stage of this revolution, the last five years of research brought further ground-breaking advancements to the field. This paper aims to fill this gap in a two-fold manner: first, we...

Feb 2025
Tosi, Fabio, Bartolomei, Luca, Poggi, Matteo

DustNet++: Deep Learning-Based Visual Regression for Dust Density Estimation

Feb 2025 | Michel, Andreas, Weinmann, Martin, Kuester, Jannick, et al.

Detecting airborne dust in standard RGB images presents significant challenges. Nevertheless, the monitoring of airborne dust holds substantial potential benefits for climate protection, environmentally sustainable construction, scientific research, and various other fields. To develop an efficient and robust algorithm for airborne dust monitoring, several hurdles have to be...

Feb 2025
Michel, Andreas, Weinmann, Martin, Kuester, Jannick, et al.

Diagnosing Human-Object Interaction Detectors

Feb 2025 | Zhu, Fangrui, Xie, Yiming, Xie, Weidi, et al.

We have witnessed significant progress in human-object interaction (HOI) detection. However, relying solely on mAP (mean Average Precision) scores as a summary metric does not provide sufficient insight into the nuances of model performance (e.g., why one model outperforms another), which can hinder further innovation in this field. To address this issue, we introduce a diagnosis...

Feb 2025
Zhu, Fangrui, Xie, Yiming, Xie, Weidi, et al.

Correction: SplitNet: Learnable Clean-Noisy Label Splitting for Learning with Noisy Labels

Feb 2025 | Kim, Daehwan, Ryoo, Kwangrok, Cho, Hansang, et al.

Feb 2025
Kim, Daehwan, Ryoo, Kwangrok, Cho, Hansang, et al.

Sample-Cohesive Pose-Aware Contrastive Facial Representation Learning

Jan 2025 | Liu, Yuanyuan, Feng, Shaoze, Liu, Shuyang, et al.

Self-supervised facial representation learning (SFRL) methods, especially contrastive learning (CL) methods, have been increasingly popular due to their ability to perform face understanding without heavily relying on large-scale well-annotated datasets. However, analytically, current CL-based SFRL methods still perform unsatisfactorily in learning facial representations due to...

Jan 2025
Liu, Yuanyuan, Feng, Shaoze, Liu, Shuyang, et al.

Correction: Few Annotated Pixels and Point Cloud Based Weakly Supervised Semantic Segmentation of Driving Scenes

Jan 2025 | Ma, Huimin, Yi, Sheng, Chen, Shijie, et al.

Jan 2025
Ma, Huimin, Yi, Sheng, Chen, Shijie, et al.

Correction: Deep Attention Learning for Pre-operative Lymph Node Metastasis Prediction in Pancreatic Cancer via Multi-object Relationship Modeling

Jan 2025 | Zheng, Zhilin, Fang, Xu, Yao, Jiawen, et al.

Jan 2025
Zheng, Zhilin, Fang, Xu, Yao, Jiawen, et al.

GL-MCM: Global and Local Maximum Concept Matching for Zero-Shot Out-of-Distribution Detection

Jan 2025 | Miyai, Atsuyuki, Yu, Qing, Irie, Go, et al.

Zero-shot OOD detection is a task that detects OOD images during inference with only in-distribution (ID) class names. Existing methods assume ID images contain a single, centered object, and do not consider the more realistic multi-object scenarios, where both ID and OOD objects are present. To meet the needs of many users, the detection method must have the flexibility to adapt...

Jan 2025
Miyai, Atsuyuki, Yu, Qing, Irie, Go, et al.

Correction: CMAE-3D: Contrastive Masked AutoEncoders for Self-Supervised 3D Object Detection

Jan 2025 | Zhang, Yanan, Chen, Jiaxin, Huang, Di

Jan 2025
Zhang, Yanan, Chen, Jiaxin, Huang, Di

Unsupervised Semantic Segmentation of Urban Scenes via Cross-Modal Distillation

Jan 2025 | Vobecky, Antonin, Hurych, David, Siméoni, Oriane, et al.

Semantic image segmentation models typically require extensive pixel-wise annotations, which are costly to obtain and prone to biases. Our work investigates learning semantic segmentation in urban scenes without any manual annotation. We propose a novel method for learning pixel-wise semantic segmentation using raw, uncurated data from vehicle-mounted cameras and LiDAR sensors...

Jan 2025
Vobecky, Antonin, Hurych, David, Siméoni, Oriane, et al.

Relation-Guided Versatile Regularization for Federated Semi-Supervised Learning

Jan 2025 | Yang, Qiushi, Chen, Zhen, Peng, Zhe, et al.

Federated semi-supervised learning (FSSL) target to address the increasing privacy concerns for the practical scenarios, where data holders are limited in labeling capability. Latest FSSL approaches leverage the prediction consistency between the local model and global model to exploit knowledge from partially labeled or completely unlabeled clients. However, they merely utilize...

Jan 2025
Yang, Qiushi, Chen, Zhen, Peng, Zhe, et al.

AniClipart: Clipart Animation with Text-to-Video Priors

Dec 2024 | Wu, Ronghuan, Su, Wanchao, Ma, Kede, et al.

Clipart, a pre-made graphic art form, offers a convenient and efficient way of illustrating visual content. Traditional workflows to convert static clipart images into motion sequences are laborious and time-consuming, involving numerous intricate steps like rigging, key animation and in-betweening. Recent advancements in text-to-video generation hold great potential in resolving...

Dec 2024
Wu, Ronghuan, Su, Wanchao, Ma, Kede, et al.

Structured Generative Models for Scene Understanding

Dec 2024 | Williams, Christopher K. I.

This position paper argues for the use of structured generative models (SGMs) for the understanding of static scenes. This requires the reconstruction of a 3D scene from an input image (or a set of multi-view images), whereby the contents of the image(s) are causally explained in terms of models of instantiated objects, each with their own type, shape, appearance and pose, along...

Dec 2024
Williams, Christopher K. I.