MTAP special issue on methods and tools for ground truth collection in multimedia applications
Concetto Spampinato
0
1
2
Bastiaan J. Boom
0
1
2
Jiyin He
0
1
2
0
J. He Centre of Mathematics and Informatics, Science Park 123
, 1098 XG Amsterdam,
The Netherlands
1
B. J. Boom University of Edinburgh
, 10 Crichton Street, EH8 9AB Edinburgh,
UK
2
C. Spampinato ( ) University of Catania
, Viale Andrea Doria, 6 - 95125 Catania,
Italy
The importance of having high quality ground truth annotations for a variety of multimedia applications is widely recognised. Indeed, one of the most time-consuming steps in methods' development is represented by the generation of accurate truth and comparing this truth to the output of applications to provide evidence that the devised methods are performing well in the targeted domain. However, the cost of creating labeled data, which implies asking a human to examine multimedia data thoroughly and provide labels, becomes impractical as datasets to be labeled grow. This can lead to the creation of disparate datasets which are often too small for both learning and evaluating the underlining data distribution. To build up large scale datasets, recently, methods exploiting the collaborative effort of a large population of users annotators (e.g. Labelme, CalTech, Pascal VOC, Trecvid) have been devised. Nevertheless, the creation of a common and large scale ground truth data to train, test and evaluate algorithms for multimedia processing is still a major concern. In particular, the research in ground truth labelling still lacks both in developing user-oriented tools and in automatic methods for supporting annotators in accomplishing their labelling tasks. In fact, tools for ground truth annotation must be user-oriented, providing visual interfaces and methods that are able to guide and speed-up the process of ground truth creation. Under this scenario, multimedia processing methods and collaborative methods play a crucial role. Further, setting up requirements and standards for the creation of multimedia dataset allows other researchers in the field to continue efforts and to contribute to the creation and annotation of multimedia data. This allows researchers to share and extend each others' work, which is beneficial for the research community.
-
The special issue specifically addresses the development of multimedia processing
methods for supporting automatic ground truth generation, methods and tools for combining and
comparing ground truth labeled by multiple users in any field of multimedia where ground
truth is required, interfaces for collecting ground truth, obtaining groundtruth by simulation
and domain requirements/standardization of groundtruth data.
The paper An Innovative Web-Based Collaborative Platform for Video Annotation by
Kavasidis et al. presents a web based tool for supporting collaborative generation of ground
truth for object detection, tracking and image segmentation. The authors, moreover,
introduce a new approach to combine annotations of multiple users and show how the quality
of the annotations increases incrementally as more users create annotations (independently
from user skills and the quality of their individual annotations).
The paper A web-based platform for biosignal visualisation and annotation by
Lourenco et al. develops a new web-based platform for visualisation, retrieval and
annotations of biosignals for non-technical users allowing them to provide ground truth labels
for biomedical applications. This allows automated machine learning algorithms to use the
ground truth information as input for both learning and evaluation. To evaluate the
usability of the system, non-technical users were asked to perform certain task to assess the
functionalities of this annotation system.
The Ground Truth annotation of Traffic Video Data by Mossi et al. describes an
application to annotate traffic surveillance videos (although the method can be extended to other
application domains). The main novelty of the method is the use of a jog shuttle wheel to
navigate through the video frames which results in a substantial gain of efficiency in the
video annotation task.
The paper From Global Image Annotation to Interactive Object Segmentation by Giro
et al. deals with annotations of still images at two different scales: 1) at a global scale, the
proposed method allows users to tag semantically images with labels taken from an ontology
and 2) at a local scale, for interactive segmentation of objects starting from automatically
segmented regions using a hierarchical partition obtained by using a Binary Partition Tree.
The paper Robust Semi-automatic Head Pose Labeling for Real-World Face Video
Sequences by Meltem Demirkus et al. present a methodology to annotate the temporal
head pose of face in real-world video sequences. In order to annotate the head pose
efficiently, a semi automatic framework is used, where faces are automatically detected in a
subset of the frames, afterwards the head pose of the face is labelled for those frames and
using an interpolati (...truncated)