MTAP special issue on methods and tools for ground truth collection in multimedia applications

Multimedia Tools and Applications, May 2014

Concetto Spampinato, Bastiaan J. Boom, Jiyin He

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2Fs11042-014-1853-1.pdf

MTAP special issue on methods and tools for ground truth collection in multimedia applications

Concetto Spampinato 0 1 2 Bastiaan J. Boom 0 1 2 Jiyin He 0 1 2 0 J. He Centre of Mathematics and Informatics, Science Park 123 , 1098 XG Amsterdam, The Netherlands 1 B. J. Boom University of Edinburgh , 10 Crichton Street, EH8 9AB Edinburgh, UK 2 C. Spampinato ( ) University of Catania , Viale Andrea Doria, 6 - 95125 Catania, Italy The importance of having high quality ground truth annotations for a variety of multimedia applications is widely recognised. Indeed, one of the most time-consuming steps in methods' development is represented by the generation of accurate truth and comparing this truth to the output of applications to provide evidence that the devised methods are performing well in the targeted domain. However, the cost of creating labeled data, which implies asking a human to examine multimedia data thoroughly and provide labels, becomes impractical as datasets to be labeled grow. This can lead to the creation of disparate datasets which are often too small for both learning and evaluating the underlining data distribution. To build up large scale datasets, recently, methods exploiting the collaborative effort of a large population of users annotators (e.g. Labelme, CalTech, Pascal VOC, Trecvid) have been devised. Nevertheless, the creation of a common and large scale ground truth data to train, test and evaluate algorithms for multimedia processing is still a major concern. In particular, the research in ground truth labelling still lacks both in developing user-oriented tools and in automatic methods for supporting annotators in accomplishing their labelling tasks. In fact, tools for ground truth annotation must be user-oriented, providing visual interfaces and methods that are able to guide and speed-up the process of ground truth creation. Under this scenario, multimedia processing methods and collaborative methods play a crucial role. Further, setting up requirements and standards for the creation of multimedia dataset allows other researchers in the field to continue efforts and to contribute to the creation and annotation of multimedia data. This allows researchers to share and extend each others' work, which is beneficial for the research community. - The special issue specifically addresses the development of multimedia processing methods for supporting automatic ground truth generation, methods and tools for combining and comparing ground truth labeled by multiple users in any field of multimedia where ground truth is required, interfaces for collecting ground truth, obtaining groundtruth by simulation and domain requirements/standardization of groundtruth data. The paper An Innovative Web-Based Collaborative Platform for Video Annotation by Kavasidis et al. presents a web based tool for supporting collaborative generation of ground truth for object detection, tracking and image segmentation. The authors, moreover, introduce a new approach to combine annotations of multiple users and show how the quality of the annotations increases incrementally as more users create annotations (independently from user skills and the quality of their individual annotations). The paper A web-based platform for biosignal visualisation and annotation by Lourenco et al. develops a new web-based platform for visualisation, retrieval and annotations of biosignals for non-technical users allowing them to provide ground truth labels for biomedical applications. This allows automated machine learning algorithms to use the ground truth information as input for both learning and evaluation. To evaluate the usability of the system, non-technical users were asked to perform certain task to assess the functionalities of this annotation system. The Ground Truth annotation of Traffic Video Data by Mossi et al. describes an application to annotate traffic surveillance videos (although the method can be extended to other application domains). The main novelty of the method is the use of a jog shuttle wheel to navigate through the video frames which results in a substantial gain of efficiency in the video annotation task. The paper From Global Image Annotation to Interactive Object Segmentation by Giro et al. deals with annotations of still images at two different scales: 1) at a global scale, the proposed method allows users to tag semantically images with labels taken from an ontology and 2) at a local scale, for interactive segmentation of objects starting from automatically segmented regions using a hierarchical partition obtained by using a Binary Partition Tree. The paper Robust Semi-automatic Head Pose Labeling for Real-World Face Video Sequences by Meltem Demirkus et al. present a methodology to annotate the temporal head pose of face in real-world video sequences. In order to annotate the head pose efficiently, a semi automatic framework is used, where faces are automatically detected in a subset of the frames, afterwards the head pose of the face is labelled for those frames and using an interpolati (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs11042-014-1853-1.pdf

Concetto Spampinato, Bastiaan J. Boom, Jiyin He. MTAP special issue on methods and tools for ground truth collection in multimedia applications, Multimedia Tools and Applications, 2014, pp. 409-412, Volume 70, Issue 1, DOI: 10.1007/s11042-014-1853-1