VISION: a video and image dataset for source identification
Shullani et al. EURASIP Journal on Information Security
VISION: a video and image dataset for source identification
Dasara Shullani 1
Marco Fontani 0 1
Massimo Iuliani 0 1
Omar Al Shaya 1 2
Alessandro Piva 0 1
0 FORLAB, Multimedia Forensics laboratory, PIN Scrl , Piazza G. Ciardi, 25, 59100 Prato , Italy
1 Department of Information Engineering, University of Florence , Via di S. Marta, 3, 50139 Florence , Italy
2 Department of electronic Media, Saudi Electronic University , Abi Bakr As Sadiq Rd, Riyadh, 11673 , Saudi Arabia
Forensic research community keeps proposing new techniques to analyze digital images and videos. However, the performance of proposed tools are usually tested on data that are far from reality in terms of resolution, source device, and processing history. Remarkably, in the latest years, portable devices became the preferred means to capture images and videos, and contents are commonly shared through social media platforms (SMPs, for example, Facebook, YouTube, etc.). These facts pose new challenges to the forensic community: for example, most modern cameras feature digital stabilization, that is proved to severely hinder the performance of video source identification technologies; moreover, the strong re-compression enforced by SMPs during upload threatens the reliability of multimedia forensic tools. On the other hand, portable devices capture both images and videos with the same sensor, opening new forensic opportunities. The goal of this paper is to propose the VISION dataset as a contribution to the development of multimedia forensics. The VISION dataset is currently composed by 34,427 images and 1914 videos, both in the native format and in their social version (Facebook, YouTube, and WhatsApp are considered), from 35 portable devices of 11 major brands. VISION can be exploited as benchmark for the exhaustive evaluation of several image and video forensic tools.
Dataset multimedia forensics; Image forensics; Video forensics; Source identification
1 Introduction
In the last decades, visual data gained a key role in
providing information. Images and videos are used to convey
persuasive messages to be used under several different
environments, from propaganda to child pornography.
The wild world of web also allows users to easily share
visual contents through social media platforms. Statistics
[
1
] show that a relevant portion of the world’s
population owns a digital camera and can capture pictures.
Furthermore, one third of the people can go online and
upload their pictures on websites and social networks.
Given their digital nature, these data also convey several
information related to their life cycle (e.g., source device,
processing they have been subjected to). Such
information may become relevant when visual data are involved
in a crime. In this scenario, multimedia forensics (MF) has
been proposed as a solution for investigating images and
videos to determine information about their life cycle [
2
].
During the years, the research community developed
several tools to analyze a digital image, focusing on issues
related to the identification of the source device and the
assessment of content authenticity [
3
].
Generally, the effectiveness of a forensic technique
should be verified on image and video datasets that are
freely available and shared among the community.
Unfortunately, these datasets, especially for the case of videos,
are outdated and non-representative of real case
scenarios. Indeed, most multimedia contents are currently
acquired by portable devices that keep updating year by
year. These devices are also capable to acquire both videos
and images exploiting the same sensor, thus opening new
investigation opportunities in linking different kind of
contents [
4
]. This motivates the need for a new dataset
containing a heterogeneous and sufficiently large set of
visual data—both images and videos—as benchmark to
test and compare forensic tools.
In this paper, we present a new dataset of native images
and videos captured with 35 modern smartphones/tablets
belonging to 11 different brands: Apple, Asus, Huawei,
Lenovo, LG electronics, Microsoft, OnePlus, Samsung,
Sony, Wiko, and Xiaomi.
Overall, we collected 11,732 native images; 7565 of them
were shared through Facebook, in both high and low
quality, and through WhatsApp, resulting in a total of 34,427
images. Furthermore, we acquired 648 native videos, 622
of which were shared through YouTube at the maximum
available resolution, and 644 through WhatsApp,
resulting in a total of 1914 videos1.
To exemplify the usefulness of the VISION dataset, we
test the performance of a well-known forensic tool, i.e.,
the detection of the sensor pattern noise (SPN) left by
the acquisition device [
5
] for the source identification
of native/social media contents; moreover, we describe
some new opportunities deriving by the availability of
images and videos captured with the same sensor to find
a solution to current (...truncated)