Determining modified versions of social media images

World Wide Web, Apr 2025

Social media platforms usually contain several modified versions of an image. This proliferation of versions questions the trust of social media images. We propose a novel framework to find modified versions of social media images using only their metadata. We consider several aspects to determine if an image is a modified version of another image. These aspects include topic of an image, spatio-temporal information, and semantic similarity. We first do topic modeling to find images linked to the same context. Secondly, we perform spatio-temporal clustering to group spatio-temporally close images. Finally, we perform hierarchical clustering to form more precise clusters of versions. Notably, the proposed framework also considers modifications introduced in an image’s metadata while determining versions of the image. Modifications in social media images pose a significant challenge to correctly cluster versions together as a version may exhibit significant deviations from its original image. We address this issue by exploring inconsistencies in the image metadata. These inconsistencies are reflective of the changes in an image. We validate our model on a fact-checked image verification corpus and the Multimodal C4 dataset. We achieve around 95% accuracy, validating the effectiveness of the proposed approach.

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s11280-025-01335-1.pdf

Determining modified versions of social media images

World Wide Web (2025) 28:32 https://doi.org/10.1007/s11280-025-01335-1 Determining modified versions of social media images Qijun He1 · Muhammad Umair1 · Athman Bouguettaya1 · Amani Abusafia1 Received: 20 February 2024 / Revised: 3 February 2025 / Accepted: 23 February 2025 © The Author(s) 2025 Abstract Social media platforms usually contain several modified versions of an image. This proliferation of versions questions the trust of social media images. We propose a novel framework to find modified versions of social media images using only their metadata. We consider several aspects to determine if an image is a modified version of another image. These aspects include topic of an image, spatio-temporal information, and semantic similarity. We first do topic modeling to find images linked to the same context. Secondly, we perform spatiotemporal clustering to group spatio-temporally close images. Finally, we perform hierarchical clustering to form more precise clusters of versions. Notably, the proposed framework also considers modifications introduced in an image’s metadata while determining versions of the image. Modifications in social media images pose a significant challenge to correctly cluster versions together as a version may exhibit significant deviations from its original image. We address this issue by exploring inconsistencies in the image metadata. These inconsistencies are reflective of the changes in an image. We validate our model on a fact-checked image verification corpus and the Multimodal C4 dataset. We achieve around 95% accuracy, validating the effectiveness of the proposed approach. Keywords Image metadata · Fake images · Social media · Trust · Image provenance · Modified images · Misinformation 1 Introduction Social media platforms have become crucial for sharing real-life event updates [1–3]. The number of active social media users surpassed 4 billion by 2019 [4], and around 350 million new photos uploaded daily on Facebook [5]. These social media images may contain useful Qijun He and Muhammad Umair contributed equally to this work. B Muhammad Umair Qijun He Athman Bouguettaya Amani Abusafia 1 School of Computer Science, The University of Sydney, Camperdown, Sydney 2006, New South Wales, Australia 0123456789().: V,-vol 123 32 Page 2 of 29 World Wide Web (2025) 28:32 information about recent incidents. Therefore, these images are actively used in many critical applications, i.e., scene reconstruction and scene analysis [6]. These applications underline the significant impact of social media images in modern information dissemination. People nowadays frequently alter images with the ubiquitous accessibility of image editing tools., e.g., Photoshop [7]. They alter these images to meet their objectives [8], e.g., spreading false information about COVID-19 [9]. These objectives can encompass anything from lighthearted improvements to the dissemination of rumors and misinformation [10]. Consequently, numerous versions of images are disseminated on the internet. We define a “version" of an image as the modified copy of an original image. Identifying the modified version of an image may help in several applications such as understanding the manipulations in different events reported on social media. Social media platforms usually contain numerous versions of an image [11]. This proliferation of versions on social media platforms poses a significant challenge for users seeking to establish the authenticity and trustworthiness of a particular image. The various alterations, filters, and edits applied to images before they are shared online may obscure their original context and veracity [12]. This necessitates a vigilant approach to discerning credible information in the digital realm. For example, a fake image of Greta Thunberg, the Swedish climate activist, was shared widely on social media ( See Figure 1) [13]. The image (along with its metadata) was altered to create the impression that Thunberg was having lunch in front of poor kids. This paper focuses on identifying the modified versions of social media images. Tracking the versions of an image requires the detection of modifications in these images. In the realm of social media, numerous tools have been developed to identify similar images, including Reverse Image Search [14]. However, merely identifying similar images offers limited utility in tracking their versions. This limitation is due to the fact that two visually similar images may not share the same origin. Researchers have employed a multitude of techniques aimed at detecting modifications in images. These methodologies may be categorized into computer vision, image forensics, service-oriented, and image-metadata-based approaches. Computer vision-based approaches primarily rely on the content within images. For instance, image processing is employed in [15] to extract an image’s residual signals. Image forensics methods detect modifications by scrutinizing inconsistencies between the visual content and associated metadata [16]. These techniques focus on various factors, including metadata discrepancies, traces of manipulation, and anomalies that may signal alterations in the image. By meticulously cross-referencing metadata with visual characteristics, forensic methods uncover evidence of tampering or editing. This approach not only reveals potential modifications but also offers a comprehensive understanding of the image’s authenticity and provenance, enhancing the ability to validate and trace its history. Serviceoriented approaches assess image authenticity by analyzing user-generated content such as comments and timestamps [17]. Metadata-based approaches employ image metadata to identify modified images. Image metadata may offer a substantial representation of its content, making it possible to detect image alterations through metadata analysis [18]. The aforementioned methods for detecting image modifications are limited in detecting the versions of an image. For instance, computer vision and image forensic methods heavily rely on extensive training over image content [19, 20]. In addition, they often lack the ability to identify any changes in images metadata. Similarly, service-oriented methods rely on user comments but may be subjective and biased [17]. Metadata-based methods may detect further insights during image forensics [21]. However, most metadata-based methods fail to utilize all the attributes of metadata. These solutions typically analyze a very specific kind 123 World Wide Web (2025) 28:32 Page 3 of 29 32 Figure 1 Altered images of Greta Thunberg of image modification (e.g., changes in spatio-temporal attributes) and use a small subset of EXIF metadata tags [22]. It is important to highlight that these solutions often overlook the semantic aspects embedded within image metadata. An image is a well-defined entity that consists of pixels and metadata [23]. We servitize the detection of image (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007/s11280-025-01335-1.pdf
Article home page: https://link.springer.com/article/10.1007/s11280-025-01335-1

He, Qijun, Umair, Muhammad, Bouguettaya, Athman, Abusafia, Amani. Determining modified versions of social media images, World Wide Web, 2025, pp. 1-29, Volume 28, Issue 3, DOI: 10.1007/s11280-025-01335-1