International Journal of Multimedia Information Retrieval

https://link.springer.com/journal/13735

List of Papers (Total 98)

Few-shot and meta-learning methods for image understanding: a survey

State-of-the-art deep learning systems (e.g., ImageNet image classification) typically require very large training sets to achieve high accuracies. Therefore, one of the grand challenges is called few-shot learning where only a few training samples are required for good performance. In this survey, we illuminate one of the key paradigms in few-shot learning called meta-learning...

Emotion-aware music tower blocks (EmoMTB ): an intelligent audiovisual interface for music discovery and recommendation

Music listening has experienced a sharp increase during the last decade thanks to music streaming and recommendation services. While they offer text-based search functionality and provide recommendation lists of remarkable utility, their typical mode of interaction is unidimensional, i.e., they provide lists of consecutive tracks, which are commonly inspected in sequential order...

Optical music recognition for homophonic scores with neural networks and synthetic music generation

The recognition of patterns that have a time dependency is common in areas like speech recognition or natural language processing. The equivalent situation in image analysis is present in tasks like text or video recognition. Recently, Convolutional Recurrent Neural Networks (CRNN) have been broadly applied to solve these tasks in an end-to-end fashion with successful performance...

MemeTector: enforcing deep focus for meme detection

Image memes and specifically their widely known variation image macros are a special new media type that combines text with images and are used in social media to playfully or subtly express humor, irony, sarcasm and even hate. It is important to accurately retrieve image memes from social media to better capture the cultural and social aspects of online phenomena and detect...

Multimodal Quasi-AutoRegression: forecasting the visual popularity of new fashion products

Estimating the preferences of consumers is of utmost importance for the fashion industry as appropriately leveraging this information can be beneficial in terms of profit. Trend detection in fashion is a challenging task due to the fast pace of change in the fashion industry. Moreover, forecasting the visual popularity of new garment designs is even more demanding due to lack of...

A unified approach of detecting misleading images via tracing its instances on web and analyzing its past context for the verification of multimedia content

The verification of multimedia content over social media is one of the challenging and crucial issues in the current scenario and gaining prominence in an age where user-generated content and online social web-platforms are the leading sources in shaping and propagating news stories. As these sources allow users to share their opinions without restriction, opportunistic users...

Generative adversarial networks and its applications in the biomedical image segmentation: a comprehensive survey

Recent advancements with deep generative models have proven significant potential in the task of image synthesis, detection, segmentation, and classification. Segmenting the medical images is considered a primary challenge in the biomedical imaging field. There have been various GANs-based models proposed in the literature to resolve medical segmentation challenges. Our research...

How can users’ comments posted on social media videos be a source of effective tags?

This paper proposed a new approach for the extraction of tags from users’ comments made about videos. In fact, videos on the social media, like Facebook and YouTube, are usually accompanied by comments where users may give opinions about things evoked in the video. The main challenge is how to extract relevant tags from them. To the best of the authors’ knowledge, this is the...

Music emotion recognition based on segment-level two-stage learning

In most Music Emotion Recognition (MER) tasks, researchers tend to use supervised learning models based on music features and corresponding annotation. However, few researchers have considered applying unsupervised learning approaches to labeled data except for feature representation. In this paper, we propose a segment-based two-stage model combining unsupervised learning and...

Siamese coding network and pair similarity prediction for near-duplicate image detection

Near-duplicate detection in a dataset involves finding the elements that are closest to a new query element according to a given similarity function and proximity threshold. The brute force approach is very computationally intensive as it evaluates the similarity between the queried item and all items in the dataset. The potential application domain is an image sharing website...

Anomaly detection using edge computing in video surveillance system: review

The current concept of smart cities influences urban planners and researchers to provide modern, secured and sustainable infrastructure and gives a decent quality of life to its residents. To fulfill this need, video surveillance cameras have been deployed to enhance the safety and well-being of the citizens. Despite technical developments in modern science, abnormal event...

Few2Decide: towards a robust model via using few neuron connections to decide

Researches have shown that image classification networks are vulnerable to adversarial examples, which seriously limits their application in safely critical scenarios. Existing defense methods usually employ adversarial training or adjust the network structure to resist adversarial attack. Although these defense methods can improve the model robustness to some extent, they often...

Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown

The Video Browser Showdown addresses difficult video search challenges through an annual interactive evaluation campaign attracting research teams focusing on interactive video retrieval. The campaign aims to provide insights into the performance of participating interactive video retrieval systems, tested by selected search tasks on large video collections. For the first time in...

Multimodal image and audio music transcription

Optical Music Recognition (OMR) and Automatic Music Transcription (AMT) stand for the research fields that aim at obtaining a structured digital representation from sheet music images and acoustic recordings, respectively. While these fields have traditionally evolved independently, the fact that both tasks may share the same output representation poses the question of whether...

Towards a high robust neural network via feature matching

Image classification systems have been found vulnerable to adversarial attack, which is imperceptible to human but can easily fool deep neural networks. Recent researches indicate that regularizing the network by introducing randomness could greatly improve the model’s robustness against adversarial attack, but the randomness module would normally involve complex calculations and...

A review on deep learning in medical image analysis

Ongoing improvements in AI, particularly concerning deep learning techniques, are assisting to identify, classify, and quantify patterns in clinical images. Deep learning is the quickest developing field in artificial intelligence and is effectively utilized lately in numerous areas, including medication. A brief outline is given on studies carried out on the region of...

Multimodal news analytics using measures of cross-modal entity and context consistency

The World Wide Web has become a popular source to gather information and news. Multimodal information, e.g., supplement text with photographs, is typically used to convey the news more effectively or to attract attention. The photographs can be decorative, depict additional details, but might also contain misleading information. The quantification of the cross-modal consistency...

Counterfactual attribute-based visual explanations for classification

In this paper, our aim is to provide human understandable intuitive factual and counterfactual explanations for the decisions of neural networks. Humans tend to reinforce their decisions by providing attributes and counterattributes. Hence, in this work, we utilize attributes as well as examples to provide explanations. In order to provide counterexplanations we make use of...

Design ensemble deep learning model for pneumonia disease classification

With the recent spread of the SARS-CoV-2 virus, computer-aided diagnosis (CAD) has received more attention. The most important CAD application is to detect and classify pneumonia diseases using X-ray images, especially, in a critical period as pandemic of covid-19 that is kind of pneumonia. In this work, we aim to evaluate the performance of single and ensemble learning models...