Multimedia Systems

https://link.springer.com/journal/530

List of Papers (Total 109)

LET-Net: locally enhanced transformer network for medical image segmentation

Medical image segmentation has attracted increasing attention due to its practical clinical requirements. However, the prevalence of small targets still poses great challenges for accurate segmentation. In this paper, we propose a novel locally enhanced transformer network (LET-Net) that combines the strengths of transformer and convolution to address this issue. LET-Net utilizes...

A social-aware video sharing solution using demand prediction of epidemic-based propagation in wireless networks

The video services that account for the majority of global network traffic consume significant amounts of electricity and network resources to meet the large-scale demand of users. Variations in user interest and social influence lead to high maintenance costs for achieving a dynamic balance between supply and demand, which negatively impacts the sustainable development of video...

View-target relation-guided unsupervised 2D image-based 3D model retrieval via transformer

Unsupervised 2D image-based 3D model retrieval aims at retrieving images from the gallery of 3D models by the given 2D images. Despite the encouraging progress made in this task, there are still two significant limitations: (1) feature alignment of 2D images and 3D model gallery is still difficult due to the huge gap between the two modalities. (2) The important view information...

LPR: learning point-level temporal action localization through re-training

Point-level temporal action localization (PTAL) aims to locate action instances in untrimmed videos with only one timestamp annotation for each action instance. Existing methods adopt the localization-by-classification paradigm to locate action boundaries in the temporal class activation map (TCAM) by thresholding, also known as TCAM-based method. However, TCAM-based methods are...

Context-guided coarse-to-fine detection model for bird nest detection on high-speed railway catenary

As a critical component of ensuring the safe and stable operation of trains, the detection of bird’s nests on the rail catenary has always been essential. Low-resolution images and the lack of labelled data, however, make it difficult to detect smaller bird’s nests (those occupying small pixels in the input image). Previous solution relies on manual online patrol or offline video...

Multi-modal humor segment prediction in video

Humor can be induced by various signals in the visual, linguistic, and vocal modalities emitted by humans. Finding humor in videos is an interesting but challenging task for an intelligent system. Previous methods predict humor in the sentence level given some text (e.g., speech transcript), sometimes together with other modalities, such as videos and speech. Such methods ignore...

Virtual reality in medical emergencies training: benefits, perceived stress, and learning success

Medical graduates lack procedural skills experience required to manage emergencies. Recent advances in virtual reality (VR) technology enable the creation of highly immersive learning environments representing easy-to-use and affordable solutions for training with simulation. However, the feasibility in compulsory teaching, possible side effects of immersion, perceived stress...

COVID-SegNet: encoder–decoder-based architecture for COVID-19 lesion segmentation in chest X-ray

The coronavirus disease 2019, initially named 2019-nCOV (COVID-19) has been declared a global pandemic by the World Health Organization in March 2020. Because of the growing number of COVID patients, the world’s health infrastructure has collapsed, and computer-aided diagnosis has become a necessity. Most of the models proposed for the COVID-19 detection in chest X-rays do image...

Ensemble deep honey architecture for COVID-19 prediction using CT scan and chest X-ray images

Recently, the infectious disease COVID-19 remains to have a catastrophic effect on the lives of human beings all over the world. To combat this deadliest disease, it is essential to screen the affected people quickly and least inexpensively. Radiological examination is considered the most feasible step toward attaining this objective; however, chest X-ray (CXR) and computed...

A survey on the pipeline evolution of facial capture and tracking for digital humans

With the introduction of concepts for virtual interaction and digital doubles, a rich scenario has been created for embodied avatars to strive. These avatars, more recently referred to as digital humans, have become a popular area of research, resulting in various techniques and methods that focus on improving the perception of their realism, fidelity, emphatic response, and...

An overview of deep learning techniques for COVID-19 detection: methods, challenges, and future works

The World Health Organization (WHO) declared a pandemic in response to the coronavirus COVID-19 in 2020, which resulted in numerous deaths worldwide. Although the disease appears to have lost its impact, millions of people have been affected by this virus, and new infections still occur. Identifying COVID-19 requires a reverse transcription-polymerase chain reaction test (RT-PCR...

A survey on face presentation attack detection mechanisms: hitherto and future perspectives

The advances in human face recognition (FR) systems have recorded sublime success for automatic and secured authentication in diverse domains. Although the traditional methods have been overshadowed by face recognition counterpart during this progress, computer vision gains rapid traction, and the modern accomplishments address problems with real-world complexity. However...

Performance analysis of U-Net with hybrid loss for foreground detection

With the latest developments in deep neural networks, the convolutional neural network (CNN) has made considerable progress in the area of foreground detection. However, the top-rank background subtraction algorithms for foreground detection still have many shortcomings. It is challenging to extract the true foreground against complex background. To tackle the bottleneck, we...

Learning effective embedding for automated COVID-19 prediction from chest X-ray images

The pandemic that the SARS-CoV-2 originated in 2019 is continuing to cause serious havoc on the global population’s health, economy, and livelihood. A critical way to suppress and restrain this pandemic is the early detection of COVID-19, which will help to control the virus. Chest X-rays are one of the more straightforward ways to detect the COVID-19 virus compared to the...

DATaR: depth augmented target redetection using kernelized correlation filter

Unlike deep learning which requires large training datasets, correlation filter-based trackers like Kernelized Correlation Filter (KCF) use implicit properties of tracked images (circulant structure) for training in real time. Despite their popularity in tracking applications, there exists significant drawbacks of the tracker in cases like occlusions and out-of-view scenarios...

Fake COVID-19 videos detector based on frames and audio watermarking

With the innovation and development of advanced video editing technology and the widespread use of video information and services in our society, it is increasingly necessary to maintain the reliability of video information. As a result, sensitive video contents in various fields such as surveillance, medical, and others should be secured against attempts to alter them because...

Combating multimodal fake news on social media: methods, datasets, and future perspective

The growth in the use of social media platforms such as Facebook and Twitter over the past decade has significantly facilitated and improved the way people communicate with each other. However, the information that is available and shared online is not always credible. These platforms provide a fertile ground for the rapid propagation of breaking news along with other misleading...

A survey on the interpretability of deep learning in medical diagnosis

Deep learning has demonstrated remarkable performance in the medical domain, with accuracy that rivals or even exceeds that of human experts. However, it has a significant problem that these models are “black-box” structures, which means they are opaque, non-intuitive, and difficult for people to understand. This creates a barrier to the application of deep learning models in...