The increase in precision agriculture has promoted the development of picking robot technology, and the visual recognition system at its core is crucial for improving the level of agricultural automation. This paper reviews the progress of visual recognition technology for picking robots, including image capture technology, target detection algorithms, spatial positioning...
Graph-structured data are pervasive in the real-world such as social networks, molecular graphs and transaction networks. Graph neural networks (GNNs) have achieved great success in representation learning on graphs, facilitating various downstream tasks. However, GNNs have several drawbacks such as lacking interpretability, can easily inherit the bias of data and cannot model...
Traditional parametric system identification methods usually rely on apriori knowledge of the targeted system, which may not always be available, especially for complex systems. Although neural networks (NNs) have been increasingly adopted in system identification, most studies have failed to derive interpretable parametric models for further analysis. In this paper, we propose a...
Graph neural networks (GNNs) have made rapid developments in the recent years. Due to their great ability in modeling graph-structured data, GNNs are vastly used in various applications, including high-stakes scenarios such as financial analysis, traffic predictions, and drug discovery. Despite their great potential in benefiting humans in the real world, recent study shows that...
This paper tackles the high computational/space complexity associated with multi-head self-attention (MHSA) in vanilla vision transformers. To this end, we propose hierarchical MHSA (H-MHSA), a novel approach that computes sell-attention in a hierarchical fashion. Specifically, we first divide the input image into patches as commonly done, and each patch is viewed as a token...
Unlike existing fully-supervised approaches, we rethink colorectal polyp segmentation from an out-of-distribution perspective with a simple but effective self-supervised learning approach. We leverage the ability of masked autoencoders–self-supervised vision transformers trained on a reconstruction task–to learn in-distribution representations, here, the distribution of healthy...
Zero trust architecture (ZTA) is a paradigm shift in how we protect data, stay connected and access resources. ZTA is non-perimeter-based defence, which has been emerging as a promising revolution in the cyber security field. It can be used to continuously maintain security by safeguarding against attacks both from inside and outside of the network system. However, ZTA automation...
Using knowledge graphs to assist deep learning models in making recommendation decisions has recently been proven to effectively improve the model’s interpretability and accuracy. This paper introduces an end-to-end deep learning model, named representation-enhanced knowledge graph convolutional networks (RKGCN), which dynamically analyses each user’s preferences and makes a...
Graph neural networks have been shown to be very effective in utilizing pairwise relationships across samples. Recently, there have been several successful proposals to generalize graph neural networks to hypergraph neural networks to exploit more complex relationships. In particular, the hypergraph collaborative networks yield superior results compared to other hypergraph neural...
Generative AI models for music and the arts in general are increasingly complex and hard to understand. The field of explainable AI (XAI) seeks to make complex and opaque AI models such as neural networks more understandable to people. One approach to making generative AI models more understandable is to impose a small number of semantically meaningful attributes on generative AI...
Neuroimaging data typically include multiple modalities, such as structural or functional magnetic resonance imaging, diffusion tensor imaging, and positron emission tomography, which provide multiple views for observing and analyzing the brain. To leverage the complementary representations of different modalities, multimodal fusion is consequently needed to dig out both inter...
The recent rapid development of deep learning has laid a milestone in industrial image anomaly detection (IAD). In this paper, we provide a comprehensive review of deep learning-based image anomaly detection techniques, from the perspectives of neural network architectures, levels of supervision, loss functions, metrics and datasets. In addition, we extract the promising setting...
Facial expression recognition (FER) is still challenging due to the small interclass discrepancy in facial expression data. In view of the significance of facial crucial regions for FER, many existing studies utilize the prior information from some annotated crucial points to improve the performance of FER. However, it is complicated and time-consuming to manually annotate facial...
With the breakthrough of AlphaGo, deep reinforcement learning has become a recognized technique for solving sequential decision-making problems. Despite its reputation, data inefficiency caused by its trial and error learning mechanism makes deep reinforcement learning difficult to apply in a wide range of areas. Many methods have been developed for sample efficient deep...
While recent years have witnessed a dramatic upsurge of exploiting deep neural networks toward solving image denoising, existing methods mostly rely on simple noise assumptions, such as additive white Gaussian noise (AWGN), JPEG compression noise and camera sensor noise, and a general-purpose blind denoising method for real images remains unsolved. In this paper, we attempt to...
This paper aims to address the problem of supervised monocular depth estimation. We start with a meticulous pilot study to demonstrate that the long-range correlation is essential for accurate depth estimation. Moreover, the Transformer and convolution are good at long-range and close-range depth estimation, respectively. Therefore, we propose to adopt a parallel encoder...