Edge computing empowered anomaly detection framework with dynamic insertion and deletion schemes on data streams

World Wide Web, May 2022

Anomaly detection plays a crucial role in many Internet of Things (IoT) applications such as traffic anomaly detection for smart transportation and medical diagnosis for smart healthcare. With the explosion of IoT data, anomaly detection on data streams raises higher requirements for real-time response and strong robustness on large-scale data arriving at the same time and various application fields. However, existing methods are either slow or application-specific. Inspired by the edge computing and generic anomaly detection technique, we propose an isolation forest based framework with dynamic Insertion and Deletion schemes (IDForest), which can incrementally update the forest to detect anomalies on data streams. Besides, IDForest is deployed on edge servers in parallel through packing each tree into a subtask, which facilitates the fast anomaly detection on data streams. Extensive experiments on both synthetic and real-life datasets demonstrate the efficiency and robustness of our framework for anomaly detection.

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s11280-022-01052-z.pdf

Edge computing empowered anomaly detection framework with dynamic insertion and deletion schemes on data streams

World Wide Web https://doi.org/10.1007/s11280-022-01052-z Edge computing empowered anomaly detection framework with dynamic insertion and deletion schemes on data streams Haolong Xiang1 · Xuyun Zhang1 Received: 2 November 2021 / Revised: 17 March 2022 / Accepted: 7 April 2022 © The Author(s) 2022 Abstract Anomaly detection plays a crucial role in many Internet of Things (IoT) applications such as traffic anomaly detection for smart transportation and medical diagnosis for smart healthcare. With the explosion of IoT data, anomaly detection on data streams raises higher requirements for real-time response and strong robustness on large-scale data arriving at the same time and various application fields. However, existing methods are either slow or application-specific. Inspired by the edge computing and generic anomaly detection technique, we propose an isolation forest based framework with dynamic Insertion and Deletion schemes (IDForest), which can incrementally update the forest to detect anomalies on data streams. Besides, IDForest is deployed on edge servers in parallel through packing each tree into a subtask, which facilitates the fast anomaly detection on data streams. Extensive experiments on both synthetic and real-life datasets demonstrate the efficiency and robustness of our framework for anomaly detection. Keywords Anomaly detection · Data streams · Large-scale data · Edge computing · Efficiency and robustness 1 Introduction The application of Internet of Things (IoT) technologies to the smart world has improved life quality and attracted significant attention in academia [4, 28]. With fast development and wide deployment of IoT technologies, the size of the data has exploded, which This article belongs to the Topical Collection: Special Issue on Resource Management at the Edge for Future Web, Mobile, and IoT Applications Guest Editors: Qiang He, Fang Dong, Chenshu Wu, and Yun Yang * Xuyun Zhang Haolong Xiang 1 Faculty of Science and Engineering, Macquarie University, Sydney 2122, Australia 13 Vol.:(0123456789) World Wide Web comes from various intelligent applications, such as smart city, smart home, smart hospital and smart farm [13, 22]. Large-scale IoT data increase the difficulty to detect, quantify and understand the surrounding environments, where the criminals are more likely to invade [29]. For instance, identifying hacker intrusions in massive network data or detecting anomalous trends in industrial data that indicate a pending system failure requires accurate and fast anomaly detection. In real life applications, these data get sampled over very short time intervals and keep flowing in infinitely leading to data streams, which raise the requirement for real-time response to the abnormal event. Therefore, developing effective and real-time anomaly detection techniques among the data stream with large-scale data should be a research priority [12, 23]. Streams can be a time-series or multidimensional, and the data stream does not have a fixed length compared with the static data [10]. For an infinite data stream, anomaly detection is performed by a sliding window, which confines the data instances within the fixed-size context. As the window slides, the expired data points are removed from the window while an equal number of new data points are added to the window. Besides, the anomalies are detected in each sliding window. For example, monitor the click rate of shopping websites and find the anomalous click times is a typical time-series anomaly detection on data streams. Besides, the real-time cardiac monitoring produces a kind of multidimensional data streams, which collects the medical information from implanted or wearable sensors and transmitted this information to a server for diagnosis [17]. However, these methods are all application-specific, executing anomaly detection on one field of application or one type of data streams. With the increasing of application types, it is a trouble work to design different kinds of anomaly detection methods. So, it is meaningful to design a generic framework for anomaly detection on data streams and improve the robustness on various application fields. Monitoring on data streams often requires real-time response to the anomalous events, which increases the difficulty to execute efficient anomaly detection on data streams with large-scale data instances. Limited by the capability of resource storage and computation on sensor-equipped devices, these intensive data are offloaded to cloud/edge servers for storage and processing. Since edge servers are closer to devices in geography compared to cloud servers and the resources in the edge servers provide sufficient computing and storage power for data streams, model deployment on edge servers is regarded as a practical method to shorten the processing time on the data stream with massive data. To illustrate the efficiency of edge computing, Mehnaz et al. recently made an experiment and found the processing time in smart devices is around 5 times longer than that in edge servers over a data stream containing 100000 data points [21]. In this case, the windowed Gaussian (W-Gaussian) anomaly detection method is used to detect anomalies. As a statistic-based method, W-Gaussian has good accuracy on the data following a distribution while it may act poorly on the data not belonging to a normal distribution. This work provides us with the idea of combining anomaly detection methods with edge computing. However, it remains a big challenge to build an accurate anomaly detection method over a data stream with complex data and deploy it to edge servers to monitor data in real time. Considering the distributed characteristics of edge computing, the distributed processing method is feasible to be deployed on edge servers to speed up. Among all types of anomaly detection methods, ensemble-based anomaly detection methods can be broken down into multiple concurrent tasks that can be handled independently. So, we consider an integrated approach of ensemble-based anomaly detection method and edge computing to solve the above problem. In the previous researches, ensemble-based isolation forest 13 World Wide Web (iForest) is proposed to provide fast anomaly detection in big data. Benefiting from the nature of the sampling-based ensemble, iForest possesses good detection accuracy with short processing time over extensive datasets [18]. Based on this remarkable scheme, Guha et al. proposed robust random cut forest (RRCF) to detect anomalies in dynamic data stream [9]. RRCF method improves the original data partitioning of iForest and update the tree structure through inserting and deleting leaves. Although the experiment showed that RRCF can capture the beginning and end of the anomalous event on a single data stream, it fails to provide perfect detection accuracy and real-time response on the data stream with multidimensional data instances. Therefore, we aim to desi (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007/s11280-022-01052-z.pdf
Article home page: https://link.springer.com/article/10.1007/s11280-022-01052-z

Xiang, Haolong, Zhang, Xuyun. Edge computing empowered anomaly detection framework with dynamic insertion and deletion schemes on data streams, World Wide Web, 2022, pp. 1-21, DOI: 10.1007/s11280-022-01052-z