A process-guided uncertainty-aware deep learning framework for reliable and interpretable industrial fault diagnosis (pdf)

Article PDF cannot be displayed. You can download it here:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0349385&type=printable

A process-guided uncertainty-aware deep learning framework for reliable and interpretable industrial fault diagnosis

RESEARCH ARTICLE A process-guided uncertainty-aware deep learning framework for reliable and interpretable industrial fault diagnosis Babar Hayat 1, Shabeer Ahmad2, Muhammad Asfandyar Shahid3, Adil Khan1, Md. Rajibul Islam 4, Md Shohel Sayeed 5*, Yasir Ullah6 1 School of Information Engineering, Xi’an Eurasia University, Xi’an, Shaanxi, China,‌‌ 2 School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing, China, 3 School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China, 4 Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh, 5 Centre for Intelligent Cloud Computing, CoE for Advanced Cloud, Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, Malaysia, 6 Centre for Wireless Technology, Faculty of AI and Engineering, Multimedia University, Cyberjaya, Selangor, Malaysia‌‌ * Abstract OPEN ACCESS Citation: Hayat B, Ahmad S, Shahid MA, Khan A, Islam MR, Sayeed MS, et al. (2026) A process-guided uncertainty-aware deep learning framework for reliable and interpretable industrial fault diagnosis. PLoS One 21(6): e0349385. https://doi.org/10.1371/journal. pone.0349385 Editor: Muhammad Shahid Anwar, Gachon University, KOREA, REPUBLIC OF Received: December 22, 2025 Accepted: April 29, 2026 Published: June 2, 2026 Copyright: © 2026 Hayat et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data availability statement: All relevant data are within the manuscript and its Supporting information files. Funding: The author(s) received no specific funding for this work. Timely fault detection is essential for safety, product quality, and energy efficiency in advanced industrial processes. However, many existing fault diagnosis methods insufficiently exploit process structure and sensor reliability, which limits their robustness and practical usefulness for process engineers. This study presents an improved framework SAU-PGA-CNN-BiLSTM that first couples Convolutional Neural Networks and Bidirectional Long Short-Term Memory layers to extract multivariate temporal dynamics and spatial correlations of the process data, secondly a process guided and sensor-aware attention mechanism is introduced which embeds process centrality, sequence level sensor reliability and uncertainty to the attention learning, to suppress unreliable channels and bias towards informative and stable sensors. In addition, Monte Carlo dropout with sensor prior-conditioning is used to provide calibrated confidence estimates that reflect both predictive uncertainty and sensor reliability. Finally, two lightweight sigmoid output heads perform fault detection and diagnosis combinedly, promoting mutual reinforcement between the tasks. Validated on the Tennessee Eastman Process benchmark, the proposed framework outperforms baselines model and achieves 93.6% multiclass diagnosis accuracy with 94.0% F1 score. After temperature scaling, the proposed model also demonstrates improved calibration compared with an otherwise identical model without sensor awareness, reducing negative log-likelihood from 0.197 to 0.182, Brier score from 0.101 to 0.095, and expected calibration error from 0.040 to 0.037. Attention visualizations further show that the model focuses on process-relevant and reliable sensors, supporting reliable industrial fault diagnosis. PLOS One | https://doi.org/10.1371/journal.pone.0349385 June 2, 2026 1 / 25 Competing interests: The authors have declared that no competing interests exist. 1. Introduction Detecting and diagnosing abnormal events quickly in large industrial processes is crucial for safety, maintaining product quality, and improving energy efficiency [1].As modern industrial processes use more and more dense sensor networks, they generate and store huge amounts of process data every day, creating a unique chance for intelligent monitoring [2]. However, the complex, nonlinear, and constantly changing nature of industrial systems makes fault detection and diagnosis (FDD) quite challenging [3]. Faults that are missed or diagnosed may propagate at a very fast rate, resulting in damaged equipment, environmental risks, and significant financial loss [4]. The common process monitoring methods which have been used extensively in fault detection and diagnosis (FDD) of industrial systems include Principal Component Analysis (PCA) Partial least squares (PLS) and Multivariate Statistical Process Control (MSPC) [5–7]. Data-driven techniques, including multivariate statistical process monitoring and machine learning methods, have therefore gained substantial attention due to their flexibility and reduced dependence on explicit process models. However, classical methods such as PCA, PLS, and shallow classifiers typically assume linear relationships and lack the capacity to capture complex temporal dependencies and fault propagation patterns [8,9]. Moreover, such procedures do not provide a practical understanding of the underlying causes of failures, lowering the usefulness of the operator intervention and troubleshooting [3]. The recent surging popularity of deep learning technologies has spawned advanced dataintensive models of monitoring and diagnosis of industrial processes. CNNs are also skilled at describing local spatial correlations of sensors, whereas Recurrent Neural Networks (RNNs), especially LSTM units are skilled at describing temporal dependencies [10,11]. Hybrid models that combine CNNs with LSTMs or BiLSTMs have shown notable improvements in accuracy on benchmark datasets like the Tennessee Eastman Process (TEP). Nevertheless, more powerful deep learning models are also costly, in the cost of training and running them, which makes them impractical to apply in industrial contexts with highly constrained time, latency, and hardware requirements [12,13]. More importantly, the vast majority of deep neural networks remain black boxes, which do not provide much interpretability to process engineers [14]. This transparency deficiency makes automated monitoring system implementation challenging as operators require beyond alerts, they want to know what variables and time periods are behind the observed anomalies [15]. Attention mechanism has been proposed as a promising technique that can provide interpretability, i.e., make the internal workings of neural networks transparent by giving priority ratings to input elements and time steps [16]. However, process monitoring models that use attention are often based on multi-head attention or transformer designs, which add complexity and computational cost to the models [17]. Beyond predictive accuracy and interpretability, practical industrial fault detection and diagnosis (FDD (...truncated)