Transformer fault diagnosis method based on SMOTE and NGO-GBDT (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41598-024-57509-w.pdf

Transformer fault diagnosis method based on SMOTE and NGO-GBDT

www.nature.com/scientificreports OPEN Transformer fault diagnosis method based on SMOTE and NGO‑GBDT Li‑zhong Wang 1, Jian‑fei Chi 1, Ye‑qiang Ding 1, Hai‑yan Yao 2, Qiang Guo 2 & Hai‑qi Yang 3* In order to improve the accuracy of transformer fault diagnosis and improve the influence of unbalanced samples on the low accuracy of model identification caused by insufficient model training, this paper proposes a transformer fault diagnosis method based on SMOTE and NGO-GBDT. Firstly, the Synthetic Minority Over-sampling Technique (SMOTE) was used to expand the minority samples. Secondly, the non-coding ratio method was used to construct multi-dimensional feature parameters, and the Light Gradient Boosting Machine (LightGBM) feature optimization strategy was introduced to screen the optimal feature subset. Finally, Northern Goshawk Optimization (NGO) algorithm was used to optimize the parameters of Gradient Boosting Decision Tree (GBDT), and then the transformer fault diagnosis was realized. The results show that the proposed method can reduce the misjudgment of minority samples. Compared with other integrated models, the proposed method has high fault identification accuracy, low misjudgment rate and stable performance. Keywords Fault diagnosis, Transformers, Oversampling, LightGBM feature selection, GBDT, Northern goshawk optimization algorithm Power transformers are key equipment in the transmission and transformation system, and their operating status is related to the stability of the power system. When a transformer malfunctions, if accurate diagnosis cannot be made in a timely manner, it will cause significant economic losses. Therefore, how to improve the accuracy of transformer fault diagnosis has always been a hot topic for scholars to study. As the aging process of transformer insulation progresses, H2, CH4, C2H6, C2H4, C2H2, CO2, and other gases are produced and dissolve into the insulating oil. The present condition of the transformer may be inferred from the concentration and composition of these dissolved gases within the oil1. The predominant analytical techniques employed to assess the transformer’s condition encompass the IEC three-ratio method2, Rogers’ four-ratio method3, Duval Pentagon4, Doernberg’s ratio method5, among others. In6, a fuzzy logic approach was proposed to overcome the shortcomings of traditional IEC methods and enhance the accuracy of model diagnosis. I n7, based upon the data of dissolved gases within oil, a fuzzy logic-based transformer fault diagnosis model employing the Rogers Four Ratio Method has been developed. The model’s implementation has demonstrated its capacity to rectify the deficiencies inherent in conventional fault diagnosis methods, thereby enhancing the accuracy of fault diagnosis. Conversely, this method lacks comprehensive coding and the diagnostic threshold is too rigidly defined, thereby failing to capture the intricate nature of faults within the transformer and compromising the accuracy of fault d iagnosis8. In9, the ratio coding method and raw gas data are used to construct 24-dimensional features, which improves the model’s ability to distinguish between different faults and makes it more versatile. Ref.10. proposes a PSO-RF diagnostic model that extracts transformer fault characteristic information without using coding ratios, thereby improving the model’s fault diagnosis capabilities. However, in existing research, the dimensionality explosion problem is less considered when constructing feature parameters. Because as the sample size increases, the fault diagnosis model becomes better. However, the increase in feature dimension leads to an exponential increase in the amount of calculation and an increase in redundant information. Therefore, it is necessary to remove redundant information to improve model operation efficiency and diagnostic accuracy. As artificial intelligence technology advances, machine learning applications in transformer fault diagnosis have gained momentum. Support Vector M achine11–13, Convolutional Neural Network(CNN)14,15, SelfOrganizing Mapping Neural Network(SOM)16, Gate Recurrent Unit(GRU)17,18, Cloud Model(CM)19, Adaptive 1 State Grid Zhejiang Power Co., Ltd, Hangzhou Linping Power Supply Company, Hangzhou 311199, China. 2Hangzhou Electric Power Equipment Manufacturing Co., Ltd, Yuhang Qunli Complete Sets Electricity Manufacturing Branch Electric, Hangzhou 311000, China. 3School of Mechanical Engineering, Northeast Electric Power University, Jilin 132012, China. *email: Scientific Reports | (2024) 14:7179 | https://doi.org/10.1038/s41598-024-57509-w 1 Vol.:(0123456789) www.nature.com/scientificreports/ Boosting(AdaBoost)20, Gradient Boosting Decision Tree(GBDT)21 and other models have demonstrated remarkable success in classification identification. Yet, The fault diagnosis models mentioned above were all constructed based on the assumption of having a relatively large dataset. However, in practical operations, transformers rarely experience failures and the frequencies of different types of faults vary significantly. This makes it difficult to meet the precision requirements using big data samples. Therefore, when addressing the practical challenges of transformer fault diagnosis, the issue of sample imbalance needs to be given immediate attention in order to achieve precision. The formulation of transformer fault diagnosis models hinges upon an abundance of data sets. In practical operations, the likelihood of transformer malfunction is slim; the variance of diverse fault types is vast, thereby making it challenging to attain the requisite standards for extensive datasets. Research on imbalanced datasets mainly focuses on developing classifiers and data preprocessing techniques. Data-level processing involves reconstructing the dataset to better align with its inherent characteristics, thereby addressing issues arising from an imbalance in sampling frequency. undersampling22 involves selecting a subset of the most representative samples from the majority classes to mitigate the issue of class imbalance. However, this approach may result in the loss of crucial information regarding the bulk of sample classes, ultimately impairing the performance of classifiers. Oversampling involves artificially increasing a limited sample size to achieve data balance. This can be done through techniques such as Synthetic Minority Oversampling Technique(SMOTE)23,24, SVM SMOTE25, Borderline-SMOTE26, Adaptive Synthetic Sampling(ADASYN)27, Generative Adversarial Network(GAN)28, and others. Common approaches at the classification algorithm level include CostSensitive29 and Ensemble L earning30. In31, cost-sensitive classifiers are used to address class disparities and improve fault categorization accuracy. The Auxiliary Generation Mutual Countermeasure Network (AGMAN) was proposed in Ref.32. to enhance the accuracy of small sample class imbalance fault diagnosis. I n33, MeanRadius-SMOTE (...truncated)