期刊
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
卷 126, 期 -, 页码 -出版社
PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.engappai.2023.106859
关键词
Machine learning methods; Fault Detection models; Unsupervised learning; Condition monitoring system; Cross-validation; Receiver operating characteristic curve
This paper presents a robust unsupervised machine-learning approach for fleet-based anomaly detection in wind turbines' critical components. The approach preprocesses and extracts features from noisy, unlabeled, and unstructured vibration data, and optimizes the performance of eleven machine learning algorithms. Six best models are selected based on robust performance metrics and achieve classification metrics above 90%.
Large amounts of unlabeled data are produced from wind turbine condition monitoring systems to catch their operational status. With this unmanageable amount of data, developing robust systems with good performance on unseen test data to detect incipient wind turbine faults is crucial to maximizing wind farm performance. This paper presents an implementation of a robust unsupervised machine-learning approach capable of executing fleet-based anomaly detection in wind turbines' critical components. The proposed methodology is applied to noisy, unlabeled, and unstructured vibration data, which must go through the databank decoding, data engineering, preprocessing, and feature extraction. Twelve operational wind turbines with varying health conditions are used to train, validate, and test the models. Features from different domains (time, frequency, and mechanical domain) are extracted and represented in the model's input. A labeling procedure from expert analysis regarding the condition of each wind turbine component through the evaluation of CMS output was carried out. Combining distinctive approaches to optimize eleven unsupervised machine learning algorithms through an unusual 5x2 cross-validation approach applied to real, noisy, and unstructured wind turbine data represents the paper's novelty. The methodology selected the six best models (k-nearest neighbors, clustering-based local outlier, histogram-based outlier, isolation forest, principal component analysis, and minimum covariance determinant) based on robust performance metrics such as accuracy, F1-score, precision, recall, and area under the ROC (Receiver Operating Characteristic Curve). These models generalized the problem well and returned reasonable classification metrics for such a complex problem, with values above 90% for the area under the ROC.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据