3.8 Proceedings Paper

A statistical framework for labeling unlabelled data: a case study on anomaly detection in pressurization systems for high-speed railway trains

Publisher

IEEE
DOI: 10.1109/IJCNN55064.2022.9892880

Keywords

Predictive Maintenance; Dissimilarity Space; Unsupervised Learning; Supervised Learning; Anomaly Detection; Condition-based Maintenance

Ask authors/readers for more resources

This paper introduces a system for anomaly detection in railway environments, with a focus on the pressurization systems of high-speed trains. The study utilizes statistical techniques and classification tasks to address the issue of unbalanced data.
The ability to perform predictive maintenance, as one of the main asset of Industry 4.0, is known to help improve downtime, costs, control and production quality. Modern predictive maintenance programs involve machine learning techniques, within the AI umbrella, that work in a data-driven fashion. This is true in all machinery where, through intelligent sensors, it is possible to collect data to be processed to detect faults or carry out anomaly detection activities. This paper presents a system for the detection of anomalies in the railway context and, specifically, in the pressurization systems of Italian high-speed trains. The available real-world dataset is in form of unlabeled time series of fixed length of 600 samples. Hence, it is proposed a two-stage machine learning workflow where the first stage acts in an unsupervised fashion through a statistical technique validated by field experts with the aim of building a labeled dataset. In the second stage, the faced problem is conceived as a classification task in the context of a strong class imbalance problem - very likely in predictive maintenance - where are compared two feature engineering techniques. The first one considers directly the raw signals as input of a SVM algorithm. In the second, time series are subjected to an adaptive heuristic procedure of piece-wise approximation, whose output is a sequence of R-2 vectors (slopes and intercepts). In this case, the classification task is carried out in the so-called dissimilarity space for pattern recognition adopting different dimensions of the representation set obtained through a clustering algorithm. The dissimilarity measure consists of an ad-hoc edit distance capable of measuring the dissimilarity between 2-dimensional sequences. In this study a k-medoids clustering procedure is adopted for balancing the dataset together with further additional techniques for solving the challenging problem of unbalanced data, offering a deep comparison related to various experimental methodologies.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available