4.7 Article

Significance and methodology: Preprocessing the big data for machine learning on TBM performance

Journal

UNDERGROUND SPACE
Volume 7, Issue 4, Pages 680-701

Publisher

KEAI PUBLISHING LTD
DOI: 10.1016/j.undsp.2021.12.003

Keywords

Big data; Data processing; Anomaly classification; Machine learning

Funding

  1. National Program on Key Basic Research Project (973 Program) of China [2015CB058100]
  2. Key Research Project of China Institute of Water Resources and Hydropower Research Limited [HTGE0203A03201900000, HTGE0203A20202000000]
  3. Natural Science Foundation of Shaanxi Province [2019JLZ-13, 2019JLP-23]
  4. China Railway Engineering Equipment Group Corporation

Ask authors/readers for more resources

This paper discusses the importance of preprocessing large amounts of data collected from tunnel boring machine excavations before using it for machine learning on TBM performance predictions. The research work is based on two water diversion tunneling projects and suggests using moving average methods and noise reduction filters to process the data. A drilling efficiency index is introduced to assess the relationships between mechanical parameters in a boring cycle. The paper also defines irrelevant data caused by human or mechanical errors and provides a program for recognizing and classifying these categories.
This paper addresses the significance of preprocessing big data collected during a tunnel boring machine (TBM) excavation before it is used for machine learning on various TBM performance predictions. The research work is based on two water diversion tunneling pro-jects that cover 29.52 km and 17 051 boring cycles. It has been found that the penetration rate calculated from the raw measured penetration distances exhibits more random behavior owing to their percussive and vibratory behavior of the cutterhead. A moving average method to process the negative instantaneous velocities and a noise reduction filter to deal with signals with abnormal frequencies have been recommended. An index called the drilling efficiency index is introduced to assess the relationships between the mechanical parameters in a boring cycle, whose linear regression coefficient R-2 is taken for a preliminary investigation of possible problems requiring pre-processing. The research work defines the irrelevant data whose errors are caused by human or mechanical mistakes, and therefore should be cleaned or amended. These irrelevant data can be divided into five categories: (1) premature cycles, (2) sensor defects, (3) mechanical defects, (4) human interruption, and (5) missing files. A program TBM-Processing has been coded for the recognition and classification of these categories. PDF books generated by the program have been uploaded at GitHub to encourage discussions, collaboration, and upgrading of the data processing work with our peers.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available