4.7 Review

Performance evaluation of outlier detection techniques in production timeseries: A systematic review and meta-analysis

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 191, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2021.116371

Keywords

Outlier detection; Production data analysis; Decline curve analysis; Performance evaluation; Binary classification

Funding

  1. Office of Energy Research and Development, Natural Resource of Canada

Ask authors/readers for more resources

Time-series data is extensively collected and analyzed in various disciplines, with outliers causing uncertainties in interpretation results, making accurate and efficient outlier removal essential. This study applies 17 outlier detection techniques to oil and gas production data, with 15 being used for the first time. Evaluation based on various metrics reveals that eight unsupervised algorithms outperform the others, demonstrating the superior performance of ML-based techniques over statistical methods.
Time-series data have been extensively collected and analyzed in many disciplines, such as stock market, medical diagnosis, meteorology, and oil and gas industry. Numerous data in these disciplines are sequence of observations measured as functions of time, which can be further used for different applications via analytical or data analytics techniques (e.g., to forecast future price, climate change, etc.). However, presence of outliers can cause significant uncertainties to interpretation results; hence, it is essential to remove the outliers accurately and efficiently before conducting any further analysis. A total of 17 techniques that belong to statistical, regressionbased, and machine learning (ML) based categories for outlier detection in timeseries are applied to the oil and gas production data analysis. 15 of these methods are utilized for production data analysis for the first time. Two state-of-the-art and high-performance techniques are then selected for data cleaning which require minimum control and time complexity. Moreover, performances of these techniques are evaluated based on several metrics including the accuracy, precision, recall, F1 score, and Cohen's Kappa to rank the techniques. Results show that eight unsupervised algorithms outperform the rest of the methods based on the synthetic case study with known outliers. For example, accuracies of the eight shortlisted methods are in the range of 0.83-0.99 with a precision between 0.83 and 0.98, compared to 0.65-0.82 and 0.07-0.77 for the others. In addition, ML-based techniques perform better than statistical techniques. Our experimental results on real field data further indicate that the knearest neighbor (KNN) and Fulford-Blasingame methods are superior to other outlier detection frameworks for outlier detection in production data, followed by four others including density-based spatial clustering of applications with noise (DBSCAN), and angle-based outlier detection (ABOD). Even though the techniques are examined with oil and gas production data, but the same data cleaning workflow can be used to detect timeseries' outliers in other disciplines.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available