4.7 Article

Data Imputation for Multivariate Time Series Sensor Data With Large Gaps of Missing Data

期刊

IEEE SENSORS JOURNAL
卷 22, 期 11, 页码 10671-10683

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JSEN.2022.3166643

关键词

Data imputation; large missing data gap; MICE; multivariate; time series

资金

  1. VT EPSCoR [EPS-IIA 1330446]
  2. National Science Foundation [1730568]
  3. National Science Foundation under Vermont Established Program to Stimulate Competitive Research (VT EPSCoR) [OIA-1556770]
  4. Div Of Engineering Education and Centers
  5. Directorate For Engineering [1730568] Funding Source: National Science Foundation

向作者/读者索取更多资源

This paper proposes a framework to improve the accuracy of the popular multivariate imputation by chained equations (MICE) method for dealing with missing data. The framework involves reshaping the original sensor data and leveraging the correlation between missing and observed data. Experimental results using water quality monitoring data demonstrate a significant improvement in MICE model accuracy with these strategies.
Imputation of missing sensor-collected data is often an important step prior to machine learning and statistical data analysis. One particular data imputation challenge is filling large data gaps when the only related data comes from the same sensor station. In this paper, we propose a framework to improve the popular multivariate imputation by chained equations (MICE) method for dealing with missing data. One key strategy we use to improve model accuracy is to reshape the original sensor data to leverage the correlation between the missing data and the observed data. We demonstrate our framework using data from continuous water quality monitoring stations in Vermont. Because of possible irregularly spaced peaks throughout the time series, the reshaped data is split into extreme and normal values and two MICE models are built. We also recommend that sensor-collected data should be transformed to meet the machine learning model assumptions. According to our experimental results, these strategies can improve MICE data imputation model accuracy at least 23% for large data gaps based on R-2 values and are promising to be applied for other data imputation algorithms.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据