3.8 Article

Importance of data preprocessing in time series prediction using SARIMA: A case study

出版社

IOS PRESS
DOI: 10.3233/KES-200065

关键词

Time series data prediction; SARIMA; data preprocessing

向作者/读者索取更多资源

Over last decades, time series data analysis has been in practice of specific importance. Different domains such as financial data analysis, analyzing biological data and speech recognition inherently deal with time dependent signals. Monitoring the past behavior of signals is a key for precise predicting the behavior of a system in near future. In scenarios such as financial data prediction, the predominant signal has a periodic behavior (starting from beginning of the month, week, etc.) and a general trend and seasonal behavior can also be assumed. Autoregressive Integrated Moving Average (ARIMA) model and its seasonal extension, SARIMA, have been widely used in forecasting time-series data, and are also capable of dealing with the seasonal behavior/trend in the data. Although the behavior of data may be autoregressive and trends and seasonality can be detected and handled by SARIMA, the data is not always exactly compatible with SARIMA (or more generally ARIMA) assumptions. In addition, the existence of missing data is not pre-assumed in SARIMA, while in real-world, there can be always missing data for different reasons such as holidays for which no data may be recorded. For different week days, different working hours may be a cause of observing irregular patterns compared to what is expected by SARIMA assumptions. In this paper, we investigate the effectiveness of applying SARIMA on such real-world data, and demonstrate preprocessing methods that can be applied in order to make the data more suitable to be modeled by SARIMA model. The data in the existing research is derived from transactions of a mutual fund investment company, which contains missing values (single point and intervals) and also irregularities as a result of the number of working hours per week days being different from each other which makes the data inconsistent leading to poor result without preprocessing. In addition, the number of data points was not adequate at the time of analysis in order to fit a SARIM model. Preprocessing steps such as filling missing values and tricks to make data consistent has been proposed to deal with existing problems. Results show that prediction performance of SARIMA on this set of real-world data is significantly improved by applying several preprocessing steps introduced in order to deal with mentioned circumstances. The proposed preprocessing steps can be used in other real-world time-series data analysis.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据