☆ 4.6 Article

How to Handle Data Imbalance and Feature Selection Problems in CNN-Based Stock Price Forecasting

IEEE ACCESS (2022)

期刊

IEEE ACCESS

卷 10, 期 -, 页码 31297-31305

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2022.3160797

关键词

Predictive models; Labeling; Data models; Convolutional neural networks; Biological system modeling; Stock markets; Forecasting; CNN model; feature selection; labeling; stock prediction

类别

Computer Science, Information Systems Engineering, Electrical & Electronic Telecommunications

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Stock market forecasting is a challenging task due to the high uncertainty and numerous influencing factors in the stock data. This study proposes a new labeling algorithm and feature selection approach to address the issues and applies a CNN-based model to predict the next day's trade action. The experimental results demonstrate that the proposed methods outperform other studies in terms of prediction accuracy.

Stock market forecasting is a time series problem that aims to predict possible future prices or directions of an index/stock. The stock data contains high uncertainty and is influenced by too many factors; hence it isn't easy to achieve the goal by traditional time series methods. In literature, the convolutional neural networks (CNN) models were used for stock market forecasting and gave successful results. But, data imbalance due to labeling and feature selection problems were seen when considering these models. Hence, this study proposed a new rule-based labeling algorithm and a new feature selection approach to solve the issues. In addition, a CNN-based model, which was presented to predict the next day's trade action of stocks in the Dow30 index, was constructed to check the effectiveness of the data labeling and the feature selection approach. Different image-based input variable sets were created using technical indicators, gold, and oil price data to feed the CNN model. The prediction performance of CNN models was compared with other studies in the literature. The experimental results showed that the CNN prediction model, which uses the proposed feature selection and labeling approaches in this study, performs 3-22% higher accuracy than the CNN-based models taking part in other studies. Also, the labeling approach proposed is more successful than Chen and Huang's data weighting approach to solve the stock data imbalance problem. This algorithm reduced the ratio between labeled data from 15 times to 1.8 times.

How to Handle Data Imbalance and Feature Selection Problems in CNN-Based Stock Price Forecasting

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

How to Handle Data Imbalance and Feature Selection Problems in CNN-Based Stock Price Forecasting

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文