4.7 Article

Factors affecting text mining based stock prediction: Text feature representations, machine learning models, and news platforms

Journal

APPLIED SOFT COMPUTING
Volume 130, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.asoc.2022.109673

Keywords

Machine learning; Text mining; Feature representation; Stock prediction; Financial news

Funding

  1. Ministry of Science and Technology of Taiwan
  2. Chang Gung Memorial Hospital at Linkou
  3. [MOST 109-2410-H-182- 012]
  4. [BMRPH13]
  5. [CMRPG3J0732]

Ask authors/readers for more resources

This study utilized text mining techniques and machine learning algorithms for stock market prediction, finding that the combination of CNN with Word2vec and CNN with BERT performed the best. Additionally, the use of different text feature representations and learning models in financial news articles published on different news platforms can have an impact on prediction results.
Text mining techniques have demonstrated their effectiveness for stock market prediction and different text feature representation approaches, (e.g., TF-IDF and word embedding), have been adapted to extract textual information from financial news sources. In addition, different machine learning techniques including deep learning have been employed to construct the prediction models. Various combinations of text feature representations and learning models have been applied for stock prediction, but it is unknown which performs the best or which ones can be regarded as the representative baselines for future research. Moreover, since the textual contents in the financial news articles published on different news platforms are somewhat different, the effect of using different news platforms may have an impact on prediction performance so this is also examined in the experiments comparing eight different combinations comprised of two context-free and two contextualized text feature representations, i.e. TF-IDF, Word2vec, ELMo, and BERT, and three learning techniques, i.e. SVM, CNN, and LSTM. The experimental results show that CNN+Word2vec and CNN+BERT perform the best. The textual material is taken from three public news platforms including Reuters, CNBC, and The Motley Fool. We found that the learning models constructed and the news platforms used can certainly affect the prediction of stock prices between different companies. (c) 2022 Published by Elsevier B.V.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available