4.6 Article

Time Series Impact Through Topic Modeling

Journal

IEEE ACCESS
Volume 10, Issue -, Pages 97327-97347

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2022.3202960

Keywords

Expectation-maximization algorithms; natural language processing; regression analysis; text mining; text analysis; time series analysis

Funding

  1. Spanish Ministry of Economy and Business (MINECO) through Digital Enabling Technologies (THD) Program [TSI-100905-2019-9]
  2. Spanish Ministry of Science and Innovation [PID2021-124361OB-C32]

Ask authors/readers for more resources

This paper introduces a method called Time Series Impact Through Topic Modeling (TSITM) that models the impact of underlying themes discussed in text data on time series. The method combines latent Dirichlet allocation (LDA) with linear regression, using an elastic net prior to set the impact of uncorrelated topics to zero. Experimental results show that TSITM outperforms baseline and state of the art methods in terms of mean squared error (MSE), mean absolute error (MAE), and out-of-sample R-2.
A time-series of numerical data and a sequence of time-ordered documents are often correlated. This paper aims at modeling the impact that the underlying themes discussed in the text data have on the time series. To do so, we introduce an original topic model, Time Series Impact Through Topic Modeling (TSITM), that includes contextual data by coupling Latent Dirichlet Allocation (LDA) with linear regression, using an elastic net prior to set to zero the impact of uncorrelated topics. The resulting topics act as explanatory variables for the regression of the numerical time series, which allows us to understand the time series movements based on the events described on the text data. We have tested our model on two datasets: first, we used political news to explain the US president's disapproval ratings; then, we considered a corpus of economic news to explain the financial returns of 4 different multinational corporations. Our experiments show that an appropriate selection of hyperparameters (via repeated random subsampling validation and Bayesian optimization) leads to significant correlations: both an intrinsic baseline and state of the art methods were significantly outperformed by TSITM in MSE, MAE and out-of-sample R-2, according to our hypothesis tests. We believe that this framework can be useful in the context of reputational risk management.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available