☆ 4.6 Article

Time Series Impact Through Topic Modeling

IEEE ACCESS (2022)

Journal

IEEE ACCESS

Volume 10, Issue -, Pages 97327-97347

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2022.3202960

Keywords

Expectation-maximization algorithms; natural language processing; regression analysis; text mining; text analysis; time series analysis

Funding

Spanish Ministry of Economy and Business (MINECO) through Digital Enabling Technologies (THD) Program [TSI-100905-2019-9]
Spanish Ministry of Science and Innovation [PID2021-124361OB-C32]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper introduces a method called Time Series Impact Through Topic Modeling (TSITM) that models the impact of underlying themes discussed in text data on time series. The method combines latent Dirichlet allocation (LDA) with linear regression, using an elastic net prior to set the impact of uncorrelated topics to zero. Experimental results show that TSITM outperforms baseline and state of the art methods in terms of mean squared error (MSE), mean absolute error (MAE), and out-of-sample R-2.

A time-series of numerical data and a sequence of time-ordered documents are often correlated. This paper aims at modeling the impact that the underlying themes discussed in the text data have on the time series. To do so, we introduce an original topic model, Time Series Impact Through Topic Modeling (TSITM), that includes contextual data by coupling Latent Dirichlet Allocation (LDA) with linear regression, using an elastic net prior to set to zero the impact of uncorrelated topics. The resulting topics act as explanatory variables for the regression of the numerical time series, which allows us to understand the time series movements based on the events described on the text data. We have tested our model on two datasets: first, we used political news to explain the US president's disapproval ratings; then, we considered a corpus of economic news to explain the financial returns of 4 different multinational corporations. Our experiments show that an appropriate selection of hyperparameters (via repeated random subsampling validation and Bayesian optimization) leads to significant correlations: both an intrinsic baseline and state of the art methods were significantly outperformed by TSITM in MSE, MAE and out-of-sample R-2, according to our hypothesis tests. We believe that this framework can be useful in the context of reputational risk management.

Time Series Impact Through Topic Modeling

Journal

IEEE ACCESS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Time Series Impact Through Topic Modeling

Journal

IEEE ACCESS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper