4.6 Article

Time Series Impact Through Topic Modeling

期刊

IEEE ACCESS
卷 10, 期 -, 页码 97327-97347

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2022.3202960

关键词

Expectation-maximization algorithms; natural language processing; regression analysis; text mining; text analysis; time series analysis

资金

  1. Spanish Ministry of Economy and Business (MINECO) through Digital Enabling Technologies (THD) Program [TSI-100905-2019-9]
  2. Spanish Ministry of Science and Innovation [PID2021-124361OB-C32]

向作者/读者索取更多资源

This paper introduces a method called Time Series Impact Through Topic Modeling (TSITM) that models the impact of underlying themes discussed in text data on time series. The method combines latent Dirichlet allocation (LDA) with linear regression, using an elastic net prior to set the impact of uncorrelated topics to zero. Experimental results show that TSITM outperforms baseline and state of the art methods in terms of mean squared error (MSE), mean absolute error (MAE), and out-of-sample R-2.
A time-series of numerical data and a sequence of time-ordered documents are often correlated. This paper aims at modeling the impact that the underlying themes discussed in the text data have on the time series. To do so, we introduce an original topic model, Time Series Impact Through Topic Modeling (TSITM), that includes contextual data by coupling Latent Dirichlet Allocation (LDA) with linear regression, using an elastic net prior to set to zero the impact of uncorrelated topics. The resulting topics act as explanatory variables for the regression of the numerical time series, which allows us to understand the time series movements based on the events described on the text data. We have tested our model on two datasets: first, we used political news to explain the US president's disapproval ratings; then, we considered a corpus of economic news to explain the financial returns of 4 different multinational corporations. Our experiments show that an appropriate selection of hyperparameters (via repeated random subsampling validation and Bayesian optimization) leads to significant correlations: both an intrinsic baseline and state of the art methods were significantly outperformed by TSITM in MSE, MAE and out-of-sample R-2, according to our hypothesis tests. We believe that this framework can be useful in the context of reputational risk management.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据