4.6 Article

Parameter-efficient deep probabilistic forecasting

Journal

INTERNATIONAL JOURNAL OF FORECASTING
Volume 39, Issue 1, Pages 332-345

Publisher

ELSEVIER
DOI: 10.1016/j.ijforecast.2021.11.011

Keywords

Probabilistic forecasting; Temporal convolutional network; Efficiency in forecasting methods; Large-scale forecasting Forecasting with neural networks

Ask authors/readers for more resources

Probabilistic time series forecasting is crucial in various domains, and Transformer-based methods have achieved state-of-the-art performance. However, they require a large number of parameters and high memory requirements. To address this, we propose a novel bidirectional temporal convolutional network with significantly fewer parameters. Our method performs on par with state-of-the-art approaches and requires lower memory, reducing infrastructure cost.
Probabilistic time series forecasting is crucial in many application domains, such as retail, ecommerce, finance, and biology. With the increasing availability of large volumes of data, a number of neural architectures have been proposed for this problem. In particular, Transformer-based methods achieve state-of-the-art performance on real -world benchmarks. However, these methods require a large number of parameters to be learned, which imposes high memory requirements on the computational resources for training such models. To address this problem, we introduce a novel bidirectional temporal convolutional network that requires an order of magnitude fewer parame-ters than a common Transformer-based approach. Our model combines two temporal convolutional networks: the first network encodes future covariates of the time series, whereas the second network encodes past observations and covariates. We jointly estimate the parameters of an output distribution via these two networks. Experiments on four real-world datasets show that our method performs on par with four state-of-the-art probabilistic forecasting methods, including a Transformer-based approach and WaveNet, on two point metrics (sMAPE and NRMSE) as well as on a set of range metrics (quantile loss percentiles) in the majority of cases. We also demonstrate that our method requires significantly fewer parameters than Transformer-based methods, which means that the model can be trained faster with significantly lower memory requirements, which as a consequence reduces the infrastructure cost for deploying these models.(c) 2021 The Author(s). Published by Elsevier B.V. on behalf of International Institute of Forecasters. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available