4.7 Article

Incremental Factorization of Big Time Series Data with Blind Factor Approximation

Journal

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2019.2931687

Keywords

Big time series data; tensor factorization; blind factor approximation; parallel factor analysis; variational Bayesian inference; EEG; massively parallel computing

Funding

  1. National Natural Science Foundation of China [61772380]
  2. Foundation for Innovative Research Groups of Hubei Province [2017CFA007]
  3. Major Project for Technological Innovation of Hubei Province [2019AAA044]

Ask authors/readers for more resources

This study proposes an incrementally parallel factorization solution for extracting latent factors of big time series data, which reveals key insights to the overall mechanisms. Through a phased algorithm and GPU cluster, this solution is capable of deriving multi-mode factors of augmenting big data without the need for prior knowledge.
Extracting the latent factors of big time series data is an important means to examine the dynamic complex systems under observation. These low-dimensional and small representations reveal the key insights to the overall mechanisms, which can otherwise be obscured by the notoriously high dimensionality and scale of big data as well as the enormously complicated interdependencies amongst data elements. However, grand challenges still remain: (1) to incrementally derive the multi-mode factors of the augmenting big data and (2) to achieve this goal under the circumstance of insufficient a priori knowledge. This study develops an incrementally parallel factorization solution (namely I-PARAFAC) for huge augmenting tensors (multi-way arrays) consisting of three phases over a cutting-edge GPU cluster: in the giant-step phase, a variational Bayesian inference (VBI) model estimates the distribution of the close neighborhood of each factor in a high confidence level without the need for a priori knowledge of the tensor or problem domain; in the baby-step phase, a massively parallel Fast-HALS algorithm (namely G-HALS) has been developed to derive the accurate subfactors of each subtensor on the basis of the initial factors; in the final fusion phase, I-PARAFAC fuses the known factors of the original tensor and those accurate subfactors of the increment to achieve the final full factors. Experimental results indicate that: (1) the VBI model enables a blind factor approximation, where the distribution of the close neighborhood of each final factor can be quickly derived (10 iterations for the test case). As a result, the model of a low time complexity significantly accelerates the derivation of the final accurate factors and lowers the risks of errors; (2) I-PARAFAC significantly outperforms even the latest high performance counterpart when handling augmenting tensors, e.g., the increased overhead is only proportional to the increment while the latter has to repeatedly factorize the whole tensor, and the overhead in fusing subfactors is always minimal; (3) I-PARAFAC can factorize a huge tensor (volume up to 500 TB over 50 nodes) as a whole with the capability several magnitudes higher than conventional methods, and the runtime is in the order of $\frac{1}{n}$1n to the number of compute nodes; (4) I-PARAFAC supports correct factorization-based analysis of a real 4-order EEG dataset captured from a variety of epilepsy patients. Overall, it should also be noted that counterpart methods have to derive the whole tensor from the scratch if the tensor is augmented in any dimension; as a contrast, the I-PARAFAC framework only needs to incrementally compute the full factors of the huge augmented tensor.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available