4.5 Article

Learning a symbolic representation for multivariate time series classification

期刊

DATA MINING AND KNOWLEDGE DISCOVERY
卷 29, 期 2, 页码 400-422

出版社

SPRINGER
DOI: 10.1007/s10618-014-0349-y

关键词

Supervised learning; Codebook; Decision trees

资金

  1. ONR [N00014-09-1-0656]

向作者/读者索取更多资源

Multivariate time series (MTS) classification has gained importance with the increase in the number of temporal datasets in different domains (such as medicine, finance, multimedia, etc.). Similarity-based approaches, such as nearest-neighbor classifiers, are often used for univariate time series, but MTS are characterized not only by individual attributes, but also by their relationships. Here we provide a classifier based on a new symbolic representation for MTS (denoted as SMTS) with several important elements. SMTS considers all attributes of MTS simultaneously, rather than separately, to extract information contained in the relationships. Symbols are learned from a supervised algorithm that does not require pre-defined intervals, nor features. An elementary representation is used that consists of the time index, and the values (and first differences for numerical attributes) of the individual time series as columns. That is, there is essentially no feature extraction (aside from first differences) and the local series values are fused to time position through the time index. The initial representation of raw data is quite simple conceptually and operationally. Still, a tree-based ensemble can detect interactions in the space of the time index and time values and this is exploited to generate a high-dimensional codebook from the terminal nodes of the trees. Because the time index is included as an attribute, each MTS is learned to be segmented by time, or by the value of one of its attributes. The codebook is processed with a second ensemble where now implicit feature selection is exploited to handle the high-dimensional input. The constituent properties produce a distinctly different algorithm. Moreover, MTS with nominal and missing values are handled efficiently with tree learners. Experiments demonstrate the effectiveness of the proposed approach in terms of accuracy and computation times in a large collection multivariate (and univariate) datasets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据