☆ 4.5 Article

catch22: CAnonical Time-series CHaracteristics Selected through highly comparative time-series analysis

DATA MINING AND KNOWLEDGE DISCOVERY (2019)

期刊

DATA MINING AND KNOWLEDGE DISCOVERY

卷 33, 期 6, 页码 1821-1852

出版社

SPRINGER

DOI: 10.1007/s10618-019-00647-x

关键词

Time series; Classification; Clustering; Time-series features

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems

资金

Engineering and Physical Sciences Research Council (EPSRC) [EP/L016737/1]
Natural Environment Research Council through the Science and Solutions for a Changing Planet DTP
National Health and Medical Research Council (NHMRC) Grant [1089718]
EPSRC [EP/N014529/1, EP/K503733/1]
Natural Environment Research Council
EPSRC [EP/J021199/1, EP/N014529/1] Funding Source: UKRI
NERC [NE/L012456/1] Funding Source: UKRI

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Capturing the dynamical properties of time series concisely as interpretable feature vectors can enable efficient clustering and classification for time-series applications across science and industry. Selecting an appropriate feature-based representation of time series for a given application can be achieved through systematic comparison across a comprehensive time-series feature library, such as those in the hctsa toolbox. However, this approach is computationally expensive and involves evaluating many similar features, limiting the widespread adoption of feature-based representations of time series for real-world applications. In this work, we introduce a method to infer small sets of time-series features that (i) exhibit strong classification performance across a given collection of time-series problems, and (ii) are minimally redundant. Applying our method to a set of 93 time-series classification datasets (containing over 147,000 time series) and using a filtered version of the hctsa feature library (4791 features), we introduce a set of 22 CAnonical Time-series CHaracteristics, catch22, tailored to the dynamics typically encountered in time-series data-mining tasks. This dimensionality reduction, from 4791 to 22, is associated with an approximately 1000-fold reduction in computation time and near linear scaling with time-series length, despite an average reduction in classification accuracy of just 7%. catch22 captures a diverse and interpretable signature of time series in terms of their properties, including linear and non-linear autocorrelation, successive differences, value distributions and outliers, and fluctuation scaling properties. We provide an efficient implementation of catch22, accessible from many programming environments, that facilitates feature-based time-series analysis for scientific, industrial, financial and medical applications using a common language of interpretable time-series properties.

catch22: CAnonical Time-series CHaracteristics Selected through highly comparative time-series analysis

期刊

DATA MINING AND KNOWLEDGE DISCOVERY

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

catch22: CAnonical Time-series CHaracteristics Selected through highly comparative time-series analysis

期刊

DATA MINING AND KNOWLEDGE DISCOVERY

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文