4.7 Article

Decision Tree for Sequences

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2021.3075023

关键词

Sequence classification; decision tree; discriminative pattern mining; sequential pattern mining

向作者/读者索取更多资源

Current decision trees like C4.5 and CART have simple, accurate, and intuitive interpretation, making them widely used in various fields. However, they are limited in handling complex data like sequences. To address this, a two-step procedure is commonly adopted to convert sequential data into vector data and apply existing tree-based classifiers. Nevertheless, this approach heavily relies on feature generation, potentially missing crucial features for tree construction. To overcome these challenges, we propose a new tree-based sequence classification method that constructs a concise decision tree from the feature space composed of all subsequences in the training data. Experimental results on 14 real datasets demonstrate its superior performance compared to state-of-the-art sequence classification algorithms. The source codes for our method are available at: https://github.com/ZiyaoWu/SeqDT.
Current decision trees such as C4.5 and CART are widely used in different fields due to their simplicity, accuracy and intuitive interpretation. Similar to other popular classifiers, these tree-based classification algorithms are developed for fixed-length vector data and suffer from intrinsic limitations in handling complex data such as sequences. To tackle the discrete sequence classification task, the dominant strategy is to adopt a two-step procedure: first transform the sequential dataset into a vector dataset and then apply existing tree-based classifiers on the new vector data. However, such methods are highly dependent on the feature generation procedure and some features that are critical to the tree construction may be missed. To alleviate these issues, we present a new tree-based sequence classification method, which is able to construct a concise decision tree from the feature space that is composed of all subsequences presented in the training sequences. Experimental results on fourteen real datasets show that our method can achieve better performance than those state-of-the-art sequence classification algorithms. The source codes of our method are available at: https://github.com/ZiyaoWu/SeqDT.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据