☆ 4.6 Article

3-D Deconvolutional Networks for the Unsupervised Representation Learning of Human Motions

IEEE TRANSACTIONS ON CYBERNETICS (2022)

期刊

IEEE TRANSACTIONS ON CYBERNETICS

卷 52, 期 1, 页码 398-410

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCYB.2020.2973300

关键词

Machine learning; Task analysis; Correlation; Optimization; Feature extraction; Convolution; Data models; 3-D deconvolutional networks (3DDNs); data representation; human motion analysis; unsupervised learning; video representation learning

类别

Automation & Control Systems Computer Science, Artificial Intelligence Computer Science, Cybernetics

资金

National Natural Science Foundation of China [61603096, 61751202, 61751205, 61572540, U1813203, U1801262]
Natural Science Foundation of Fujian Province [2017J01750]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Data representation learning is a crucial problem in machine learning. This article proposes a new 3D deconvolutional network (3DDN) for representation learning of high-dimensional video data. The proposed 3DDN is evaluated in human action recognition and shows comparable results to feedforward convolutional neural networks (CNNs).

Data representation learning is one of the most important problems in machine learning. Unsupervised representation learning becomes meritorious as it has no necessity of label information with observed data. Due to the highly time-consuming learning of deep-learning models, there are many machine-learning models directly adapting well-trained deep models that are obtained in a supervised and end-to-end manner as feature abstractors to distinct problems. However, it is obvious that different machine-learning tasks require disparate representation of original input data. Taking human action recognition as an example, it is well known that human actions in a video sequence are 3-D signals containing both visual appearance and motion dynamics of humans and objects. Therefore, the data representation approaches with the capabilities to capture both spatial and temporal correlations in videos are meaningful. Most of the existing human motion recognition models build classifiers based on deep-learning structures such as deep convolutional networks. These models require a large quantity of training videos with annotations. Meanwhile, these supervised models cannot recognize samples from the distinct dataset without retraining. In this article, we propose a new 3-D deconvolutional network (3DDN) for representation learning of high-dimensional video data, in which the high-level features are obtained through the optimization approach. The proposed 3DDN decomposes the video frames into spatiotemporal features under a sparse constraint in an unsupervised way. In addition, it also can be regarded as a building block to develop deep architectures by stacking. The high-level representation of input sequential data can be used in multiple downstream machine-learning tasks, we evaluate the proposed 3DDN and its deep models in human action recognition. The experimental results from three datasets: 1) KTH data; 2) HMDB-51; and 3) UCF-101, demonstrate that the proposed 3DDN is an alternative approach to feedforward convolutional neural networks (CNNs), that attains comparable results.

3-D Deconvolutional Networks for the Unsupervised Representation Learning of Human Motions

期刊

IEEE TRANSACTIONS ON CYBERNETICS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

3-D Deconvolutional Networks for the Unsupervised Representation Learning of Human Motions

期刊

IEEE TRANSACTIONS ON CYBERNETICS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文