☆ 4.6 Article

Deepfake Video Detection via Predictive Representation Learning

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2022)

期刊

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS

卷 18, 期 2, 页码 -

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3536426

关键词

Deepfake video detection; representation learning; deep learning; video understanding

类别

Computer Science, Information Systems Computer Science, Software Engineering Computer Science, Theory & Methods

资金

National Key Research and Development Plan [2020AAA0140001]
Beijing Natural Science Foundation [19L2040]
National Natural Science Foundation of China [61772513]
Youth Innovation Promotion Association, Chinese Academy of Sciences

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a predictive representative learning approach called Latent Pattern Sensing for deepfake video detection. The approach captures semantic change characteristics in deepfake videos by analyzing temporal inconsistencies and uses a unified framework to describe latent patterns across frames. Experimental results show the effectiveness of the approach, achieving high AUC scores on benchmark datasets.

Increasingly advanced deepfake approaches have made the detection of deepfake videos very challenging. We observe that the general deepfake videos often exhibit appearance-level temporal inconsistencies in some facial components between frames, resulting in discriminative spatiotemporal latent patterns among semantic-level feature maps. Inspired by this finding, we propose a predictive representative learning approach termed Latent Pattern Sensing to capture these semantic change characteristics for deepfake video detection. The approach cascades a Convolution Neural Network-based encoder, a ConvGRU-based aggregator, and a single-layer binary classifier. The encoder and aggregator are pretrained in a self-supervised manner to form the representative spatiotemporal context features. Then, the classifier is trained to classify the context features, distinguishing fake videos from real ones. Finally, we propose a selective self-distillation fine-tuning method to further improve the robustness and performance of the detector. In this manner, the extracted features can simultaneously describe the latent patterns of videos across frames spatially and temporally in a unified way, leading to an effective and robust deepfake video detector. Extensive experiments and comprehensive analysis prove the effectiveness of our approach, e.g., achieving a very highest Area Under Curve (AUC) score of 99.94% on FaceForensics++ benchmark and surpassing 12 states of the art at least 7.90%@AUC and 8.69%@AUC on challenging DFDC and Celeb-DF(v2) benchmarks, respectively.

Deepfake Video Detection via Predictive Representation Learning

期刊

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Deepfake Video Detection via Predictive Representation Learning

期刊

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文