4.8 Article

Audio-Visual Autoencoding for Privacy-Preserving Video Streaming

期刊

IEEE INTERNET OF THINGS JOURNAL
卷 9, 期 3, 页码 1749-1761

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JIOT.2021.3089080

关键词

Streaming media; Privacy; Visualization; Internet of Things; Cryptography; Predictive models; Generative adversarial networks; Audio-visual; privacy; video streaming; vector quantized variational autoencoder (VQ-VAE)

资金

  1. U.S. National Science Foundation [1741277, 1829674, 1704287, 1912753, 2011845]
  2. Microsoft Investigator Fellowship
  3. Direct For Education and Human Resources
  4. Division Of Graduate Education [1912753] Funding Source: National Science Foundation
  5. Div Of Electrical, Commun & Cyber Sys
  6. Directorate For Engineering [2011845] Funding Source: National Science Foundation

向作者/读者索取更多资源

The demand for sharing video streaming has increased significantly due to the proliferation of IoT devices, while the development of AI detection techniques has made visual privacy protection more urgent and difficult. In this article, a cycle vector-quantized variational autoencoder (cycle-VQ-VAE) framework is proposed to encode and decode videos with extracted audio, enabling effective privacy protection. The framework includes two models, F2F and V2V, which utilize frame relations to improve privacy protection, video compression, and video reconstruction. Experimental results demonstrate the superiority of the proposed models in visual privacy protection, visual quality preservation, and video transmission efficiency.
The demand of sharing video streaming extremely increases due to the proliferation of Internet of Things (IoT) devices in recent years, and the explosive development of artificial intelligent (AI) detection techniques has made visual privacy protection more urgent and difficult than ever before. Although a number of approaches have been proposed, their essential drawbacks limit the effect of visual privacy protection in real applications. In this article, we propose a cycle vector-quantized variational autoencoder (cycle-VQ-VAE) framework to encode and decode the video with its extracted audio, which takes the advantage of multiple heterogeneous data sources in the video itself to protect individuals' privacy. In our cycle-VQ-VAE framework, a fusion mechanism is designed to integrate the video and its extracted audio. Particularly, the extracted audio works as the random noise with a nonpatterned distribution, which outperforms the noise that follows a patterned distribution for hiding visual information in the video. Under this framework, we design two models, including the frame-to-frame (F2F) model and video-to-video (V2V) model, to obtain privacy-preserving video streaming. In F2F, the video is processed as a sequence of frames; while, in V2V, the relations between frames are utilized to deal with the video, greatly improving the performance of privacy protection, video compression, and video reconstruction. Moreover, the video streaming is compressed in our encoding process, which can resist side-channel inference attack during video transmission and reduce video transmission time. Through the real-data experiments, we validate the superiority of our models (F2F and V2V) over the existing methods in visual privacy protection, visual quality preservation, and video transmission efficiency. The codes of our model implementation and more experimental results are now available at https://github.com/ahahnut/cycle-VQ-VAE.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据