3.8 Proceedings Paper

Multimodal Federated Learning on IoT Data

出版社

IEEE COMPUTER SOC
DOI: 10.1109/IoTDI54339.2022.00011

关键词

collaborative work; semisupervised learning; edge computing; multimodal sensors

资金

  1. UK Dementia Research Institute

向作者/读者索取更多资源

In this paper, a multimodal and semi-supervised federated learning framework is proposed, which can extract shared or correlated representations from different local data modalities and train local autoencoders through a multimodal aggregation algorithm. The experimental results demonstrate that introducing data from multiple modalities can improve the classification performance of federated learning, and it is possible to use labelled data from only one modality for supervised learning and apply it to testing data from other modalities.
Federated learning is proposed as an alternative to centralized machine learning since its client-server structure provides better privacy protection and scalability in real-world applications. In many applications, such as smart homes with Internet-of-Things (IoT) devices, local data on clients are generated from different modalities such as sensory, visual, and audio data. Existing federated learning systems only work on local data from a single modality, which limits the scalability of the systems. In this paper, we propose a multimodal and semi-supervised federated learning framework that trains autoencoders to extract shared or correlated representations from different local data modalities on clients. In addition, we propose a multimodal FedAvg algorithm to aggregate local autoencoders trained on different data modalities. We use the learned global autoencoder for a downstream classification task with the help of auxiliary labelled data on the server. We empirically evaluate our framework on different modalities including sensory data, depth camera videos, and RGB camera videos. Our experimental results demonstrate that introducing data from multiple modalities into federated learning can improve its classification performance. In addition, we can use labelled data from only one modality for supervised learning on the server and apply the learned model to testing data from other modalities to achieve decent F-1 scores (e.g., with the best performance being higher than 60%), especially when combining contributions from both unimodal clients and multimodal clients.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据