☆ 4.6 Article

Federated data processing and learning for collaboration in the physical sciences

MACHINE LEARNING-SCIENCE AND TECHNOLOGY (2022)

期刊

MACHINE LEARNING-SCIENCE AND TECHNOLOGY

卷 3, 期 4, 页码 -

出版社

IOP Publishing Ltd

DOI: 10.1088/2632-2153/aca87c

关键词

machine learning; federated learning; physical science; nanoparticles

类别

Computer Science, Artificial Intelligence Computer Science, Interdisciplinary Applications Multidisciplinary Sciences

资金

National Computing Infrastructure (NCI)
[p00]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper discusses the challenges of property analysis and prediction in fields like chemistry, nanotechnology, and materials science, especially due to the lack of data. It introduces federated learning (FL) as a machine learning framework that encourages privacy-preserving collaborations between data owners, addressing the need for combining data while preserving proprietary information. The paper proposes the use of horizontal FL and FedRed, a new dimensionality reduction method, to mitigate data limitation issues and improve collaboration efficiency. Experimental results on metallic nanoparticles data sets demonstrate the effectiveness of FL in reducing the negative impact of insufficient data.

Property analysis and prediction is a challenging topic in fields such as chemistry, nanotechnology and materials science, and often suffers from lack of data. Federated learning (FL) is a machine learning (ML) framework that encourages privacy-preserving collaborations between data owners, and potentially overcomes the need to combine data that may contain proprietary information. Combining information from different data sets within the same domain can also produce ML models with more general insight and reduce the impact of the selection bias inherent in small, individual studies. In this paper we propose using horizontal FL to mitigate these data limitation issues and explore the opportunity for data-driven collaboration under these constraints. We also propose FedRed, a new dimensionality reduction method for FL, that allows faster convergence and accounts for differences between individual data sets. The FL pipeline has been tested on a collection of eight different data sets of metallic nanoparticles, and while there are expected losses compared to a combined data set that does not preserve the privacy of the collaborators, we obtained extremely good result compared to local training on individual data sets. We conclude that FL is an effective and efficient method for the physical science domain that could hugely reduce the negative effect of insufficient data.

Federated data processing and learning for collaboration in the physical sciences

期刊

MACHINE LEARNING-SCIENCE AND TECHNOLOGY

出版社

IOP Publishing Ltd

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Federated data processing and learning for collaboration in the physical sciences

期刊

MACHINE LEARNING-SCIENCE AND TECHNOLOGY

出版社

IOP Publishing Ltd

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文