☆ 4.7 Article

A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2023)

期刊

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

卷 34, 期 6, 页码 1699-1715

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TPDS.2023.3244135

关键词

Random variables; Scalability; Probability distribution; Markov processes; Machine learning algorithms; Bayes methods; Software algorithms; Bayesian networks; constraint-based learning; parallel machine learning; gene networks; reproducibility

类别

Computer Science, Theory & Methods Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article presents a parallel framework for scaling Bayesian network structure learning algorithms to tens of thousands of variables. The framework parallelizes three different algorithms and is able to construct large-scale networks from real data sets in less than a minute on 1024 cores, achieving significant speedup and efficiency. The scalability of the framework is also demonstrated using simulated data sets.

Bayesian networks (BNs) are a widely used graphical model in machine learning. As learning the structure of BNs is NP-hard, high-performance computing methods are necessary for constructing large-scale networks. In this article, we present a parallel framework to scale BN structure learning algorithms to tens of thousands of variables. Our framework is applicable to learning algorithms that rely on the discovery of Markov blankets (MBs) as an intermediate step. We demonstrate the applicability of our framework by parallelizing three different algorithms: Grow-Shrink (GS), Incremental Association MB (IAMB), and Interleaved IAMB (Inter-IAMB). Our implementations are available as part of an open-source software called ramBLe, and are able to construct BNs from real data sets with tens of thousands of variables and thousands of observations in less than a minute on 1024 cores, with a speedup of up to 845X and 82.5% efficiency. Furthermore, we demonstrate using simulated data sets that our proposed parallel framework can scale to BNs of even higher dimensionality. Our implementations were selected for the reproducibility challenge component of the 2021 student cluster competition (SCC'21), which tasked undergraduate teams from around the world with reproducing the results that we obtained using the implementations. We discuss details of the challenge and the results of the experiments conducted by the top teams in the competition. The results of these experiments indicate that our key results are reproducible, despite the use of completely different data sets and experiment infrastructure, and validate the scalability of our implementations.

A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery

期刊

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery

期刊

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文