4.7 Article

A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TPDS.2023.3244135

关键词

Random variables; Scalability; Probability distribution; Markov processes; Machine learning algorithms; Bayes methods; Software algorithms; Bayesian networks; constraint-based learning; parallel machine learning; gene networks; reproducibility

向作者/读者索取更多资源

This article presents a parallel framework for scaling Bayesian network structure learning algorithms to tens of thousands of variables. The framework parallelizes three different algorithms and is able to construct large-scale networks from real data sets in less than a minute on 1024 cores, achieving significant speedup and efficiency. The scalability of the framework is also demonstrated using simulated data sets.
Bayesian networks (BNs) are a widely used graphical model in machine learning. As learning the structure of BNs is NP-hard, high-performance computing methods are necessary for constructing large-scale networks. In this article, we present a parallel framework to scale BN structure learning algorithms to tens of thousands of variables. Our framework is applicable to learning algorithms that rely on the discovery of Markov blankets (MBs) as an intermediate step. We demonstrate the applicability of our framework by parallelizing three different algorithms: Grow-Shrink (GS), Incremental Association MB (IAMB), and Interleaved IAMB (Inter-IAMB). Our implementations are available as part of an open-source software called ramBLe, and are able to construct BNs from real data sets with tens of thousands of variables and thousands of observations in less than a minute on 1024 cores, with a speedup of up to 845X and 82.5% efficiency. Furthermore, we demonstrate using simulated data sets that our proposed parallel framework can scale to BNs of even higher dimensionality. Our implementations were selected for the reproducibility challenge component of the 2021 student cluster competition (SCC'21), which tasked undergraduate teams from around the world with reproducing the results that we obtained using the implementations. We discuss details of the challenge and the results of the experiments conducted by the top teams in the competition. The results of these experiments indicate that our key results are reproducible, despite the use of completely different data sets and experiment infrastructure, and validate the scalability of our implementations.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据