☆ 4.7 Article

Distributed Sparse Class-Imbalance Learning and Its Applications

IEEE TRANSACTIONS ON BIG DATA (2021)

期刊

IEEE TRANSACTIONS ON BIG DATA

卷 7, 期 5, 页码 832-844

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TBDATA.2017.2688372

关键词

Class-imbalance learning; distributed algorithm; anomaly detection

类别

Computer Science, Information Systems Computer Science, Theory & Methods

资金

Prime Minister's Fellowship for Doctoral Research, joint initiative by Confederation of Indian Industry (CII)
industry partner Robert Bosch Engineering & Business Solution (RBEI), Bangalore, India

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The study focuses on addressing class imbalance problems in a distributed setting by exploiting the sparsity structure in the data. The class-imbalance learning problem is formulated as a cost-sensitive learning problem with L-1 regularization, and the optimization is done within the Distributed Alternating Direction Method of Multiplier (DADMM) framework. The results show that the distributed solution approximates the centralized solution on numerous benchmark datasets, offering promising avenues for real-world applications like anomaly detection and class-imbalance learning.

In the present work, the study on class imbalance problems in a distributed setting exploiting sparsity structure in the data has been carried out. We formulate the class-imbalance learning problem as a cost-sensitive learning problem with L-1 regularization. The cost-sensitive loss function is a cost-weighted smooth hinge loss. The resultant optimization problem is minimized within the Distributed Alternating Direction Method of Multiplier (DADMM) framework. We partition the data matrix across samples. This operation splits the original problem into a distributed L-2 regularized smooth loss minimization and a L-1 regularized squared loss minimization. L-2 regularized subproblem is solved via Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) and random coordinate descent method in parallel at multiple processing nodes using MPI whereas L-1 regularized problem is just a simple soft-thresholding operation. We show, empirically, that the distributed solution approximates the centralized solution on many benchmark data sets. The centralized solution is obtained via Cost-Sensitive Stochastic Coordinate Descent (CSSCD). Empirical results on small and large-scale benchmark datasets show some promising avenues to further investigate the real-world applications of the proposed algorithms such as anomaly detection, class-imbalance learning, etc. To the best of our knowledge, ours is the first work to study class-imbalance in a distributed environment on large-scale sparse data.

Distributed Sparse Class-Imbalance Learning and Its Applications

期刊

IEEE TRANSACTIONS ON BIG DATA

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Distributed Sparse Class-Imbalance Learning and Its Applications

期刊

IEEE TRANSACTIONS ON BIG DATA

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文