4.7 Article

Distributed Sparse Class-Imbalance Learning and Its Applications

期刊

IEEE TRANSACTIONS ON BIG DATA
卷 7, 期 5, 页码 832-844

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TBDATA.2017.2688372

关键词

Class-imbalance learning; distributed algorithm; anomaly detection

资金

  1. Prime Minister's Fellowship for Doctoral Research, joint initiative by Confederation of Indian Industry (CII)
  2. industry partner Robert Bosch Engineering & Business Solution (RBEI), Bangalore, India

向作者/读者索取更多资源

The study focuses on addressing class imbalance problems in a distributed setting by exploiting the sparsity structure in the data. The class-imbalance learning problem is formulated as a cost-sensitive learning problem with L-1 regularization, and the optimization is done within the Distributed Alternating Direction Method of Multiplier (DADMM) framework. The results show that the distributed solution approximates the centralized solution on numerous benchmark datasets, offering promising avenues for real-world applications like anomaly detection and class-imbalance learning.
In the present work, the study on class imbalance problems in a distributed setting exploiting sparsity structure in the data has been carried out. We formulate the class-imbalance learning problem as a cost-sensitive learning problem with L-1 regularization. The cost-sensitive loss function is a cost-weighted smooth hinge loss. The resultant optimization problem is minimized within the Distributed Alternating Direction Method of Multiplier (DADMM) framework. We partition the data matrix across samples. This operation splits the original problem into a distributed L-2 regularized smooth loss minimization and a L-1 regularized squared loss minimization. L-2 regularized subproblem is solved via Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) and random coordinate descent method in parallel at multiple processing nodes using MPI whereas L-1 regularized problem is just a simple soft-thresholding operation. We show, empirically, that the distributed solution approximates the centralized solution on many benchmark data sets. The centralized solution is obtained via Cost-Sensitive Stochastic Coordinate Descent (CSSCD). Empirical results on small and large-scale benchmark datasets show some promising avenues to further investigate the real-world applications of the proposed algorithms such as anomaly detection, class-imbalance learning, etc. To the best of our knowledge, ours is the first work to study class-imbalance in a distributed environment on large-scale sparse data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据