期刊
IEEE TRANSACTIONS ON BIG DATA
卷 7, 期 5, 页码 832-844出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TBDATA.2017.2688372
关键词
Class-imbalance learning; distributed algorithm; anomaly detection
资金
- Prime Minister's Fellowship for Doctoral Research, joint initiative by Confederation of Indian Industry (CII)
- industry partner Robert Bosch Engineering & Business Solution (RBEI), Bangalore, India
The study focuses on addressing class imbalance problems in a distributed setting by exploiting the sparsity structure in the data. The class-imbalance learning problem is formulated as a cost-sensitive learning problem with L-1 regularization, and the optimization is done within the Distributed Alternating Direction Method of Multiplier (DADMM) framework. The results show that the distributed solution approximates the centralized solution on numerous benchmark datasets, offering promising avenues for real-world applications like anomaly detection and class-imbalance learning.
In the present work, the study on class imbalance problems in a distributed setting exploiting sparsity structure in the data has been carried out. We formulate the class-imbalance learning problem as a cost-sensitive learning problem with L-1 regularization. The cost-sensitive loss function is a cost-weighted smooth hinge loss. The resultant optimization problem is minimized within the Distributed Alternating Direction Method of Multiplier (DADMM) framework. We partition the data matrix across samples. This operation splits the original problem into a distributed L-2 regularized smooth loss minimization and a L-1 regularized squared loss minimization. L-2 regularized subproblem is solved via Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) and random coordinate descent method in parallel at multiple processing nodes using MPI whereas L-1 regularized problem is just a simple soft-thresholding operation. We show, empirically, that the distributed solution approximates the centralized solution on many benchmark data sets. The centralized solution is obtained via Cost-Sensitive Stochastic Coordinate Descent (CSSCD). Empirical results on small and large-scale benchmark datasets show some promising avenues to further investigate the real-world applications of the proposed algorithms such as anomaly detection, class-imbalance learning, etc. To the best of our knowledge, ours is the first work to study class-imbalance in a distributed environment on large-scale sparse data.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据