期刊
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS
卷 36, 期 11, 页码 6161-6179出版社
WILEY
DOI: 10.1002/int.22546
关键词
big data; feature selection; locality-sensitive hashing; ReliefF; scalability
资金
- Ministerio de Economia y Competitividad [PID2019-109238GB-C2, TIN 2015-65069-C2-1-R, TIN 2015-65069-C2-2-R]
- Xunta de Galicia [ED431C 2018/34]
- European Union
The ReliefF-LSH algorithm simplifies the costliest step of the ReliefF algorithm by approximating the nearest neighbor graph using locality-sensitive hashing. It can process large data sets and obtains better results and is more generally applicable than the original ReliefF.
Feature selection algorithms, such as ReliefF, are very important for processing high-dimensionality data sets. However, widespread use of popular and effective such algorithms is limited by their computational cost. We describe an adaptation of the ReliefF algorithm that simplifies the costliest of its step by approximating the nearest neighbor graph using locality-sensitive hashing (LSH). The resulting ReliefF-LSH algorithm can process data sets that are too large for the original ReliefF, a capability further enhanced by distributed implementation in Apache Spark. Furthermore, ReliefF-LSH obtains better results and is more generally applicable than currently available alternatives to the original ReliefF, as it can handle regression and multiclass data sets. The fact that it does not require any additional hyperparameters with respect to ReliefF also avoids costly tuning. A set of experiments demonstrates the validity of this new approach and confirms its good scalability.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据