☆ 4.2 Article

Efficient K-Nearest Neighbor Graph Construction Using MapReduce for Large-Scale Data Sets

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS (2014)

期刊

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS

卷 E97D, 期 12, 页码 3142-3154

出版社

IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG

DOI: 10.1587/transinf.2014EDP7108

关键词

k-nearest neighbor graph; Hadoop MapReduce; distributed computing

类别

Computer Science, Information Systems Computer Science, Software Engineering

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

This paper presents an efficient method using Hadoop MapReduce for constructing a K-nearest neighbor graph (K-NNG) from a large-scale data set. K-NNG has been utilized as a data structure for data analysis techniques in various applications. If we are to apply the techniques to a large-scale data set, it is desirable that we develop an efficient K-NNG construction method. We focus on NN-Descent, which is a recently proposed method that efficiently constructs an approximate K-NNG. NN-Descent is implemented on a shared-memory system with OpenMP-based parallelization, and its extension for the Hadoop MapReduce framework is implied for a larger data set such that the shared-memory system is difficult to deal with. However, a simple extension for the Hadoop MapReduce framework is impractical since it requires extremely high system performance because of the high memory consumption and the low data transmission efficiency of MapReduce jobs. The proposed method relaxes the requirement by improving the MapReduce jobs, which employs an appropriate key-value pair format and an efficient sampling strategy. Experiments on large-scale data sets demonstrate that the proposed method both works efficiently and is scalable in terms of a data size, the number of machine nodes, and the graph structural parameter K.

Efficient K-Nearest Neighbor Graph Construction Using MapReduce for Large-Scale Data Sets

期刊

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS

出版社

IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Efficient K-Nearest Neighbor Graph Construction Using MapReduce for Large-Scale Data Sets

期刊

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS

出版社

IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文