出版社
ELSEVIER SCIENCE BV
DOI: 10.1016/j.procs.2015.06.035
关键词
Big data; Classification; Gene selection; Hadoop; K-nearest neighbor; MapReduce; Microarray
The major drawback of microarray data is the 'curse of dimensionality problem', this hinders the useful information of dataset and leads to computational instability. Therefore, selecting relevant genes is an imperative in microarray data analysis. Most of the existing schemes employ a two-phase processes: feature selection/extraction followed by classification. In this paper, a statistical test, ANOVA based on MapReduce is proposed to select the relevant features. After feature selection, MapReduce based K-Nearest Neighbor (K-NN) classifier is also proposed to classify the microarray data. These algorithms are successfully implemented on Hadoop framework and comparative analysis is done using various datasets. (C) 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据