4.2 Article

Feature Selection and Classification of Microarray Data for Cancer Prediction Using MapReduce Implementation of Random Forest Algorithm

期刊

JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH
卷 78, 期 3, 页码 158-161

出版社

NATL INST SCIENCE COMMUNICATION-NISCAIR

关键词

Microarray; Random Forest; Hadoop

向作者/读者索取更多资源

Cancer is an invasive disease if it detects at a later stage. We strongly believe that the early detection of cancer can increase the efficiency of treatment and decreases the mortality rate. The microarray is a technique where we can study thousands of genes in a short amount of time when we compared to any other traditional methods. The main drawback of microarray data is its curse of dimensionality problem. Since it has a very large number of genes (features) as compared to a number of samples, it creates computational instability for a single system to give an effective result. Hadoop processes big data in divide and conquer manner in its master-slave architecture to project results in a short amount of time. Random forest is an ensemble technique for feature selection and classification. To select a relevant feature from all feature set of microarray data, we need to randomly permute the value of features and check for misclassification rate. If the misclassification rate changes then that particular feature is important. With the proposed method, the accuracy level for detecting cancer at the early stage is effectively improved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据