期刊
JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH
卷 78, 期 3, 页码 158-161出版社
NATL INST SCIENCE COMMUNICATION-NISCAIR
关键词
Microarray; Random Forest; Hadoop
Cancer is an invasive disease if it detects at a later stage. We strongly believe that the early detection of cancer can increase the efficiency of treatment and decreases the mortality rate. The microarray is a technique where we can study thousands of genes in a short amount of time when we compared to any other traditional methods. The main drawback of microarray data is its curse of dimensionality problem. Since it has a very large number of genes (features) as compared to a number of samples, it creates computational instability for a single system to give an effective result. Hadoop processes big data in divide and conquer manner in its master-slave architecture to project results in a short amount of time. Random forest is an ensemble technique for feature selection and classification. To select a relevant feature from all feature set of microarray data, we need to randomly permute the value of features and check for misclassification rate. If the misclassification rate changes then that particular feature is important. With the proposed method, the accuracy level for detecting cancer at the early stage is effectively improved.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据