4.2 Article

Lung cancer survival prediction using ensemble data mining on SEER data

期刊

SCIENTIFIC PROGRAMMING
卷 20, 期 1, 页码 29-42

出版社

HINDAWI LTD
DOI: 10.1155/2012/920245

关键词

Ensemble data mining; lung cancer; predictive modeling; outcome calculator

资金

  1. NSF [CCF-0621443, OCI-0724599, CCF-0833131, CNS-0830927, IIS-0905205, OCI-0956311, CCF-0938000, CCF-1043085, CCF-1029166, OCI-1144061, CNS-0551639, IIS-0536994]
  2. DOE [DE-FC02-07ER25808, DE-FG02-08ER25848, DE-SC0001283, DE-SC0005309, DE-SC0005340]

向作者/读者索取更多资源

We analyze the lung cancer data available from the SEER program with the aim of developing accurate survival prediction models for lung cancer. Carefully designed preprocessing steps resulted in removal/modification/splitting of several attributes, and 2 of the 11 derived attributes were found to have significant predictive power. Several supervised classification methods were used on the preprocessed data along with various data mining optimizations and validations. In our experiments, ensemble voting of five decision tree based classifiers and meta-classifiers was found to result in the best prediction performance in terms of accuracy and area under the ROC curve. We have developed an on-line lung cancer outcome calculator for estimating the risk of mortality after 6 months, 9 months, 1 year, 2 year and 5 years of diagnosis, for which a smaller non-redundant subset of 13 attributes was carefully selected using attribute selection techniques, while trying to retain the predictive power of the original set of attributes. Further, ensemble voting models were also created for predicting conditional survival outcome for lung cancer (estimating risk of mortality after 5 years of diagnosis, given that the patient has already survived for a period of time), and included in the calculator. The on-line lung cancer outcome calculator developed as a result of this study is available at http://info.eecs.northwestern.edu:8080/LungCancerOutcomeCalculator/.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据