☆ 4.7 Review

Text mining of cancer-related information: Review of current status and future directions

INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS (2014)

期刊

INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS

卷 83, 期 9, 页码 605-623

出版社

ELSEVIER IRELAND LTD

DOI: 10.1016/j.ijmedinf.2014.06.009

关键词

Cancer; Natural language processing; Data mining; Electronic medical records

类别

Computer Science, Information Systems Health Care Sciences & Services Medical Informatics

资金

Christie NHS Foundation Trust
Health e-Research Centre (HeRC)
Serbian Ministry of Education and Science [III44006, III47003]
Medical Research Council [MC_PC_13042] Funding Source: researchfish

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Purpose: This paper reviews the research literature on text mining (TM) with the aim to find out (1) which cancer domains have been the subject of TM efforts, (2) which knowledge resources can support TM of cancer-related information and (3) to what extent systems that rely on knowledge and computational methods can convert text data into useful clinical information. These questions were used to determine the current state of the art in this particular strand of TM and suggest future directions in TM development to support cancer research. Methods: A review of the research on TM of cancer-related information was carried out. A literature search was conducted on the Medline database as well as IEEE Xplore and ACM digital libraries to address the interdisciplinary nature of such research. The search results were supplemented with the literature identified through Google Scholar. Results: A range of studies have proven the feasibility of TM for extracting structured information from clinical narratives such as those found in pathology or radiology reports. In this article, we provide a critical overview of the current state of the art for TM related to cancer. The review highlighted a strong bias towards symbolic methods, e.g. named entity recognition (NER) based on dictionary lookup and information extraction (IE) relying on pattern matching. The F-measure of NER ranges between 80% and 90%, while that of IE for simple tasks is in the high 90s. To further improve the performance, TM approaches need to deal effectively with idiosyncrasies of the clinical sublanguage such as non-standard abbreviations as well as a high degree of spelling and grammatical errors. This requires a shift from rule-based methods to machine learning following the success of similar trends in biological applications of TM. Machine learning approaches require large training datasets, but clinical narratives are not readily available for TM research due to privacy and confidentiality concerns. This issue remains the main bottleneck for progress in this area. In addition, there is a need for a comprehensive cancer ontology that would enable semantic representation of textual information found in narrative reports. (C) 2014 The Authors. Published by Elsevier Ireland Ltd.

Text mining of cancer-related information: Review of current status and future directions

期刊

INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS

出版社

ELSEVIER IRELAND LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Text mining of cancer-related information: Review of current status and future directions

期刊

INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS

出版社

ELSEVIER IRELAND LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文