☆ 4.4 Article

Associative Feature Information Extraction Using Text Mining from Health Big Data

WIRELESS PERSONAL COMMUNICATIONS (2019)

Journal

WIRELESS PERSONAL COMMUNICATIONS

Volume 105, Issue 2, Pages 691-707

Publisher

SPRINGER

DOI: 10.1007/s11277-018-5722-5

Keywords

Information extraction; Text mining; Health big data; TF-IDF; Data mining

Funding

Kyonggi University Research Grant

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

With the development of big data computing technology, most documents in various areas, including politics, economics, society, culture, life, and public health, have been digitalized. The structure of conventional documents differs according to their authors or the organization that generated them. Therefore, policies and studies related to their efficient digitalization and use exist. Text mining is the technology used to classify, cluster, extract, search, and analyze data to find patterns or features in a set of unstructured or structured documents written in natural language. In this paper, a method for extracting associative feature information using text mining from health big data is proposed. Using health documents as raw data, health big data are created by means of the Web. The useful information contained in health documents is extracted through text mining. Health documents as raw data are collected through Web scraping and then saved in a file server. The collected raw data of health documents are sentence type, and thus morphological analysis is applied to create a corpus. The file server executes stop word removal, tagging, and the analysis of polysemous words in a preprocessing procedure to create a candidate corpus. TF-C-IDF is applied to the candidate corpus to evaluate the importance of words in a set of documents. The words classified as of high importance by TF-C-IDF are included in a set of keywords, and the transactions of each document are created. Using an Apriori mining algorithm, the association rules of keywords in the created transaction are analyzed and associative keywords are generated. TF-C-IDF weights and associative keywords are extracted from health big data as associative features. The proposed method is a base technology for creating added value in the healthcare industry in the era of the 4th industrial revolution. Its evaluation in terms of F-measure and efficiency showed its performance to be high. The method is expected to contribute to healthcare big data management and information search.

Associative Feature Information Extraction Using Text Mining from Health Big Data

Journal

WIRELESS PERSONAL COMMUNICATIONS

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Associative Feature Information Extraction Using Text Mining from Health Big Data

Journal

WIRELESS PERSONAL COMMUNICATIONS

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper