期刊
JOURNAL OF INFORMATION SCIENCE
卷 33, 期 6, 页码 660-677出版社
SAGE PUBLICATIONS LTD
DOI: 10.1177/0165551506076401
关键词
text mining; data quality; knowledge discovery; term similarity; text cleaning
In this research, the development of a 'concept-clumping algorithm' designed to improve the clustering of technical concepts is demonstrated. The algorithm developed first identifies a list of technically relevant noun phrases from a cleaned extracted list and then applies a rule-based algorithm for identifying synonymous terms based on shared words in each term. An assessment of the algorithm found that the algorithm has an 89-91% precision rate, was successful in moving technically important terms higher in the term frequency list, and improved the technical specificity of term clusters.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据