4.3 Article

A Visual Analytics Approach for Interactive Document Clustering

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3241380

关键词

Interactive document clustering; key-term; visualization; document projection; user study; text; email list; seeding; deterministic

资金

  1. Natural Sciences and Engineering Research Council of Canada
  2. Boeing Company (Canada)
  3. CNPq
  4. FAPESP (Brazil)
  5. International Development Research Center, Ottawa, Canada

向作者/读者索取更多资源

Document clustering is a necessary step in various analytical and automated activities. When guided by the user, algorithms are tailored to imprint a perspective on the clustering process that reflects the user's understanding of the dataset. More than just allow for customized adjustment of the clusters, a visual analytics approach will provide tools for the user to draw new insights on the collection. While contributing his or her perspective, the user will also acquire a deeper understanding of the data set. To that effect, we propose a novel visual analytics system for interactive document clustering. We built our system on top of clustering algorithms that can adapt to user's feedback. In the proposed system, initial clustering is created based on the user-defined number of clusters and the selected clustering algorithm. A set of coordinated visualizations allow the examination of the dataset and the results of the clustering. The visualization provides the user with the highlights of individual documents and understanding of the evolution of documents over the time period to which they relate. The users then interact with the process by means of changing key-terms that drive the process according to their knowledge of the documents domain. In key-term-based interaction, the user assigns a set of key-terms to each target cluster to guide the clustering algorithm. We have improved that process with a novel algorithm for choosing proper seeds for the clustering. Results demonstrate that not only the system has improved considerably its precision, but also its effectiveness in the document-based decision making. A set of quantitative experiments and a user study have been conducted to show the advantages of the approach for document analytics based on clustering. We performed and reported on the use of the framework in a real decision-making scenario that relates users discussion by email to decision making in improving patient care. Results show that the framework is useful even for more complex data sets such as email conversations.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据