3.8 Proceedings Paper

Software Clusterings with Vector Semantics and the Call Graph

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3338906.3342483

关键词

document embeddings; doc2vec; linux kernel; natural language processing; software architecture recovery; software clustering; static graph analysis; vector semantics

资金

  1. European Union's Horizon 2020 Research and Innovation Programme [732223]

向作者/读者索取更多资源

In this paper, we propose a novel method to determine a software's modules without knowledge of its architectural structure, and empirically validate the method's performance. We cluster files by combining document embeddings, generated with the Doc2Vec algorithm, and the call graph, provided by Static Graph Analyzers to an augmented graph. We use the Louvain Algorithm to determine its community structure and propose a module-level clustering. Our method performs better in terms of stability, authoritativeness, and extremity over other state-of-the-art clustering methods proposed in the literature and is able to decently recover the ground truth clustering of the Linux Kernel. Finally, we conclude that semantic information from vector semantics as well as the call graph can produce accurate results for software clusterings of large systems.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据