期刊
出版社
ASSOC COMPUTING MACHINERY
DOI: 10.1145/3338906.3342483
关键词
document embeddings; doc2vec; linux kernel; natural language processing; software architecture recovery; software clustering; static graph analysis; vector semantics
资金
- European Union's Horizon 2020 Research and Innovation Programme [732223]
In this paper, we propose a novel method to determine a software's modules without knowledge of its architectural structure, and empirically validate the method's performance. We cluster files by combining document embeddings, generated with the Doc2Vec algorithm, and the call graph, provided by Static Graph Analyzers to an augmented graph. We use the Louvain Algorithm to determine its community structure and propose a module-level clustering. Our method performs better in terms of stability, authoritativeness, and extremity over other state-of-the-art clustering methods proposed in the literature and is able to decently recover the ground truth clustering of the Linux Kernel. Finally, we conclude that semantic information from vector semantics as well as the call graph can produce accurate results for software clusterings of large systems.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据