3.8 Proceedings Paper

Software Clusterings with Vector Semantics and the Call Graph

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3338906.3342483

Keywords

document embeddings; doc2vec; linux kernel; natural language processing; software architecture recovery; software clustering; static graph analysis; vector semantics

Funding

  1. European Union's Horizon 2020 Research and Innovation Programme [732223]

Ask authors/readers for more resources

In this paper, we propose a novel method to determine a software's modules without knowledge of its architectural structure, and empirically validate the method's performance. We cluster files by combining document embeddings, generated with the Doc2Vec algorithm, and the call graph, provided by Static Graph Analyzers to an augmented graph. We use the Louvain Algorithm to determine its community structure and propose a module-level clustering. Our method performs better in terms of stability, authoritativeness, and extremity over other state-of-the-art clustering methods proposed in the literature and is able to decently recover the ground truth clustering of the Linux Kernel. Finally, we conclude that semantic information from vector semantics as well as the call graph can produce accurate results for software clusterings of large systems.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available