4.6 Article

A Fast Clustering Algorithm for Modularization of Large-Scale Software Systems

Journal

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
Volume 48, Issue 4, Pages 1451-1462

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TSE.2020.3022212

Keywords

Clustering algorithms; Software algorithms; Search problems; Software systems; Semantics; Software architecture; Software clustering; software modularization; software maintenance; software comprehension; architecture recovery

Ask authors/readers for more resources

This paper presents a new and fast clustering algorithm, FCA, that overcomes the time and space constraints of existing algorithms by performing operations on the dependency matrix and extracting other matrices. Experimental results show that the proposed algorithm achieves higher quality modularization compared to hierarchical algorithms and can compete with search-based algorithms and a clustering algorithm based on subsystem patterns. Additionally, the running time of the proposed algorithm is much shorter than that of other algorithms.
A software system evolves over time in order to meet the needs of users. Understanding a program is the most important step to apply new requirements. Clustering techniques through dividing a program into small and meaningful parts make it possible to understand the program. In general, clustering algorithms are classified into two categories: hierarchical and non-hierarchical algorithms (such as search-based approaches). While clustering problems generally tend to be NP-hard, search-based algorithms produce acceptable clustering and have time and space constraints and hence they are inefficient in large-scale software systems. Most algorithms which currently used in software clustering fields do not scale well when applied to large and very large applications. In this paper, we present a new and fast clustering algorithm, FCA, that can overcome space and time constraints of existing algorithms by performing operations on the dependency matrix and extracting other matrices based on a set of features. The experimental results on ten small-sized applications, ten folders with different functionalities from Mozilla Firefox, a large-sized application (namely ITK), and a very large-sized application (namely Chromium) demonstrate that the proposed algorithm achieves higher quality modularization compared with hierarchical algorithms. It can also compete with search-based algorithms and a clustering algorithm based on subsystem patterns. But the running time of the proposed algorithm is much shorter than that of the hierarchical and non-hierarchical algorithms. The source code of the proposed algorithm can be accessed at https://github.com/SoftwareMaintenanceLab.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available