4.4 Article

Software reuse analytics using integrated random forest and gradient boosting machine learning algorithm

Journal

SOFTWARE-PRACTICE & EXPERIENCE
Volume 51, Issue 4, Pages 735-747

Publisher

WILEY
DOI: 10.1002/spe.2921

Keywords

AdaBoostM1; confusion matrix; DecisionStump; gradient boosting machine; J48; JRip; LMT; LogitBoost; one R; part; random forest; software metrics; software reuse

Ask authors/readers for more resources

Cleaner Production is crucial for achieving sustainable production in companies, and software reuse is pivotal for software enterprises. This paper introduces a new machine learning algorithm (RFGBM) to test the reusability of given software code, outperforming existing algorithms in performance parameters.
The term Cleaner Production (CP) for Production Companies is contemplated as influential to get sustainable production. CP mainly deals with three R's that is, reuse, reduce, and recycle. For software enterprise, the software reuse plays a pivotal role. Software reuse is a process of producing new products or software from the existing software by updating it. To extract useful information from the existing software data mining comes into light. The algorithms used for software reuse face issues related to maintenance cost, accuracy, and performance. Also, the currently used algorithm does not give accurate results on whether the component of software can be reused. Machine Learning gives the best results to predicate if the given software component is reusable or not. This paper introduces an integrated Random Forest and Gradient Boosting Machine Learning Algorithm (RFGBM) which test the reusability of the given software code considering the object-oriented parameters such as cohesion, coupling, cyclomatic complexity, bugs, number of children, and depth inheritance tree. Further, the proposed algorithm is compared with J48, AdaBoostM1, LogitBoost, Part, One R, LMT, JRip, DecisionStump algorithms. Performance metrices like accuracy, error rate, Relative Absolute Error, and Mean Absolute Error are improved using RFGBM. This algorithm also utilizes data preprocessing with the help of an unsupervised filter to remove the missing value for efficiency improvement. Proposed algorithm outperforms existing in term of performance parameters.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available