☆ 3.8 Proceedings Paper

Towards better ML-based software services: an investigation of source code engineering impact

2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE SERVICES ENGINEERING, SSE (2023)

期刊

2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE SERVICES ENGINEERING, SSE

卷 -, 期 -, 页码 128-137

出版社

IEEE COMPUTER SOC

DOI: 10.1109/SSE60056.2023.00027

关键词

Software service; source code engineering; parsing; machine learning; data engineering

类别

Computer Science, Software Engineering

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In recent years, there has been rapid development in machine learning-based software service solutions, specifically for source code. However, the impact of source code engineering on these models is often overlooked. This study evaluates different parsing tools for their impact on a Code2Vec model's prediction task for method names in Java language. The results show that ASTs generated by different parsing tools vary significantly in terms of source code structures and contents, which can significantly affect the model's performance. Therefore, the selection of appropriate parsing tools during data pre-processing is crucial for machine learning models implemented in software services.

In recent years, the development of machine learning-based solutions for software services, particularly for source code, has grown rapidly. It is witnessed that many machine learning models for software services require the input of source code snippets in a desired form of abstract syntax tree (AST), which is mostly generated from an external tool. However, such data pre-processing tasks could be done by different engineering tools, and the impact of these tools towards final models is often neglected. In this work, we aim to investigate the source code engineering impacts towards machine learning-based software services. Three different types of parsing tools are identified, which are parser generator, parsing library and parser developed for a certain purpose. They are thoroughly evaluated towards the impacts on the prediction model of Code2Vec for the prediction task of the method name in Java language. The collective result on the Java-small dataset shows that the generated ASTs differ a lot in terms of source code structures and contents when using different parsing tools. The difference could influence the performance of the trained model significantly. Our result suggests that when machine learning models are implemented for software services, especially for code-related tasks, the selection of parsing tools should be thoroughly considered during the data pre-processing stage. While there are some interesting findings on Java-med and Java-small, we anticipate this work could provide some insights for better ML-based software service solutions from the perspective of source code engineering.

Towards better ML-based software services: an investigation of source code engineering impact

期刊

2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE SERVICES ENGINEERING, SSE

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Towards better ML-based software services: an investigation of source code engineering impact

期刊

2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE SERVICES ENGINEERING, SSE

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文