☆ 4.5 Article

Hierarchical features extraction and data reorganization for code search

JOURNAL OF SYSTEMS AND SOFTWARE (2024)

期刊

JOURNAL OF SYSTEMS AND SOFTWARE

卷 208, 期 -, 页码 -

出版社

ELSEVIER SCIENCE INC

DOI: 10.1016/j.jss.2023.111896

关键词

AI in SE; Code search; Transformer-based architecture

类别

Computer Science, Software Engineering Computer Science, Theory & Methods

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study proposes a novel method called HFEDR that utilizes the hierarchical features of Transformer models and reorganizes training data to improve code search performance. Experimental results demonstrate the effectiveness and rationality of the proposed approach.

According to a natural language query, code search aims to retrieve relevant code snippets from a codebase. Recent works mainly rely on transformer-based pretraining models to measure the matching degree of queries and codes. Compared with works that rely on earlier deep learning methods, such as LSTM and Attention, they can significantly improve the performance of code search tasks. However, the different layers of the transformer-based models have different features that are intuitive and efficient for understanding the semantics of codes and queries but are rarely considered. Moreover, existing methods do not consider further increasing the amount of training data during training to improve the model's performance.Toward this end, we propose a novel method called HFEDR, which utilizes the hierarchical features of transformer-based models and reorganizes original training data during a training phase. Specifically, we first extract high-level and low-level features of queries and codes from the higher and lower layers of GraphCodeBERT, respectively, achieving multi-view and comprehensive semantic representation. After that, we organize the original training data into hierarchical-uncorrelated feature pairs and then reorganize them into hierarchical-correlated feature pairs, achieving training the model with more data. Finally, we update the model's parameters using a contrastive training method. We conduct extensive experiments on CodeSearchNet, demonstrating the effectiveness and rationality of our proposed approach.

Hierarchical features extraction and data reorganization for code search

期刊

JOURNAL OF SYSTEMS AND SOFTWARE

出版社

ELSEVIER SCIENCE INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Hierarchical features extraction and data reorganization for code search

期刊

JOURNAL OF SYSTEMS AND SOFTWARE

出版社

ELSEVIER SCIENCE INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文