☆ 4.5 Article

Hierarchical features extraction and data reorganization for code search

JOURNAL OF SYSTEMS AND SOFTWARE (2024)

Journal

JOURNAL OF SYSTEMS AND SOFTWARE

Volume 208, Issue -, Pages -

Publisher

ELSEVIER SCIENCE INC

DOI: 10.1016/j.jss.2023.111896

Keywords

AI in SE; Code search; Transformer-based architecture

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study proposes a novel method called HFEDR that utilizes the hierarchical features of Transformer models and reorganizes training data to improve code search performance. Experimental results demonstrate the effectiveness and rationality of the proposed approach.

According to a natural language query, code search aims to retrieve relevant code snippets from a codebase. Recent works mainly rely on transformer-based pretraining models to measure the matching degree of queries and codes. Compared with works that rely on earlier deep learning methods, such as LSTM and Attention, they can significantly improve the performance of code search tasks. However, the different layers of the transformer-based models have different features that are intuitive and efficient for understanding the semantics of codes and queries but are rarely considered. Moreover, existing methods do not consider further increasing the amount of training data during training to improve the model's performance.Toward this end, we propose a novel method called HFEDR, which utilizes the hierarchical features of transformer-based models and reorganizes original training data during a training phase. Specifically, we first extract high-level and low-level features of queries and codes from the higher and lower layers of GraphCodeBERT, respectively, achieving multi-view and comprehensive semantic representation. After that, we organize the original training data into hierarchical-uncorrelated feature pairs and then reorganize them into hierarchical-correlated feature pairs, achieving training the model with more data. Finally, we update the model's parameters using a contrastive training method. We conduct extensive experiments on CodeSearchNet, demonstrating the effectiveness and rationality of our proposed approach.

Hierarchical features extraction and data reorganization for code search

Journal

JOURNAL OF SYSTEMS AND SOFTWARE

Publisher

ELSEVIER SCIENCE INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Hierarchical features extraction and data reorganization for code search

Journal

JOURNAL OF SYSTEMS AND SOFTWARE

Publisher

ELSEVIER SCIENCE INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper