4.5 Article

Hierarchical features extraction and data reorganization for code search

Journal

JOURNAL OF SYSTEMS AND SOFTWARE
Volume 208, Issue -, Pages -

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.jss.2023.111896

Keywords

AI in SE; Code search; Transformer-based architecture

Ask authors/readers for more resources

This study proposes a novel method called HFEDR that utilizes the hierarchical features of Transformer models and reorganizes training data to improve code search performance. Experimental results demonstrate the effectiveness and rationality of the proposed approach.
According to a natural language query, code search aims to retrieve relevant code snippets from a codebase. Recent works mainly rely on transformer-based pretraining models to measure the matching degree of queries and codes. Compared with works that rely on earlier deep learning methods, such as LSTM and Attention, they can significantly improve the performance of code search tasks. However, the different layers of the transformer-based models have different features that are intuitive and efficient for understanding the semantics of codes and queries but are rarely considered. Moreover, existing methods do not consider further increasing the amount of training data during training to improve the model's performance.Toward this end, we propose a novel method called HFEDR, which utilizes the hierarchical features of transformer-based models and reorganizes original training data during a training phase. Specifically, we first extract high-level and low-level features of queries and codes from the higher and lower layers of GraphCodeBERT, respectively, achieving multi-view and comprehensive semantic representation. After that, we organize the original training data into hierarchical-uncorrelated feature pairs and then reorganize them into hierarchical-correlated feature pairs, achieving training the model with more data. Finally, we update the model's parameters using a contrastive training method. We conduct extensive experiments on CodeSearchNet, demonstrating the effectiveness and rationality of our proposed approach.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available