☆ 3.8 Proceedings Paper

PTM4Tag: Sharpening Tag Recommendation of Stack Overflow Posts with Pre-trained Models

30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022) (2022)

Journal

30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022)

Volume -, Issue -, Pages 1-11

Publisher

IEEE COMPUTER SOC

DOI: 10.1145/3524610.3527897

Keywords

Tag Recommendation; Transformer; Pre-Trained Models

Funding

National Research Foundation, Singapore

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This article introduces PTM4Tag, a framework for tag recommendation that utilizes pre-trained language models. By independently modeling the title, description, and code of Stack Overflow posts, and using a software engineering domain-specific pre-trained model CodeBERT, PTM4Tag achieves the best performance. The results show that the title is the most important component in predicting relevant tags.

Stack Overflow is often viewed as one of the most influential Software Question & Answer (SQA) websites, containing millions of programming-related questions and answers. Tags play a critical role in efficiently structuring the contents in Stack Overflow and are vital to support a range of site operations, e.g., querying relevant contents. Poorly selected tags often introduce extra noise and redundancy, which raises problems like tag synonym and tag explosion. Thus, an automated tag recommendation technique that can accurately recommend high-quality tags is desired to alleviate the problems mentioned above. Inspired by the recent success of pre-trained language models (PTMs) in natural language processing (NLP), we present PTM4Tag, a tag recommendation framework for Stack Overflow posts that utilize PTMs with a triplet architecture, which models the components of a post, i.e., Title, Description, and Code with independent language models. To the best of our knowledge, this is the first work that leverages PTMs in the tag recommendation task of SQA sites. We comparatively evaluate the performance of PTM4Tag based on five popular pre-trained models: BERT, RoBERTa, ALBERT, CodeBERT, and BERTOverflow. Our results show that leveraging CodeBERT, a software engineering (SE) domain-specific PTM in PTM4Tag achieves the best performance among the five considered PTMs and outperforms the state-of-the-art Convolutional Neural Network-based approach by a large margin in terms of averageWe conduct an ablation study to quantify the contribution of a post's constituent components (Title, Description, and Code Snippets) to the performance of PTM4Tag. Our results show that Title is the most important in predicting the most relevant tags, and utilizing all the components achieves the best performance.

PTM4Tag: Sharpening Tag Recommendation of Stack Overflow Posts with Pre-trained Models

Journal

30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022)

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

PTM4Tag: Sharpening Tag Recommendation of Stack Overflow Posts with Pre-trained Models

Journal

30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022)

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper