3.8 Proceedings Paper

Cross-Language Code Search using Static and Dynamic Analyses

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3468264.3468538

关键词

code-to-code search; cross-language code search; non-dominated sorting; static analysis; dynamic analysis

资金

  1. National Science Foundation under NSF SHF [1645136, 1749936, 2006947]
  2. Direct For Computer & Info Scie & Enginr
  3. Division of Computing and Communication Foundations [1749936, 2006947] Funding Source: National Science Foundation
  4. Direct For Computer & Info Scie & Enginr
  5. Division of Computing and Communication Foundations [1645136] Funding Source: National Science Foundation

向作者/读者索取更多资源

Code-to-Code Search Across Languages (COSAL) is a cross-language technique that combines static and dynamic analyses to identify similar code without the need for a machine learning model. It ranks code snippets using non-dominated sorting based on code token, structural, and behavioral similarity, outperforming current within-language and cross-language code-to-code search tools in terms of precision and recall. COSAL shows promise for practical, multi-language code search on large open-source repositories.
As code search permeates most activities in software development, code-to-code search has emerged to support using code as a query and retrieving similar code in the search results. Applications include duplicate code detection for refactoring, patch identification for program repair, and language translation. Existing code-to-code search tools rely on static similarity approaches such as the comparison of tokens and abstract syntax trees (AST) to approximate dynamic behavior, leading to low precision. Most tools do not support cross-language code-to-code search, and those that do, rely on machine learning models that require labeled training data. We present Code-to-Code Search Across Languages (COSAL), a cross-language technique that uses both static and dynamic analyses to identify similar code and does not require a machine learning model. Code snippets are ranked using non-dominated sorting based on code token similarity, structural similarity, and behavioral similarity. We empirically evaluate COSAL on two datasets of 43,146 Java and Python files and 55,499 Java files and find that 1) code search based on non-dominated ranking of static and dynamic similarity measures is more effective compared to single or weighted measures; and 2) COSAL has better precision and recall compared to state-of-the-art within-language and cross-language code-to-code search tools. We explore the potential for using COSAL on large open-source repositories and discuss scalability to more languages and similarity metrics, providing a gateway for practical, multi-language code-to-code search.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据