☆ 3.8 Article

code2vec: Learning Distributed Representations of Code

PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL (2019)

期刊

PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL

卷 3, 期 -, 页码 -

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3290353

关键词

Big Code; Machine Learning; Distributed Representations

类别

Computer Science, Software Engineering

资金

European Union [615688-ERC-COG-PRIME]
AWS Cloud Credits for Research award

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

We present a neural model for representing snippets of code as continuous distributed vectors (code embeddings). The main idea is to represent a code snippet as a single fixed-length code vector, which can be used to predict semantic properties of the snippet. To this end, code is first decomposed to a collection of paths in its abstract syntax tree. Then, the network learns the atomic representation of each path while simultaneously learning how to aggregate a set of them. We demonstrate the effectiveness of our approach by using it to predict a method's name from the vector representation of its body. We evaluate our approach by training a model on a dataset of 12M methods. We show that code vectors trained on this dataset can predict method names from files that were unobserved during training. Furthermore, we show that our model learns useful method name vectors that capture semantic similarities, combinations, and analogies. A comparison of our approach to previous techniques over the same dataset shows an improvement of more than 75%, making it the first to successfully predict method names based on a large, cross-project corpus. Our trained model. visualizations and vector similarities are available as an interactive online demo at http://code2vec.org. The code, data and trained models are available at https://github.com/tech-srl/code2vec.

code2vec: Learning Distributed Representations of Code

期刊

PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

code2vec: Learning Distributed Representations of Code

期刊

PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文