3.8 Proceedings Paper

CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3475960.3475985

关键词

Security vulnerabilities; dataset; software repository mining; vulnerability prediction; vulnerability classification; source code repair

资金

  1. Research Council of Norway [288787]

向作者/读者索取更多资源

The study proposes a method to automatically collect and curate a comprehensive vulnerability dataset from CVE records in NVD, which includes vulnerable code and fixes, along with metadata and detailed code and security metrics. The dataset can be easily updated with newly discovered or patched vulnerabilities, and supports various types of data-driven software security research.
Data-driven research on the automated discovery and repair of security vulnerabilities in source code requires comprehensive datasets of real-life vulnerable code and their fixes. To assist in such research, we propose a method to automatically collect and curate a comprehensive vulnerability dataset from Common Vulnerabilities and Exposures (CVE) records in the National Vulnerability Database (NVD). We implement our approach in a fully automated dataset collection tool and share an initial release of the resulting vulnerability dataset named CVEfixes. The CVEfixes collection tool automatically fetches all available CVE records from the NVD, gathers the vulnerable code and corresponding fixes from associated open-source repositories, and organizes the collected infonnation in a relational database. Moreover, the dataset is enriched with meta-data such as programming language, and detailed code and security metrics at five levels of abstraction. The collection can easily be repeated to keep up-to-date with newly discovered or patched vulnerabilities. The initial release of CVEfixes spans all published CVEs up to 9 June 2021, covering 5365 CVE records for 1754 open-source projects that were addressed in a total of 5495 vulnerability fixing commits. CVEfixes supports various types of data-driven software security research, such as vulnerability prediction, vulnerability classification, vulnerability severity prediction, analysis of vulnerability-related code changes, and automated vulnerability repair.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据