☆ 4.1 Article

Fuzzing-based grammar learning from a minimal set of seed inputs

JOURNAL OF COMPUTER LANGUAGES (2024)

期刊

JOURNAL OF COMPUTER LANGUAGES

卷 78, 期 -, 页码 -

出版社

ELSEVIER SCI LTD

DOI: 10.1016/j.cola.2023.101252

关键词

Fuzzing; Grammar-based fuzzing; Software testing; Grammar inference; Grammar learning; Program analysis

类别

Computer Science, Software Engineering

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

To be effective, a fuzzer needs to generate well-formed inputs that can detect meaningful bugs. Grammar based fuzzers require a known grammar of the input language, which is often unknown. Our proposed algorithm can learn grammars from recursive descent parsers with high levels of recall and precision without requiring complex program analysis techniques.

To be effective, a fuzzer needs to generate inputs that are well formed, so that they are not outright rejected by the Software Under Test (SUT) and can thus detect meaningful bugs. Grammar based fuzzers solve this problem, but they obviously require a grammar of the input language accepted by the SUT. Many times such grammar is unknown. Therefore, different black-and white-box algorithms have been proposed for learning them from SUTs. Black-box algorithms rely only on membership queries, but need access to carefully crafted well formed inputs in order to obtain good results. White-box algorithms require access to the source code and generally produce grammars with higher precision and recall, but at the expense of working only for specific programming languages and libraries. We propose a new algorithm and show through extensive experimentation that it can learn grammars from recursive descendent parsers with consistently high levels of both, recall and precision. Notably, this result was obtained starting with a couple of arbitrary seed inputs and includes evaluations with sophisticated languages such as Java Script Object Notation (JSON). Different to other state of the art white-box approaches, our method does not require sophisticated program analysis techniques such as dynamic tainting or symbolic execution. In fact, the experiments confirm that our method performs extremely well with just a (standard) generic Abstract Syntax Tree (AST) of the parsing program as input. The core of our method uses fuzzing techniques combined with fundamental theoretical results on grammar learning. Compared to other white-box approaches, ours is not tied to specific programming languages and tools, and thus can be easily ported. Regarding performance, we have shown that our algorithm works well in practice and that, under reasonable assumptions, its worst-case complexity is polynomial (with low exponents) w.r.t. time and space requirements.

Fuzzing-based grammar learning from a minimal set of seed inputs

期刊

JOURNAL OF COMPUTER LANGUAGES

出版社

ELSEVIER SCI LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Fuzzing-based grammar learning from a minimal set of seed inputs

期刊

JOURNAL OF COMPUTER LANGUAGES

出版社

ELSEVIER SCI LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文