4.1 Article

Fuzzing-based grammar learning from a minimal set of seed inputs

期刊

JOURNAL OF COMPUTER LANGUAGES
卷 78, 期 -, 页码 -

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.cola.2023.101252

关键词

Fuzzing; Grammar-based fuzzing; Software testing; Grammar inference; Grammar learning; Program analysis

向作者/读者索取更多资源

To be effective, a fuzzer needs to generate well-formed inputs that can detect meaningful bugs. Grammar based fuzzers require a known grammar of the input language, which is often unknown. Our proposed algorithm can learn grammars from recursive descent parsers with high levels of recall and precision without requiring complex program analysis techniques.
To be effective, a fuzzer needs to generate inputs that are well formed, so that they are not outright rejected by the Software Under Test (SUT) and can thus detect meaningful bugs. Grammar based fuzzers solve this problem, but they obviously require a grammar of the input language accepted by the SUT. Many times such grammar is unknown. Therefore, different black-and white-box algorithms have been proposed for learning them from SUTs. Black-box algorithms rely only on membership queries, but need access to carefully crafted well formed inputs in order to obtain good results. White-box algorithms require access to the source code and generally produce grammars with higher precision and recall, but at the expense of working only for specific programming languages and libraries. We propose a new algorithm and show through extensive experimentation that it can learn grammars from recursive descendent parsers with consistently high levels of both, recall and precision. Notably, this result was obtained starting with a couple of arbitrary seed inputs and includes evaluations with sophisticated languages such as Java Script Object Notation (JSON). Different to other state of the art white-box approaches, our method does not require sophisticated program analysis techniques such as dynamic tainting or symbolic execution. In fact, the experiments confirm that our method performs extremely well with just a (standard) generic Abstract Syntax Tree (AST) of the parsing program as input. The core of our method uses fuzzing techniques combined with fundamental theoretical results on grammar learning. Compared to other white-box approaches, ours is not tied to specific programming languages and tools, and thus can be easily ported. Regarding performance, we have shown that our algorithm works well in practice and that, under reasonable assumptions, its worst-case complexity is polynomial (with low exponents) w.r.t. time and space requirements.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.1
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据