Collections

Journal
2025-10-28

LMFuzz: Program repair fuzzing based on large language models

Lin, Renze Wang, Ran Hu, Guanghuan +1
Automated Software Engineering
Abstract

Generating programs using large language models (LLMs) for fuzz testing has emerged as a significant testing methodology. While traditional fuzzers can produce correct programs, their effectiveness is limited by excessive constraints and restricted API combinations, resulting in insufficient coverage of the target system’s code and impacting testing efficiency. Unlike traditional methods, large language model based fuzzers can generate more diverse code, effectively addressing key issues of conventional fuzzers. However, the lack of constraints on API combinations during the generation process often leads to reduced program validity. Therefore, a crucial challenge is to enhance the validity of generated code while maintaining its diversity. To address this issue, we propose a novel and universal fuzzer, LMFuzz. To ensure the fuzzer’s generation capability, we utilize a large language model as the primary generator and model the operator selection problem within the fuzzing loop as a multi-armed bandit problem. We introduce the Thompson Sampling algorithm to enhance both the diversity and validity of program generation. To improve the validity of the generated code, we incorporate a program repair loop that iteratively corrects the generated programs, thereby reducing errors caused by the lack of API combination constraints. Experimental results demonstrate that LMFuzz significantly surpasses existing state-of-the-art large language model based fuzzers in terms of coverage and validity, and also exhibits notable advantages in generating diverse programs. Furthermore, LMFuzz has identified 24 bugs across five popular programming languages and their corresponding systems.

Journal Data Ref: EP2JV2WH
Journal
2022-12-01

AMSFuzz: An adaptive mutation schedule for fuzzing

Zhao, Xiaoqi Qu, Haipeng Xu, Jianliang +2
Expert Systems with Applications
Abstract

Mutation-based fuzzing is one of the most popular software testing techniques. After allocating a specific amount of energy (i.e., the number of testcases generated by the seed) for the seed, it uses existing mutation operators to continuously mutate the seed to generate new testcases and feed them into the target program to discover unexpected behaviors, such as bugs, crashes, and vulnerabilities. However, the random selection of mutation operators and sequential selection of mutation positions in existing fuzzers affect path discovery and bug detection. In this paper, a novel adaptive mutation schedule framework, AMSFuzz is proposed. For the random selection of mutation operators, AMSFuzz has the ability to adaptively adjust the probability distribution of mutation operators to select mutation operators. Aiming at the sequential selection of mutation positions, seeds are dynamically sliced with different sizes during the fuzzing process and giving more seeds the opportunity to preferentially mutate, improving the efficiency of fuzzing. AMSFuzz is implemented and evaluated in 12 real-world programs and LAVA-M dataset. The results show that AMSFuzz substantially outperforms state-of-the-art fuzzers in terms of path discovery and bug detection. Additionally, AMSFuzz has detected 17 previously unknown bugs in several projects, 15 of which were assigned CVE IDs.

Journal Data Ref: D3UAX9ZU
Conference
2023

DARWIN: Survival of the Fittest Fuzzing Mutators

Jauernig, Patrick Jakobovic, Domagoj Picek, Stjepan +2
Abstract

Fuzzing is an automated software testing technique broadly adopted by the industry. A popular variant is mutation-based fuzzing, which discovers a large number of bugs in practice. While the research community has studied mutation-based fuzzing for years now, the algorithms' interactions within the fuzzer are highly complex and can, together with the randomness in every instance of a fuzzer, lead to unpredictable effects. Most efforts to improve this fragile interaction focused on optimizing seed scheduling. However, real-world results like Google's FuzzBench highlight that these approaches do not consistently show improvements in practice. Another approach to improve the fuzzing process algorithmically is optimizing mutation scheduling. Unfortunately, existing mutation scheduling approaches also failed to convince because of missing real-world improvements or too many user-controlled parameters whose configuration requires expert knowledge about the target program. This leaves the challenging problem of cleverly processing test cases and achieving a measurable improvement unsolved. We present DARWIN, a novel mutation scheduler and the first to show fuzzing improvements in a realistic scenario without the need to introduce additional user-configurable parameters, opening this approach to the broad fuzzing community. DARWIN uses an Evolution Strategy to systematically optimize and adapt the probability distribution of the mutation operators during fuzzing. We implemented a prototype based on the popular general-purpose fuzzer AFL. DARWIN significantly outperforms the state-of-the-art mutation scheduler and the AFL baseline in our own coverage experiment, in FuzzBench, and by finding 15 out of 21 bugs the fastest in the MAGMA benchmark. Finally, DARWIN found 20 unique bugs (including one novel bug), 66% more than AFL, in widely-used real-world applications.

Conference Data Ref: R3L4W5XB
Blog

FirmAgent: Leveraging Fuzzing to Assist LLM Agents with IoT Firmware Vulnerability Discovery

Abstract

No abstract available

Blog Data Ref: 88P9Y9UZ
Conference
2024-04-12

Fuzz4All: Universal Fuzzing with Large Language Models

Xia, Chunqiu Steven Paltenghi, Matteo Tian, Jia Le +2
Abstract

Fuzzing has achieved tremendous success in discovering bugs and vulnerabilities in various software systems. Systems under test (SUTs) that take in programming or formal language as inputs, e.g., compilers, runtime engines, constraint solvers, and software libraries with accessible APIs, are especially important as they are fundamental building blocks of software development. However, existing fuzzers for such systems often target a specific language, and thus cannot be easily applied to other languages or even other versions of the same language. Moreover, the inputs generated by existing fuzzers are often limited to specific features of the input language, and thus can hardly reveal bugs related to other or new features. This paper presents Fuzz4All, the first fuzzer that is universal in the sense that it can target many different input languages and many different features of these languages. The key idea behind Fuzz4All is to leverage large language models (LLMs) as an input generation and mutation engine, which enables the approach to produce diverse and realistic inputs for any practically relevant language. To realize this potential, we present a novel autoprompting technique, which creates LLM prompts that are wellsuited for fuzzing, and a novel LLM-powered fuzzing loop, which iteratively updates the prompt to create new fuzzing inputs. We evaluate Fuzz4All on nine systems under test that take in six different languages (C, C++, Go, SMT2, Java and Python) as inputs. The evaluation shows, across all six languages, that universal fuzzing achieves higher coverage than existing, language-specific fuzzers. Furthermore, Fuzz4All has identified 98 bugs in widely used systems, such as GCC, Clang, Z3, CVC5, OpenJDK, and the Qiskit quantum computing platform, with 64 bugs already confirmed by developers as previously unknown.

Conference Data Ref: Y3KAUUIV
Preprint
2023-04-04

Large Language Models are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT

Deng, Yinlin Xia, Chunqiu Steven Yang, Chenyuan +3
Abstract

Deep Learning (DL) library bugs affect downstream DL applications, emphasizing the need for reliable systems. Generating valid input programs for fuzzing DL libraries is challenging due to the need for satisfying both language syntax/semantics and constraints for constructing valid computational graphs. Recently, the TitanFuzz work demonstrates that modern Large Language Models (LLMs) can be directly leveraged to implicitly learn all the constraints to generate valid DL programs for fuzzing. However, LLMs tend to generate ordinary programs following similar patterns seen in their massive training corpora, while fuzzing favors unusual inputs that cover edge cases or are unlikely to be manually produced. To fill this gap, this paper proposes FuzzGPT, the first technique to prime LLMs to synthesize unusual programs for fuzzing. FuzzGPT is built on the well-known hypothesis that historical bug-triggering programs may include rare/valuable code ingredients important for bug finding. Traditional techniques leveraging such historical information require intensive human efforts to design dedicated generators and ensure the validity of generated programs. FuzzGPT demonstrates that this process can be fully automated via the intrinsic capabilities of LLMs (including fine-tuning and in-context learning), while being generalizable and applicable to challenging domains. While FuzzGPT can be applied with different LLMs, this paper focuses on the powerful GPT-style models: Codex and CodeGen. Moreover, FuzzGPT also shows the potential of directly leveraging the instruct-following capability of the recent ChatGPT for effective fuzzing. Evaluation on two popular DL libraries (PyTorch and TensorFlow) shows that FuzzGPT can substantially outperform TitanFuzz, detecting 76 bugs, with 49 already confirmed as previously unknown bugs, including 11 high-priority bugs or security vulnerabilities.

Preprint Data Ref: VMBHEQR5
1