Collections

Preprint
2025-04-18

Large Language Models for Validating Network Protocol Parsers

Zheng, Mingwei Xie, Danning Zhang, Xiangyu
Abstract

Network protocol parsers are essential for enabling correct and secure communication between devices. Bugs in these parsers can introduce critical vulnerabilities, including memory corruption, information leakage, and denial-of-service attacks. An intuitive way to assess parser correctness is to compare the implementation with its official protocol standard. However, this comparison is challenging because protocol standards are typically written in natural language, whereas implementations are in source code. Existing methods like model checking, fuzzing, and differential testing have been used to find parsing bugs, but they either require significant manual effort or ignore the protocol standards, limiting their ability to detect semantic violations. To enable more automated validation of parser implementations against protocol standards, we propose PARVAL, a multi-agent framework built on large language models (LLMs). PARVAL leverages the capabilities of LLMs to understand both natural language and code. It transforms both protocol standards and their implementations into a unified intermediate representation, referred to as format specifications, and performs a differential comparison to uncover inconsistencies. We evaluate PARVAL on the Bidirectional Forwarding Detection (BFD) protocol. Our experiments demonstrate that PARVAL successfully identifies inconsistencies between the implementation and its RFC standard, achieving a low false positive rate of 5.6%. PARVAL uncovers seven unique bugs, including five previously unknown issues.

Preprint Data Ref: GD8E7F8Y
Journal
October 27, 2025

OmniFuzz: A Multi-agent Reinforcement Learning Framework for Protocol-Aware Fuzzing in Power IoT Devices

Song, Yubo Chen, Weiwei Sun, Xin +5
J. Netw. Syst. Manage.
Abstract

Power IoT devices, as critical components in industrial control systems, often operate in heterogeneous environments and support multiple communication protocols such as Modbus TCP, EtherNet/IP, and Siemens S7. Unlike general embedded systems, these devices have strict real-time constraints, safety-critical characteristics, and a large protocol surface exposed to external networks, making them vulnerable to protocol-level attacks. However, most existing fuzzing tools only test individual protocols independently, making it difficult to detect protocol-stack-level or multi-interface vulnerabilities. To address this, we propose OmniFuzz, a protocol-aware fuzzing framework based on multi-agent reinforcement learning, specifically designed for power IoT devices. For the multi-protocol scenarios supported by the devices, the framework constructs a dedicated agent array for each protocol. Each agent mutates specific protocol fields through an independently learned policy network and collaborates via a shared value network, forming a directed multi-protocol concurrent testing mechanism. The framework incorporates a domain-specific reward function cluster (covering vulnerability severity, code path depth, and input diversity), which effectively improves testing efficiency and code coverage. OmniFuzz supports concurrent multi-protocol fuzzing during runtime, enabling comprehensive vulnerability discovery across concurrent heterogeneous protocol interfaces. Although the current implementation does not explicitly model inter-protocol behavior sequences, it lays the foundation for future exploration of cross-protocol attack paths. Experiments on real-world PLC devices from multiple vendors show that OmniFuzz outperforms baseline fuzzers by approximately 10% in terms of time to first vulnerability, exception triggering rate, and effective recognition rate. Through this framework, we discovered 5 high-risk buffer overflow vulnerabilities in the State Grid’s Smart-distribution-transformer-combine-terminal-unit, with relevant demonstration videos published on GitHub. Detailed descriptions of this will be provided in the discussion section.

Journal Data Ref: RU8UFL2H
Preprint
2025-08-19

MultiFuzz: A Dense Retrieval-based Multi-Agent System for Network Protocol Fuzzing

Maklad, Youssef Wael, Fares Hamdi, Ali +2
Abstract

Traditional protocol fuzzing techniques, such as those employed by AFL-based systems, often lack effectiveness due to a limited semantic understanding of complex protocol grammars and rigid seed mutation strategies. Recent works, such as ChatAFL, have integrated Large Language Models (LLMs) to guide protocol fuzzing and address these limitations, pushing protocol fuzzers to wider exploration of the protocol state space. But ChatAFL still faces issues like unreliable output, LLM hallucinations, and assumptions of LLM knowledge about protocol specifications. This paper introduces MultiFuzz, a novel dense retrieval-based multi-agent system designed to overcome these limitations by integrating semantic-aware context retrieval, specialized agents, and structured tool-assisted reasoning. MultiFuzz utilizes agentic chunks of protocol documentation (RFC Documents) to build embeddings in a vector database for a retrieval-augmented generation (RAG) pipeline, enabling agents to generate more reliable and structured outputs, enhancing the fuzzer in mutating protocol messages with enhanced state coverage and adherence to syntactic constraints. The framework decomposes the fuzzing process into modular groups of agents that collaborate through chain-of-thought reasoning to dynamically adapt fuzzing strategies based on the retrieved contextual knowledge. Experimental evaluations on the Real-Time Streaming Protocol (RTSP) demonstrate that MultiFuzz significantly improves branch coverage and explores deeper protocol states and transitions over state-of-the-art (SOTA) fuzzers such as NSFuzz, AFLNet, and ChatAFL. By combining dense retrieval, agentic coordination, and language model reasoning, MultiFuzz establishes a new paradigm in autonomous protocol fuzzing, offering a scalable and extensible foundation for future research in intelligent agentic-based fuzzing systems.

Preprint Data Ref: WG77B4JY
Journal
2025-10-28

LMFuzz: Program repair fuzzing based on large language models

Lin, Renze Wang, Ran Hu, Guanghuan +1
Automated Software Engineering
Abstract

Generating programs using large language models (LLMs) for fuzz testing has emerged as a significant testing methodology. While traditional fuzzers can produce correct programs, their effectiveness is limited by excessive constraints and restricted API combinations, resulting in insufficient coverage of the target system’s code and impacting testing efficiency. Unlike traditional methods, large language model based fuzzers can generate more diverse code, effectively addressing key issues of conventional fuzzers. However, the lack of constraints on API combinations during the generation process often leads to reduced program validity. Therefore, a crucial challenge is to enhance the validity of generated code while maintaining its diversity. To address this issue, we propose a novel and universal fuzzer, LMFuzz. To ensure the fuzzer’s generation capability, we utilize a large language model as the primary generator and model the operator selection problem within the fuzzing loop as a multi-armed bandit problem. We introduce the Thompson Sampling algorithm to enhance both the diversity and validity of program generation. To improve the validity of the generated code, we incorporate a program repair loop that iteratively corrects the generated programs, thereby reducing errors caused by the lack of API combination constraints. Experimental results demonstrate that LMFuzz significantly surpasses existing state-of-the-art large language model based fuzzers in terms of coverage and validity, and also exhibits notable advantages in generating diverse programs. Furthermore, LMFuzz has identified 24 bugs across five popular programming languages and their corresponding systems.

Journal Data Ref: EP2JV2WH
Journal
2022-12-01

AMSFuzz: An adaptive mutation schedule for fuzzing

Zhao, Xiaoqi Qu, Haipeng Xu, Jianliang +2
Expert Systems with Applications
Abstract

Mutation-based fuzzing is one of the most popular software testing techniques. After allocating a specific amount of energy (i.e., the number of testcases generated by the seed) for the seed, it uses existing mutation operators to continuously mutate the seed to generate new testcases and feed them into the target program to discover unexpected behaviors, such as bugs, crashes, and vulnerabilities. However, the random selection of mutation operators and sequential selection of mutation positions in existing fuzzers affect path discovery and bug detection. In this paper, a novel adaptive mutation schedule framework, AMSFuzz is proposed. For the random selection of mutation operators, AMSFuzz has the ability to adaptively adjust the probability distribution of mutation operators to select mutation operators. Aiming at the sequential selection of mutation positions, seeds are dynamically sliced with different sizes during the fuzzing process and giving more seeds the opportunity to preferentially mutate, improving the efficiency of fuzzing. AMSFuzz is implemented and evaluated in 12 real-world programs and LAVA-M dataset. The results show that AMSFuzz substantially outperforms state-of-the-art fuzzers in terms of path discovery and bug detection. Additionally, AMSFuzz has detected 17 previously unknown bugs in several projects, 15 of which were assigned CVE IDs.

Journal Data Ref: D3UAX9ZU
1