Abstract
In order to aid quality assurance of large complex hardware architectures, system simulators have been developed. However, such system simulators do not always accurately mirror what would have happened on a real device. A significant challenge in testing these simulators comes from the complexity of having to model both the simulation and the infinite number of software that could be run on such a device.
Our previous work introduced SearchSYS, a testing framework for software simulators. SearchSYS leverages a large language model for initial seed C code generation which is then compiled, and the resultant binary is fed to a fuzzer. We then use differential testing by running the outputs of fuzzing on real hardware and a system simulator to identify mismatches.
In this paper, we present and discuss our solution to the problem of testing software simulators, using SearchSYS to test the gem5 VLSI digital circuit simulator, employed by ARM to test their systems. In particular, we focus on the simulation of the ARM silicon chip Instruction Set Architecture (ISA). SearchSYS can create test cases that activate bugs by combining LLMs, fuzzing, and differential testing. Using only LLM, SearchSYS identified 74 test cases that activated bugs. By incorporating fuzzing, this number increased by 93 additional bug-activating cases within 24 hours. Through differential testing, we identified 624 bugs with LLM-generated test cases and 126 with fuzzed test inputs. Out of the total number of bug-activating test cases, 4 unique bugs have been reported and acknowledged by developers. Additionally, we provided developers with a test case suite and fuzzing statistics, and open-sourced SearchSYS.
Our previous work introduced SearchSYS, a testing framework for software simulators. SearchSYS leverages a large language model for initial seed C code generation which is then compiled, and the resultant binary is fed to a fuzzer. We then use differential testing by running the outputs of fuzzing on real hardware and a system simulator to identify mismatches.
In this paper, we present and discuss our solution to the problem of testing software simulators, using SearchSYS to test the gem5 VLSI digital circuit simulator, employed by ARM to test their systems. In particular, we focus on the simulation of the ARM silicon chip Instruction Set Architecture (ISA). SearchSYS can create test cases that activate bugs by combining LLMs, fuzzing, and differential testing. Using only LLM, SearchSYS identified 74 test cases that activated bugs. By incorporating fuzzing, this number increased by 93 additional bug-activating cases within 24 hours. Through differential testing, we identified 624 bugs with LLM-generated test cases and 126 with fuzzed test inputs. Out of the total number of bug-activating test cases, 4 unique bugs have been reported and acknowledged by developers. Additionally, we provided developers with a test case suite and fuzzing statistics, and open-sourced SearchSYS.
Original language | English |
---|---|
Title of host publication | ICSE SEIP 2025 |
Number of pages | 12 |
Publication status | Accepted/In press - 15 Dec 2024 |
Fingerprint
Dive into the research topics of 'Search+LLM-based Testing for ARM Simulators'. Together they form a unique fingerprint.Datasets
-
Artifact of Search+LLM-based Testing for ARM Simulators
Even-Mendoza, K., Menendez Benito, H., Langdon, W. B., Dakhama, A., Petke, J. & Bruce, B. R., Zenodo, 10 Oct 2024
DOI: 10.5281/zenodo.13909721, https://zenodo.org/records/13909721
Dataset