Not all software testing techniques have origin stories, but fuzz testing does: On a stormy evening in 1988, Barton Miller, a computer science professor at the University of Wisconsin–Madison, was using a dial-up connection to work remotely on a Unix computer from his apartment. He was attempting to feed input information into a computer program, only to see the program repeatedly crash.
He knew that the electrical noise from the thunderstorm was distorting his inputs into the program as they traveled through the phone line. The distorted inputs were different from what the software needed from the user, resulting in errors. But as he describes in his book, Fuzzing for Software Security Testing and Quality Assurance, Miller was surprised that even programs he considered robust were crashing as a result of the unexpected input, instead of gracefully handling the error and asking for input again.
The tools themselves also have fun and whimsical names, such as the Peach fuzzer or the american fuzzy lop (AFL) fuzzer, which is named after a species of rabbit.
Alongside his graduate students at the university, Miller set out to explore the extent of the issue in common computer applications. Their research, conducted over several years, caused program failures across a wide array of Unix, Windows and Macintosh applications by feeding them noisy inputs. He gave their new testing strategy the name “fuzz” testing to “evoke the feeling of random, unstructured data.”
But the new model wasn’t without its critics.
“In the process of writing our early fuzz papers, we came across strong resistance from the testing and software engineering community,” Miller writes in his book. “The lack of a formal model and methodology and undisciplined approach to testing often offended experienced practitioners in the field.”
Today, there are many open-source and commercial tools available that help developers incorporate fuzzing into the software development lifecycle. The tools themselves also have fun and whimsical names, such as the Peach fuzzer or the american fuzzy lop (AFL) fuzzer, which is named after a species of rabbit. There are now many different types of fuzz testing to choose from, each type improving upon the weaknesses of the last.
And yet, fuzz testing still has a reputation in many circles as a mindless strategy that doesn’t yield significant results. The tools available make that reputation less and less deserved, and when it’s configured correctly, fuzzing is an effective, low-effort way of testing applications.
The Benefits of Fuzz Testing
Miller’s concern about what he saw during his thunderstorm experience extended beyond the annoyance of having applications crash unexpectedly. Applications that are not able to handle unexpected input also pose security concerns. Errors that aren’t handled by the program are vulnerabilities that attackers can exploit to hack into systems.
In fact, attackers often use fuzz testing tools to locate vulnerabilities in applications, according to Jared DeMott, the CEO of VDA Labs security testing company and the instructor of several Pluralsight courses on testing.
“If you follow what we call a secure development lifecycle ... fuzzing is one piece of the lifecycle that relates to the testing portion of it,” DeMott said.
Attackers often use fuzz testing tools to locate vulnerabilities in applications.
There are many types of testing available, such as static application security testing (SAST) and dynamic application security testing (DAST). SAST tools examine application code at rest, scanning for known mistakes that can lead to security vulnerabilities, while DAST tools find bugs by running the application. Fuzz testing is similar to DAST because it checks to see how an application responds when it is running and receiving different inputs, but the errors that each method finds are different.
“DAST is looking for known vulnerabilities, things like SQL injection, cross-site scripting ... not something unknown to it,” said David DeSanto, director of product for security at GitLab.
But there are other types of errors that SAST and DAST tools aren’t able to catch. DeSanto gave an example of a code snippet that allocated a set amount of memory for an array, then proceeded to read beyond the space given to the array. This type of error would crash an application and introduce security vulnerabilities, but it’s not included in the type of code errors that SAST and DAST tools look for. However, it’s easy for fuzzers to find that type of bug.
The Types of Fuzzers
All types of fuzz testing work by sending data to an application that is different from what the application wants or expects to receive. The simplest type of fuzz testing is similar to what happened to Miller during the thunderstorm, where applications are bombarded with completely random inputs. But sending random data is inefficient, because there is no strategy for testing applications where they are most likely to be vulnerable.
“It’s an infinite space problem,” said Jonathan Knudsen, senior security strategist at Synopsys, a tech company that focuses on silicon design and software security. “You can make an infinite number of bad inputs for a piece of software. So, to do effective fuzzing in finite time, you have to pick some subset of those to use as test cases.”
Simply generating random data also often won’t work because applications usually expect inputs to be formatted a certain way, otherwise the input would be rejected. A simple example Knudsen gave was an application that read in data from text files, but the names of the text files all had to have a certain five-digit code in them for the application to treat them as input. If fuzz testing generated completely random files with random file names, none of the files would even be read by the application.
That means that it’s important to understand the parameters of what an application is expecting when generating fuzz testing data.
“The next step up from random is mutational fuzzing,” Knudsen said. “You start with a known good input, and then you just mess it up in certain ways to get the test cases.”
If you’re thinking about an integer, or a string, maybe you’re only supposed to send a small string, but we send a really big one.”
Instead of generating a completely random input file, for instance, the tester would take a legitimate input file that the application accepts, then make edits to generate different tests.
“You might go and shorten it, or insert some extra data in the middle, or flip some bits through it,” Knudsen said. “Because you started from a good input, the test cases look pretty much correct except for the place where they’re messed up, and so they’re more believable. So the target software will do more work on them and you’ll be more likely to find bugs in the target software.”
Some of the most effective data to test with is found by looking at the “boundary conditions” of the data, as DeMott says in his course on fuzz testing.
“If you’re thinking about an integer, or a string, maybe you’re only supposed to send a small string, but we send a really big one,” DeMott says. “Or maybe you’re only supposed to send a small number for the number of bytes, but you send a null, or negative or a huge [one].”
Boundary data is great for testing because these are likely the conditions that developers of an application didn’t consider and forgot to handle in the code, and are therefore likely to expose errors.
Another type of fuzzer, known as the generational fuzzer, is better than mutational fuzzers at creating data at these boundary conditions. The downside is that generational fuzzers rely on developers to create data templates that the fuzzer then uses to generate test inputs. Knudsen’s company, Synopsys, makes a popular commercial generational fuzzing tool called Defensics.
Knowing something about the spec is kind of hard, because it’s expensive to pre-create all these test cases.”
A third type of fuzzer is called the guided fuzzer, or evolutionary fuzzer, which combines aspects of both mutational and generational fuzzers.
“[Guided fuzzers] combine these ideas of randomly sending data and also knowing something about the spec,” DeMott said. “Knowing something about the spec is kind of hard, because it’s expensive to pre-create all these test cases. You have to pay somebody to do that, and they need to know a lot about the protocol.”
Guided fuzzers use code coverage analysis to calculate how well different test cases perform, and they use mutational fuzzing to generate more test cases similar to the high-performing ones.
“If you’re covering more and more of the program, you’re likely to uncover more and more of the bugs,” DeMott said.
There’s also an “advanced” fuzzer that pulls elements of all three types of fuzzers.
“They combine symbolic execution, which is not just having code coverage and guessing, but actually measuring decision points in the software,” DeMott said.
DeMott recommended that developers use the more advanced types of fuzzers, each of which includes the gains from the previous iterations of fuzzers. However, he noted that the most up-to date-commercial tools may be more expensive, and some fuzzers may only be compatible with certain types of operating systems.
Setting Up Fuzzers Can Be Cumbersome
One of the things preventing greater adoption of fuzz testing among developers is that set-up still takes some time.
“On one hand, these tools are getting easier to use all the time,” DeMott said. “But on the other hand, it still takes deep expertise and security and coding to really set up, monitor, manage and triage the bugs that come out.”
Knudsen, whose company creates a generational fuzzer, said the hardest part for customers is building out the data model for the fuzzer to use as a template to create test cases. The commercial tool includes a large catalog of test suites, which means that customers using standard network protocols and file formats can simply use existing templates instead of taking time to create their own.
“Let’s say you’re making an Internet of Things device, like a smart thermostat,” Knudsen said. “It’s going to have to communicate with a server somewhere. So to do that, there are already established network protocols that people can use, so you can reuse a lot of existing implementations.”
But most applications are still going to have custom code on top of these existing protocols that needs to be tested with new data models. Defensics has a software development kit (SDK) that developers can use to create custom data models, but like with other generational fuzzing tools, it takes some effort to make the models.
People who use fuzzing traditionally had to be an expert in fuzzing to use it.”
It’s also important to integrate fuzz testing into the DevOps pipeline, and to have proper set-up so that test failures are logged. If tests simply fail without logging the input data that caused the error, developers won’t know how to fix the code. Google creates a tool for managing the fuzzing infrastructure and tracking fuzzer performance, called ClusterFuzz. (“I actually came up with the name first many years ago and my advisor made me change it,” DeMott said.)
Last month, GitLab acquired Peach Tech and Fuzzit, generational fuzzing tools that GitLab is integrating into its CI/CD process.
“We felt like there was a gap,” DeSanto said, referring to GitLab’s previous offering of security testing tools, which included SAST and DAST offerings. “Our security team today uses GitLab Secure as its primary testing tool for making sure our application is secure and reliable. For our customers, starting with our release next month, they will begin to be able to run it, assessing themselves using GitLab against their own software.”
GitLab’s goal is to make set-up easier for developers wanting to incorporate fuzz testing into their development lifecycles. Developers who want to use one of the fuzzers only need to add an additional line to the YAML script used to orchestrate the CI/CD process.
“People who use fuzzing traditionally had to be an expert in fuzzing to use it,” DeSanto said. “The advantage to using fuzzing with GitLab [is that] we understand your application. You’re using us for source code management and CI. So instead of having to tell [the fuzzer], ‘Here is the structure of my API,’ we can just scan your source code and build that definition for you automatically. There’s no need for you to have to try to understand the tool to use the tool.”
If more fuzzing tools are more easily integrated into developers’ workflows, fuzz testing may become a more widespread and effective part of the testing landscape.
Fuzz Testing Is Good for Native Languages
Before you go and fuzz test all your applications, however, it might be worth it to consider what types of applications can benefit more from fuzz testing.
“This type of testing is pretty specific to code called native code,” DeMott said. “Basically where you’re mishandling data in memory, you’re overwriting memory in some way, you’re writing out of bounds of a buffer or an array, or you’re writing to a portion of memory that you shouldn’t have a pointer to — a lot of it’s related to memory corruption.”
DeMott said that native programming languages, like C and C++, leave memory management up to the developer, which opens up a lot of opportunities for dangerous memory errors. Meanwhile, higher-level programming languages such as Java and C# handle memory for the developer, so those programs won’t suffer from the same problems.
That said, there are plenty of applications that are written in C and C++, including a lot of programs in defense, automotive and aerospace industries, according to DeMott.
“These are all mission-critical apps,” DeMott said. “If your car crashes, if your missile shoots when it shouldn’t, bad things happen, so all these domains are very important.”
Knudsen agreed that memory mismanagement causes more than its fair share of software failures, but said that there are still many other coding errors that fuzz testing could catch for programs written in higher-level languages.
“There are all these different ways that software can fail,” Knudsen said. “One way is that maybe you crash a process.... You can do things like send the process into an infinite loop, which could happen in any language.”
Fuzz testing works together with SAST and DAST tools, unit and integration tests that developers write and the workflow and DevOps tools built around them to find security vulnerabilities in code that no single tool would be able to catch.
“All of these testing techniques complement each other, they’re all likely to find things that other approaches miss,” DeMott said.
Ultimately, fuzz testing is just another useful tool to complement the other kinds of testing you do.