It’s not how long you let it run, it’s how you wiggle your fuzzer
Sun Tzu
I spent some time hacking libFuzzer and pondering its techniques. I’ve come up with some additions that I expect will dramatically speed up finding certain edge cases.
First of all a huge vote of appreciation for Michał Zalewski and the people behind libFuzzer and the various sanitizers for their work. The remarkable ease by which fuzzers can be attached to arbitrary software to find world-class bugs that affect millions is at least as commendable as the technical underpinnings. The shoulders of giants.
You can find my fuzzer here: https://github.com/guidovranken/libfuzzer-gv
Remember that these features are very experimental. Developers of libFuzzer and other fuzzers are encouraged to merge these features into their work if they like it.
Code coverage is just one way to guide the fuzzer
Code coverage is the chief metric that a fuzzer like libFuzzer uses to increase the likelihood that a code path resulting in an error is found. But the exact course of code execution is determined by many more factors. These factors are not accounted for by code coverage metrics alone. So I’ve implemented a number of additional program state signalers that help reach faulty code quickly. Without these, certain bugs will be uncovered only after a very long time of fuzzing.
Stack-depth-guided fuzzing
void recur(size_t depth, size_t maxdepth) { if (depth >= maxdepth) { return; } recur(depth + 1, maxdepth); } extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { size_t i, maxdepth = 0; for (i = 0; i < size; i++) { if (i % 3 == 0 && data[i] == 0xAA) { maxdepth += 1; } } maxdepth *= 400; recur(0, maxdepth); return 0; }
Given enough 0xAA’s in the input, the program will crash due a stack overflow (recursing too deep). With -stack_depth_guided=1 -use_value_profile=1 it usually takes about 0.5 – 5 seconds to crash on my system.
With just -value_profile=1 (and ASAN_OPTIONS=coverage=1:coverage_counters=1), it takes about 5-10 minutes. I think this is pure chance though. I’ve done runs where it was still busy after an hour.
static void getStackDepth(void) { size_t p; asm("movq %%rsp,%0" : "=r"(p)); p = 0x8000000000000000 - p; if (p > fuzzer::stackDepthRecord) { fuzzer::stackDepthRecord = p; if (fuzzer::stackDepthBase == 0) { fuzzer::stackDepthBase = p; } } }
(yes, this specific implementation works only on x86-64. If this doesn’t work for you, comment it out or change it to suit your architecture.)
If you need a fuzzer input that exceeds a certain stack depth as a file, you can lower the stack size with ulimit -s before running the fuzzer. It will crash and libFuzzer writes the fuzzer input to disk.
Crashes due to excessive recursion are, I think, an under-appreciated class of vulnerabilities. For server applications, it matters a lot that an untrusted client can perform a stack overflow on the server. These vulnerabilities are relatively rare, but I did manage to find a remote, unauthenticated crasher in high-profile software (Apache httpd CVE-2015-0228).
A lot of applications that parse context-free grammar, such as
- Programming languages (an expression can contain an expression can contain an expression..)
- Serialization formats (JSON: an array can contain an array can contain an array ..)
are in theory susceptible to this.
PS: you can use my tool to find call graph loops in binaries.
Intensity-guided fuzzing
This feature quantifies the number of instrumented locations that are hit in a single run. It is the aggregate of non-unique locations accessed.
So if a certain for loop of 1 iteration causes the coverage callback to be called 5 times, the same loop of 5 iterations results in an aggregate value of 5*5=25.
Great to find slow inputs.
Allocation-guided fuzzing
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { size_t i, alloc = 0; void* p; for (i = 0; i < size; i++) { if (i % 3 == 0 && data[i] == 0xAA) { alloc += 1; } } if (alloc >= 1350) alloc = -1; p = malloc(alloc); free(p); return 0; }
Given enough 0xAA’s in the input, the program will perform an allocation of -1 bytes. AddressSanitizer does not tolerate this and it will crash.
With -alloc_guided=1 -value_profile=1, it usually takes 10-25 seconds on my system until it crashes (which is what we want).
With just -value_profile=1 (and ASAN_OPTIONS=coverage=1:coverage_counters=1), it was still running after more than an hour. It has very little to go on, and it cannot figure out the logic.
I expect this feature will help to find certain threshold-constrained issues. For instance, an application runs fine if less than 8192 elements of something are involved. Beyond that threshold, it resorts to different, erroneous logic (maybe a wrong use of realloc()). This feature guides the fuzzer towards that pivot.
Aside from finding crashes, this feature is great at providing insight into the top memory usage of an application, and it automatically finds the worst case input in terms of heap usage (because fuzzing is guided by the malloc()s). If you can discover an input that makes a server application reserve 50MB of memory whereas the average memory usage for normal requests is 100KB, it’s not a vulnerability in the traditional sense (although it may be a very cheap DoS opportunity), but it might make you consider refactoring some code.
Custom-guided fuzzing
libFuzzer expects that LLVMFuzzerTestOneInput returns 0. It will halt if it returns something else. It isn’t used for anything else at this moment. So I thought I’d put it to good use. Use -custom_guided=1.
You can now connect libFuzzer to literally anything. I’m experimenting with connecting to a remove server in LLVMFuzzerTestOneInput, hashing what the server returns, and return the number of unique hashes produced so far. So I am in fact fuzzing a remote, uninstrumented application.
Disable coverage-guided fuzzing
Use -no_coverage_guided=1 to disable coverage-guided fuzzing. This is useful if you want to rely purely on, say, allocation guidance.
Techniques tried and discarded
Favoring efficient mutators
I’ve tried keeping a histogram for mutator efficacy. So each time a certain mutator (like EraseBytes, InsertBytes, …) was responsible for an increase in code coverage, I incremented its histogram value. Then, when the mutator for the next iteration had to be selected, I favored the most efficient mutator (but less efficient mutators could be chosen as well, just with a smaller likelihood).
Upon class construction I created a lin-log look-up table. For 5 mutators, it looks like this:
LUT = [0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4]
Every iteration, I sorted the histogram and save the order of the indices. So if the histogram looks like this:
Mutator 0: 100 hits Mutator 1: 1000 hits Mutator 2: 500 hits Mutator 3: 1200 hits Mutator 4: 10 hits
The (reverse) sorted sequence of indices is then:
LUT2 = [4, 0, 2, 1, 3]
To choose a new mutator:
curMutator = LUT2[ LUT[ rand() % numMutators ] ]
So mutator 3 is now strongly favored (chance of 1 in 3), but there is still a 1 in 15 chance that mutator 4 gets chosen.
Unfortunately, this effort was in vain. It appeared to only slow down fuzzing. Apparently the fuzzer needs mutator diversity in order to reach new coverage. Or I have been overlooking something, in which case you are free to comment ;).
Unique call graph traversal
I figured that an approach that embeds both stack-depth-guidance and code intensity-guidance is to keep an array of code locations hit by the application in one run, hash the array, and use the number of unique hashes as guidance. Unfortunately this number increments for nearly every input, and soon memory is exhausted. Maybe a less granular coverage instrumentation could work.
Fuzzing tips du jour
- Sanitizers and fuzzers are distinct technologies. You can fuzz without sanitizers (and sanitize without fuzzing): speed up corpus generation by an order of magnitude -> then test the corpus with sanitizers.
- Developers: you can use fuzzing to verify application logic. Put an abort() where you normally print a debug message when an assert() failed that you believe should never fail. Now fuzz it.
- Sometimes optimizations and compiler versions matter. gcc + ASAN detects an issue in the following program with -O0, but not with -O1 and higher: int main(){char* b;int l=strlen(b);} . clang doesn’t find it with any optimization flag. The reverse (crashes with -O3, not with -O1) can also happen (see my OpenSSH CVE-2016-10012). Security that relies on specific compiler versions and flags is probably a great way to contribute backdoored code to open-source software, if you are so inclined. Had I been a bad hombre, this is what I would do. Maintainers testing your code with a their clang -O2 build system + regression tests + fuzzing rig will probably not detect your malicious code hiding in plain sight, but it is nonetheless going to creep into some percentage of binaries.
Work
There’s been a lot of commercial interest in my activities after OpenVPN. Yes, I am available for contracting work.
I’ve recently completed work for a well-respected open-source application. I had a wonderful run: about 10 remote vulnerabilities in one week (release 17 Jul 2017).
I love to go full-out on software and exploit every technique known to me to squeeze out every vulnerability. I’ve got a lot of lesser-known tricks up my sleeve that I like to use.
Feel free to contact me: guidovranken @ gmail com and inquire about the possibilities.
