The OpenVPN post-audit bug bonanza

Summary

I’ve discovered 4 important security vulnerabilities in OpenVPN. Interestingly, these were not found by the two recently completed audits of OpenVPN code. Below you’ll find mostly technical information about the vulnerabilities and about how I found them, but also some commentary on why commissioning code audits isn’t always the best way to find vulnerabilities.

Here you can find the latest version of OpenVPN: https://openvpn.net/index.php/open-source/downloads.html

This was a labor of love. Nobody paid me to do this. If you appreciate this effort, please donate BTC to 1BnLyXN2QwdMZLZTNqKqY48bU4hN2A3MwZ.

Introduction

After a hardening of the OpenVPN code (as commissioned by the Dutch intelligence service AIVD) and two recent audits 1 2, I thought it was now time for some real action ;).

Most of this issues were found through fuzzing. I hate admitting it, but my chops in the arcane art of reviewing code manually, acquired through grueling practice, are dwarfed by the fuzzer in one fell swoop; the mortal’s mind can only retain and comprehend so much information at a time, and for programs that perform long cycles of complex, deeply nested operations it is simply not feasible to expect a human to perform an encompassing and reliable verification.

End users and companies who want to invest in validating the security of an application written in an “unsafe” language like C, such as those who crowd-funded the OpenVPN audit, should not request a manual source code audit, but rather task the experts with the goal of ensuring intended operation and finding vulnerabilities, using that strategy that provides the optimal yield for a given funding window.

Upon first thought you’d assume both endeavors boil down to the same thing, but my fuzzing-based strategy is evidently more effective. What’s more, once a set of fuzzers has been written, these can be integrated into a continuous integration environment for permanent protection henceforth, whereas a code review only provides a “snapshot” security assessment of a particular software version.

Manual reviews may still be part of the effort, but only there where automation (fuzzing) is not adequate. Some examples:

verify cryptographic operations
other application-level logic, like path traversal (though a fuzzer may help if you’re clever)
determine the extent to which timing discrepancies divulge sensitive information
determine the extent to which size of (encrypted) transmitted data divulges sensitive information (see also). Beyond the sphere of cryptanalysis, I think this is an underappreciated way of looking at security.
applications that contain a lot of pointer comparisons (not a very good practice to begin with — OpenVPN is very clean in this regard, by the way) may require manual inspection to see if behavior relies on pointer values (example)
can memory leaks (which may be considered a vulnerability themselves) can lead to more severe vulnerabilities? (eg. will memory corruption take place if the system is drained of memory?)
can very large inputs (say megabytes, gigabytes, which would be very slow to fuzz) cause problems?
does the software rely on the behavior of certain library versions/flavors? (eg. a libc function that behaves a certain way with glibc may behave differently with the BSD libc — I’ve tried making a case around the use of ctime() in OpenVPN)

So doing a code audit to find memory vulnerabilities in a C program is a little like asking car wash employees to clean your car with a makeup brush. A very noble pursuit indeed, and if you manage to complete it, the overall results may be even better than automated water blasting, but unless you have infinite funds and time, resources are better spent on cleaning the exterior with a machine, vacuuming the interior followed by an evaluation of the overall cleanliness, and acting where necessary.

Vulnerabilities

Remote server crashes/double-free/memory leaks in certificate processing

Reported to the OpenVPN security list on May 13.

CVE-2017-7521

There are several issues in the extract_x509_extension() function in ssl_verify_openssl.c. This function is called if the user has used the ‘x509-username-field’ directive in their configuration.

GENERAL_NAMES *extensions;
int nid = OBJ_txt2nid(fieldname);

extensions = (GENERAL_NAMES *)X509_get_ext_d2i(cert, nid, NULL, NULL);

The first issue. The ‘fieldname’ variable is the value specified in the configuration file after the ‘x509-username-directive’. Different NID’s require different storage structures. That is to say, using a GENERAL_NAMES structure for every NID will result in spectacular crashes for some NIDs.


ASN1_STRING_to_UTF8((unsigned char **)&buf, name->d.ia5);
if (strlen(buf) != name->d.ia5->length)
{
    msg(D_TLS_ERRORS, "ASN1 ERROR: string contained terminating zero");
    OPENSSL_free(buf);
}
else
{
    strncpynt(out, buf, size);
    OPENSSL_free(buf);
    retval = true;
}

The second issue. The return value of ASN1_STRING_to_UTF8 is not checked. It may return failure, in which case buf retains its value. This code is executed in a loop (for every GENERAL_NAME encoded in the certificate). So let’s consider this scenario:

First loop: ASN1_STRING_to_UTF8 succeeds, and buf is processed and freed in any of the following branches.
Second loop: ASN1_STRING_to_UTF8 fails, and buf is processed (use-after-free) and freed (double-free) in any of the following branches.

In spite of extensive fuzzing I could not trigger a single ASN1_STRING_to_UTF8 failure using OpenSSL 1.0.2l. It may or may not be possible with other versions of OpenSSL, LibreSSL, BoringSSL. This would NOT indicate a bug in those libraries — as an API, they are allowed to fail for any reason. The actual error is OpenVPN not checking the return value.

But what makes this interesting is that at the end of this function, the following attempt is made to free the ‘extensions’ variable.

sk_GENERAL_NAME_free(extensions);

This is wrong. The correct way to do this is to call GENERAL_NAMES_free. This is because sk_GENERAL_NAME_free frees only the containing structure, whereas GENERAL_NAMES_free frees the structure AND its items.

Hence, there is a remote memory leak here.

If you look in the OpenSSL source code, one way through which ASN1_STRING_to_UTF8 can fail is if it cannot allocate sufficient memory. So the fact that an attacker can trigger a double-free IF the server has insufficient memory, combined with the fact that the attacker can arbitrarily drain the server of memory, makes it plausible that a remote double-free can be achieved. But if a double-free is inadequate to achieve remote code execution, there are probably other functions, whose behavior is wildly different under memory duress, that you can exploit.

Furthermore, there are two more instances of ASN1_STRING_to_UTF8 in this file:

(in the function extract_x509_field_ssl)

tmp = ASN1_STRING_to_UTF8(&buf, asn1);
if (tmp <= 0) {    return FAILURE; }

(in the function x509_setenv_track)

 if (ASN1_STRING_to_UTF8(&buf, val) > 0)
{
    do_setenv_x509(es, xt->name, (char *)buf, depth);
    OPENSSL_free(buf);
}

(in the function x509_setenv)

if (ASN1_STRING_to_UTF8(&buf, val) <= 0)
{
    continue;
}

Here, the code assumes that a return value that is negative or zero indicates failure, and ‘buf’ is not initialized, and needs not to be freed. But in fact, this is ONLY the case if ASN1_STRING_to_UTF8 returns a negative value. A return value 0 simply means a string of length 0, but memory is nonetheless allocated, so there are memory leaks here as well.

Remote (including MITM) client crash, data leak

Reported to the OpenVPN security list on May 19.

CVE-2017-7520

This only affects clients who use OpenVPN to connect to an NTLM version 2 proxy.

ntlm_phase_3() in ntlm.c:

if (( *((long *)&buf2[0x14]) & 0x00800000) == 0x00800000)          /* Check for Target Information block */
{
    tib_len = buf2[0x28];            /* Get Target Information block size */
    if (tib_len > 96)
    {
        tib_len = 96;
    }
    {
        char *tib_ptr = buf2 + buf2[0x2c];           /* Get Target Information block pointer */
        memcpy(&ntlmv2_blob[0x1c], tib_ptr, tib_len);           /* Copy Target Information block into the blob */
    }
}

‘buf2’ is an array of type char (signed), which contains data sent by the peer (the proxy).
‘tib_len’ is of type int.

First issue: remote crash. If buf[0x28] contains a value of 0x80 or higher, ‘tib_len’ will be negative; both variables are signed, after all. This will cause memcpy to crash.

Second issue: data leak. buf[0x2c] is used as an index to the buf2 array. Because ‘buf2[0x2c]’ is a signed value, if it is >= 0x80, it will cause tib_ptr to point BEFORE ‘buf2’.
Memory at this location is then copied to ntlmv2_blob, which is then sent to the peer.
This constitutes a data leak.

Because the user’s password is also stored on the stack (the variable ‘pwbuf’ in this function), this or other sensitive information to the peer in cleartext.

These issues can be triggered by an actor in an active man-in-the-middle role.

Remote (including MITM) client stack buffer corruption

Reported to the OpenVPN security list on June 6.
No CVE
Patch: https://github.com/OpenVPN/openvpn/commit/69162924de3600bfe8ae9708a1d6e3f4515ef995

This is exceedingly unlikely to occur.

The my_strupr function in ntlm.c is constructed as follows:

unsigned char *
my_strupr(unsigned char *str)
{
    /* converts string to uppercase in place */
    unsigned char *tmp = str;

    do
    {
        *str = toupper(*str);
    } while (*(++str));
    return tmp;
}

From this code it is obvious that if a string of length 0 is passed, OOB read(s) and possibly write(s) will occur.

In the case of a string of length 0, the null terminator is toupper()’ed, pointer is incremented, byte AFTER null terminator is evaluated, and if not null toupper()’ed, until a second NULL byte is seen.

The function is invoked once:

my_strupr((unsigned char *)strcpy(userdomain, username));

Exploitation can only be achieved if:

NTLM version 2 is used.
The user specified a username ending with a backslash.
The (uninitialized) ‘username’ array constists entirely of non-null values.
The stack layout is such that the ‘username’ array is followed by a pointer, or something else that, if toupper()’ed, could cause arbitrary code execution.

This issue can be triggered by an actor in an active man-in-the-middle role.

Remote server crash (forced assertion failure)

Reported to the OpenVPN security list on May 20.
CVE-2017-7508

The OpenVPN server can be crashed by sending crafted data.

mss_fixup_ipv6() in mss.c:

if (buf_advance( &newbuf, 40 ) )
{
    struct openvpn_tcphdr *tc = (struct openvpn_tcphdr *) BPTR(&newbuf);
    if (tc->flags & OPENVPN_TCPH_SYN_MASK)
    {
        mss_fixup_dowork(&newbuf, (uint16_t) maxmss-20);
    }
}

in mss_fixup_dowork():

ASSERT(BLEN(buf) >= (int) sizeof(struct openvpn_tcphdr));

It is possible to construct a packet to the server such that this assertion will fail, and the server will stop.

Crash mbed TLS/PolarSSL-based server

Reported to the OpenVPN security list on May 22.

CVE-2017-7522

This requires that the –x509-track configuration option has been set. It affects OpenVPN 2.4 (not 2.3) compiled with mbed TLS/PolarSSL as the cryptography backend. The crafted certificate must have been signed by the CA.

When parsing the client certificate, asn1_buf_to_c_string() may be called (via x509_setenv_track -> do_setenv_name).
It iterates over an ASN1 string as follows:

for (i = 0; i < orig->len; ++i)
{
    if (orig->p[i] == '\0')
    {
        return "ERROR: embedded null value";
    }
}

If a null byte is found within this string (ASN1 allows this), the static string “ERROR: embedded null value” is returned. If no null byte is found, a heap-allocated string is returned.

The static string becomes problematic if a while later string_mod() is called. This attempts to modify the string. This will typically cause a crash, because the static string is stored in a read-only memory region.

Stack buffer overflow if long –tls-cipher is given

Reported to the OpenVPN security list on May 12
No CVE
Patch: https://github.com/OpenVPN/openvpn/commit/e6bf7e033d063535a4414a4cf49c8f367ecdbb4f

An excessively long –tls-cipher option can cause stack buffer corruption. This can only affect the user if they load untrusted options. Not considered an actual vulnerability because untrusted options may execute arbitrary code via other option directives by design (see commit message).

As a general rule, don’t load untrusted configuration files.

(v)s(n)printf hardening

Reported to the OpenVPN security list on May 23

This is not a vulnerability. It is a proposed hardening technique. My motivation can be read here: https://community.openvpn.net/openvpn/ticket/894
The gist is that vsnprintf and related functions (upon which OpenVPN heavily relies) can, in theory, fail. The reasons for this are entirely inherent to the libc’s internal logic, and behavior may differ from one libc to the other. It must be noted that it is exceedingly unlikely that these functions fail in practice. However, should this happen, this could create dangerous data leaks of sensitive data. My proposed patch remedies this and ensures no data is ever leaked.

Other bugs

Some other minor bugs, that don’t impact security, have been found: https://github.com/OpenVPN/openvpn/commits/master

How I fuzzed OpenVPN

Fuzzing OpenVPN has been an extensive effort. You can’t just chain the fuzzer to arbitary internal functions for various reasons:

OpenVPN executes external programs like ipconfig and route to modify the system’s networking state. This is not acceptable within a fuzzing environment.
Direct resource access (files, networking) occurs throughout the code. You certainly don’t want the fuzzer to end up writing random files and sending data to random IP’s.
There are many ASSERT() statements throughout the code. These will cause a direct abort if the enclosed condition is false. This makes fuzzing impossible; you want the fuzzer to run for hours, not abort after 2 seconds.

To work around the first problem, I modified the source code such that in fuzzing mode, everything leading up to the actual execve() is executed (processing of arguments to the external program), but the actual execve() call is commented out. It will return success or failure based on a bit in the fuzzer input data.

To prevent access to resources, I implemented abstractions for libc functions. For example, recv() is now platform_recv(), and within platform_recv() I either call recv() directly (in non-fuzzing mode), or grab data from the fuzzer input data (in fuzzing mode). Similarly, through abstractions such as platform_read(), the application can open, read and write to files at will. The data that it expects is transparently pulled from the fuzzing input.

To deal with the assertions, there is no other way than to comment them out (#ifndef FUZZING .. ASSERT(condition) .. #else if (condition) return; #endif), but only in certain cases.

I leave them in place in situations where the assertion condition depends directly on untrusted data. As an example, say the application recv()’s data from the peer, and then does ASSERT(recvd_data[3] == 0x20). It is important to leave this ASSERT in; it implies that the client can force an abort() on the server (or vice-versa); this can be considered a security issue.

But there are also ASSERTs that rely on variables within an internal data structure. I typically fill these data structures with fuzzer input. Rather than manually ensuring that these variables are valid and coherent with regard to the application’s logic, I simply change the ASSERTs that rely on this validity into ‘return’ where possible (and free objects, where applicable).

I’ve used libFuzzer combined with AddressSanitizer (ASAN), UndefinedBehaviorSanitizer (UBSAN) and MemorySanitizer (MSAN). ASAN cannot be combined with MSAN, and moreover MSAN does not work with libFuzzer (due to the apparent use of uninitialized memory within libFuzzer itself). So the way to go is to generate a corpus with the fuzzer, and then execute each of the resulting inputs with a MSAN-enabled standalone version.

There are various discrete components in OpenVPN that together constitute the application. There is an extensive suite of functions to deal with data buffers (buffer.c, buffer.h), an extensive option parser (options.c — parses the configuration file, command line arguments as well as commands pushed by server to client), a base64 encoder/decoder, etc.
Thanks to this relative modularity in OpenVPN it has been possible to use and abuse these components as if they were an API with relatively little effort.
My approach for testing all of these API-like components is as follows:

(assume 3 functions to be tested)

Get a number from the fuzzer input data in the range 0 – 2.
Call either of the three functions based in the number
Provide each function with parameters derived from the input data where dynamic parameters are required
Repeat the above process a number of times (for example (for i = 0; i < 5; i++) { … })

This will cause an ever-permutating sequence of invocations. Essentially the coverage surface becomes (near-) absolute, that is to say, (almost) every conceivable way to use the API is a contender to be tested via this algorithm.

This approach is especially useful to test the functions that operate on the same structure. If there exists any sequenced set of functions that would cause memory violations, this setup is bound to find it. Of course, the actual use of any group of functions within the application is only a small subset of all permutations and parameters that the fuzzer sets out to test, and any mishaps the fuzzer finds for very particular circumstances may not actually occur within the code. But it is nonetheless good to know, because:

If you know that a certain sequence of calls and their parameters will lead to memory corruption, you can now perform a manual code analysis to see if this situation occurs.
Corner-case API bugs that are not invoked now, may become manifest in the future once code (with calls to the API) is added that does trigger these bugs.

In MSAN-enabled builds I serialize the output structure (if there is one) to /dev/null. For example, the options parser stores all its data in a struct options variable. MSAN does not immediately report the use of uninitialized data; it only does so if it is used in conditions that lead to branching (if (x) …) or when the data is used for I/O.

Hence, by serializing this data to /dev/null (normally a no-op), I force MSAN to detect uninitialized data. In C, there is no automatic way to serialize nested data structures (struct A contains a pointer to struct B etc), so for some structures I had to manually make a serialization stack of functions.

Limited fuzzing on a 32 bit platform has also been performed. This did not find any issues that do not occur on 64 bit.

Sorry I strongly disagree with you point of view. Mostly that fuzzing has it place but it is more often than not the wrong method. Also manual reviews for lot of what you listed is also wrong.

What is wrong in most cases is no specification extraction tool and no complete specification to compare programs functionality to. Also why is manual reviews wrong in most cases.
http://ts.data61.csiro.au//projects/TS/autocorres/
There are tool assisted reviews. These reduce the lines of code reviewer has to go over and can be used to compare the program final logic. The more lines a human has to review the more likely they will miss something and less likely they will have as complete of picture.

Most validation performed is not done to formal validation standards include the fact that the2 openvpn ones were not done to formal validation standard.

http://ts.data61.csiro.au//projects/TS/tools.pml
If you read here you will notice C to HOL. This then can compare that the C code matches a HOL defined specification of action. So you are no longer limited by C logic in validation. Everything you listed for manual review can be in an almost 100 percent automated review that can be performed every code modification.

>>can very large inputs (say megabytes, gigabytes, which would be very slow to fuzz) cause problems?<<
This is something making a specification of action and performing a formal mathematical proof on it has major advantage over fuzz. Since you have reduced the code to a mathematically processable theory you can find issues that are impossible for fuzz testing. Mathematics calculating every possible outcome provides a different method of validation to brute forcing that fuzz testing is.

The problem is how do we get projects that are required for security to start using mathematical proofs. Fuzzing is better than nothing. Note the thing about using a formal mathematical proof system is you don't have to run the code to find a lot of error and the formal mathematical proof can be used to guide you fuzz testing to make sure you cover all code paths and to intentionally guide the fuzz into suspected errors.

Integrating Fuzz testing into build process means project does not go on and develop a mathematical proof system so have a lot more flaws than they should and still using manual testing for a lot of those flaw. It is possible for a mathematical proof to tell you that is pointless to-do any Fuzz testing as no flaw does exist. A fuzz test solution can never tell you that you have found every flaw.

16 thoughts on “The OpenVPN post-audit bug bonanza”

ofr June 21, 20171:39 pm Reply

Great work. I am curious why you don’t fuzz in a VM where it wouldn’t matter to have the fuzzer modify network state or create arbitrary files. It might even be easier to inspect memory etc.
Bill June 21, 20172:11 pm Reply

Fuzzing and manual code audit both often find bugs, but very rarely do sets intersect. Nice bugs (and writeup)!
Ross June 21, 20174:31 pm Reply

THANK YOU. Manual source code review, pfft!!
Nick June 21, 201710:45 pm Reply

libFuzzer works with MSAN, but you have to use a libc++ built with MSAN when you compile libfuzzer with MSAN.
1. guidovranken June 24, 20171:31 am Reply
  
  Thanks! Gonna try that.
ad June 22, 20179:35 am Reply

Excellent write-up (rare thing these days). Kudos!
Chris June 22, 20179:45 am Reply

Would static analysis tools like Coverity have caught these issues?
Jakub June 22, 201711:53 am Reply

Great article, thanks!
JackTechie (@jacktechnie) June 22, 20173:04 pm Reply

Awesome
Maleko June 23, 20172:36 am Reply

Good read, even though I don’t have any idea how most of the details work. I am aspiring to the field of networking and security myself and find these sorts of in-depth analysis intrigue me. I hope to join you in making software more secure someday.
oiaohm June 25, 20173:23 am Reply

Sorry I strongly disagree with you point of view. Mostly that fuzzing has it place but it is more often than not the wrong method. Also manual reviews for lot of what you listed is also wrong.

What is wrong in most cases is no specification extraction tool and no complete specification to compare programs functionality to. Also why is manual reviews wrong in most cases.
http://ts.data61.csiro.au//projects/TS/autocorres/
There are tool assisted reviews. These reduce the lines of code reviewer has to go over and can be used to compare the program final logic. The more lines a human has to review the more likely they will miss something and less likely they will have as complete of picture.

Most validation performed is not done to formal validation standards include the fact that the2 openvpn ones were not done to formal validation standard.

http://ts.data61.csiro.au//projects/TS/tools.pml
If you read here you will notice C to HOL. This then can compare that the C code matches a HOL defined specification of action. So you are no longer limited by C logic in validation. Everything you listed for manual review can be in an almost 100 percent automated review that can be performed every code modification.

>>can very large inputs (say megabytes, gigabytes, which would be very slow to fuzz) cause problems?<<
This is something making a specification of action and performing a formal mathematical proof on it has major advantage over fuzz. Since you have reduced the code to a mathematically processable theory you can find issues that are impossible for fuzz testing. Mathematics calculating every possible outcome provides a different method of validation to brute forcing that fuzz testing is.

The problem is how do we get projects that are required for security to start using mathematical proofs. Fuzzing is better than nothing. Note the thing about using a formal mathematical proof system is you don't have to run the code to find a lot of error and the formal mathematical proof can be used to guide you fuzz testing to make sure you cover all code paths and to intentionally guide the fuzz into suspected errors.

Integrating Fuzz testing into build process means project does not go on and develop a mathematical proof system so have a lot more flaws than they should and still using manual testing for a lot of those flaw. It is possible for a mathematical proof to tell you that is pointless to-do any Fuzz testing as no flaw does exist. A fuzz test solution can never tell you that you have found every flaw.
1. guidovranken June 25, 20174:03 am Reply
  
  I see where you’re coming from. But the software that we use daily simply is a wild patchwork of ancient code whose quality is often dubious, let alone tries to conform to a specification, or is designed such that it can be verified automatically. Code reviews + static analyzers + fuzzing + symbolic execution like Klee + tests are simply the best approaches we have that can be applied to arbitrary software within an acceptable amount of human and computational resources. I agree that security should not be a second thought but rather every opportunity to minimize dangerous edge cases should be taken and be implemented at the outset. But in the here and now we find ourselves using software that has been accumulating cruft and complexity for 20+ years, and until safer alternatives are created (and adopted!), we can only attack their insecurity with the tools that we have. But if you want to write a formally verified VPN server and client for free, be my guest :).
oiaohm June 25, 20174:56 am Reply

The idea of having to write the code again is wrong. Linux kernel is developing a full formal memory model same with working on making a full specification of the user-space interface.

Do note something like autocorres in review process is about building specification from the existing source code even if that specification is going to be flawed. A flawed specification is better than no specificaiton. So openvpn has been reviewed and no one even started building a specification.

You can say that making a specification automatically and then running mathematics on it does what static analyzers except it provides you something that is more compact and readable than the source code it made from. There is more than 1 way to implemented the static analyzers setup. Automatic generated specification also provides route from automatic specification to do what klees does.

Using tools to generate a specification is over looked. The generated specification can be used for three roles code review, static analyzers and test coverage. Also the generated specification reduces code review size and can be tighten over time.

Code reviews + static analyzers + Klees should all be part of one thing creating and validating specification. This is where everything is going south they are being treated as 3 independent things so there is no proper common core and a lot of duplicate processing.

The problem here is without specifications, the code reviews, static analyzers and test coverage work is going to be flawed so requiring fuzz testing to fill gaps.
1. happyg June 26, 20173:29 pm Reply
  
  The grammatical flow of your comments are really hard to read and follow, just FYI.
VPN June 26, 20173:57 pm Reply

As they said, problem will be resolved soon
c1ue July 3, 20173:07 am Reply

The fundamental issue with any code development is the primary motivation.
A large company could perhaps tolerate a detailed specification with accompanying testing – if said company has the right security culture.
However, the sad reality is that the vast majority of code written out there is not created under such environmental conditions. They very much are created under the conditions noted by the author – and so the need for fuzzing is only going to increase.
Kudos for a great writeup and a very illuminating view of an expert fuzzer at work.

	guidovranken on Differential fuzzing of crypto…
	Philippe Antoine on Differential fuzzing of crypto…
	guidovranken on One more OpenVPN vulnerability…
	windhl on One more OpenVPN vulnerability…
	Andrew Williams (@Sm… on VrankenFuzz – a multi-se…