OpenSSL version 3.0.4, released on June 21th 2022, is susceptible to remote memory corruption which can be triggered trivially by an attacker. BoringSSL, LibreSSL and the OpenSSL 1.1.1 branch are not affected. Furthermore, only x64 systems with AVX512 support are affected. The bug is fixed in the repository but a new release is still pending.

Somewhat peculiarly, almost nobody is talking about this. If RCE exploitation is possible this makes it worse than Heartbleed in an isolated severity assessment, though the potential blast radius is limited by the fact that many people are still using the 1.1.1 tree rather than 3, libssl has forked into LibreSSL and BoringSSL, the vulnerability has only existed for a week (HB existed for years) and an AVX512-capable CPU is required.

This post gives some background on how this bug came to be and some notes on its exploitability.

On May 31th 2022 I found and reported an issue with constant-time Montgomery modular exponentiation in OpenSSL and BoringSSL (but not LibreSSL). For some values of `B, E, M`

for which `B ^ E % M == 0`

, some functions would return `M`

instead of 0; the result was not fully reduced.

It was found that there are four distinct code paths affected by this:

David Benjamin of Google analyzed the issue extensively and found that the bug does not constitute a security risk (at least not internally; external callers might end up in incorrect states depending on what they are trying to compute). Interestingly, he also found an apparent bug in the paper by Shay Gueron upon which the RSAZ code is based.

This took my fuzzer a long time to find, because the odds of finding `B ^ M % E = 0`

for large, `N`

-bit values of `B, E, M`

where those values are semi-random are small, so I proceeded to add a modular exponentation solver, so the bug can now be found quite quickly, and hopefully it helps finding more, similar bugs in the future (in OpenSSL or elsewhere). (Z3’s performance in solving equations involving modular exponentiation is quite poor so I had to roll my own).

The fix that was applied to the dual 1024 RSAZ code is wrong because the reduction function is called with `num`

set to the bit size, where it should be number of `BN_ULONG`

elements (which are always 8 bytes large, because that is the size of an unsigned long on x64 systems, which is the only architecture which can have AVX512 support). So with the input sizes being 1024 bits, 8192 bytes are accessed (read from or written to) instead of 128.

On to the bug internals. There are 5 distinct arrays involved. 3 arrays are overwritten.

Variable | Description | Allocated size | Over-read/write | Total | Read/write? |

`res1` | modexp result 1 | 128 | 896 | 8192 | Read, then write |

`m1` | Modulus 1 | 128 | 896 | 8192 | Read |

`res2` | modexp result 1 | 128 | 896 | 8192 | Read, then write |

`m2` | Modulus 2 | 128 | 896 | 8192 | Read |

`storage` | Scratch space | 1184 | 7296 | 8192 | Write, then read |

- 8192 bytes are read from
`res1`

,`res2`

,`m1`

,`m2`

and`storage`

- 8192 bytes are written to
`res1`

,`res2`

and`storage`

(this is where the memory corruption takes place) - If we consider
`res1_bn`

to be a bignum comprising`res1[0..8192]`

(where the last byte is the most significant byte) and`m1_bn`

to be`m1[0..8192]`

, then if`res1_bn < m1_bn`

, the contents of`res1[0..8192]`

will be left unchanged after being overwritten. The same applies for`res2`

and`m2`

. - This implies that if you can set the most significant bit of
`m1[8191]`

to`1`

and the most significant bit of`res1[8191]`

to`0`

, then`res1[0..8192]`

will retain its original state (and no actual corruption takes place). This circumstance may occur by chance. The same applies for`res2`

and`m2`

. - Conversely, if
`res1_bn >= m1_bn`

, then after the write,`res1[N]`

will be one of`{res1[N], res1[N] - m1[N], res1[N] - m1[N] - 1}`

. The same applies for`res2`

and`m2`

. - After the overwrites have occurred,
`storage[N]`

will be`~(m2[N] - res2[N])`

or`~(m2[N] - res2[N])+1`

. - From this it follows that if you control
`m2[N]`

and`res2[N]`

, you mostly control`storage[N]`

. - The original contents of
`storage`

is never read, so it does not influence the end state in any way.

Summarized:

Variable | Post-write value |

res1[N] | `res1[N]` or `res1[N] - m1[n]` or `res1[N] - m1[N] - 1` |

res2[N] | `res2[N]` or `res2[N] - m2[n]` or `res2[N] - m2[N] - 1` |

storage[N] | `~(m2[N] - res2[N])` or `~(m2[N] - res2[N]) + 1` |

Each of these lemma’s are true independent of what the inputs to the modexp function are, and of any other variable or state.

Here is a fuzzer which demonstrates these invariants.

The (wrapping) subtraction mechanics at play here are interesting because you can use them to apply pointer delta’s to a function pointer to make it point to a different function and this can help circumvent ASLR, because while the function addresses are randomized, their spacing is constant for every given binary.

For example, if:

- a single, particular
`BN_ULONG`

in the`res1`

array is known to contain an active pointer to a specific function - and you can control the
`BN_ULONG`

in`m1`

at the same index (for example via heap spraying) - and the rest of the state is completely unknown and uncontrolled

then by setting `m1[N]`

to `oldfunc - newfunc`

, the original function pointer will be exactly `newfunc`

after the overflow with a 25% probability:

```
void goodfunc(void) { printf("good\n"); }
void badfunc(void) { printf("bad\n"); }
int main(void)
{
/* Indices 0..127 are within allocated bounds */
/* Beyond that is either unallocated or allocated by different parts
* of the program.
*/
struct {
BN_ULONG storage[1024], res1[1024], m1[1024], res2[1024], m2[1024];
} vars;
/* Randomize everything; not known to the attacker */
FILE* fp = fopen("/dev/urandom", "rb");
assert(fread(&vars, 1, sizeof(vars), fp));
fclose(fp);
/* Assume res1[345] contains a function pointer used internally by OpenSSL */
vars.res1[345] = (BN_ULONG)(&goodfunc);
/* Assume we control m1[345] */
vars.m1[345] = (BN_ULONG)(&goodfunc) - (BN_ULONG)(&badfunc);
bn_reduce_once_in_place(vars.res1, /*carry=*/0, vars.m1, vars.storage, 1024);
bn_reduce_once_in_place(vars.res2, /*carry=*/0, vars.m2, vars.storage, 1024);
void (*fnptr)(void) = (void*)vars.res1[345];
fnptr(); /* good or bad? */
return 0;
}
```

OpenSSL makes heavy use of function pointers. Running `find -name '`

from the repository root shows over 130 data structures that encapsulate a set of function pointers. Delta subtraction may be useful in exploiting this circumstance to make OpenSSL misbehave to varying degrees of severity.*.c' -exec grep 'METH.* = {' {} \; | grep -v test

Apart from code execution, there can also be scenario’s where private data is leaked to the attacker.

Assume:

- R¹ =
`res[I..J]`

- M¹ =
`m[I..J]`

- I, K >= 128
- J, L <= 1023

Let R¹ be an allocated space which the attacker can read and write (for example an internal TLS state whose value is, to an extent, determined by how the attacker conducts the handshake, and which can be read, to an extent, by how the TLS code subsequently behaves based on its state).

Let M¹ contain some kind of secret information.

Recall that the combination of `res[N]`

(before the overwrite) and `m[N]`

leaks into `res[N]`

. It follows that if the attacker has read-write access to `R¹`

, M¹ can be partially or completely inferred.

This is what I’ve deduced so far. It’s possible there are errors in this post; please e-mail guido@guidovranken.com and I’ll correct them and credit you. You can follow me on Twitter.