Python 2.7 32-bit heap corruption in JSON encoder

The ascii_escape_str and ascii_escape_unicode functions in Python 2.7 have hitherto been prone to a heap corruption vulnerability. Various paths towards these functions and triggering their the vulnerability exist, one of which is encoding a dict object with a very large key:

python -c 'import json;json.dumps({chr(0x22)*0x2AAAAAAB:0})'

A fix has been implemented: https://hg.python.org/cpython/rev/9375c8834448

Node.js memory corruption from JavaScript as a feature

Update 26 Sept 2016: a fix is being prepared at https://github.com/nodejs/node/issues/8724

As I was casually browsing the NodeJS 6.6.0 source code I stumbled upon this suspect piece of code.

src/node_buffer.cc:

 816 template <typename T, enum Endianness endianness>
 817 void WriteFloatGeneric(const FunctionCallbackInfo<Value>& args) {
 818   Environment* env = Environment::GetCurrent(args);
 819 
 820   bool should_assert = args.Length() < 4;
 821 
 822   if (should_assert) {
 823     THROW_AND_RETURN_UNLESS_BUFFER(env, args[0]);
 824   }
 825 
 826   Local<Uint8Array> ts_obj = args[0].As<Uint8Array>();
 827   ArrayBuffer::Contents ts_obj_c = ts_obj->Buffer()->GetContents();
 828   const size_t ts_obj_offset = ts_obj->ByteOffset();
 829   const size_t ts_obj_length = ts_obj->ByteLength();
 830   char* const ts_obj_data =
 831       static_cast<char*>(ts_obj_c.Data()) + ts_obj_offset;
 832   if (ts_obj_length > 0)
 833     CHECK_NE(ts_obj_data, nullptr);
 834 
 835   T val = args[1]->NumberValue(env->context()).FromMaybe(0);
 836   size_t offset = args[2]->IntegerValue(env->context()).FromMaybe(0);
 837 
 838   size_t memcpy_num = sizeof(T);
 839 
 840   if (should_assert) {
 841     CHECK_NOT_OOB(offset + memcpy_num >= memcpy_num);
 842     CHECK_NOT_OOB(offset + memcpy_num <= ts_obj_length);
 843   }
 844 
 845   if (offset + memcpy_num > ts_obj_length)
 846     memcpy_num = ts_obj_length - offset;
 847 
 848   union NoAlias {
 849     T val;
 850     char bytes[sizeof(T)];
 851   };
 852 
 853   union NoAlias na = { val };
 854   char* ptr = static_cast<char*>(ts_obj_data) + offset;
 855   if (endianness != GetEndianness())
 856     Swizzle(na.bytes, sizeof(na.bytes));
 857   memcpy(ptr, na.bytes, memcpy_num);
 858 }

As you can see, should_assert is set to false when there is a 4th parameter.

This is what the documentation says about it:

https://nodejs.org/api/buffer.html#buffer_buf_writefloatbe_value_offset_noassert

buf.writeFloatBE(value, offset[, noAssert])
#
buf.writeFloatLE(value, offset[, noAssert])
#
Added in: v0.11.15

    value <Number> Number to be written to buf
    offset <Integer> Where to start writing. Must satisfy: 0 <= offset <= buf.length - 4
    noAssert <Boolean> Skip value and offset validation? Default: false
    Return: <Integer> offset plus the number of bytes written

Writes value to buf at the specified offset with specified endian format (writeFloatBE() writes big endian, writeFloatLE() writes little endian). value should be a valid 32-bit float. Behavior is undefined when value is anything other than a 32-bit float.

Setting noAssert to true allows the encoded form of value to extend beyond the end of buf, but the result should be considered undefined behavior.

So it’s not a bug but a feature..

Let’s try it on 64 bit:

node-v6.6.0$ ./node -e 'new Buffer(10).writeFloatBE(1, 0xFFFFFFFFFFFFFFFF-3000, 1);'
Segmentation fault

Groovy!

Disclaimer: I never use NodeJS and I know next to nothing about it. Maybe there is a good use for this “feature” (but what?), but other popular high-level languages have a zero-tolerance policy with regards to raw memory corruption from scripts (see Python, Ruby, Perl, PHP vulnerabilities etc in the Internet Bug Bounty program).