-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Make HashContexts serializable #5702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for your comments @sgolemon!
Those functions appear not to be used on master very much?
This does not seem worthwhile, since the serialization of binary strings is already binary.
The current version does a better job of this.
It would be fine with me to refuse to serialize an HASH_HMAC context. But if we serialize such a context, then we need to serialize the key.
There are still a couple of those, sorry. Will fix them after a round of comments |
ext/hash/hash.c
Outdated
|
||
ops = php_hash_fetch_ops(Z_STR_P(algo_zv)); | ||
if (!ops) { | ||
zend_throw_exception(spl_ce_UnexpectedValueException, "Unknown hash algorithm", 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of SPL's UnexpectedValueExceptions, lately we throw ValueErrors in similar cases via zend_argument_value_error and zend_value_error. Probably the examples that are tonbe found in the sodium extension are the most interesting for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I did this for all except the LogicError (when attempting to call unserialize
on an already-initialized context), I wasn't sure what was appropriate there.
The serialization of the context is binary (8-bit, non-ascii transparent), the output of base64_encode() or bin2hex() is ASCII (7-bit). The latter had fewer storage and transport issues than the former. |
Sorry, had to go back and refer to the HMAC spec. Yeah. We need the secret at the start AND at the end of stream. Ugh. Okay. It doesn't need to be in this project, BUT I'd like to consider adding a couple API methods to HashContext.
Then security conscious callers could do:
This way the HMAC secret never needs to leave the app, but you have to opt-into that extra layer of security. Another option might be a global setting for a secret encryptor/decryptor (maybe calling into sodium? Maybe just a userspace callback which can figure out encryption/stripping/decryption as needed). I just want us to avoid spilling secrets before we push this through. Edit to add:
We can put this issue down the road by initially refusing to serialize HMAC. The window for 8.0 is closing soon and I'd hate to lose general hash serialization because of an issue specific to HMAC. |
foreach ($algos as $algo) { | ||
var_dump($algo); | ||
$ctx0 = hash_init($algo); | ||
$serial = serialize($ctx0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output the serialized string. We want to make sure the expect which we get on LE arches match what we get on BE arches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this for a selection of algorithms and not all of them? In particular if fast-SHA3 is enabled, the serialization is still endian-sensitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new hash_serialize_003.phpt covers this case.
|
||
var_dump($algo); | ||
$ctx0 = hash_init($algo, HASH_HMAC, $key); | ||
$serial = serialize($ctx0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I'd like to include the serialized state in the expect so that we can confirm endianness is accounted for correctly.
I understand, but this patch is designed to fit well into the existing PHP serialization framework, which does not generate 7-bit safe output. For instance the serialization of a binary string contains binary characters. ( |
Thanks very much for the comments! Seems I should:
At that point might we be good to merge? |
|
So it turns out that many hash algorithms do not fully initialize portions of their contexts. For instance the input buffers in MD4, MD5, SHA1, etc. This is not a huge problem since those bytes are never read by the hash algorithm, but it does mean that the serialized version of a HashContext contains some bytes with indeterminate values, and we can't rely on the output of serialize() equaling any particular string. I can see a couple ways forward, including (1) ensuring that HashContexts are always fully initialized (this might slow down hashing a tiny bit), (2) changing the tests to use particular serialization strings (generated on a mix of big- and little-endian). Any preference? |
b986a77
to
3106e92
Compare
Hi all, thanks very much for the suggestions! I added a test validating that specific serialized representations can be un-serialized. This found some bugs, in particular with 32-bit vs. 64-bit architectures—so thanks for pushing me to add those tests. (I added more internal documentation too.) At this point all CI tests pass. Commit 15d2b65 isn't a critical part of this PR, and I could make a separate PR for it, but it seems valuable anyway. |
Exposing uninitialized memory is bad, so now hash context memory is always zeroed before use. There was no performance impact in a test that did a couple million hashes of short data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with this. I still personally think making an effort to be ASCII clean is worthwhile is serialize output, but I do understand that there's no expectation of ASCII safety currently, so I don't need to lose sleep over it.
The serialization of the sha3 context seems to be unsafe. Using this base64-encoded unserialize payload:
We get:
and various variations thereon. I haven't looked closer at the cause. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks to be in good shape, I only have some cosmetic comments.
However, I'm not convinced about the safety of this. I pointed out the sha3 issue above and fuzzing hasn't found issues with other hashes. However, this is pure unserialize fuzzing and I'm pretty sure we'd be seeing issues if we fed in any data after unserializing. From a cursory code inspection
php-src/ext/hash/hash_snefru.c
Lines 154 to 158 in 956dde0
if (context->length) { | |
i = 32 - context->length; | |
memcpy(&context->buffer[context->length], input, i); | |
SnefruTransform(context, context->buffer); | |
} |
ext/hash/hash.c
Outdated
if (!hash->ops->hash_serialize) { | ||
goto serialize_failure; | ||
} else if (hash->options & PHP_HASH_HMAC) { | ||
zend_value_error("HashContext with HASH_HMAC option cannot be serialized"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe for serialization failures it is necessary to use an Exception subclass (just Exception itself is fine), not an Error subclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I was advised to go for zend_value_error above. Can you help out with a suggeste change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went with Exception.
ext/hash/hash.c
Outdated
|| !options_zv || Z_TYPE_P(options_zv) != IS_LONG | ||
|| !hash_zv | ||
|| !members_zv || Z_TYPE_P(members_zv) != IS_ARRAY) { | ||
zend_value_error("Incomplete or ill-formed serialization data"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same in this function, we shouldn't use ValueError.
To give an example of the snefru issue:
Would cause an out-of-bounds memory access without ubsan. The premise of serializing the raw internal state seems to be on pretty shaky ground. |
First off, thanks!!!
I audited all the hash functions for possible out-of-bounds memory accesses and similar issues. Gost, md2, sha3, snefru, tiger, & whirlpool states contain positions into “sponge” buffers that absorb input. I updated the unserialize codes to validate these positions, ensuring they're within bounds. The other hash functions look safe to me already. Some (e.g. adler32) work bytewise and have no sponge buffers. Some (e.g. sha256) encode their sponge buffer positions as a bitfield; when reading the position, they mask off uninteresting bits, so no input value will cause a problem. After ac607dc, the bad snefru serialization produces:
Hash serialization is a useful feature, so I'd love to hear how I could help raise the comfort level around the implementation and find the bugs that remain. |
One more note: The Keccak implementation of SHA3 is now serialized semantically, rather than just by serializing the entire raw internal state. That means every hash function is serialized semantically. |
That looks like a reasonable approach in general. We just need to be somewhat confident that everything that needs checking has been checked. I've done a fuzzing run with the following patch: https://gist.github.com/nikic/34d798a0c579e5069ac9e9eccf8511c8 Which turned up the following issue:
The part before
|
Fixed the whirlpool issue. Also committed an unserializehash fuzzer based on your gist. Maybe it would be good to have around? |
Hi! I rebased and compressed the commits, updated some documentation, switched to using Exception instead of ValueError, and moved the configury around. An overnight fuzzing run of the fuzzer committed above didn't find anything after 131M runs. (I checked that the fuzzer worked by undoing the Snefru fix; it found the problem after 200K runs.) Happy to answer any further questions or problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good to me. The only thing that's missing is some test coverage for the error conditions in the unserialize implementation (we should at least have the distinct exceptions thrown once).
ext/hash/hash.c
Outdated
|
||
PHP_HASH_API int php_hash_serialize_spec(const php_hashcontext_object *hash, zend_long *magic, zval *zv, const char *spec) /* {{{ */ | ||
{ | ||
size_t pos = 0, max_alignment = 1, sz, count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, what I meant here is that you can write size_t count = ...
in the while loop below, as the value does not need to be available outside the loop.
Previously, the Keccak_HashInstance was separately allocated. This could cause memory leaks on errors. For instance, in php_hash_do_hash_hmac, the following code cleans up after a file read error: if (n < 0) { efree(context); efree(K); zend_string_release(digest); RETURN_FALSE; } This does not call the context's hash_final operation, which was the only way to free the separately-allocated Keccak state. The simplest fix is simply to place the Keccak_HashInstance state inside the context object. Then it doesn't need to be freed. As a result, there is no need to call hash_final in the HashContext destructor: HashContexts cannot contain internally allocated resources.
To avoid undefined behavior warnings for those accesses.
|
* Modify php_hash_ops to contain the algorithm name and serialize and unserialize methods. * Implement __serialize and __unserialize magic methods on HashContext. Note that serialized HashContexts are not necessarily portable between PHP versions or from architecture to architecture. (Most are, though Keccak and slow SHA3s are not.) An exception is thrown when an unsupported serialization is attempted. Because of security concerns, HASH_HMAC contexts are not currently serializable; attempting to serialize one throws an exception. Serialization exposes the state of HashContext memory, so ensure that memory is zeroed before use by allocating it with a new php_hash_alloc_context function. Performance impact is negligible. Some hash internal states have logical pointers into a buffer, or sponge, that absorbs input provided in bytes rather than chunks. The unserialize functions for these hash functions must validate that the logical pointers are all within bounds, lest future hash operations cause out-of-bounds memory accesses. * Adler32, CRC32, FNV, joaat: simple state, no buffer positions * Gost, MD2, SHA3, Snefru, Tiger, Whirlpool: buffer positions must be validated * MD4, MD5, SHA1, SHA2, haval, ripemd: buffer positions encoded bitwise, forced to within bounds on use; no need to validate
Unlike the straight unserialize fuzzer, this runs only on HashContexts, and it does an update and finalize on the contexts it creates. With @nikic.
OK, the AppVeyor build demonstrated that some platforms have |
I think that's it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Thanks @nikic! |
serialize and unserialize methods.
HashContext.
Note that serialized HashContexts are not necessarily portable
between PHP versions or from architecture to architecture.
(Most are, but fast SHA3s are not necessarily.)
An UnexpectedValueException is thrown when an unsupported
serialization is attempted.