-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-131525: Cache the result of tuple_hash #131529
Conversation
Objects/tupleobject.c
Outdated
@@ -530,6 +547,8 @@ tuple_repeat(PyObject *self, Py_ssize_t n) | |||
if (np == NULL) | |||
return NULL; | |||
|
|||
_PyTuple_RESET_HASH_CACHE(np); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason not to just move this into tuple_alloc() itself instead of repeating it everywhere tuple_alloc() is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't personally have an objection. Most *_alloc
functions only do allocation, not initialization, so I didn't want to confuse that. But the cached hash is kind of a special case...
Objects/tupleobject.c
Outdated
static Py_hash_t | ||
tuple_hash(PyObject *op) | ||
{ | ||
PyTupleObject *v = _PyTuple_CAST(op); | ||
|
||
// For the empty singleton, we don't need to dereference the pointer | ||
if (op == (PyObject *)&_Py_SINGLETON(tuple_empty)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could set the hash of ()
statically in pycore_runtime_init.h
, then this test would be unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but interestingly, this is measurably faster -- it doesn't have to chase the pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the difference here is in the noise, so might as well do this at build time and skip this extra branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just cosmetics nits and I think it's good.
Note: it's also possible to use a static inline function instead for _PyTuple_RESET_HASH_CACHE
, but that would mean that *op
needs to be a PyObject but I think not all calls pass a true PyObject.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Objects/tupleobject.c
Outdated
static Py_hash_t | ||
tuple_hash(PyObject *op) | ||
{ | ||
PyTupleObject *v = _PyTuple_CAST(op); | ||
|
||
if (v->ob_hash != -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loads of ob_hash need relaxed memory ordering i.e. use FT_ATOMIC_LOAD_SSIZE_RELAXED
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still learning the free-threaded conventions. Can you explain why (or point me at an explanation)? Since the write is atomic and we don't care if there is a race (worst case, the hash gets recomputed), and the thread sanitizer CI isn't complaining, I don't see why this would be necessary -- but also I'm new to this stuff so maybe I'm missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FT_...
macros translate to plain loads/stores on the default build, so there is no overhead there.
It is necessary because accessing a field from multiple threads without any kind of synchronization is undefined behavior.
Even on the free-threaded build the RELAXED
part means that they is no synchronization at the hardware level on most platforms, it just convinces the compiler and sanitizers that this is OK.
When you're done making the requested changes, leave the comment: |
I have made the requested changes; please review again |
Thanks for making the requested changes! @picnixz, @kumaraditya303: please review the changes made to this pull request. |
We should probably just remove the old comments and keep one in |
This PR changes 102 files, it's hard to review. Most files are generated by Argument Clinic. Would it be possible to have a first PR which only adds the |
GitHub seems to be doing the right thing by hiding the generated changes. Is that not happening for you? I'd be happy to rearrange the commits, but I don't think merging a PR if the second one may not be approved makes sense. Let me know, as that would also invalidate the existing review comments on this PR. |
Co-authored-by: Chris Eibl <138194463+chris-eibl@users.noreply.github.com>
Back in 2013, it was determined that caching the result of tuple_hash did not have any significant speedup.
However, a lot has changed since then, and in a recent experiment to add a tuple hash cache back in, the mdp benchmark increased by 86%.
Admittedly, there was no measurable improvement on any other benchmark, but it also seems to have no downside, including for memory usage when measured with max_rss.