Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-123358: Use _PyStackRef in LOAD_DEREF #130064

Merged
merged 5 commits into from
Mar 26, 2025

Conversation

colesbury
Copy link
Contributor

@colesbury colesbury commented Feb 13, 2025

@colesbury
Copy link
Contributor Author

!buildbot AMD64 Windows Server 2022 NoGIL

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @colesbury for commit e4c7608 🤖

The command will test the builders whose names match following regular expression: AMD64 Windows Server 2022 NoGIL

The builders matched are:

  • AMD64 Windows Server 2022 NoGIL PR

@python python deleted a comment from bedevere-bot Feb 13, 2025
@python python deleted a comment from bedevere-bot Feb 13, 2025
@@ -19,7 +20,7 @@ PyCell_SwapTakeRef(PyCellObject *cell, PyObject *value)
PyObject *old_value;
Py_BEGIN_CRITICAL_SECTION(cell);
old_value = cell->ob_ref;
cell->ob_ref = value;
FT_ATOMIC_STORE_PTR_RELEASE(cell->ob_ref, value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naiive question: out of curiosity, why does this need release? I don't see the acquire anywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally want to use at least release when storing pointers that may be loaded concurrently. This ensures that previously written data is visible before the store of value to cell->ob_ref.

For example, we probably initialize value's type earlier in the program execution:

value->ob_type = &PyFloat_Type;  // for example
...
FT_ATOMIC_STORE_PTR_RELEASE(cell->ob_ref, value);

It's really important that value's ob_type field is visible before the write of value to cell->ob_ref or a reader might see some previous, garbage data for ob_type.

The load below uses seq-cst, which is at least as strong as acquire. The minimum in the C11/C++11 memory model would be consume for the load (for data dependencies), but no compiler implements that -- they all just upgrade it to "acquire", so it's kind of a mess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough explanation!

@colesbury
Copy link
Contributor Author

For context, I'm still running into the issues we saw at the sprint in September where some of the tests on Windows time out with this change.

With the locks around cell reads gone, some of the free threading tests
were prone to starvation: the readers were able to run in a tight loop
and the writer threads weren't scheduled frequently enough to make
timely progress.
@colesbury
Copy link
Contributor Author

Ok, I finally figured out the Windows test timeouts. We have tests like test_attr_cache_consistency where the readers run until the writers finish. Before this change, the readers would frequently block on access to the cell containing C, which gave the writer more chances to make progress. With this change, the readers would run without blocking and effectively starving the writer thread. A few other things contribute to this problem:

  • the runners only have 4 (virtual?) CPUs and some tests use a lot of threads
  • the reader threads are often started before the writer. Using a barrier helps here.

def test_attr_cache_consistency(self):
class C:
x = 0
DONE = False
def writer_func():
for i in range(3000):
C.x
C.x
C.x += 1
nonlocal DONE
DONE = True
def reader_func():
while True:
# We should always see a greater value read from the type than the
# dictionary
a = C.__dict__['x']
b = C.x
self.assertGreaterEqual(b, a)
if DONE:
break
self.run_one(writer_func, reader_func)

I've gone through the test_free_threading tests and fixed the tests that were very slow.

@colesbury colesbury marked this pull request as ready for review March 25, 2025 22:06
@colesbury colesbury requested a review from markshannon as a code owner March 25, 2025 22:06
@colesbury colesbury requested a review from mpage March 25, 2025 22:06
@colesbury
Copy link
Contributor Author

!buildbot Windows

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @colesbury for commit c5c689b 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F130064%2Fmerge

The command will test the builders whose names match following regular expression: Windows

The builders matched are:

  • AMD64 Windows Server 2022 NoGIL PR
  • AMD64 Windows11 Refleaks PR
  • AMD64 Windows PGO NoGIL PR
  • AMD64 Windows11 Bigmem PR
  • ARM64 Windows Non-Debug PR
  • AMD64 Windows10 PR
  • ARM64 Windows PR
  • AMD64 Windows11 Non-Debug PR
  • AMD64 Windows PGO PR

Copy link
Contributor

@mpage mpage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - nice job figuring out what was going on here!

@colesbury colesbury merged commit 3d4ac1a into python:main Mar 26, 2025
79 of 80 checks passed
@colesbury colesbury deleted the gh-123358-load-deref branch March 26, 2025 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants