Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-123471: Make concurrent iteration over itertools.cycle safe under free-threading #131212

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Mar 13, 2025

See #124397

  • In the FT build we cannot clear the iterator cycle->it. Instead we use cycle->index as a check whether the iterator has been exhausted or not.
  • We remove dead code related to cycle->firstpass
  • The unit test is combined with itertools.batched in a single file.

@rhettinger
Copy link
Contributor

In the "Strategy for Free Threading" umbrella issue, principle 3 is:

Other iterators implemented in C will get only the minimal changes necessary to cause them to not crash in a free-threaded build. The edits should be made in a way that does not impact existing semantics or performance (i.e. do not damage the standard GIL build). Concurrent access is allowed to return duplicate values, skip values, or raise an exception.

Given that plan, can you please articulate reason for this PR. Is there currently a risk of a crash?

Also what is specifically is meant by "safe under free-threading"? AFAICT there is no concurrent access use case for cycle() that wouldn't be met by the planned options to either wrap with serialize() or tee() depending on whether the user wants atomic/consecutive semantics or parallel/independent semantics. That is what we would do for an equivalent generator.

The itertools code was highly refined and micro-optimized with painful tooling (valgrind, vtune, etc), so I'm a bit reluctant to change long stable code unless it is really necessary. Also, I was waiting to see how the essential builtin functions/classes evolved so that itertools could follow a proven model rather than being on the bleeding edge where misadventures might persist for a long time before being noticed.

@eendebakpt
Copy link
Contributor Author

The current implementation of itertools.cycle can crash under the free-threading build for two reasons:

  • Concurrent iteration can result in lz->index being incremented by a thread to a value of PyList_GET_SIZE(lz->saved)while in another thread the non thread-safePyList_GET_ITEM` is used.

item = PyList_GET_ITEM(lz->saved, lz->index);
lz->index++;
if (lz->index >= PyList_GET_SIZE(lz->saved))
lz->index = 0;

  • One thread can exhaust the iterator lz->it and clear it

Py_CLEAR(lz->it);
}

while another thread is still using lz->it (since the cycleobject might be holding the only reference to the iterator this is not safe). The test added in this PR will often crash the interpreter in the free-threading build on my system (increasing the number_of_iterations will increase the odds).

By "safe under free-threading" I mean the free-threading build will not crash. No guarantees are given about any "correctness" of the results (I can adapt the title and news entry if desired). It is true that using the planned serialize would be a good solution for someone planning to use the cycle with concurrent iteration. But I have not doubt some users will be using the itertools.cycle in a free-threading setting (perhaps even being unaware of this via 3rd party packages). I believe we should make the cpython free-threading safe against race conditions leading to crashes like this. These kind of bugs can be difficult to debug, and could potentially only trigger very rarely.

@rhettinger
Copy link
Contributor

rhettinger commented Mar 17, 2025

Okay, this sounds reasonable :-) Thanks for explaining the purpose of the edit.

Please do check that performance hasn't been negatively impacted (if so, the make an ifdef so that the baseline isn't impacted; if not, then we're good).

@eendebakpt
Copy link
Contributor Author

@rhettinger In the benchmarks I did this PR is faster for the normal build. There are two reasons for this

  • The unused variable firstpass is removed, making the cycleobject smaller in memory and avoiding the initialiation of the variable in the constructor. (note: this change is not related to FT)
  • Once the iterator is exhausted, we only access lz->index and lz->saved. In current main in addition the lz->it is accessed.

The gain from this PR is small though (and I would not be surprised if for different platforms or benchmark tests the gain is zero).

Here are the results of one of the benchmarks I did:

Benchmark code
import time
import gc
from itertools import cycle

t0=time.perf_counter()
for ii in range(100):
    c  = cycle( (1, 2, 3, 4))
    
    for _ in range(200):
        next(c)

gc.collect() # make sure that in both the normal and free-threading build we clean up the constructed objects
dt=time.perf_counter()-t0
print(dt)
Script to generate and plot the data
""" Interleaved benchmark for itertools.cycle

@author: eendebakpt
"""

import subprocess

import matplotlib.pyplot as plt
import numpy as np

test_script = '/home/eendebakpt/python_testing/benchmarks/bm_itertools_cycle.py'
cmds = ["/home/eendebakpt/cpython0/python {test_script", "/home/eendebakpt/cpython/python {test_script}" ]

verbose = False
tt = []
for ii in range(600):
    print(f"run {ii}")
    for cmd in cmds:
        p = subprocess.run(cmd, shell=True, check=True, capture_output=True, encoding="utf-8")
        if verbose:
            print(f"Command {p.args} exited with {p.returncode} code, output: \n{p.stdout}")
        tt.append(float(p.stdout))

tt_main = tt[::2]
tt_pr = tt[1::2]

# %% Show results
plt.figure(10)
plt.clf()
plt.plot(tt_main[::2], ".", label="Main")
plt.plot(tt_pr[1::2], ".", label="PR")
plt.axhline(np.mean(tt_main), color="C0", label="mean for main")
plt.axhline(np.mean(tt_pr), color="C1", label="mean for PR")
plt.ylabel("Execution time [s]")
plt.legend()

gain = np.mean(tt_main) / np.mean(tt_pr)
plt.title(f"Performance gain: {gain:.3f}")

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants