We need unambiguous, meaningful naming of stack reference operations dependent on how counting is done #130789

markshannon · 2025-03-03T12:40:17Z

faster-cpython/ideas#700 describes three ways to count references:

Virtual,
Embedded, and
Immediate

Virtual references are references that are know to exist to the relevant code generator, but are elided at runtime, so no API is needed for them.
Embedded references are marked by bit(s) in the reference and not in the ob_refcount field of the object.
Immediate references are counted in the ob_refcount (or free-threading equivalent) field of the object.

To this we should add uncounted which are references to immortal objects (including NULL).
Note that it is possible to have embedded or immediate references to immortal objects if the object was mortal when the reference, or reference this reference was created from, was created.

Why this matters

It is important that the use of references is understandable without referring to the implementation and we have multiple implementations of stackrefs, so the interface needs to be clear.

Multiple implementations

Even when we merge the free-threading and default implementations of stackrefs, we will still have the Py_STACKREF_DEBUG implementation which is very different and vital to finding reference errors.

Examples:

When creating an embedded stackref from another stackref, we should use PyStackRef_DUP_Embedded which has the same semantics as PyStackRef_DUP but creates an embedded reference if the implementation supports it.

There are circumstance when a method of counting is not safe. E.g. using embedded references in the heap is not safe. For that we will want to physically transform a reference without a logic change in ownership.
E.g. PyStackRef_ToNonEmbedded. In terms of ownership, this a no-op, PyStackRef_ToNonEmbedded(ref) is equivalent ref, but ensures that any embedded count is turned into an immediate count.

We probably should only use uncounted when referring to references in docs and comments, as we already have PyStackRef_FromPyObjectImmortal, there is no need for PyStackRef_FromPyObjectUncounted as well.

The text was updated successfully, but these errors were encountered:

markshannon · 2025-03-10T10:15:29Z

We also need names for the use and lifetime of references. #130708 introduces the concept of references that depend, for correctness, on the lifetime of another reference outliving that reference.
The free-threading build allows deferred reclamation of some objects. For those reference counts, the immediate reference may reach zero when the object is still alive. Those objects are not reclaimed by Py_Dealloc, but by the cyclic GC when it can prove that there are no references (immediate or embedded) to the object.

With that in mind, we can use the term "deferred" for the embedded reference count when it relies on the GC to prevent premature reclamation, and "scoped" for the embedded reference count when it depends on the scope of the reference to prevent premature reclamation.
A reference with a scoped refcount can refer to any object. A reference with an embedded reference count can only refer to an object that supports deferred reclamation.

Note: there are no "deferred references", only deferred reference counts.

We should never use the term "borrowed", as that is used for PyObject * references, and is already ambiguous enough without adding yet another meaning.

The term "scoped" was suggested by @nascheme.

markshannon · 2025-03-10T10:26:12Z

Names for the kinds of counts:

The problem with "embedded" and "immediate" is that they do not form a pair, making them harder to remember.

Maybe "external" and "internal" are more appropriate:

External references are counted outside of the object and marked by bit(s) in the reference.
Internal references are counted in the object in its ob_refcount (or free-threading equivalent) field.

mpage · 2025-03-19T21:19:53Z

Thanks for pushing for consistency in how we talk about these concepts. I think that sticking with existing names will be clearer than introducing new ones:

A stackref whose lifetime must not exceed another stackref's lifetime is borrowed and is created by calling PyStackRef_Borrow(). Typically, the reference count of the referenced object will not be updated on creation and destruction of stackrefs created with PyStackRef_Borrow(), but that's an implementation detail. The use of "borrow" here matches my understanding of how it's used within CPython and elsewhere (e.g. Rust), and accurately describes the constraints in which its safe to use stackrefs created this way. Introducing a new term would increase cognitive load without improving clarity.

A stackref that does not update the reference count of the referenced object on creation/destruction defers the reference count update. Such stackrefs are deferred. The use of "defer" here is consistent with our use in "deferred reference counting," in that the application of the reference count updates have been deferred.

A minimal interface using these names might look like:

// Create a new stackref
_PyStackRef PyStackRef_FromPyObjectNew(PyObject *obj);

// Create a copy of a stackref
_PyStackRef PyStackRef_DUP(_PyStackRef stackref);

// Retrieve the PyObject * without changing the reference count on the object
PyObject *PyStackRef_AsPyObjectBorrow(_PyStackRef stackref);

// Create a stackref whose lifetime must not exceed that of `stackref`
_PyStackRef PyStackRef_Borrow(_PyStackRef stackref);

// Has the reference count update on the referenced object been deferred?
bool PyStackRef_IsDeferred(_PyStackRef stackref);

bool PyStackRef_IsHeapSafe(_PyStackRef stackref) {
  if (PyStackRef_IsDeferred(stackref)) {
    PyObject *obj = PyStackRef_AsPyObjectBorrow(stackref);
    return PyObject_HasDeferredRefcount(obj) || _Py_IsImmortal(obj);
  }
  return true;
}

_PyStackRef PyStackRef_MakeHeapSafe(_PyStackRef stackref) {
  if (PyStackRef_IsHeapSafe(stackref)) {
    return stackref;
  }
  return PyStackRef_FromPyObjectNew(PyStackRef_AsPyObjectBorrow(stackref));
}

void PyStackRef_CLOSE(_PyStackRef stackref);

Tagging a few other folks who have expressed opinions (sorry!): @nascheme @brandtbucher @Yhg1s @colesbury

Yhg1s · 2025-03-19T23:20:56Z

FWIW, coming at this as someone who wasn't exposed to most of the new interpreter design until recently, "borrow" and "borrowed references" are immediately clear to me, and they behave exactly as I expected. "Deferred references" took a bit of understanding of how the bookkeeping was handled, but the concept was pretty apparent from the name. ("Immediate" references are fine as well, although my immediate reaction was "why does that need a name, that's just references".)

"Embedded" and "virtual" references are meaningless to me. We already use "embedded" in other contexts in Python, and virtual is so overloaded in computing in general it's not a good term for anything anymore. But all things considered, they're not worse than any other non-obvious name.

"Borrowed" and "deferred" definitely has my vote.

markshannon added the interpreter-core label Mar 3, 2025

markshannon mentioned this issue Mar 3, 2025

gh-130704: Strength reduce LOAD_FAST{_LOAD_FAST} #130708

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

We need unambiguous, meaningful naming of stack reference operations dependent on how counting is done #130789

We need unambiguous, meaningful naming of stack reference operations dependent on how counting is done #130789

markshannon commented Mar 3, 2025

markshannon commented Mar 10, 2025

markshannon commented Mar 10, 2025

mpage commented Mar 19, 2025 •

edited

Loading

Yhg1s commented Mar 19, 2025

We need unambiguous, meaningful naming of stack reference operations dependent on how counting is done #130789

We need unambiguous, meaningful naming of stack reference operations dependent on how counting is done #130789

Comments

markshannon commented Mar 3, 2025

Why this matters

Multiple implementations

Examples:

markshannon commented Mar 10, 2025

markshannon commented Mar 10, 2025

mpage commented Mar 19, 2025 • edited Loading

Yhg1s commented Mar 19, 2025

mpage commented Mar 19, 2025 •

edited

Loading