Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We need unambiguous, meaningful naming of stack reference operations dependent on how counting is done #130789

Open
markshannon opened this issue Mar 3, 2025 · 4 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@markshannon
Copy link
Member

faster-cpython/ideas#700 describes three ways to count references:

  • Virtual,
  • Embedded, and
  • Immediate

Virtual references are references that are know to exist to the relevant code generator, but are elided at runtime, so no API is needed for them.
Embedded references are marked by bit(s) in the reference and not in the ob_refcount field of the object.
Immediate references are counted in the ob_refcount (or free-threading equivalent) field of the object.

To this we should add uncounted which are references to immortal objects (including NULL).
Note that it is possible to have embedded or immediate references to immortal objects if the object was mortal when the reference, or reference this reference was created from, was created.

Why this matters

It is important that the use of references is understandable without referring to the implementation and we have multiple implementations of stackrefs, so the interface needs to be clear.

Multiple implementations

Even when we merge the free-threading and default implementations of stackrefs, we will still have the Py_STACKREF_DEBUG implementation which is very different and vital to finding reference errors.

Examples:

When creating an embedded stackref from another stackref, we should use PyStackRef_DUP_Embedded which has the same semantics as PyStackRef_DUP but creates an embedded reference if the implementation supports it.

There are circumstance when a method of counting is not safe. E.g. using embedded references in the heap is not safe. For that we will want to physically transform a reference without a logic change in ownership.
E.g. PyStackRef_ToNonEmbedded. In terms of ownership, this a no-op, PyStackRef_ToNonEmbedded(ref) is equivalent ref, but ensures that any embedded count is turned into an immediate count.

We probably should only use uncounted when referring to references in docs and comments, as we already have PyStackRef_FromPyObjectImmortal, there is no need for PyStackRef_FromPyObjectUncounted as well.

@markshannon markshannon added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Mar 3, 2025
@markshannon
Copy link
Member Author

We also need names for the use and lifetime of references. #130708 introduces the concept of references that depend, for correctness, on the lifetime of another reference outliving that reference.
The free-threading build allows deferred reclamation of some objects. For those reference counts, the immediate reference may reach zero when the object is still alive. Those objects are not reclaimed by Py_Dealloc, but by the cyclic GC when it can prove that there are no references (immediate or embedded) to the object.

With that in mind, we can use the term "deferred" for the embedded reference count when it relies on the GC to prevent premature reclamation, and "scoped" for the embedded reference count when it depends on the scope of the reference to prevent premature reclamation.
A reference with a scoped refcount can refer to any object. A reference with an embedded reference count can only refer to an object that supports deferred reclamation.

Note: there are no "deferred references", only deferred reference counts.

We should never use the term "borrowed", as that is used for PyObject * references, and is already ambiguous enough without adding yet another meaning.

The term "scoped" was suggested by @nascheme.

@markshannon
Copy link
Member Author

Names for the kinds of counts:

The problem with "embedded" and "immediate" is that they do not form a pair, making them harder to remember.

Maybe "external" and "internal" are more appropriate:

  • External references are counted outside of the object and marked by bit(s) in the reference.
  • Internal references are counted in the object in its ob_refcount (or free-threading equivalent) field.

@mpage
Copy link
Contributor

mpage commented Mar 19, 2025

Thanks for pushing for consistency in how we talk about these concepts. I think that sticking with existing names will be clearer than introducing new ones:

A stackref whose lifetime must not exceed another stackref's lifetime is borrowed and is created by calling PyStackRef_Borrow(). Typically, the reference count of the referenced object will not be updated on creation and destruction of stackrefs created with PyStackRef_Borrow(), but that's an implementation detail. The use of "borrow" here matches my understanding of how it's used within CPython and elsewhere (e.g. Rust), and accurately describes the constraints in which its safe to use stackrefs created this way. Introducing a new term would increase cognitive load without improving clarity.

A stackref that does not update the reference count of the referenced object on creation/destruction defers the reference count update. Such stackrefs are deferred. The use of "defer" here is consistent with our use in "deferred reference counting," in that the application of the reference count updates have been deferred.

A minimal interface using these names might look like:

// Create a new stackref
_PyStackRef PyStackRef_FromPyObjectNew(PyObject *obj);

// Create a copy of a stackref
_PyStackRef PyStackRef_DUP(_PyStackRef stackref);

// Retrieve the PyObject * without changing the reference count on the object
PyObject *PyStackRef_AsPyObjectBorrow(_PyStackRef stackref);

// Create a stackref whose lifetime must not exceed that of `stackref`
_PyStackRef PyStackRef_Borrow(_PyStackRef stackref);

// Has the reference count update on the referenced object been deferred?
bool PyStackRef_IsDeferred(_PyStackRef stackref);

bool PyStackRef_IsHeapSafe(_PyStackRef stackref) {
  if (PyStackRef_IsDeferred(stackref)) {
    PyObject *obj = PyStackRef_AsPyObjectBorrow(stackref);
    return PyObject_HasDeferredRefcount(obj) || _Py_IsImmortal(obj);
  }
  return true;
}

_PyStackRef PyStackRef_MakeHeapSafe(_PyStackRef stackref) {
  if (PyStackRef_IsHeapSafe(stackref)) {
    return stackref;
  }
  return PyStackRef_FromPyObjectNew(PyStackRef_AsPyObjectBorrow(stackref));
}

void PyStackRef_CLOSE(_PyStackRef stackref);

Tagging a few other folks who have expressed opinions (sorry!): @nascheme @brandtbucher @Yhg1s @colesbury

@Yhg1s
Copy link
Member

Yhg1s commented Mar 19, 2025

FWIW, coming at this as someone who wasn't exposed to most of the new interpreter design until recently, "borrow" and "borrowed references" are immediately clear to me, and they behave exactly as I expected. "Deferred references" took a bit of understanding of how the bookkeeping was handled, but the concept was pretty apparent from the name. ("Immediate" references are fine as well, although my immediate reaction was "why does that need a name, that's just references".)

"Embedded" and "virtual" references are meaningless to me. We already use "embedded" in other contexts in Python, and virtual is so overloaded in computing in general it's not a good term for anything anymore. But all things considered, they're not worse than any other non-obvious name.

"Borrowed" and "deferred" definitely has my vote.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)
Projects
None yet
Development

No branches or pull requests

3 participants