Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-130798: Add type hints to pathlib.types #131639

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

ap--
Copy link
Contributor

@ap-- ap-- commented Mar 23, 2025

Hello everyone,

This PR addresses #130798

As requested, the type hints are kept compatible with Python 3.9 to simplify release in pathlib-abc.

I also just noticed #131621 which I assumed will be merged soon, because it simplifies typing of _ReadablePath.copy and _ReadablePath.copy_into. Otherwise, for str targets the _ReadablePath subclass has to be a _WritablePath subclass too to make sense as a target.

Since, I haven't contributed type hinted code to cpython so far, it would be great if you could let me know if this PR should also ship tests for verifying the types somewhere.

Cheers,
Andreas 😃

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about the covariant returns that are lacking but I don't remember if it's allowed to have a method with def f(self: T_co) -> T_co: ...

@ap--
Copy link
Contributor Author

ap-- commented Mar 24, 2025

Thanks for the review!

I'm worried about the covariant returns that are lacking but I don't remember if it's allowed to have a method with def f(self: T_co) -> T_co: ...

Could you explain what you mean specifically? _JoinablePath is not generic so I don't see how the covariant typevars would play a role here.

@picnixz
Copy link
Member

picnixz commented Mar 24, 2025

Could you explain what you mean specifically

I should avoid reviewing while doing C++. Invariance is sufficient here, my bad.

@barneygale
Copy link
Contributor

barneygale commented Mar 24, 2025

BTW @ap--, this is awesome! Thanks so much!! And thanks also @picnixz! I appreciate your scrutiny :)

ap-- added 2 commits March 24, 2025 22:31
- pathsegments are `str`: joinpath, __truediv__, __rtruediv__, with_segments
- symlink_to target is `str`
- glob recurse_symlinks is `Literal[True]`
- full_match and glob pattern is `str`
@ap--
Copy link
Contributor Author

ap-- commented Mar 24, 2025

@barneygale @picnixz I made all changes as suggested ☺️

I am now wondering if it wouldn't be better to just use typing.Self and rely on typing_extensions in pathlib-abc. I think it would make this module a lot easier to parse visually.

@barneygale
Copy link
Contributor

barneygale commented Mar 24, 2025

That's a much better plan - please feel free.

@barneygale
Copy link
Contributor

What do you chaps think about WritablePath.write_text() and write_bytes() returning an int? It might be a pain to implement for users who override those methods. We could make that a pathlib.Path-exclusive feature perhaps?

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For write_* methods, one could return int or int / None if you want children classes to only return None if it's too hard for them to know how many characters were written. WDYT @barneygale?

@ap--
Copy link
Contributor Author

ap-- commented Mar 24, 2025

What do you chaps think about WritablePath.write_text() and write_bytes() returning an int? It might be a pain to implement for users who override those methods.

Given that __open_wb__ is abstract, and write_text and write_bytes ship default implementations, I don't think they need to be simplified.

"""
Return the path to which the symbolic link points.
"""
raise NotImplementedError

def copy(self, target, **kwargs):
def copy(self, target: _WP, **kwargs: Any) -> _WP:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use a ParamSpec/similar here to show that the keyword arguments are passed to target._copy_from()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent quite a bit of time now trying to get this to work, but I believe the way _ReadablePath.copy and _WritablePath._copy_from are coupled is currently not representable in the Python type system.

Be aware, I am not an expert in this, so all conclusions I draw here should be confirmed with someone who is.

Ideally the typechecker would just be able to infer if keyword arguments are not compatible with the signature of a detected subclass of _WritablePath._copy_from . But all my attempts failed so far. Basically the minimal scenario that needs to be supported is:

class R:
    def copy[T: W](self, target: T, **kwargs) -> T:
        target._copy_from(self, **kwargs)
        return target

class W:
    def _copy_from(self, source: R, /, *, follow_symlinks: Literal[True] = True) -> None:
        return
  1. In this example: "not typing the kwargs" (or typing them as Any) leads to missing incorrect keyword arguments: https://mypy-play.net/?mypy=latest&python=3.13&gist=7608e8dcd5e4b58dc2fc2c355bf6ed89

  2. Trying to use TypedDicts to define the kwargs doesn't work either because the definitions would have to be provided to _ReadablePath.copy for each speficic target class: https://mypy-play.net/?mypy=latest&python=3.13&gist=12f4cbdd725073d8a152a90a941922cf

  3. ParamSpec looks pretty promising but to be able to actually bind the parameters, you need to go via a protocol class, which prevents you from reusing the actual target type as return type of the _ReadablePath.copy method. Moreover ParamSpec can't be used without binding P.args to the function signature: https://mypy-play.net/?mypy=latest&python=3.13&gist=b47ea834a0f57a722b8124f76f3fb6d4

Ideally we'd be able to do something like:

  class R:
      def copy[T: W, **P](self, target: T[P], **kwargs: P.kwargs) -> T[P]: ...
  class W[**P]:
      def _copy_from(self, source: R, **kwargs: P.kwargs) -> None: ...

But from reading through a bunch of docs and issues, I this seems impossible. I believe the correct related issue here should be this one python/typing#548 for support of higher-kinded typevars, and we'd need kwargs_only support for ParamSpec: python/typing#1524

My guess is that by redesigning the coupling between _ReadablePath.copy and _WritablePath._copy_from it should be possible to better represent this in the type system. But it's getting late here, and I think I'll revisit this with a fresh mind tomorrow.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intersection types might also work, since then you could do the following:

def copy[T: W, **P](self, target: T & SupportsCopyFrom[P], *args: P.args, **kwargs: P.kwargs) -> T: ...

This would mean that target must be both a T and SupportsCopyFrom simultaneously. Intersections I think are being considered more seriously than HKT, but they're also another quite substantial addition to the type system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the analysis! Feels like we're pushing the current type system a bit far, so I'm happy with Any. I could log an issue in pathlib-abc and link back to this thread.

I'm also happy to change how copy() and _copy_from() interact if there's something better we can do.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have a solution, though it's awkward. First change is to pass *args too, meaning ParamSpec works. Second is to require _copy_to() to redundantly return self. Then we could do the following:

class _HasCopyTo[**Args, R](Protocol):
    def _copy_to(
        self, path: ReadablePath, /, 
        *args: Args.args, **kwargs: Args.kwargs) -> R: ...

class ReadablePath:
    def copy[**Args, R](
        self, path: _HasCopyTo[Args, R], /, 
        *args: Args.args, **kwargs: Args.kwargs,
    ) -> R:
        res = path._copy_to(self, *args, **kwargs)
        assert path is res, "Must return self"
        return res

class WritablePath(ReadablePath):
    @abstractmethod
    def _copy_to(self, path: ReadablePath, /, *args: Any, **kwargs: Any) -> Self: ...

class Path(WritablePath):
    def _copy_to(self, path: ReadablePath, /, some_option: bool) -> Self:
        do_copy(self, path)
        return self

The Any in WritablePath is required to allow subclasses to require any arguments, but it also (correctly?) means if you call copy() with something that's just WritablePath it'll accept anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That LGTM - hopefully it can be expressed in 3.9 syntax?

Copy link
Contributor Author

@ap-- ap-- Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does! I'll make the change:

https://mypy-play.net/?mypy=latest&python=3.11&gist=8c300324c700eea4ffa7b7776a460f6e

code
from typing import Literal
from typing import Protocol
from typing import Generic
from typing import TypeVar
from typing import ParamSpec
from typing import Callable
from typing import Self
from typing import ClassVar
from typing import Any

R = TypeVar("R", bound="_WritablePath", covariant=True)
P = ParamSpec("P")

class _HasCopyFrom(Protocol[R, P]):
    def _copy_from(self, source: _ReadablePath, /, *args: P.args, **kwargs: P.kwargs) -> R:
        ...

class _ReadablePath:
    def copy(self, path: _HasCopyFrom[R, P], /, *args: P.args, **kwargs: P.kwargs) -> R:
        return path._copy_from(self, *args, **kwargs)

class _WritablePath:
    def _copy_from(self, path: _ReadablePath, /, *args: Any, **kwargs: Any) -> Self: ...

class MyPathR(_ReadablePath):
    pass

class MyPathW0(_WritablePath):
    def _copy_from(self, path: _ReadablePath, /, *, some_option: bool = True) -> Self:
        return self

class MyPathW1(_WritablePath):
    def _copy_from(self, path: _ReadablePath, /, *, other_option: int = 3) -> Self:
        return self

r = MyPathR()       
w0 = MyPathW0()
w1 = MyPathW1()
 
# correct       
reveal_type(r.copy(w0))
r.copy(w0, some_option=False)
reveal_type(r.copy(w1))
r.copy(w1, other_option=4)

# error
r.copy(w0, dont_exist=2)
r.copy(w1, text="str")
r.copy(w0, some_option=1.0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I tried implementing the changes, and they don't fully work. While they solve the copy (*args and) **kwargs issue, they cause type checking issues in for example copy_into because mypy can't infer the actual type of path, but only that path has the structural type _HasCopyFrom. This is a limitation of how the protocol is expressed (see here: https://mypy-play.net/?mypy=latest&python=3.12&gist=c744090b69aad82d75c14e6ab7dedb4c ). So it seems intersection types would be required to do this correctly.

Also on a side note: I now wonder if follow_symlinks should be an explicit keyword argument in copy and copy_into since it's mandatory in _WritablePath._copy_from's default implementation.

Additionally, forcing _copy_from to return Self would interfere with #131636

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the issues - you need to be returning R from _copy_from, and _HasCopyFrom.__truediv__/joinpath needs to return _HasCopyFrom, not just R. That was causing it to discard the paramspec and specific return value. For #131636, you can still do -> Generator[tuple[...], None, Self].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but the problem imo is that both joinpath and __truediv__ should actually return the same type. The workaround is nice, but to make it work, one needs to be typed as returning R because it has to return the WritablePath instance and the other as the HasCopyFrom protocol because it's passed again into .copy ...

So in the end you have to adjust the protocol to the exact implementation of copy and copy_into which is unfortunate. If both in copy and copy_into joinpath would be used once as the return type and once to pass as the target arg to copy, the typing workaround would not work.

I'm not sure if the benefit of being able to type check the copy kwargs justifies the introduction of this trick.

@barneygale
Copy link
Contributor

Let's keep the int return values for write_text() and write_bytes() - thanks both.

@ap--
Copy link
Contributor Author

ap-- commented Mar 27, 2025

Just noticed the discussion in #130798: I'm happy to recreate this PR in typeshed if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants