Skip to content

"Attribute 'path' unset" when accessing the blobs/trees in a tree object returned by git.repo.fun.name_to_object #759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ali1234 opened this issue May 27, 2018 · 8 comments

Comments

@ali1234
Copy link

ali1234 commented May 27, 2018

Example code:

object_parents = defaultdict(set)

for o in repo.git.rev_list('--objects', '-g', '--no-walk', '--all').split('\n'):
    name = o.split()[0]
    obj = name_to_object(repo, name)
    print(obj.hexsha)
    if type(obj) == git.objects.tree.Tree:
	for b in obj.blobs:
	    object_parents[b.binsha].add(obj.binsha)
	for t in obj.trees:
	    object_parents[t.binsha].add(obj.binsha)
    elif type(obj) == git.objects.commit.Commit:
	object_parents[obj.tree.binsha].add(obj.binsha)

Output:

  File "/home/al/Source/gitxref/gitxref/__main__.py", line 37, in main
    for b in obj.blobs:
  File "/usr/lib/python3/dist-packages/gitdb/util.py", line 258, in __getattr__
    return object.__getattribute__(self, attr)
  File "/usr/lib/python3/dist-packages/git/objects/tree.py", line 263, in blobs
    return [i for i in self if i.type == "blob"]
  File "/usr/lib/python3/dist-packages/git/objects/tree.py", line 263, in <listcomp>
    return [i for i in self if i.type == "blob"]
  File "/usr/lib/python3/dist-packages/git/objects/tree.py", line 207, in _iter_convert_to_object
    path = join_path(self.path, name)
  File "/usr/lib/python3/dist-packages/gitdb/util.py", line 256, in __getattr__
    self._set_cache_(attr)
  File "/usr/lib/python3/dist-packages/git/objects/tree.py", line 200, in _set_cache_
    super(Tree, self)._set_cache_(attr)
  File "/usr/lib/python3/dist-packages/git/objects/base.py", line 164, in _set_cache_
    % (attr, type(self).__name__))
AttributeError: Attribute 'path' unset: path and mode attributes must have been set during Tree object creation

The same thing happens with obj.trees.

@ali1234
Copy link
Author

ali1234 commented May 27, 2018

version: 2.1.8-1/python3.6

@Byron
Copy link
Member

Byron commented Jun 5, 2018

Thanks for letting us know!

name_to_object is used internally, and seeing the very specific error message, this case is anticipated.
I wonder if it works if name_to_object is avoided, and rev_parse is used instead? The name_to_object function is never used directly, but only from rev_parse.

@ali1234
Copy link
Author

ali1234 commented Jun 5, 2018

It does not work with rev_parse either. It raises the same exception.

As a workaround, I simply set a fake path on the object like this:

obj = name_to_object(...)
obj.path='unknown'
for b in obj.blobs:
    ...

This has no effect on the result. The actual code is like this: https://github.com/ali1234/gitxref/blob/8bced542d6d60493d9fdd5a8e4b27402c79741eb/gitxref/backrefs.py#L96

@ali1234
Copy link
Author

ali1234 commented Jun 5, 2018

If it's not clear from the above code, what I am trying to do is iterate over every commit and tree in the repo and for each, return a list of binsha of subtrees, and for trees, also a list of binsha of blobs. The order does not matter and the paths do not matter. Speed is very important - this operation takes 15 minutes on a kernel tree when using 8 processes. Perhaps there is a way to do this without having to look up each hexsha? Maybe by using gitdb directly?

@Byron
Copy link
Member

Byron commented Jun 6, 2018

I see, so it does appear this bug is inherent all of GitPython, unless of course the code-path GitPython takes right after calling rev_parse applies a similar fix.

For the fastest possible access, you could try using a GitCmdObjectDB. Under the hood, when accessing objects, it will use a persistent instance of git cat-file, which gets fed the SHAs you want information of. I would assume this is the fastest way possible as most work is offloaded to cgit.

@ali1234
Copy link
Author

ali1234 commented Jun 6, 2018

My version of GitPython appears to use GitCmdObjectDB by default. In fact my program does not work correctly if I tell it to use GitDB - it crashes with "unexpected delta opcode 0", possibly related to the other bug I reported.

@Byron
Copy link
Member

Byron commented Jun 6, 2018 via email

@Byron
Copy link
Member

Byron commented Jun 6, 2018

For the fun of it, I have created a small program which for now only effectively counts commits: https://github.com/Byron/git-count . It now uses the odb for iteration, and seems to produce acceptable results.
You can run it with cargo run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants