You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using GitPython to do data mining on Git repository on a Windows 10 laptop. To retrieve the stats for commits -which might be on different repositories- I do the following:
#Tried this. It didn't work
if platform.system() == 'Windows':
import win32file
win32file._setmaxstdio(2048)
#About 20 000 commits
commits = get_commits()
for commit_sha, repository in commits:
repository_location = REPO_LOCATION + repository
repository = git.Repo(repository_location)
commit = repository.rev_parse(commit_sha)
total_stats = commit.stats.total
process_stats(total_stats)
#Tried this also. It won't work
del total_stats
del repository
However, I get the following error message every time:
File "my_code.py", line 126, in my_code
File "\Anaconda2\lib\site-packages\git\objects\commit.py", line 229, in stats
File "\Anaconda2\lib\site-packages\gitdb\util.py", line 237, in __getattr__
File "\Anaconda2\lib\site-packages\git\objects\commit.py", line 141, in _set_cache_
File "\Anaconda2\lib\site-packages\git\db.py", line 45, in stream
File "\Anaconda2\lib\site-packages\git\cmd.py", line 982, in stream_object_data
File "\Anaconda2\lib\site-packages\git\cmd.py", line 948, in _get_persistent_cmd
File "\Anaconda2\lib\site-packages\git\cmd.py", line 878, in _call_process
File "\Anaconda2\lib\site-packages\git\cmd.py", line 604, in execute
File "\Anaconda2\lib\subprocess.py", line 732, in __init__
IOError: [Errno 24] Too many open files
Is there a way to free resource on every loop iteration to avoid the error message?
The text was updated successfully, but these errors were encountered:
I just spent some time to find something along the lines of calling release() on the odb instance of the repository, but only came to the conclusion that such functionality does not exist.
When I wrote GitPython for py2.X, I was counting on the somewhat deterministic destruction of objects, and built everything around that. However, by now this is simply not the case anymore (if it ever was ...), so GitPython does have a problem with releasing system resources properly in some cases.
A known workaround for this issue is to fork code into it's own subprocess, to allow it to be cleaned up by the operating system when done. Doing this in your case might add a lot of complexity.
Something you could try is to use libgit2 directly, which will by it's very nature provide methods to release resources explicitly.
Also I am afraid there no fix for this issue at this time, unless someone is willing to dig in and assure respective release methods are added to the types in question.
I'm using GitPython to do data mining on Git repository on a Windows 10 laptop. To retrieve the stats for commits -which might be on different repositories- I do the following:
However, I get the following error message every time:
Is there a way to free resource on every loop iteration to avoid the error message?
The text was updated successfully, but these errors were encountered: