Skip to content

IOError: [Errno 24] Too many open files #421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cptanalatriste opened this issue Apr 30, 2016 · 2 comments
Open

IOError: [Errno 24] Too many open files #421

cptanalatriste opened this issue Apr 30, 2016 · 2 comments

Comments

@cptanalatriste
Copy link

cptanalatriste commented Apr 30, 2016

I'm using GitPython to do data mining on Git repository on a Windows 10 laptop. To retrieve the stats for commits -which might be on different repositories- I do the following:

    #Tried this. It didn't work
    if platform.system() == 'Windows':
        import win32file
        win32file._setmaxstdio(2048)

    #About 20 000 commits
    commits = get_commits()
    for commit_sha, repository in commits:
        repository_location = REPO_LOCATION + repository
        repository = git.Repo(repository_location)
        commit = repository.rev_parse(commit_sha)

        total_stats = commit.stats.total
        process_stats(total_stats)

        #Tried this also. It won't work
        del total_stats
        del repository

However, I get the following error message every time:

  File "my_code.py", line 126, in my_code
  File "\Anaconda2\lib\site-packages\git\objects\commit.py", line 229, in stats
  File "\Anaconda2\lib\site-packages\gitdb\util.py", line 237, in __getattr__
  File "\Anaconda2\lib\site-packages\git\objects\commit.py", line 141, in _set_cache_
  File "\Anaconda2\lib\site-packages\git\db.py", line 45, in stream
  File "\Anaconda2\lib\site-packages\git\cmd.py", line 982, in stream_object_data
  File "\Anaconda2\lib\site-packages\git\cmd.py", line 948, in _get_persistent_cmd
  File "\Anaconda2\lib\site-packages\git\cmd.py", line 878, in _call_process
  File "\Anaconda2\lib\site-packages\git\cmd.py", line 604, in execute
  File "\Anaconda2\lib\subprocess.py", line 732, in __init__
IOError: [Errno 24] Too many open files

Is there a way to free resource on every loop iteration to avoid the error message?

@Byron
Copy link
Member

Byron commented May 18, 2016

I just spent some time to find something along the lines of calling release() on the odb instance of the repository, but only came to the conclusion that such functionality does not exist.
When I wrote GitPython for py2.X, I was counting on the somewhat deterministic destruction of objects, and built everything around that. However, by now this is simply not the case anymore (if it ever was ...), so GitPython does have a problem with releasing system resources properly in some cases.

A known workaround for this issue is to fork code into it's own subprocess, to allow it to be cleaned up by the operating system when done. Doing this in your case might add a lot of complexity.

Something you could try is to use libgit2 directly, which will by it's very nature provide methods to release resources explicitly.

Also I am afraid there no fix for this issue at this time, unless someone is willing to dig in and assure respective release methods are added to the types in question.

@abourget
Copy link

I'm hitting this, 9 years later. I can't believe there's nothing that can be done?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants