Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent handling of duplicate ZipFile entries #117779

Open
obfusk opened this issue Apr 11, 2024 · 4 comments
Open

inconsistent handling of duplicate ZipFile entries #117779

obfusk opened this issue Apr 11, 2024 · 4 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@obfusk
Copy link
Contributor

obfusk commented Apr 11, 2024

Bug report

Bug description:

Create a ZIP file with duplicate central directory entries pointing to the same local file header (these can be found in the wild, see e.g. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1068705, this is just an easy way to create one for testing).

>>> import zipfile
>>> with zipfile.ZipFile("foo.zip", "w") as zf:
...     info = zipfile.ZipInfo(filename="foo")
...     zf.writestr(info, "FOO")
...     zf.filelist.append(info)

Opening the duplicate entry fails if using the name or the later entry in infolist(), but works using the earlier entry (since the later one is considered to overlap with the earlier one, but the earlier one isn't considered to overlap with another entry or the central directory).

>>> import zipfile
>>> zf = zipfile.ZipFile("foo.zip")
>>> zf.infolist()[0]
<ZipInfo filename='foo' filemode='?rw-------' file_size=3>
>>> zf.infolist()[1]
<ZipInfo filename='foo' filemode='?rw-------' file_size=3>
>>> zf.open("foo") # fails
zipfile.BadZipFile: Overlapped entries: 'foo' (possible zip bomb)
>>> zf.open(zf.infolist()[1]) # fails
zipfile.BadZipFile: Overlapped entries: 'foo' (possible zip bomb)
>>> zf.open(zf.infolist()[0]) # works fine
<zipfile.ZipExtFile name='foo' mode='r'>

If I modify NameToInfo to contain the earlier entry instead, f.open("foo") works fine. On the one hand these ZIP files are broken. On the other hand, it would be easy to simply not overwrite existing entries in NameToInfo, allowing these files to be opened. And this affects real-world programs trying to open real-world files. So it could be considered a regression caused by #110016). Perhaps a warning would be in order when duplicates are detected; e.g. unzip shows an error but does extract the files.

CPython versions tested on:

3.11, 3.12

Operating systems tested on:

Linux

Linked PRs

@obfusk obfusk added the type-bug An unexpected behavior, bug, or error label Apr 11, 2024
@h01ger
Copy link

h01ger commented Apr 12, 2024

obfusk: thank you for also filing a bug here!

@raininja
Copy link

raininja commented Aug 25, 2024

This is still evident, and rather annoying!

[ 18/291] Writing tensor blk.1.attn_norm.weight                 | size   4096           | type F32  | T+   3
[ 19/291] Writing tensor blk.1.ffn_norm.weight                  | size   4096           | type F32  | T+   3
[ 20/291] Writing tensor blk.2.attn_q.weight                    | size   4096 x   4096  | type F32  | T+   3
[ 21/291] Writing tensor blk.2.attn_k.weight                    | size   4096 x   4096  | type F32  | T+   3
[ 22/291] Writing tensor blk.2.attn_v.weight                    | size   4096 x   4096  | type F32  | T+   3
[ 23/291] Writing tensor blk.2.attn_output.weight               | size   4096 x   4096  | type F32  | T+   3
Traceback (most recent call last):
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 1219, in <module>
    main()
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 1214, in main
    OutputFile.write_all(outfile, ftype, params, model, vocab, special_vocab, concurrency = args.concurrency, endianess=endianess)
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 941, in write_all
    for i, ((name, lazy_tensor), ndarray) in enumerate(zip(model.items(), ndarrays)):
                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 781, in bounded_parallel_map
    result = futures.pop(0).result()
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 905, in do_item
    tensor = lazy_tensor.load().to_ggml()
             ^^^^^^^^^^^^^^^^^^
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 510, in load
    ret = self._load()
          ^^^^^^^^^^^^
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 520, in load
    return self.load().astype(data_type)
           ^^^^^^^^^^^
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 510, in load
    ret = self._load()
          ^^^^^^^^^^^^
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 668, in load
    return UnquantizedTensor(storage.load(storage_offset, elm_count).reshape(size))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 652, in load
    fp = self.zip_file.open(info)
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/zipfile/__init__.py", line 1652, in open
    raise BadZipFile(f"Overlapped entries: {zinfo.orig_filename!r} (possible zip bomb)")
zipfile.BadZipFile: Overlapped entries: 'pytorch_model-00001-of-00003/data/23' (possible zip bomb)

this conversion works with python 3.10.4

[290/291] Writing tensor output_norm.weight                     | size   4096           | type F32  | T+ 110
[291/291] Writing tensor output.weight                          | size  32000 x   4096  | type F32  | T+ 110
Wrote /home/raijin/workspace/model_repo/ReluLLaMA-7B-PowerInfer-GGUF/ReluLLaMA-7B.powerinfer.gguf

@serhiy-storchaka
Copy link
Member

I didn't expect such archives to be found in the real world.

I can imagine a use case for central directory entries with different names pointing to the same local file header, as a simulation of hardlinks (although this can be used to create a ZIP bomb), but such case was prohibited in zipfile a long time ago. The central directory entries with the same name can occur if the file in the ZIP archive was "replaced", but then the entries will point to different locations. The regression occurs only if the central directory entries have the same name and point to the same location. I cannot imagine why anybody would create such archive.

Anyway, there is inconsistency in handling such weird case. When resolved by name, the last of the duplicated entries is used (and this is correct, taking into account the "replacing" scenario), but only the first of them can be read.

The following PR fixes this. Now the last of the duplicated entries can be read, so you will not get an error when read by name. You will still get errors when reading other duplicated entries, this is necessary to prevent ZIP bombing.

@mortenb-buypass
Copy link

mortenb-buypass commented Mar 25, 2025

We just upgraded out buildserver and we then get this, when trying to extract it with python on client side:

cmd: python.exe C:\dist\bam-extras-mobj\repo.py --op=download --dir=C:\\temp\bam-lra-test --buildnum=8205
20250325161315.291|INFO|C:\dist\bam-extras-mobj\deploy.py:311|dir:C:\\temp\bam-lra-test
Traceback (most recent call last):
  File "C:\dist\bam-extras-mobj\deploy.py", line 329, in <module>
    tools.tools.unzip(fname, mydir)
  File "C:\dist\bam-extras-mobj\tools\tools.py", line 155, in unzip
    zip_ref.extractall(destpath)
  File "C:\dist\python312\Lib\zipfile\__init__.py", line 1749, in extractall
    self._extract_member(zipinfo, path, pwd)
  File "C:\dist\python312\Lib\zipfile\__init__.py", line 1805, in _extract_member
    with self.open(member, pwd=pwd) as source, \
         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\dist\python312\Lib\zipfile\__init__.py", line 1657, in open
    raise BadZipFile(f"Overlapped entries: {zinfo.orig_filename!r} (possible zip bomb)")
zipfile.BadZipFile: Overlapped entries: 'TestConfigs/Local.Master.xml' (possible zip bomb)

It seems like this is just a warning, all files seem to be there so I just added it inside the exception and continue happily.

            try:
                tools.tools.unzip(fname, mydir)
            except Exception as e:
                log.error(f"unzip-error: {fname} to {mydir} - {e}")

Not optimal but all files seem to be there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

6 participants