Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with IncrementalDecoder and pipreqsnb #131437

Closed
mauriciomm7 opened this issue Mar 18, 2025 · 4 comments
Closed

Issue with IncrementalDecoder and pipreqsnb #131437

mauriciomm7 opened this issue Mar 18, 2025 · 4 comments
Labels
stdlib Python modules in the Lib dir topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@mauriciomm7
Copy link

mauriciomm7 commented Mar 18, 2025

Bug report

Bug description:

I am runing pipreqsnb . which requires the incremental decoder function IncrementalDecoder from this lib, and it returns this error:

  File "C:\Users\[USERNAME]\anaconda3\envs\pdfparser\Lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 124872: character maps to <undefined>

To resolve this, you need to specify an encoding that can handle a broader range of characters, such as utf-8, and also specify how to handle decoding errors. Here's how you can modify your IncrementalDecoder class to handle this:

class IncrementalDecoder(codecs.IncrementalDecoder):
    def __init__(self, errors='ignore'):
        super().__init__(errors=errors)
        self.encoding = 'utf-8'

    def decode(self, input, final=False):
        try:
            # Attempt to decode using utf-8
            return codecs.getdecoder(self.encoding)(input, errors=self.errors)[0]
        except UnicodeDecodeError:
            # If decoding fails, use charmap with error handling
            return codecs.charmap_decode(input, errors=self.errors)[0]

But not really sure. Hopefully this solves my issue.

CPython versions tested on:

3.13

Operating systems tested on:

Windows

@mauriciomm7 mauriciomm7 added the type-bug An unexpected behavior, bug, or error label Mar 18, 2025
@picnixz picnixz added stdlib Python modules in the Lib dir topic-unicode labels Mar 19, 2025
@picnixz
Copy link
Member

picnixz commented Mar 19, 2025

I don't have time for checking whether this is a CPython issue or not as C:\Users\[USERNAME]\anaconda3\envs\pdfparser\Lib\encodings\cp1252.py hints that it's not from us though I don't know if it's pdfparser is just bundling our Lib/encodings/cp1252.py. as it might be an issue with how pipreqsnb actually calls the incremental decoder.

I can have a look at this on Sunday.

@picnixz picnixz changed the title Charmap Error Issue with IncrementalDecoder and pipreqsnb Mar 19, 2025
@terryjreedy
Copy link
Member

I don't think this is a cpython issue. 0x81 is not a legal byte in a cp1252-encoded file. Either the data file has an error or pipreqsnb is in error in specifying that encoding. codecs.IncrementalDecoder just specifies the methods needed for each codec-specific incremental decoder. The incremental decoder for cp1252 should raise on 0x81 (and a few other bytes).

@ericvsmith
Copy link
Member

I agree with @terryjreedy : this isn't a Python bug.

@mauriciomm7
Copy link
Author

You are totally right, the issue was with pipreqsnb. Their function is not encoding agnostic and I have to specify encoding even when its utf-8.
:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-unicode type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants