Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ftplib retrbinary(rest=...) seems to send incorrect data #127561

Open
zeddit opened this issue Dec 3, 2024 · 11 comments
Open

ftplib retrbinary(rest=...) seems to send incorrect data #127561

zeddit opened this issue Dec 3, 2024 · 11 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@zeddit
Copy link

zeddit commented Dec 3, 2024

Bug report

Bug description:

blocksize = 8192 #262144 # default 8192
local_tmp_filepath.touch(exist_ok=True)
## backward 1 blocksize to ensure data correctness
request_offset = max(0, local_tmp_filepath.stat().st_size - blocksize)
if request_offset > 0:
	print(f"continue downloading at position {request_offset}.")
with local_tmp_filepath.open('r+b') as f:
	f.seek(request_offset)
	ftp_hook.get_conn().retrbinary(f"RETR {str(remote_filepath)}",
				f.write, rest=request_offset, blocksize=blocksize)

when I used the code above to download some ftp files from the server, there would be a condition that my file finish downloaded has a byte of 32768 larger than it should be.

I don't know what's wrong with it, it's conditional, about 1~2 time in 10 downloads. great thanks!

CPython versions tested on:

3.10

Operating systems tested on:

Linux

@zeddit zeddit added the type-bug An unexpected behavior, bug, or error label Dec 3, 2024
@picnixz
Copy link
Member

picnixz commented Dec 3, 2024

Do you mean that you have a file too large? maybe the FTP server is sending incorrect data?

@picnixz picnixz added stdlib Python modules in the Lib dir pending The issue will be closed if no feedback is provided labels Dec 3, 2024
@zeddit
Copy link
Author

zeddit commented Dec 4, 2024

@picnixz is there a way to check if it's the error with ftp server rather than the client. I could find out more.

The file I try to download with the command above is about 25MB each, which I think is not quite large. and it's strange that the error looks same if the download is incorrect.

and I found if I offset the download data 32768 bytes forward at a specific position, the data is matched. There are 32768 bytes of data inserted in the middle of the file. The ftp connection is not broken during the whole process, but the final file is not correct.

I don't know how to figure it out. any idea for me? great thanks.

@picnixz
Copy link
Member

picnixz commented Dec 5, 2024

32768 is exactly 2^15, so there's probably something happening out there. Can you perhaps always try to download the file but without using the previous attempt?

and it's strange that the error looks same if the download is incorrect.

What is actually the error? the code you've shown does not seem to raise an error?

@zeddit
Copy link
Author

zeddit commented Dec 8, 2024

what's the meaning of without using the previous attempt, does it mean the rest parameter in retrbinary method? I would try and it seems to be ok if I don't specify the rest parameter.

and it's strange that the error looks same if the download is incorrect.

What is actually the error? the code you've shown does not seem to raise an error?

I am sorry for using the word error improperly. Actually, there is no explicit error shown up, the ftplib retrbinary will go on and complete successfully but the result file is corrupted.

@picnixz
Copy link
Member

picnixz commented Dec 9, 2024

I would try and it seems to be ok if I don't specify the rest parameter

It's either us or the FTP server that is unable to correctly handle the REST command. We have tests on our side and the rest parameter seems to be correctly handled.

  • Can you check that the position after seek() is correct?
  • Can you check that the previous data is correctly overwritten at least?
  • Can you check to see the content being received (in a fresh buffer) is the correct one? namely, create some io.BytesIO buffer and use buf.write instead of f.write and inspect buf.getvalue() to check that the file was correctly downloaded from the desired position.

@zeddit
Copy link
Author

zeddit commented Dec 20, 2024

@picnixz sorry for not finishing test yet. I will feedback soon.

Can you check that the previous data is correctly overwritten at least?

One more question about the overwrite. The reason why I added the offset is that I found there are corrupted datas. so I guess if the error would only occur at the end of connection stream, so I try to add an offset to download some previous data to fix the error on tail. but it not works, the corrupted data still occur.

I wonder if the REST command by design needs user to handle the offset to download some previous data, or it is designed to be that the data written will be guaranteed correct.

@picnixz
Copy link
Member

picnixz commented Dec 20, 2024

The reason why I added the offset is that I found there are corrupted datas.

In this case, the issue might be on the server side.

Can you check that the previous data is correctly overwritten at least?

What I meant was:

  • Instead of using a real file, use io.BytesIO. Only download some data and inspect how many bytes you wrote and what they were.

  • Then, start another connection to the FTP server and requests the remaining data from some offset. Check if the data you receive is indeed correct. For instance, if in the first phase, you requested bytes 0 to 1024, in the second phase, request the bytes 512 to 1536 (using another io.BytesIO object). Then compare that the last N/2 bytes of the first request are the same as the first N/2 bytes of the second request.

  • For the second experiment, use the same io.BytesIO for the first and second request and check that bytes 0 to 1536 are correct.

  • For the last experiment, instead of a io.BytesIO object, use a true file object, but instead of opening the file in r+b mode, only open in ab mode (append/bytes). Indeed, you don't need to be able to read the file (r+b is read-write binary mode).

I wonder if the REST command by design needs user to handle the offset to download some previous data, or it is designed to be that the data written will be guaranteed correct.

What do you mean by "handle the offset". The offset you specify is the position from where the server starts sending the rest. Since you've done a f.seek(...), the cursor on your side for the file should be correct (or maybe there is by-1 index issue?)

@picnixz picnixz removed the pending The issue will be closed if no feedback is provided label Dec 20, 2024
@picnixz picnixz self-assigned this Dec 20, 2024
@picnixz
Copy link
Member

picnixz commented Dec 20, 2024

(I'm assigning myself the issue since I decided to investigate / help you since I don't if it's an issue or not)

@picnixz picnixz changed the title ftplib retrbinary get 32768 more bytes than upstream origin file ftplib retrbinary(rest=...) seems to send incorrect data Dec 20, 2024
@zeddit
Copy link
Author

zeddit commented Feb 7, 2025

I have tested for a few times, and I make sure with the data vendor about the ftp server, it's the latest version of Filezilla Server.

It's may be the case that the server cannot response REST correctly, it's strange because it's the latest version of Filezilla. and the case only occurs when the server output network bandwidth is overloaded. in that case, a file of 20MB needs about 3 retries to complete download.

I don't how to dig it more. Would you give me some more advices. great thanks!

@zeddit
Copy link
Author

zeddit commented Feb 24, 2025

@picnixz I noticed a case that is quite expressive.

There is an upstream ftp file with size of 25165824, and I am uisng REST to download it.
there is no network broken and other error before the retrbinary complete, however, the final size is of 25198592, which is 32768 larger than it should be.

I have compared the bit for these two files, the wrong one downloaded and the right one of the size it should be. I found the wrong bit is at 9483164, which is in the middle of the file, however, there is no error message in each process, the retrbinary function has never been jumped out.

and the wrong file just insert a size of 32768 corrupted bits at 9483164, when shift the wrong file's data bit from 9483164+32768 to 9483164, the wrong file is fixed.

@picnixz
Copy link
Member

picnixz commented Mar 23, 2025

when shift the wrong file's data bit from 9483164+32768 to 9483164

I think it might be because the transfer gets interrupted. Or something like this. Or the entire buffer is not fully written. Have you actually tried with another server? on another location (if possible). Maybe it's a server side issue.

and the case only occurs when the server output network bandwidth is overloaded

Corruption could happen because of that but that's something Python won't be able to help :c AFAIK, I can't find anything else to help you. Maybe you could try https://discuss.python.org/c/help/7 (they might have more ideas than me).

To investigate even more, I would try to see what happens after you interrupt the first time. Namely, what's the content of the first download. Namely:

  • Start the download.
  • Interrupt it.
  • Check how many bytes were effectively written.
  • Restart it with RETR command but send the content to another buffer.
  • Check if the first buffer and the second buffer can be joined together or if they overlapped.
  • If they overlap (which seems to happen), it may be an issue with the remote server that cannot jump to an exact offset.

I found the wrong bit is at 9483164, which is in the middle of the file

What do those bytes contain? could they contain some special bytes that we incorrectly interpret on our side?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants