Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating google drive file contents yields corrupted file due to multipart #118

Open
AdeelK93 opened this issue May 6, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@AdeelK93
Copy link

AdeelK93 commented May 6, 2023

Hello! Big fan of this library. Let's say I have a csv I'd like to update, and the new revision looks like this:

a,b
1,4
2,5
3,6

If I try to update an existing file's contents (such that I can retain revision history, rather than deleting+creating) like this:

req = drive.files.update(fileId=new_file_id, upload_file=data.read(), supportsAllDrives=True, fields=fields)

On Google Drive, I'll get a corrupted multipart file.

--3c76f6a5ff7d445f9320bfd7b5bdfaee
Content-Type: application/json
Content-Length: 4

null
--3c76f6a5ff7d445f9320bfd7b5bdfaee
Content-Type: text/csv

a,b
1,4
2,5
3,6

--3c76f6a5ff7d445f9320bfd7b5bdfaee--

But if after declaring the req (and before requesting) I disable multipart:

req.media_upload.multipart = False

The file updates fine! Is there a way this could be fixed more automatically in the library?

Also - disabling multipart does nothing to fix the issue for pipe_from uploads. You'll get an identically corrupted file regardless.

@omarryhan
Copy link
Owner

Hi, thanks for reporting the issue, and I'm glad you're finding the lib useful!

I'll be happy to accept a PR with a fix, thanks

@omarryhan omarryhan added the bug Something isn't working label May 8, 2023
@AdeelK93
Copy link
Author

AdeelK93 commented May 8, 2023

Since everything is dynamically generated, I don't know how to fix this for one specific method.

Also ideally the fix works for pipe_to as well, but I'm not quite sure what that looks like.

Basically there's probably a better fix that me trying to patch one specific method the way I'm doing right now in my code

@omarryhan
Copy link
Owner

Can you share with me a full for reproduction please?

Also, the expected result so that I can compare it to the corrupted multipart file.

@omarryhan
Copy link
Owner

Also I just noticed that you're passing a file object instead of a path as the upload_file argument. I don't think this is correct.

@AdeelK93
Copy link
Author

Sure here's a more complete example. I'll let you fill in your own aiogoogle and parent_id. I'm using a file object because that works for both creates and updates, but pipe_from only works for creates. The multipart workaround does not work for pipe_from.

from io import BytesIO
import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})

data = BytesIO()
df.to_csv(data, index=False)
data.seek(0)

# Create that works (16 byte file)
async with aiogoogle:
    drive = await aiogoogle.discover('drive', 'v3')
    req = drive.files.create(upload_file=data.read(), supportsAllDrives=True, fields='id', json={ # type: ignore
        'name': 'test.csv',
        'parents': [parent_id]
    })
    res = await aiogoogle.as_service_account(req)

new_file_id = res['id']

# Update that doesn't work (229 byte file)
async with aiogoogle:
    drive = await aiogoogle.discover('drive', 'v3')
    req = drive.files.update(fileId=new_file_id, upload_file=data.read(), supportsAllDrives=True, fields='id')
    # req.media_upload.multipart = False
    await aiogoogle.as_service_account(req)

# Update that works (16 byte file)
async with aiogoogle:
    drive = await aiogoogle.discover('drive', 'v3')
    req = drive.files.update(fileId=new_file_id, upload_file=data.read(), supportsAllDrives=True, fields='id')
    req.media_upload.multipart = False
    await aiogoogle.as_service_account(req)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants