Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(ObjectStoreStorage): Improve (slow) move on same object bucket #47986

Merged
merged 1 commit into from
Sep 17, 2024

Conversation

cfiehe
Copy link

@cfiehe cfiehe commented Sep 14, 2024

Summary

This commit fixes the issue #47856. When you upload a file into a group folder and when you use a single S3 bucket as primary storage, the final move operation hangs for a long time. In the background, Nextcloud initiates a copy-delete sequence from the bucket into the bucket, with causes a lot unnecessary overhead. Nextcloud thinks that the file must be imported to another storage and does not recognize that everything is done on the same object bucket. In that case, the import step can be completely skipped, which saves time, network bandwidth and reduces the load on the object storage.

The behavior improves a lot with #46013. However, after upload there are still some unnecessary put messages that are being sent to the object storage when you use an object storage as primary storage and upload files into a group folder.

Checklist

Copy link
Member

@provokateurin provokateurin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs some tests to verify the behavior and prevent regressions.

Copy link
Author

@cfiehe cfiehe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Changes committed :-).

@cfiehe cfiehe force-pushed the fix_move_on_same_bucket branch 2 times, most recently from 21172bf to b11ab3c Compare September 15, 2024 11:57
@provokateurin
Copy link
Member

Tests are still missing though.

@cfiehe
Copy link
Author

cfiehe commented Sep 15, 2024

Tests are still missing though.

I hope, that the object store tests from https://github.com/nextcloud/server/tree/master/tests/lib/Files/ObjectStore will give us a hint, if this change causes harm.

@provokateurin
Copy link
Member

Sure there shouldn't be any regressions from this change, but we also need a test that ensures the new behavior is present. So please add a test that moves a file on an object storage and ensures this case is triggered.

@cfiehe cfiehe force-pushed the fix_move_on_same_bucket branch 2 times, most recently from a025436 to a700954 Compare September 15, 2024 16:29
@cfiehe cfiehe force-pushed the fix_move_on_same_bucket branch 4 times, most recently from ddd2296 to 7c1a141 Compare September 15, 2024 16:36
Copy link
Member

@provokateurin provokateurin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! This is a really good improvement 💙

@cfiehe cfiehe force-pushed the fix_move_on_same_bucket branch 2 times, most recently from 27984de to f724be3 Compare September 15, 2024 16:53
@provokateurin
Copy link
Member

You need to follow the conventional commit format, so the commit message should be something like fix(ObjectStoreStorage): Remove unnecessary move on same bucket.

@cfiehe cfiehe changed the title Fix (slow) move on same object bucket. fix(files): Fix (slow) move on same object bucket Sep 15, 2024
@cfiehe cfiehe force-pushed the fix_move_on_same_bucket branch from f724be3 to 2639aa0 Compare September 15, 2024 17:23
@unteem
Copy link

unteem commented Oct 29, 2024

This does not only improve perfs but fixes a bug when you have an object storage as primary. In that case without this change, when object in unlinked it is deleted and as the object copied is the same than the original one, if object being moved is actually deleted. In the end you can see your moved objects in your nextcloud interface but it has been deleted in your object store.

Please backport it, this a pretty dangerous bug, we had thousands of files being deleted (we can provide guidelines to restore it for those who also face this issue).

@pierreozoux
Copy link
Member

/backport to stable28

@provokateurin
Copy link
Member

I'm not convinced this needs to be backported to fix a bug. Are there any reports about it already?

@hrenard
Copy link
Contributor

hrenard commented Oct 29, 2024

We have experienced the issue on multiple Nextcloud instances at IndieHosters.

It could lead to data loss on buckets without versioning. So I think it's pretty serious.

@acazenave
Copy link

acazenave commented Oct 29, 2024

Like @unteem said, it is quite an impacting bug.
Easy to reproduce : Move a file to a folder owned by someone-else (file in shared folder to user root / or local file to share folder).
Then file is removed on S3.

@provokateurin
Copy link
Member

But the file isn't just deleted from the S3. This change only skips deleting and uploading the same content again.
Please show me some existing bug reports about this, I'm really confused how suddenly a few people comment here that it apparently is such a big problem.

@hrenard
Copy link
Contributor

hrenard commented Oct 29, 2024

We're from the same company.

3e12e1e introduced the idea of keeping the same file id and thus object key in s3. It theoretically works fine on different s3 storage because it copies the object first and then unlinks it from the source (delete it).
But when it's the same s3 storage, the source and destination are now the same s3 object, so unlinking deletes both.
This optimization while bypassing this code path, also fix the issue.

@provokateurin
Copy link
Member

CC @icewind1991

@MaximeMaW
Copy link

We also had tens of files removed, using v28.0.11.

@pierreozoux
Copy link
Member

/backport to stable29

@provokateurin
Copy link
Member

The problem is the tests added in this PR were broken (as all of our object storage tests weren't executed by CI it wasn't noticed) and @icewind1991 made some fixes, but they didn't get backported and there were so many changes I'm not sure what is needed to make it work.

@unteem
Copy link

unteem commented Oct 30, 2024

we could just remove the tests ? Do you want me to make a PR with just the fix and not the tests ?
The soonest we get this pushed the best

@artonge
Copy link
Contributor

artonge commented Oct 30, 2024

@unteem would be awesome if you could look into fixing the tests. The fixes are known, it is a matter of extracting them from follow-up PRs. Here are some pointers:

@danxuliu
Copy link
Member

The regression when moving files fixed by this pull request can be tested in the following way:

  • Setup Nextcloud with S3 as primary storage
  • Add users admin and user0
  • As admin, create a folder (curl --user admin:admin --request MKCOL --header "OCS-APIRequest: true" "http://127.0.0.1:8000/remote.php/dav/files/admin/sub")
  • Upload a file into that folder (curl --user admin:admin --upload-file /PATH/TO/test.txt --header "OCS-APIRequest: true" http://127.0.0.1:8000/remote.php/dav/files/admin/sub/test.txt)
  • Share the folder with user0 (curl --user admin:admin --request POST --header "OCS-APIRequest: true" "http://127.0.0.1:8000/ocs/v1.php/apps/files_sharing/api/v1/shares?path=sub&shareType=0&shareWith=user0")
  • As user0, move the file from the shared folder to the root (curl --user user0:123456 --request MOVE --header "OCS-APIRequest: true" --header "Destination: http://127.0.0.1:8000/remote.php/dav/files/user0/test.txt" "http://127.0.0.1:8000/remote.php/dav/files/user0/sub/test.txt")
  • Try to get the file (curl --user user0:123456 --request GET --header "OCS-APIRequest: true" "http://127.0.0.1:8000/remote.php/dav/files/user0/test.txt")

Before 9597072 an internal error is returned, and the logs show that the object could not be got from the storage. After 9597072 the contents of the file are returned as expected.

However, the regression was not introduced in 3e12e1e, but in bd740ac. Therefore, through backports it was introduced in Nextcloud 30.0.0, Nextcloud 29.0.7 and Nextcloud 28.0.11. Backports to Nextcloud 27 and Nextcloud 26 are still open and not merged.

@unteem
Copy link

unteem commented Oct 31, 2024

It is the combination of 3e12e1e and how it is used in 3e12e1e that is causing the issue.

Since this first one when object are copied the fileid are not changed (which is a good think) but in moveFromStorage the file is copied then deleted. It was ok before as when an object was copied a new one with a new id was created.
This issue only happens when you are in the same storage. Lets say I move an object from my primary storage to a secondary storage, a new object will be created and its fine.

That is why this PR fixes it.

@provokateurin
Copy link
Member

/backport to stable28

Copy link

backportbot bot commented Nov 6, 2024

The backport to stable28 failed. Please do this backport manually.

# Switch to the target branch and update it
git checkout stable28
git pull origin stable28

# Create the new backport branch
git checkout -b backport/47986/stable28

# Cherry pick the change from the commit sha1 of the change against the default branch
# This might cause conflicts, resolve them
git cherry-pick 9597072a

# Push the cherry pick commit to the remote repository and open a pull request
git push origin backport/47986/stable28

Error: No changes found in backport branch


Learn more about backports at https://docs.nextcloud.com/server/stable/go.php?to=developer-backports.

@artonge
Copy link
Contributor

artonge commented Nov 6, 2024

/backport to stable28

Isn't it already backported?

@icewind1991
Copy link
Member

only 30 got this change so far

@provokateurin
Copy link
Member

provokateurin commented Nov 7, 2024

only 30 got this change so far

I tried to backport locally and the change is already on stable28.
It was already backported, but not released: #48985

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ☑️ Done
Development

Successfully merging this pull request may close these issues.