Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve speed for initial sync with virtual files #4424

Open
wonx opened this issue Apr 7, 2022 · 33 comments
Open

Improve speed for initial sync with virtual files #4424

wonx opened this issue Apr 7, 2022 · 33 comments
Labels
enhancement enhancement of a already implemented feature/code feature: 💽 virtual filesystem Performance 🚀

Comments

@wonx
Copy link

wonx commented Apr 7, 2022

How to use GitHub

  • Please use the 👍 reaction to show that you want to have the same feature implemented.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.

Feature description

When using virtual files, the first log in after a new installation will start a syncing process that can take a very long time depending on the number of files to synchronize.

In my case, i'm syncing around ~700000 files, my computer has been already up for 29 hours without a restart and the sync process has now reached the 50% mark. I can see that the virtual files are created one by one, but it can be as slow as 2 per second. Two or more days until Nextcloud can be usable is too much in my opinion.

It would be cool if there was any way to speed up the initial sync.

PS: This is related to #4421

@wonx wonx added the enhancement enhancement of a already implemented feature/code label Apr 7, 2022
@marcotrevisan
Copy link

I'm experiencing a similar problem on Mac OS (around 200k files). In my humble opinion, syncrhonizing the full hierarchy is the key problem here. The typical end user doesn't need to have the full folder hierarchy saved and synchronized. A lazier approach (i.e. trigger on open and/or scan the opened subfolders only, and not the whole depth trough but perhaps 1 or 2 levels below) would grant more scalability and decrease the load on the NC server.

@johannes-luebke
Copy link

I'd like to add, that restarting the sync, client or PC will result in a complete restart of the process. Also the sync doesn't seem to start immediately, but it first counts all files it will sync and then starts syncing. The counting alone takes two days for me and the sync isn't done after more than 10 days. At least the sync should pick up where it left of.

@CWempe
Copy link

CWempe commented Aug 12, 2022

I have the same issue.

The most annoying part is that I do not need the folders with all the little files available on my desktop.

So it would be enough if I could say "do not sync this folder unless it is accessed by the user".

I think the suggestion from @marcotrevisan (see #4464) also sounds promising.

@PhilippSchlesinger
Copy link

See #4918 (comment) for a description of a problem with the tray window related to speed issues for inital sync

@tobiasKaminsky tobiasKaminsky moved this to 🧭 Planning evaluation (dont pick) in 🤖 🍏 Clients team Sep 26, 2022
@CWempe
Copy link

CWempe commented Dec 7, 2022

I can confirm this issue.

I started syncing virtual files (~1 million) on a new notebook.
I knew it would take a while.
The next day I checked and saw about 30 % finished.
The day after that only 10 % more (= 40%).
In the application window I could see that roughly one file was processed per second.

Then I read about restarting the client software here.
And the syncing (files per second) increased dramatically.

Now I took some data to verify this behavior:

image

image

So the best workaround would be a script that restarts the Nextcloud client every 30 minutes or so. 😜

Bu it would be great if this could be fixed.

Server: 24.0.7 (docker)
Client: 3.6.2 (Windows)

@PhilippSchlesinger
Copy link

With latest Nextcloud Client 3.7.3 an inital sync on ~150k files took <1 hour where it was a whole night and endless errors in the past.
Maybe you guys could also check again and see if it improved with the latest version.

@PhilippSchlesinger
Copy link

With latest Nextcloud Client 3.7.3 an inital sync on ~150k files took <1 hour where it was a whole night and endless errors in the past.
Maybe you guys could also check again and see if it improved with the latest version.

@CWempe
Since you described the issue in detail and with numbers previously, could you maybe check again with 3.7.3 or later and report if anything changed?

@tobiasKaminsky tobiasKaminsky moved this from 🧭 Planning evaluation (dont pick) to 📄 To do (max 2 entries / member) in 🤖 🍏 Clients team May 11, 2023
@tomdereub
Copy link
Contributor

Like I said here : #3120 (comment), I'm still having the problem with Nextcloud 25 and desktop client 3.8.2. In 24 hours it had not yet finished to count files to synchronize, then it lost connexion, and restarted from scrath... About 2 000 000 files.

@limatus
Copy link

limatus commented Jun 15, 2023

I can also confirm that this issue persists with 3.9.0 and [Cloud] 26.0.2.
For approximately 500k files, the anticipated time jumps between 6 days and “A few seconds” – It “syncs” (virtual files) ruffly 100 files per second.
Just for testing purposes, I tried to sync the same load of files with the ownCloud [v4.1.0-rc.2] https://github.com/owncloud/client/tree/v4.1.0-rc.2) Client. This client does the job much faster, approx. 500–700 files per second – same server. It could be my laptop, but at least for the NC client with the other 30–50 laptops I experience the same issue.

@hodyroff
Copy link

@limatus Try with ownCloud Infinite Scale, 3.0 just got released, would expect 4x performance compared with oC10,

@limatus
Copy link

limatus commented Jun 15, 2023

@hodyroff thank for the hint, but I do not intend to switch servers – the Server was and is from NC!

@tobiasKaminsky
Copy link
Member

@claucambra is this a duplicate of [#5692](#5692 or vice vera?

@claucambra
Copy link
Collaborator

They are different, this is related to the Windows VFS (normal sync engine) while #5692 is related to the macOS-specific sync engine in the file provider module

@allexzander allexzander self-assigned this Jul 25, 2023
@allexzander
Copy link
Contributor

@limatus @CWempe Just to get a bit more context on Virtual Files vs normal sync, do you have a much slower syncing when using Virtual Files when compared to how it syncs via normal sync if you also select to sync everything?

@tomdereub
Copy link
Contributor

@allexzander : The problem is only with the initial sync. I'm using VFS on my personal server with success, it's working well.
The problem appears with lots of data, with 500 000 files it takes about a few days to get the initial sync complete. After that syncing seems as quick as with normal sync.
Is there any chance to see any progress on this issue ? It has been agreed for 2 years now (#3120 (comment)) without visible progress...

@limatus
Copy link

limatus commented Aug 3, 2023

@allexzander if I sync the files via normal sync, the bottleneck seems to be the connection speed, which is understandable. Sadly, we mostly use virtual files, as they're simply too many files. It's similar to what @tomdereub mentioned, the initial sync needs days, thereafter, it’s fine.

@tobiasKaminsky tobiasKaminsky moved this from 📄 To do (max 2 entries / member) to 🧭 Planning evaluation (don't pick) in 🤖 🍏 Clients team Aug 31, 2023
@PhilippSchlesinger
Copy link

PhilippSchlesinger commented Sep 6, 2023

@allexzander For the sake of completeness I'd like to add that what @tomdereub and others are describing also happens when a significant amount of files are added to the nextcloud account after the initial sync.
So when the nextcloud client needs to sync this newly added amount of files, the client shows the same problem as on the initial sync.

As described by @CWempe in #4424 (comment), the sync speed decreases dramatically over time.
Is this perhaps due to the real-time listing of activities in the tray window for each individual file being synced?
If this could be identified as a cause of the slowdown, then perhaps lazyloading activities or even summary listing for large numbers of files would be an option.

@tomdereub
Copy link
Contributor

tomdereub commented Sep 19, 2023

@allexzander : The problem is only with the initial sync. I'm using VFS on my personal server with success, it's working well. The problem appears with lots of data, with 500 000 files it takes about a few days to get the initial sync complete. After that syncing seems as quick as with normal sync. Is there any chance to see any progress on this issue ? It has been agreed for 2 years now (#3120 (comment)) without visible progress...

Like said by @PhilippSchlesinger, after some time using VFS on that folder with about 500 000 files, I find it too bad to keep syncing the whole folder tree. Every time somebody modifies quite a lot of files, it starts a long sync. It seems to me impossible to deploy for 30 persons, it will charge a lot the server and each computer.
From my point of view, the right way to make it scalable is to sync only folders that has been accessed at least one time. I mean :

  • first sync : just sync the first folder tree. It will be instantly ready.
  • when the user opens a folder, sync this folder content, but not recursively. And add this folder to the list of folders to keep synced when there are changes in it.
  • the user can select manually a folder to be fully synced (like it's already possible)
    So the first access of each folder will be a bit slower, but step by step the user will get synced the folders he's using, and will never sync all other folders.

Is this technically possible ? And if yes, what do you (nextcloud devs) think about it ?
It seems to me that it's the actual behaviour of the android desktop client.

@marcotrevisan
Copy link

I'd like to add that under Mac OS things are changing towards a FileProvider based implementation, which will solve the issue by delegating a good part of the sync logic to MacOS.

IMHO, if under Windows there's no API like FileProvider, then the client should evolve itself to a lazier approach... a "full sync" approach is against scalability and in the long run it's a major limiting factor for a borader adoption of Nextcloud.
In the case of 500k files and 30 users that are actively working, push notifications tend to generate very frequent peaks of PROPFIND requests coming from all the clients. Such peaks will cause slowdowns not only to the clients themselves but also to the other apps (talk, mail, calendar, deck...), and the end result is a busy server instance that actually is not doing anything except triggering propfinds and responding to propfinds, for files/folders that are often far away from where the actual users are working.
That's why in my hubmle opinion this is a critical and high-priority issue.

@marcotrevisan
Copy link

@tomdereub I'm in a very similar situation to yours and as a mitigation solution I ended up as follows:

  • use a webdav client like Mountain Duck in "online" mode for occasional browsing and work on the folder structure. It has its own issues but it basically works (don't forget to generate an "application" password for this client in the user's Settings -> Security section);
  • also use Nextcloud Client without virtual files, selecting those folders containing the most heavily used projects for the user, and instructing them how to add/remove folders to sync.

In this way, server load is under control (push notifications won't wake up all clients every time) and the clients are snappy enough to work. The advantage is that, for heavily used folders, the NC client has all the files downloaded and ready; the disadvantage is that not all the users are comfortable with such setup.

Hope it helps

@tomdereub
Copy link
Contributor

@marcotrevisan I'm actually trying mountainduck, and it seems to do everything I want with the "smart synchronization" mode. There is an option to index files or not.
So without checking this option, it will not index all files, it will just keep index of visited folders. And there is a option to keep a folder offline on local disk.
So it actually does what nextcloud vfs does, but with 2 advantages (from my point of view) :

  • it's possible not to index all files -» far more scalable
  • it mounts the webdav folder as a drive letter, what is usefull (on windows)
    It seems that mountainduck has part of it's code opensource, maybe it could be interesting to have a look in it.

@marcotrevisan
Copy link

Yes, but don't get drunk too fast, it has its own bugs (in Mac OS at least) :-D
Avoid unzipping archives in the share for example. Sometimes it'll screw things up, and I don't know why. The safest mode in my experience is the Online mode. If you're in Windows it may behave differently.

@roberix
Copy link

roberix commented Oct 5, 2023

Yes, but don't get drunk too fast, it has its own bugs (in Mac OS at least) :-D Avoid unzipping archives in the share for example. Sometimes it'll screw things up, and I don't know why. The safest mode in my experience is the Online mode. If you're in Windows it may behave differently.

Hi. I can confirm this. We have tested extensively the "Duck" on Windows and while the client does very well in terms of performance there are many other issues around file locking, online detection, working with MS office and so forth.

Is there any progress to be expected on improving the initial VFS sync speed? We are migrating at the moment a lot of files to NC and I am already afraid from starting the sync on our clients.

At the moment the inital sync with about 100K files takes about 60 minutes.

Regards

Rob

@PhilippSchlesinger
Copy link

Just small addition regarding the initial scan:
Synchronizing placeholder files for an additional 100k files is expected to take 0 seconds (after a previous operation already took over 90 minutes for 60k files):

Screenshot 2023-10-17 101406

@tomdereub
Copy link
Contributor

It has been agreed for 2 years now (#3120 (comment)) without visible progress...

@allexzander @mgallien could you please just give us some idea of the priority of this issue and the ways to solve it ? Like "it's not the priority at the moment, so we don't know when it will be worked on", or "it's very complicated to solve, we have to re-write entirely the sync engine, so it will take some time before we can work on it", or "you're just a few users concerned, so it's not a priority, most of our users don't have so much data"...

As users, we need to know if there is some chance to get VFS scalable at a short or mid term, or if we have to found other solutions. I don't want to see my company giving up with nextcloud and other opensource software we're using, and fall into full microsoft solutions.
I'm trying for some time mountainduck as an alternative, but as @marcotrevisan and @roberix have said, for some cases it's not working as well as nextcloud desktop client. So I need to know a bit more of nextcloud desktop client futur development before deploying it for all users.

@tomdereub
Copy link
Contributor

@joshtrichards : you added a label on this issue, what does that mean ? Will somebody start working on it ?

@OpsecPGR
Copy link

OpsecPGR commented Apr 8, 2024

@joshtrichards : you added a label on this issue, what does that mean ? Will somebody start working on it ?

From what I can see, looks like they began working on this about a week ago.

@tomdereub
Copy link
Contributor

This #6461 is exactly what is needed for windows too.

@PhilippSchlesinger
Copy link

Dear Nextcloud developers, @allexzander
It would be great if you could shed some light on what is actually being worked on. Many are following this bug and many of us contributed to this issue.

See #4918 for a description of a performance problem (PR intended to solve the problem in #5941) with the tray window. Solving this heavy issue could also pay off in improving the speed problems with initial sync.

@tobiasKaminsky tobiasKaminsky moved this from 🏗️ In progress to 📄 To do (max 2 entries / member) in 🤖 🍏 Clients team Jun 4, 2024
@psxvoid
Copy link

psxvoid commented Oct 1, 2024

For me, the initial sync is in progress for several days, and seems like laptop restarts, network connection issues are restarting this process from scratch each time. On the screenshot the number of total files is constantly increasing (~1-5 items per second), and notice, file synced count is always 0:
image

and there are no any files in the sync folder except those (and the size of sync.db is NOT changing as well):
image

It seems completely unusable at this point.

P.S.:
Client: Nextcloud-3.14.1-x64 for Windows
Server: Nextcloud 29 on Docker (the server is quite slow running on Raspberry Pi 4)
Files Total: > 300 000

@tomdereub
Copy link
Contributor

This first step of initial sync is very hard on the server. You can have a look of cpu consumption of your server, I think it's the bottleneck : in my case I have an intel i5-10210U, 6 cores dedicated to my server, and it's using almost 100% of all cores while doing this first scan of all files. I have about 700 000 files, and it takes between 1/2h and 1h to make the scan. So I'm not surprised that it takes so long on a RPi.
Once the server side scan is finished, it takes up to 48h non stop on the client to create the whole file tree.
In my case, once the first sync is done, it's working well (20 persons using it), and the load on the server is ok.
Looking forward to some improvement on this issue...

@github-project-automation github-project-automation bot moved this to 🧭 Planning evaluation (don't pick) in 💻 Desktop Clients team Oct 14, 2024
@Rello Rello moved this from 🧭 Planning evaluation (don't pick) to 📄 To do (max 2 entries / member) in 💻 Desktop Clients team Oct 14, 2024
@ne0YT
Copy link

ne0YT commented Oct 31, 2024

@Rello hey there, do you have an estimation when this will be done ?

we are planning to move from our weird software-solution built on top of windows builtin webdav which has a lot of other issues and officially was already canceled (still available but not getting updates they say).. so a switch will be needed as fast as possible.

@vagner-dias
Copy link

vagner-dias commented Nov 13, 2024

OneDrive takes a smarter approach by downloading the file and folder structure from the server first and instantly replicating it on the local system. This ends up being more efficient than how Nextcloud does it, where it downloads the entire structure first and only then starts creating it locally.

It also looks like Nextcloud uses just one thread to handle both downloading and syncing, while OneDrive splits the work into two threads: one for downloading data into a buffer and another for reading from that buffer to create the local structure. This split approach helps OneDrive sync files faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement enhancement of a already implemented feature/code feature: 💽 virtual filesystem Performance 🚀
Projects
None yet
Development

No branches or pull requests