-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gluster FS possibly causing high latency on high workload #127
Comments
just a suggestion, but you could enable accelerated networking on your already deployed environment and retest, that could give us a good insight into network latency impact on this |
Thanks for the suggestion. I believe @hosungs is running some tests on this as I type. |
Yes. I already tried the AN on all VMs, and the latency number wasn't much better (and still far from being acceptable). I even suspected if the different subnet for the Gluster VMs could be a reason, and tried another load testing with the gluster VMs moved to the web subnet, and the latency number was still far from being acceptable. At this point, I exhausted all possible workarounds to improve latency with Gluster FS under high workload... |
Could you please share your testing methodology and exact results? I would like to conduct similar test in our deployments for comparison. |
Sure, but the methodology has been always available at https://github.com/Azure/Moodle/tree/master/loadtest . In that README.md, there's a link to a shared Excel spreadsheet which shows many test results, though my most recent ones for this specific engagement are not entered there. The jMeter results zip files are rather big and not appropriate to check in here, so I hope that I can send them to you by email? Please feel free to email me. My email address is available on my GitHub profile page at https://github.com/hosungsmsft . I should mention that the specific test plan for this Gluster perf issue is just checked in (merged to master) as https://github.com/Azure/Moodle/blob/master/loadtest/time-gated-exam-test.jmx . You'll need to change the host name and other params as needed for your case. If you are not familiar with jMeter, I'd be happy to walk you through on that in another way. Thanks. |
On a similar topic. If you have specific scenarios that need testing we would encourage you to issue PR with a test plan. As a community we can't guarantee run all test plans, but the more we have access to the more we will collectively run as we validate improvements to the templates. |
We've moved to Azure Files Premium over GlusterFS as the default file-share. While the testing is getting wrapped up, will keep this issue open for any input and will close in the coming days/weeks. |
We've been experiencing high latency with Gluster FS on the time-gated exam scenario. We don't know exactly if Gluster is the issue and why in that case, but replacing Gluster with NFS makes the high latency issue go away, so naturally suspecting Gluster.
We'll need to dive deeper on how Gluster works and why it might cause perf bottleneck like we've been experiencing. In the meantime, we might need to provide an alternative, like HA NFS that's described in places like the following:
These are pretty old, and still the top results I get from my related web search, making me think that this is not a widely used solution, but we should evaluate the option. Any PR of an ARM template deploying an HA NFS 2-VM cluster would be highly appreciated.
The text was updated successfully, but these errors were encountered: