Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix shared fs problem #22

Merged
merged 1 commit into from
Aug 9, 2024
Merged

Fix shared fs problem #22

merged 1 commit into from
Aug 9, 2024

Conversation

dadav
Copy link
Owner

@dadav dadav commented Aug 9, 2024

This is an attempt to fix #19

@dadav dadav marked this pull request as ready for review August 9, 2024 19:16
@dadav dadav merged commit 1d6f2ee into main Aug 9, 2024
5 checks passed
@johnwarburton
Copy link
Contributor

johnwarburton commented Aug 12, 2024

Hi @dadav

Thanks for looking at this. I have noticed two things with this patch for file locking:

  1. Pattern of lock / read / remove lock / lock. This leaves all files locked. I assume the last lock is not needed?
  2. When the gorge process is shut down, all lock files are left in place
2024-08-12T11:56:39.119+1000    DEBUG   backend/filesystem.go:350       locking /var/tmp/gorge/modules/saz-timezone/saz-timezone-6.3.0.tar.gz.lock

2024-08-12T11:56:39.119+1000    DEBUG   backend/filesystem.go:356       Reading /var/tmp/gorge/modules/saz-timezone/saz-timezone-6.3.0.tar.gz

2024-08-12T11:56:39.119+1000    DEBUG   backend/filesystem.go:361       removing lock /var/tmp/gorge/modules/saz-timezone/saz-timezone-6.3.0.tar.gz.lock

2024-08-12T11:56:39.120+1000    DEBUG   backend/filesystem.go:220       locking /var/tmp/gorge/modules/saz-timezone/saz-timezone-6.3.0.tar.gz.lock

2024-08-12T11:56:39.120+1000    INFO    cmd/serve.go:246        Listen on 0.0.0.0:7777

^C2024-08-12T11:57:32.412+1000  DEBUG   cmd/serve.go:293        Shutting down server (timeout: 5s)

[root@sypls059 gorge]# ls -l /var/tmp/gorge/modules/saz-timezone/saz-timezone-6.3.0.tar.gz.lock
-rwxr-x--- 1 root root 0 Aug 12 11:46 /var/tmp/gorge/modules/saz-timezone/saz-timezone-6.3.0.tar.gz.lock

Thanks

John

@johnwarburton
Copy link
Contributor

Actually, digging further, I put this into a chart that mounted an NFS filesystem as documented where we experienced the problem, and the file locking mechanism would not work at all - I assume flock is being called - which does not work on NFS

Would it be simpler to unpack the file into a separate unique directory, say .gorge/$(uuidgen) or equivalent?

@johnwarburton
Copy link
Contributor

Ok, I have dug further into the code and data structures and the file locking "problem" only comes about from my naive implementation of being able to read for new modules by looping over LoadModules

This naive approach wasn't suitable for LoadModules which was written to run only once, so:

  • Module and Release data structures were initialised every time
  • If there was an existing release, it was treated as an error
  • Release files were always written

I think most of the issues came from reading the Release file tar.gz from the NFS share on one replica while another replica was writing that same file - hence 'unexpected end of file' type errors

I have made changes so that

  • LoadModules is run at startup, when ModulesScanSec is set (previous would sleep ModulesScanSec first before scanning)
  • Module and Release data structures are only initialised once
  • If there is an existing release, return that from memory rater than reading files
  • Only write release files if they do not exist

I have created PR #23 for these changes

Thanks

John

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Temporary file management on shared filesystems
2 participants