-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synchronization Issue between gcsfuse and Kubernetes Pod: 'No Such File or Directory' Error on File Update #44
Comments
Could you clarify the workflow a little bit?
I am trying to collect enough information to reproduce the issue on my end. Thank you! |
So first we download the files using shell script to the bucket which is mounted to a path and then we try to access the files after the download is complete. |
Hi @sethiay , could you help take a look at this issue? Seems like disabling cache and enabling implicit dir using |
@songjiaxun any luck with this? |
Just confirming if you are doing similar to the steps mentioned below: (a) Opening a file handle from GCSFuse mount. let's say f1 If you are doing differently, could you please share the gcsfuse logs for the steps you performed. Thanks, |
Any solution to this issue ? We have the same situation. |
Hi @raj-prince , seems like in @skyjacker2005 case, two gcsfuse instances cannot sync the file states immediately. Can you provide some suggestions here? Is this expected? |
To describe, let the two instances of gcsfuse mounted directory are Now, a couple of scenarioes Case 1: Reading from the same fileHandle forever:
If you perform the above operation, you will get an error `FileNotFoundError: [Errno 2] No such file or directory (assuming object versioning is not enabled in the mounted bucket). Case 2: Reading via different handle every time:
You will get the latest content always. Case 3 [very rare]: Read via different handle every time, but let's assume the latest updated object come with smaller generation no. (lexicographically) with respect to old one
This is a bug in GCSFuse, although it's very rare. So, we have a generation comparison where inode is updated only when the latest generation for the particular object is greater than the exist generation no. You can refer here to see the code: https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/internal/fs/fs.go#L877 So, |
Hi @raj-prince, we don't manage how exactly the code accesses to filesystem files. It's very strange that GCSFuse doesn't work as expected with a standard library that is used with success with other ReadyWriteMany file systems (ourselves are using the library with NFS versions 3 and 4 and the CephFS competitor). Thanks a lot |
@skyjacker2005 Could you please confirm are you using the same fileHandle to create the Unpicker object or not? In the meantime, I'll discuss within team regarding this behavior and come back to you. |
@raj-prince we use it to pick sessions we store in filesystem (to be available to all replicas). In our code I see: f = open(self.get_session_filename(sid), "rb") Then "f" is passed to a function executing: Where the default encoding is ASCII. unpickler.load fails with errors logged both from our code and gcsfuse (FileNotFoundError) Thanks again |
Thanks for the information! Ohh, that means the same "f" is passed to the function executing |
I don't undestand why do you think the same file handle is passed to the function Unpickler(stream, encoding=encoding). |
Got it. This is strange. There are two options now: Thanks, |
A gentle follow-up! @skyjacker2005 , could you please open a support-ticket as we need bucket-name, project-number, cluster-id to access GCS, gcsfuse logs? Otherwise, it's hard to debug. |
@raj-prince as soon as possibile we'll do it. Unfortunately I'm not the person in charge of the cloud project administration therefore I'm internally forwarding the question. Thanks again, |
Just to reiterate what was mentioned in #44 (comment): Mounting gcsfuse with In addition to what was discussed in #44 (comment), this issue can also arise when there are concurrent reads and writes to GCS from different GCSFuse mount points. Example Scenario:
Scenario:
Note: Reading via a new file handle would work successfully only if it file handle is opened after the file has been updated/written by the other mount. Next Steps: Please feel free to reopen this issue if you have any other questions. Thanks, |
I am encountering a synchronization issue between gcsfuse and a pod in Kubernetes environment. When I update the files in the Google Cloud Storage (GCS) bucket mounted by gcsfuse, the gcs fuse sidecar fails to access the updated files and throws an error.
Error Logs:
Configuration:
GKE Pod: Mounted GCS bucket using gcsfuse with the following configuration:
mountOptions: 'uid=101,gid=82'
Kubernetes Job: Updates the contents of the GCS bucket by replacing the existing files. Names are not changed.
Observations and Troubleshooting Steps Taken:
Anything that I'm missing here?
The text was updated successfully, but these errors were encountered: