-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
emphermal storage for very large tar ~100GB #57
Comments
I think CSI driver should work in this use case. @ashish01987 do you see any errors? As you mentioned, gcsfuse uses a temp directory for staging files, as a result, please consider increasing the sidecar container ephemeral-storage-limit so that gcsfuse has enough space for staging the files. See the GKE documentation for more information. |
.. |
Thanks for the quick response. I created a tar file "backup.tar" of 30GB directly on the GCS mounted bucket (mounted by csi side car). And did not find any issues with it. Just one question here If yes, i am bit concern about case where "backup.tar" size go on increasing (maybe 100GB or more due to regular backups) and sufficient node ephermeral storage is not available. In this case, one may have increase the nodes ephermeral storage manually which might cause downtime for cluster (probably ?) I see that csi side car uses this "gke-gcsfuse-tmp" mount point from emptyDir{} for staging files before uploading
It will be great if allocating storage from regular persistent disk (or nfs share ) is supported here for gke-gcsfuse-tmp. In that way we can allocate any amount of storage without changing the nodes ephermeral storage (and avoiding cluster downtime) I tried something like this
However it did not work as the deployment did not start and was not able to find the csi side car. Probably some validations are in place to check if "gke-gcsfuse-tmp" is using emptyDir: {} only ? Maybe supporting emptyDir: {} as well allocation storage from pvc as above for "gke-gcsfuse-tmp" might be beneficial (if implementation is feasible). @songjiaxun Let me know your thoughts on this |
@songjiaxun any thoughts on this ? |
Hi @ashish01987 , thanks for testing out the staging file system. To answer your question, yes, in current design, the volume The GCS FUSE team is working on write-through features, which means the staging volume may not be needed in the future release. @sethiay and @Tulsishah, could you share more information about the write-through feature? And will the write-through feature support this "tar file" use case? Meanwhile, @judemars FYI as you may need to add a new volume to the sidecar container for the read caching feature. |
Thanks @songjiaxun for looping us in. Currently, we are evaluating to support write-through feature in GCSFuse i.e. to allow users to write directly to GCS without buffering on local disk. Given that tar works now with GCSFuse, we expect it to work with write-through feature as well. |
what is the expected timeline for the write through feature ? |
@ashish01987 Currently, we don't have any timelines to share. |
@songjiaxun since we dont know timeline for write through feature, as a work around can we disable this validation logic code. and support allocation storage from any pvc for gke-gcsfuse-tmp i.e the storage can be allocated from persistent disk instead of nodes emphermal storage ? in that way the customer using gcsfuse csi will never face issue like "insufficient ephermal storage" Not sure but such issues can arise in cluster where multiple pods are having their own gcs-csi-side car instance |
@ashish01987 , thanks for the suggestion. As we have more and more customers reporting "insufficient ephermal storage" issues, we are exploring the possibilities to allow users to use other media volumes rather than emptyDir for the write staging. FYI @judemars . |
For time being is it possible to make this validation
So that I define side car with gke-gcsfuse-tmp pointing to PVC `apiVersion: v1 kind: Pod metadata: name: sidecar-test spec:
|
I see that writing very large files like 70GB (.tar file) will fail if that much ephemeral storage is not present on the node |
@ashish01987 , thanks for the suggestion, and reporting the issue. I am actively working on skipping the validation and will keep you posted. |
Thanks for looking into this. May be it will be great if "claimName: my-pvc-backup" for gke-gcsfuse-tmp` can be passed as parameter through annotation on the pod. |
I am working on a new feature to allow you to specify a separate volume for the write buffering. I will keep you posted. |
To not cross-posting I think that in the meantime we could still better notify the user about the sidecard specific |
@songjiaxun In the meant time is there a temp workaround to monitor ephemeral occupancy with a |
Hi @bhack, because the gcsfuse sidecar container is a distroless container, it means you cannot run any bash commands using We are rolling out the feature to support custom volumes for write buffering. The new feature should be available soon. Meanwhile, if you are experiencing ephemeral storage limit issues, consider setting the pod annotation |
Is this ok in Autopilot or will it be rejected? |
Oh sorry, I forgot the context of Autopilot. Unfortunately, no, Is your application writing large files back to the bucekt? |
Not so large. The main pod it is quite complex but it seems it is not writing anything other then on the csi-gcsfuse mounted volumes. It is why we need something to monitoring the sidecar pressure/occupancy on the ephemeral (and likely on CPU and MEM). It is both to debug when things fail and to create a residual margin on resources planning. Just to focus on the ephemeral point In the sidecar log I see Can you monitor that |
Thanks for the information @bhack . Yes, we do plan to add more logs and warnings to make the ephemeral storage usage more observable. Can I know your node type? What compute class or hardware configuration are you using? So on Autopilot, for most of the compute classes, the maximum ephemeral storage you can use is 10Gi. So you can use annotation |
Is the container image part of the node ephemeral? but I don't think it is part of the pod ephemeral request or not? Cause if the image it is not part of the pod request, if we are going to request 4Gi or 5Gi Edit: |
Hi @bhack,
|
The main problem is still auditing the gcsfuse sidecar Vs the pod. |
@bhack, yes, it makes sense. I will let you know when the warning logs are ready. |
Thanks, I hope that we could add this also for CPU and Memory later especially as in Autopilot we cannot set sidecar resources to "0". |
Hi @bhack , I wanted to use the same fs metrics collection approach Kubernetes uses, for example, to use However, i believe the I am exploring other approaches to just calculate the buffer volume usage. |
Are they not using the same |
What do you think about kubernetes/kubernetes#121489 ? |
Relevant info is now coverered in https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#prepare-mount |
As this doc https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#buffer-volume mentioned, for large file write operation, please consider increasing the size of the write buffer volume. Closing this case. |
I have a local folder (backup) of ~100GB files , if i directly tar the folder onto bucket e.g tar -cf /tmp/bucketmount/backup.tar /backup/ , will there be any issues with csi driver ? I see that gcsfuse csi depends on tempDir{} or some temp directore for staging files before they are uploaded to bucket.
The text was updated successfully, but these errors were encountered: