-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bitnami/kuberay] Missing Cluster Role rules causes Ray Service to be in WaitForServeDeploymentReady #30648
Comments
Hi! Thank you so much for reporting. If I understood correctly, it seems that the RBAC rules may not be in sync with some changes in upstream. Would you like to submit a PR adding the missing rules? |
Hello @javsalgar Thank you very much! Sure I will create the PR. I hope it is the correct way to do it. |
…to solve bitnami#30648 Signed-off-by: Francisco Rivas <frivas@navteca.com>
Signed-off-by: Francisco Rivas <frivas@navteca.com>
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback. |
Hello dear Bitnami team, this issue is still open and there is a PR to solve it. #30665 |
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback. |
Name and Version
bitnami/kuberay 1.2.19
What architecture are you using?
amd64
What steps will reproduce the bug?
This is my first issue so I hope I can provide all the information required for a better understand and troubleshooting. I might even be wrong with this, so please, bear with me.
Context: The infrastructure is just deployed from scratch using TF. All apps/services are up and running except a Kuberay Worker (more details below). Using the help provider I deployed
kuberay-operator
with a few custom values (show below) and I create a sample Ray Service using TF's kubectl provide to deploy the manifest (shown below too).Kubernetes Cluster: AWS EKS
Helm:
Images:
In this cluster I have deployed another apps/services using Bitnami's charts.
Deploy your Kubernetes cluster as usual. Use helm to install bitnami's kuberay. Deploy a RayService and check kuberay-operator logs, as well as the RayService.
Are you using any custom parameters or values?
The reason I am adding the rbac rules and the service account account token is related to the apparent issue I am seeing. The reason I am adding the
RAYCLUSTER_DEFAULT_REQUEUE_SECONDS_ENV
is because in kuberay-operator's logs I see a message that states as the variable was not set it was using some other value, no biggie with this one, just explaining why I added that.The Ray Service I am using as an example is this one:
Note: I tried using version 2.39.0 as well just in case but the results are the same and as the Ray image being used by Bitnami's kuberay operator is 2.38 and it is advised to use the same one in the custom images, I created my app image using 2.38.
What is the expected behavior?
The Ray Service in running state and no messages in kuberay-operator logs.
What do you see instead?
When I see the Ray Service it is stuck in WaitForServeDeploymentReady
I have read that the worker group is normal that is 0/1, in fact even under this conditions the app works.
I also see these in the kuberay-operator logs:
Additional information
Stuff that I have tried:
When I edit the cluster roles, adding the
endpoints
resource the Ray Service status changed to Running and the messages in the kuberay-operator log are no longer there.What I did was:
$ kubectl edit clusterrole kuberay-kuberay-operator -n kuberay ... - apiGroups: - "" resources: - endpoints verbs: - list - watch ...
If there is in fact an issue and not my mistake adding these in the wrong place. The change will be in clusterrole.yaml adding the resources and verbs.
I have forked the project and, if this is in fact something to be fixed, I am ready to create a PR wit the solution described above.
I hope I am not missing anything.
Update 11/27/2024: I can see the required rules are in the Ray project chart helper
The text was updated successfully, but these errors were encountered: