Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track follow-up for adding CI image build to the Kind cluster #2364

Closed
1 task
sandipanpanda opened this issue Dec 27, 2024 · 9 comments
Closed
1 task

Track follow-up for adding CI image build to the Kind cluster #2364

sandipanpanda opened this issue Dec 27, 2024 · 9 comments
Assignees
Milestone

Comments

@sandipanpanda
Copy link
Member

Following the review #2346 (comment), we need to implement the suggestion made by @andreyvelich to build the image via CI and load it directly into the Kind cluster. This process will help us automatically build images from the code submitted in the PR, similar to the approach used in the Training Operator manager and LLM Trainer.

  • Cherry-pick the changes into the release-1.9 branch once the implementation is complete.
@andreyvelich
Copy link
Member

/assign @saileshd1402

Copy link

@andreyvelich: GitHub didn't allow me to assign the following users: saileshd1402.

Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @saileshd1402

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@helenxie-bit
Copy link
Contributor

I'm also happy to help if needed 😀

@saileshd1402
Copy link
Contributor

/assign

@saileshd1402
Copy link
Contributor

saileshd1402 commented Jan 9, 2025

I wanted some input on the following ideas:
Since there are other types of jobs in e2e (integration) tests that might need images to be built (example JAX, PyTorch, XGBoost), we can build all those images in one script similar to build-image.sh called scripts/gha/build-e2e-test-images.sh.

We also have multiple steps to load images into kind cluster in integration-tests, maybe we can make a small github action to load images into cluster given the images? so that we can consolidate the loading images steps into one.

Or should we just include JAX Job image for now?

cc: @andreyvelich @helenxie-bit @Electronic-Waste @sandipanpanda

@andreyvelich
Copy link
Member

We also have multiple steps to load images into kind cluster in integration-tests,

That sounds good, but we need to see if we can load all images into single Kind cluster, since previously we reach limits.
I would suggest we focus on JAX job for now, and in the future add more images if that will be required for Kubeflow Training V1.

@andreyvelich
Copy link
Member

/milestone v1.9

@andreyvelich andreyvelich added this to the v1.9 milestone Jan 12, 2025
@andreyvelich
Copy link
Member

This has been resolved.

@andreyvelich
Copy link
Member

Ref PR: #2385

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants