-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2170: Add unit and E2E tests for model and dataset initializers #2323
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
# Private HuggingFace dataset test | ||
# ( | ||
# "HuggingFace - Private dataset", | ||
# "huggingface", | ||
# { | ||
# "storage_uri": "hf://username/private-dataset", | ||
# "use_real_token": True, | ||
# "expected_files": ["config.json", "dataset.safetensors"], | ||
# "expected_error": None | ||
# } | ||
# ), | ||
# Invalid HuggingFace dataset test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an access token for testing login and downloading resources from private repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet, maybe we can track this in a separate issue that we should create Kubeflow-owned account in HF for the Token.
current_dir = os.path.dirname(os.path.abspath(__file__)) | ||
self.temp_dir = tempfile.mkdtemp(dir=current_dir) | ||
os.environ[VOLUME_PATH_DATASET] = self.temp_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I currently test the dataset/model download by downloading resources to a temp folder and removing the temp folder after the test.
pkg/initializer_v2/test/conftest.py
Outdated
@pytest.fixture | ||
def real_hf_token(): | ||
"""Fixture to provide real HuggingFace token for E2E tests""" | ||
token = os.getenv("HUGGINGFACE_TOKEN") | ||
# if not token: | ||
# pytest.skip("HUGGINGFACE_TOKEN environment variable not set") | ||
return token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a private token, I will use this fixture to inject the token. If we don't, I can remove this.
f4167e5
to
f6345df
Compare
Pull Request Test Coverage Report for Build 12346097008Details
💛 - Coveralls |
f6345df
to
1887c5b
Compare
python3 -m pip install -e sdk/python; pytest -s sdk/python/test --log-cli-level=debug --namespace=default | ||
env: | ||
GANG_SCHEDULER_NAME: ${{ matrix.gang-scheduler-name }} | ||
|
||
- name: Run specific tests for Python 3.10+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since match
is released in python 3.10, I created another step for the e2e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you use match in the tests ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't use match
in the tests. match
is used in https://github.com/kubeflow/training-operator/blob/master/pkg/initializer_v2/model/__main__.py#L23 and https://github.com/kubeflow/training-operator/blob/master/pkg/initializer_v2/dataset/__main__.py#L23
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good point.
Let's actually use the same Python version that we use in our initializer images: https://github.com/kubeflow/training-operator/blob/master/cmd/initializer_v2/dataset/Dockerfile#L1.
E.g. Python 3.11
"HuggingFace - Public dataset", | ||
"huggingface", | ||
{ | ||
"storage_uri": "hf://karpathy/tiny_shakespeare", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anyone know which dataset
/model
in huggingface is suitable for the connectivity test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seanlaii Which connectivity test do you want to perform ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to test the actual downloading process and would like to know if there is any recommended dataset/model for testing. I currently choose a dataset that is only 1.11 MB.
8930b80
to
c6e0a83
Compare
Hi @andreyvelich , Could you help review this PR? I have some questions. Once the SDK's PR gets approved, I will modify it accordingly. Thank you! |
@seanlaii Sorry for the delay, sure, I will review it today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this effort @seanlaii!
I left my initial thoughts.
Please take a look @Electronic-Waste @deepanker13 @kubeflow/wg-training-leads @varshaprasad96 @akshaychitneni @saileshd1402
python3 -m pip install -e sdk/python; pytest -s sdk/python/test --log-cli-level=debug --namespace=default | ||
env: | ||
GANG_SCHEDULER_NAME: ${{ matrix.gang-scheduler-name }} | ||
|
||
- name: Run specific tests for Python 3.10+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you use match in the tests ?
from sdk.python.kubeflow.storage_initializer.constants import VOLUME_PATH_MODEL | ||
|
||
|
||
class TestModelE2E: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seanlaii @kubeflow/wg-training-leads @deepanker13 @Electronic-Waste @saileshd1402 What do you think about actually using Kubernetes to perform E2E tests for our initializers ?
E.g. we can deploy a single Pod that runs two initContainer for initializers and one Container to just verify that model and dataset exists under /workspace/model
and /workspace/dataset
dirs.
In that case, in our E2Es we verify that our Docker containers actually work to initialize assets.
Do we see any values in tests that I propose compare to running just initializers Python scripts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seanlaii @andreyvelich I think it would be better if we perform e2e tests for our initializers on Kubernetes, since we finally download models and datasest to Pod, not TempDir. Thus, we can discover more potential errors in the early stage when downloading datasets and models to Pod, which could not be tested by initializers Python scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, having the e2e tests for the Pod would be better. Should we consider having two sets of tests? One is like this for testing the Python script itself, the other for testing the initialization in the Pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective, it might be redundant. But I'm neutral on the final choice. WDYT👀 @andreyvelich
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having unit tests and E2E tests where we create Kind cluster and run containers should be sufficient.
We might want to consider to use Golang and Ginkgo for E2E tests of initializer, so we will be consistent between our E2Es. That allows us to use Golang clients to create Kubernetes resources and run tests.
For example JobSet and Kueue is already using Ginkgo for their E2Es:
- https://github.com/kubernetes-sigs/jobset/blob/main/test/e2e/e2e_test.go#L40
- https://github.com/kubernetes-sigs/kueue/blob/main/test/e2e/singlecluster/e2e_test.go#L47
From my point of view, the tests that we have in this PR are: Unit and Integration tests.
@tenzen-y @kubeflow/wg-training-leads What is your perspective on this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this PR focuses primarily on unit tests and integration tests, and we will have another e2e testing suite in #2213 , which will involve testing with a Kind Cluster. Is my understanding correct? Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes all perfect sense. As @tenzen-y said, I think it's a good compromise to have e2e tests covering the happy path / main use cases from an end-user perspective, and have unit / integration tests covering more, if not all, combinations, so corner and errors cases are also covered for all components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this PR focuses primarily on unit tests and integration tests, and we will have another e2e testing suite in #2213 , which will involve testing with a Kind Cluster. Is my understanding correct?
Yes, that is correct. We will have E2E tests when we create Jupyter Notebook with example that uses initializer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we think about having integration tests for initializer that perform such use-cases ?
SGTM. If I understand correctly, you decide to create 3 types of test cases:
- Unit tests: Mock and test the logic of function
- Integration tests: Download models and datasets into TempDir and check the correctness of function
- E2e tests: Run jupyter notebooks with papermill to make sure e2e examples for users are correct
It's a good plan from my perspective, since multiple testcase types will surely help us detect bugs in advance with the help of CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct.
228c4b6
to
08fbd57
Compare
Signed-off-by: wei-chenglai <qazwsx0939059006@gmail.com>
08fbd57
to
d867237
Compare
Hi @andreyvelich , could you help review the PR? I addressed the comments. Thank you! |
Sorry for the delay @seanlaii! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seanlaii Thanks for your contributions! I left some comments for you.
As for the e2e test's pattern, we can discuss later with @andreyvelich :)
@pytest.fixture | ||
def huggingface_dataset_instance(): | ||
"""Fixture for HuggingFace Dataset instance""" | ||
from pkg.initializer_v2.dataset.huggingface import HuggingFace | ||
|
||
return HuggingFace() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please tell me why do we need this fixture? Maybe we could simply declare huggingface_dataset_instance = HuggingFace()
in each UTs. WDYT👀 @andreyvelich @seanlaii
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Sounds good to me. I just put it there for abstracting the initialization of HugginFace instance which is used in all the test functions. But since currently the initialization process only involves one line of code, I agree with your approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me @Electronic-Waste.
@pytest.fixture | ||
def huggingface_model_instance(): | ||
"""Fixture for HuggingFace Model instance""" | ||
from pkg.initializer_v2.model.huggingface import HuggingFace | ||
|
||
return HuggingFace() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion as above.
@pytest.fixture | ||
def huggingface_dataset_instance(): | ||
"""Fixture for HuggingFace Dataset instance""" | ||
from pkg.initializer_v2.dataset.huggingface import HuggingFace | ||
|
||
return HuggingFace() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me @Electronic-Waste.
"Login failure", | ||
{ | ||
"config": { | ||
"storage_uri": "hf://username/model-name", | ||
"access_token": "test_token", | ||
}, | ||
"should_login": True, | ||
"expected_repo_id": "username/model-name", | ||
"mock_login_side_effect": Exception, | ||
"mock_download_side_effect": None, | ||
"expected_error": Exception, | ||
}, | ||
), | ||
( | ||
"Download failure", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering what is the value we see in these tests cases: Login failure and Download failure ?
What are we trying to test here ?
}, | ||
), | ||
( | ||
"Model download failure", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same question.
@@ -0,0 +1,38 @@ | |||
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this test case since we don't have any logic in our DataClasses ?
I think, if we are going to have some validation in our DataClasses, we can write unit test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Will remove it.
What this PR does / why we need it:
I added unit tests and e2e tests for model and dataset initializers.
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...
format, will close the issue(s) when PR gets merged):Fixes #2305
Checklist: