KEP-2170: Add unit and E2E tests for model and dataset initializers #2323

seanlaii · 2024-11-09T03:18:51Z

What this PR does / why we need it:
I added unit tests and e2e tests for model and dataset initializers.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #2305

Checklist:

Docs included if any changes are user facing

google-oss-prow · 2024-11-09T03:18:56Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign johnugeorge for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

seanlaii · 2024-11-09T03:19:57Z

pkg/initializer_v2/test/e2e/test_dataset.py

+            # Private HuggingFace dataset test
+            # (
+            #     "HuggingFace - Private dataset",
+            #     "huggingface",
+            #     {
+            #         "storage_uri": "hf://username/private-dataset",
+            #         "use_real_token": True,
+            #         "expected_files": ["config.json", "dataset.safetensors"],
+            #         "expected_error": None
+            #     }
+            # ),
+            # Invalid HuggingFace dataset test


Do we have an access token for testing login and downloading resources from private repo?

Not yet, maybe we can track this in a separate issue that we should create Kubeflow-owned account in HF for the Token.

seanlaii · 2024-11-09T03:21:10Z

pkg/initializer_v2/test/e2e/test_dataset.py

+        current_dir = os.path.dirname(os.path.abspath(__file__))
+        self.temp_dir = tempfile.mkdtemp(dir=current_dir)
+        os.environ[VOLUME_PATH_DATASET] = self.temp_dir


I currently test the dataset/model download by downloading resources to a temp folder and removing the temp folder after the test.

.github/workflows/test-python.yaml

seanlaii · 2024-11-09T03:26:19Z

pkg/initializer_v2/test/conftest.py

+@pytest.fixture
+def real_hf_token():
+    """Fixture to provide real HuggingFace token for E2E tests"""
+    token = os.getenv("HUGGINGFACE_TOKEN")
+    # if not token:
+    #     pytest.skip("HUGGINGFACE_TOKEN environment variable not set")
+    return token


If we have a private token, I will use this fixture to inject the token. If we don't, I can remove this.

coveralls · 2024-11-09T04:04:53Z

Pull Request Test Coverage Report for Build 12346097008

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall first build on initializer-test at 100.0%

Totals
Change from base Build 12345273877:	100.0%
Covered Lines:	85
Relevant Lines:	85

💛 - Coveralls

seanlaii · 2024-11-09T04:14:00Z

.github/workflows/integration-tests.yaml

          python3 -m pip install -e sdk/python; pytest -s sdk/python/test --log-cli-level=debug --namespace=default
        env:
          GANG_SCHEDULER_NAME: ${{ matrix.gang-scheduler-name }}

+      - name: Run specific tests for Python 3.10+


Since match is released in python 3.10, I created another step for the e2e.

Where do you use match in the tests ?

I didn't use match in the tests. match is used in https://github.com/kubeflow/training-operator/blob/master/pkg/initializer_v2/model/__main__.py#L23 and https://github.com/kubeflow/training-operator/blob/master/pkg/initializer_v2/dataset/__main__.py#L23

Oh, good point.
Let's actually use the same Python version that we use in our initializer images: https://github.com/kubeflow/training-operator/blob/master/cmd/initializer_v2/dataset/Dockerfile#L1.
E.g. Python 3.11

seanlaii · 2024-11-09T04:23:24Z

pkg/initializer_v2/test/e2e/test_dataset.py

+                "HuggingFace - Public dataset",
+                "huggingface",
+                {
+                    "storage_uri": "hf://karpathy/tiny_shakespeare",


Does anyone know which dataset/model in huggingface is suitable for the connectivity test?

@seanlaii Which connectivity test do you want to perform ?

I would like to test the actual downloading process and would like to know if there is any recommended dataset/model for testing. I currently choose a dataset that is only 1.11 MB.

seanlaii · 2024-11-26T01:39:54Z

Hi @andreyvelich ,

Could you help review this PR? I have some questions. Once the SDK's PR gets approved, I will modify it accordingly.

Thank you!

andreyvelich · 2024-11-26T14:50:27Z

@seanlaii Sorry for the delay, sure, I will review it today

pkg/initializer_v2/test/conftest.py

andreyvelich

Thank you for this effort @seanlaii!
I left my initial thoughts.
Please take a look @Electronic-Waste @deepanker13 @kubeflow/wg-training-leads @varshaprasad96 @akshaychitneni @saileshd1402

andreyvelich · 2024-11-26T15:40:23Z

.github/workflows/integration-tests.yaml

          python3 -m pip install -e sdk/python; pytest -s sdk/python/test --log-cli-level=debug --namespace=default
        env:
          GANG_SCHEDULER_NAME: ${{ matrix.gang-scheduler-name }}

+      - name: Run specific tests for Python 3.10+


Where do you use match in the tests ?

.github/workflows/test-python.yaml

pkg/initializer_v2/test/unit/dataset/test_dataset.py

pkg/initializer_v2/test/unit/model/test_model_config.py

pkg/initializer_v2/test/unit/model/test_model.py

pkg/initializer_v2/test/unit/test_utils.py

andreyvelich · 2024-11-27T23:32:33Z

pkg/initializer_v2/test/e2e/test_model.py

+from sdk.python.kubeflow.storage_initializer.constants import VOLUME_PATH_MODEL
+
+
+class TestModelE2E:


@seanlaii @kubeflow/wg-training-leads @deepanker13 @Electronic-Waste @saileshd1402 What do you think about actually using Kubernetes to perform E2E tests for our initializers ?
E.g. we can deploy a single Pod that runs two initContainer for initializers and one Container to just verify that model and dataset exists under /workspace/model and /workspace/dataset dirs.

In that case, in our E2Es we verify that our Docker containers actually work to initialize assets.

Do we see any values in tests that I propose compare to running just initializers Python scripts ?

@seanlaii @andreyvelich I think it would be better if we perform e2e tests for our initializers on Kubernetes, since we finally download models and datasest to Pod, not TempDir. Thus, we can discover more potential errors in the early stage when downloading datasets and models to Pod, which could not be tested by initializers Python scripts.

Yes, having the e2e tests for the Pod would be better. Should we consider having two sets of tests? One is like this for testing the Python script itself, the other for testing the initialization in the Pod.

From my perspective, it might be redundant. But I'm neutral on the final choice. WDYT👀 @andreyvelich

I think having unit tests and E2E tests where we create Kind cluster and run containers should be sufficient.
We might want to consider to use Golang and Ginkgo for E2E tests of initializer, so we will be consistent between our E2Es. That allows us to use Golang clients to create Kubernetes resources and run tests.

For example JobSet and Kueue is already using Ginkgo for their E2Es:

https://github.com/kubernetes-sigs/jobset/blob/main/test/e2e/e2e_test.go#L40

https://github.com/kubernetes-sigs/kueue/blob/main/test/e2e/singlecluster/e2e_test.go#L47

From my point of view, the tests that we have in this PR are: Unit and Integration tests.

@tenzen-y @kubeflow/wg-training-leads What is your perspective on this ?

So this PR focuses primarily on unit tests and integration tests, and we will have another e2e testing suite in #2213 , which will involve testing with a Kind Cluster. Is my understanding correct? Thank you!

That makes all perfect sense. As @tenzen-y said, I think it's a good compromise to have e2e tests covering the happy path / main use cases from an end-user perspective, and have unit / integration tests covering more, if not all, combinations, so corner and errors cases are also covered for all components.

So this PR focuses primarily on unit tests and integration tests, and we will have another e2e testing suite in #2213 , which will involve testing with a Kind Cluster. Is my understanding correct?

Yes, that is correct. We will have E2E tests when we create Jupyter Notebook with example that uses initializer.

What do we think about having integration tests for initializer that perform such use-cases ?

SGTM. If I understand correctly, you decide to create 3 types of test cases:

Unit tests: Mock and test the logic of function

Integration tests: Download models and datasets into TempDir and check the correctness of function

E2e tests: Run jupyter notebooks with papermill to make sure e2e examples for users are correct

It's a good plan from my perspective, since multiple testcase types will surely help us detect bugs in advance with the help of CI.

Yes, that is correct.

pkg/initializer_v2/test/unit/dataset/test_dataset.py

Signed-off-by: wei-chenglai <qazwsx0939059006@gmail.com>

seanlaii · 2024-12-21T19:39:03Z

Hi @andreyvelich , could you help review the PR? I addressed the comments. Thank you!

andreyvelich · 2025-01-07T18:22:11Z

Sorry for the delay @seanlaii!
I will review it this week.

Electronic-Waste

@seanlaii Thanks for your contributions! I left some comments for you.

As for the e2e test's pattern, we can discuss later with @andreyvelich :)

Electronic-Waste · 2025-01-08T03:01:38Z

pkg/initializer_v2/dataset/huggingface_test.py

+@pytest.fixture
+def huggingface_dataset_instance():
+    """Fixture for HuggingFace Dataset instance"""
+    from pkg.initializer_v2.dataset.huggingface import HuggingFace
+
+    return HuggingFace()
+


Could you please tell me why do we need this fixture? Maybe we could simply declare huggingface_dataset_instance = HuggingFace() in each UTs. WDYT👀 @andreyvelich @seanlaii

Thank you! Sounds good to me. I just put it there for abstracting the initialization of HugginFace instance which is used in all the test functions. But since currently the initialization process only involves one line of code, I agree with your approach.

Sounds good to me @Electronic-Waste.

Electronic-Waste · 2025-01-08T03:18:30Z

pkg/initializer_v2/model/huggingface_test.py

+@pytest.fixture
+def huggingface_model_instance():
+    """Fixture for HuggingFace Model instance"""
+    from pkg.initializer_v2.model.huggingface import HuggingFace
+
+    return HuggingFace()
+


Same suggestion as above.

andreyvelich · 2025-01-08T17:23:33Z

pkg/initializer_v2/dataset/huggingface_test.py

+@pytest.fixture
+def huggingface_dataset_instance():
+    """Fixture for HuggingFace Dataset instance"""
+    from pkg.initializer_v2.dataset.huggingface import HuggingFace
+
+    return HuggingFace()
+


Sounds good to me @Electronic-Waste.

andreyvelich · 2025-01-08T17:29:21Z

pkg/initializer_v2/model/huggingface_test.py

+            "Login failure",
+            {
+                "config": {
+                    "storage_uri": "hf://username/model-name",
+                    "access_token": "test_token",
+                },
+                "should_login": True,
+                "expected_repo_id": "username/model-name",
+                "mock_login_side_effect": Exception,
+                "mock_download_side_effect": None,
+                "expected_error": Exception,
+            },
+        ),
+        (
+            "Download failure",


I am wondering what is the value we see in these tests cases: Login failure and Download failure ?
What are we trying to test here ?

andreyvelich · 2025-01-08T17:30:55Z

pkg/initializer_v2/model/main_test.py

+            },
+        ),
+        (
+            "Model download failure",


The same question.

andreyvelich · 2025-01-08T17:41:01Z

sdk_v2/kubeflow/training/types/config_test.py

@@ -0,0 +1,38 @@
+import pytest


Do we really need this test case since we don't have any logic in our DataClasses ?
I think, if we are going to have some validation in our DataClasses, we can write unit test for it.

Gotcha. Will remove it.

google-oss-prow bot requested review from jinchihe and kuizhiqing November 9, 2024 03:18

google-oss-prow bot added the size/XL label Nov 9, 2024

seanlaii commented Nov 9, 2024

View reviewed changes

.github/workflows/test-python.yaml Outdated Show resolved Hide resolved

seanlaii commented Nov 9, 2024

View reviewed changes

seanlaii force-pushed the initializer-test branch from f4167e5 to f6345df Compare November 9, 2024 04:00

seanlaii force-pushed the initializer-test branch from f6345df to 1887c5b Compare November 9, 2024 04:11

seanlaii commented Nov 9, 2024

View reviewed changes

seanlaii force-pushed the initializer-test branch 4 times, most recently from 8930b80 to c6e0a83 Compare November 9, 2024 18:17

andreyvelich reviewed Nov 26, 2024

View reviewed changes

pkg/initializer_v2/test/conftest.py Outdated Show resolved Hide resolved

andreyvelich reviewed Nov 26, 2024

View reviewed changes

andreyvelich reviewed Nov 27, 2024

View reviewed changes

seanlaii force-pushed the initializer-test branch 5 times, most recently from 228c4b6 to 08fbd57 Compare December 16, 2024 05:39

KEP-2170: Add unit and E2E tests for model and dataset initializers

d867237

Signed-off-by: wei-chenglai <qazwsx0939059006@gmail.com>

seanlaii force-pushed the initializer-test branch from 08fbd57 to d867237 Compare December 21, 2024 19:34

Electronic-Waste reviewed Jan 8, 2025

View reviewed changes

andreyvelich reviewed Jan 8, 2025

View reviewed changes

		from sdk.python.kubeflow.storage_initializer.constants import VOLUME_PATH_MODEL


		class TestModelE2E:

KEP-2170: Add unit and E2E tests for model and dataset initializers #2323

Are you sure you want to change the base?

KEP-2170: Add unit and E2E tests for model and dataset initializers #2323

Conversation

seanlaii commented Nov 9, 2024

google-oss-prow bot commented Nov 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanlaii Nov 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Nov 9, 2024 • edited Loading

Pull Request Test Coverage Report for Build 12346097008

Details

💛 - Coveralls

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanlaii Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanlaii commented Nov 26, 2024 • edited Loading

andreyvelich commented Nov 26, 2024

andreyvelich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andreyvelich Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanlaii Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanlaii commented Dec 21, 2024

andreyvelich commented Jan 7, 2025

Electronic-Waste left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanlaii Nov 9, 2024 •

edited

Loading

coveralls commented Nov 9, 2024 •

edited

Loading

seanlaii Nov 27, 2024 •

edited

Loading

seanlaii commented Nov 26, 2024 •

edited

Loading

andreyvelich Nov 27, 2024 •

edited

Loading

seanlaii Jan 9, 2025 •

edited

Loading