[bugfix, enhancement] enable proper GPU offloading with fp64 support when DPCtl unavailable #2152

icfaust · 2024-11-05T13:27:19Z

Description

This corrects a circular import issue with onedal.utils.validation when trying to create interfaces to the onedal backend in that file, the _device_offload imports from validation, and trying to use policy in the validation file will make a loop. By moving the check to C++, it creates a better interface, and removes the need for the _device_offload import in _policy.py entirely.

PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
I have provided justification why performance has changed or why changes are not expected.
I have provided justification why quality metrics have changed or why changes are not expected.
I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

icfaust · 2024-11-05T21:11:12Z

/intelci: run

samir-nasibli · 2024-11-11T17:55:25Z

onedal/common/sycl.cpp

+        .def(py::init([](const py::int_& obj) {
+                return get_queue_by_pylong_pointer(obj);
+            })


Could you please share the case when it is needed? Does it covered by tests?

https://github.com/pytorch/pytorch/blob/main/torch/csrc/xpu/Stream.cpp#L72, waiting on a compiler compatibility fix here: pytorch/pytorch#139775

Please add a comment somewhere that this is needed to accept pytorch tensors.
I think the better place for the comment is along with the function's definition. But it's up to you to decide about the comment's placement.

I've now left a note @Vika-F please let me know what you think!

onedal/common/sycl_interfaces.cpp

samir-nasibli · 2024-11-11T18:00:13Z

onedal/dal.cpp

@@ -102,6 +106,7 @@ namespace oneapi::dal::python {
 #else
    #ifdef ONEDAL_DATA_PARALLEL
    PYBIND11_MODULE(_onedal_py_dpc, m) {
+        init_sycl(m);


Should not this be sycl_interfaces ?

@samir-nasibli would I need to rename sycl.cpp to sycl_interfaces.cpp, and if so, do you want me to move the contents of sycl.cpp to sycl_interfaces.cpp?

icfaust · 2024-11-12T08:52:00Z

Note to reviewers: we must figure out what to do with the deselections before merging this PR. Some of these issues are only 32 bit GPU relevant.

Could you please bring more details about the issue?

Intel Max GPUs support fp64 computation. The dummy sycl queue assumes all gpus cannot compute using doubles. The dummy sycl queue is used when DPCtl is not installed, which is the case in testing sklearn conformance with GPU. This down conversion from double to float is causing results to be less precise and leading to many of the deselected gpu tests. I think there will be a follow-up PR which will differentiate GPU deselections based on hardware characteristics, at the moment I will leave all deselections.

Co-authored-by: Andreas Huber <9201869+ahuber21@users.noreply.github.com>

Vika-F · 2024-11-13T13:20:48Z

onedal/common/sycl.cpp

+#ifdef ONEDAL_DATA_PARALLEL
+
+void instantiate_sycl_interfaces(py::module& m){
+    py::class_<sycl::queue> syclqueue(m, "SyclQueue");


Please write a comment about the purpose of this class. I.e. that it implements sycl queue interface in case dpctl's sycl queue is not available.

Done, let me know what you think

icfaust · 2024-11-14T08:41:46Z

NOTE: now that #2160 is nearly merged, I would like that PR to be merged into main and then this PR rebased so that full testing is done.

…t-learn-intelex into dev/dummysyclqueue

icfaust · 2024-11-15T07:47:25Z

/intelci: run

icfaust · 2024-11-15T09:12:27Z

It seems that the previously implemented get_device_id isn't conformant with DPCtl, and checks on filter_string will be commented out in this PR for comparisons. A follow-up ticket will be made for correcting this, ideally when we do a DLPACK conformance rollout.

icfaust · 2024-11-15T09:17:40Z

/intelci: run

carryover from intel#2126

a1c9df5

icfaust requested review from Alexsandruss and samir-nasibli as code owners November 5, 2024 13:27

icfaust and others added 7 commits November 7, 2024 09:08

Merge branch 'intel:main' into dev/dummysyclqueue

7a18a1d

looking ahead

92f8c03

attempt to get it to compile

8b08af0

forgotten :

0941eb2

attempts at fixes

3aaa59c

maybe?

ca25fdc

modify properties

36fc5a2

icfaust marked this pull request as draft November 7, 2024 12:11

icfaust and others added 18 commits November 7, 2024 13:52

modify properties

f20252f

maybe?

e5284db

another change

085c170

cleanup

d47b05f

Update policy.cpp

56ec2a6

add necessary features

350c5d7

attempt to fix compiling issues

0c03531

attempt to fix compiling issues

58202f4

try again

1ab22fd

try to deal with sycl queue pointers

70dd06f

missing value

f18abc9

try again

c21af6d

extract DummySyclQueue

4278de3

temporary solution to this issue...

e4129d5

changes just to test operation

2f4c476

Update _device_offload.py

bb04233

move to a central sycl storage

02ae4ed

last change

fd35cc9

samir-nasibli reviewed Nov 11, 2024

View reviewed changes

icfaust and others added 3 commits November 12, 2024 10:27

Update sycl_interfaces.cpp

660aaf1

Update deselected_tests.yaml

7bdb533

Update onedal/_device_offload.py

e59b4c5

Co-authored-by: Andreas Huber <9201869+ahuber21@users.noreply.github.com>

icfaust requested review from samir-nasibli and ahuber21 November 12, 2024 09:46

Vika-F reviewed Nov 13, 2024

View reviewed changes

icfaust and others added 3 commits November 13, 2024 16:39

Update _device_offload.py

f3c6a80

add tests and fix errors observed in CI

781271b

merge upstream

589c5d9

icfaust and others added 12 commits November 14, 2024 10:23

add missing file

a55a56f

fix error in _is_dpc_backend

4cb1564

fix coding issues

1f287fe

fix one of many mistakes

b4d137c

Merge branch 'intel:main' into dev/dummysyclqueue

563612f

switch to filter_string

6710772

Merge branch 'dev/dummysyclqueue' of https://github.com/icfaust/sciki…

f2161cb

…t-learn-intelex into dev/dummysyclqueue

Update test_sycl.py

91522f4

Update test_sycl.py

e71e7c0

Update test_sycl.py

89e369b

formatting

e237c2c

add requested comments

d43ca05

remove filter_string checks and add comment

b083ff3

remove unneccessary comment out

4e3e82d

icfaust requested a review from Vika-F November 15, 2024 09:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix, enhancement] enable proper GPU offloading with fp64 support when DPCtl unavailable #2152

[bugfix, enhancement] enable proper GPU offloading with fp64 support when DPCtl unavailable #2152

icfaust commented Nov 5, 2024

icfaust commented Nov 5, 2024

samir-nasibli Nov 11, 2024

icfaust Nov 12, 2024 •

edited

Loading

Vika-F Nov 13, 2024

icfaust Nov 15, 2024

samir-nasibli Nov 11, 2024

icfaust Nov 12, 2024

icfaust commented Nov 12, 2024

Vika-F Nov 13, 2024

icfaust Nov 15, 2024

icfaust commented Nov 14, 2024

icfaust commented Nov 15, 2024

icfaust commented Nov 15, 2024

icfaust commented Nov 15, 2024

[bugfix, enhancement] enable proper GPU offloading with fp64 support when DPCtl unavailable #2152

Are you sure you want to change the base?

[bugfix, enhancement] enable proper GPU offloading with fp64 support when DPCtl unavailable #2152

Conversation

icfaust commented Nov 5, 2024

Description

icfaust commented Nov 5, 2024

samir-nasibli Nov 11, 2024

Choose a reason for hiding this comment

icfaust Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

Vika-F Nov 13, 2024

Choose a reason for hiding this comment

icfaust Nov 15, 2024

Choose a reason for hiding this comment

samir-nasibli Nov 11, 2024

Choose a reason for hiding this comment

icfaust Nov 12, 2024

Choose a reason for hiding this comment

icfaust commented Nov 12, 2024

Vika-F Nov 13, 2024

Choose a reason for hiding this comment

icfaust Nov 15, 2024

Choose a reason for hiding this comment

icfaust commented Nov 14, 2024

icfaust commented Nov 15, 2024

icfaust commented Nov 15, 2024

icfaust commented Nov 15, 2024

icfaust Nov 12, 2024 •

edited

Loading