Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bugfix, enhancement] enable proper GPU offloading with fp64 support when DPCtl unavailable #2152

Open
wants to merge 73 commits into
base: main
Choose a base branch
from

Conversation

icfaust
Copy link
Contributor

@icfaust icfaust commented Nov 5, 2024

Description

This corrects a circular import issue with onedal.utils.validation when trying to create interfaces to the onedal backend in that file, the _device_offload imports from validation, and trying to use policy in the validation file will make a loop. By moving the check to C++, it creates a better interface, and removes the need for the _device_offload import in _policy.py entirely.


PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
  • I have provided justification why performance has changed or why changes are not expected.
  • I have provided justification why quality metrics have changed or why changes are not expected.
  • I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

@icfaust
Copy link
Contributor Author

icfaust commented Nov 5, 2024

/intelci: run

@icfaust icfaust marked this pull request as draft November 7, 2024 12:11
Comment on lines +33 to +35
.def(py::init([](const py::int_& obj) {
return get_queue_by_pylong_pointer(obj);
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please share the case when it is needed? Does it covered by tests?

Copy link
Contributor Author

@icfaust icfaust Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment somewhere that this is needed to accept pytorch tensors.
I think the better place for the comment is along with the function's definition. But it's up to you to decide about the comment's placement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now left a note @Vika-F please let me know what you think!

onedal/common/sycl_interfaces.cpp Outdated Show resolved Hide resolved
@@ -102,6 +106,7 @@ namespace oneapi::dal::python {
#else
#ifdef ONEDAL_DATA_PARALLEL
PYBIND11_MODULE(_onedal_py_dpc, m) {
init_sycl(m);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not this be sycl_interfaces ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samir-nasibli would I need to rename sycl.cpp to sycl_interfaces.cpp, and if so, do you want me to move the contents of sycl.cpp to sycl_interfaces.cpp?

@icfaust
Copy link
Contributor Author

icfaust commented Nov 12, 2024

Note to reviewers: we must figure out what to do with the deselections before merging this PR. Some of these issues are only 32 bit GPU relevant.

Could you please bring more details about the issue?

Intel Max GPUs support fp64 computation. The dummy sycl queue assumes all gpus cannot compute using doubles. The dummy sycl queue is used when DPCtl is not installed, which is the case in testing sklearn conformance with GPU. This down conversion from double to float is causing results to be less precise and leading to many of the deselected gpu tests. I think there will be a follow-up PR which will differentiate GPU deselections based on hardware characteristics, at the moment I will leave all deselections.

icfaust and others added 3 commits November 12, 2024 10:27
Co-authored-by: Andreas Huber <9201869+ahuber21@users.noreply.github.com>
#ifdef ONEDAL_DATA_PARALLEL

void instantiate_sycl_interfaces(py::module& m){
py::class_<sycl::queue> syclqueue(m, "SyclQueue");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write a comment about the purpose of this class. I.e. that it implements sycl queue interface in case dpctl's sycl queue is not available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, let me know what you think

@icfaust
Copy link
Contributor Author

icfaust commented Nov 14, 2024

NOTE: now that #2160 is nearly merged, I would like that PR to be merged into main and then this PR rebased so that full testing is done.

@icfaust
Copy link
Contributor Author

icfaust commented Nov 15, 2024

/intelci: run

@icfaust
Copy link
Contributor Author

icfaust commented Nov 15, 2024

It seems that the previously implemented get_device_id isn't conformant with DPCtl, and checks on filter_string will be commented out in this PR for comparisons. A follow-up ticket will be made for correcting this, ideally when we do a DLPACK conformance rollout.

@icfaust
Copy link
Contributor Author

icfaust commented Nov 15, 2024

/intelci: run

@icfaust icfaust requested a review from Vika-F November 15, 2024 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants