Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce dispatching in oneAPI covariance algorithm #2527

Merged
merged 12 commits into from
Oct 17, 2023

Conversation

Vika-F
Copy link
Contributor

@Vika-F Vika-F commented Sep 26, 2023

Only batch algorithm was modified for now. Online part of the covariance algorithm will be modified further in a separate PR.

Changes proposed in this pull request:

  • Added computre_parameters_cpu and computre_parameters_gpu classes in covariance algorithm. Those classes encapsulate the dispatching functionality for CPU and GPU respectively.
  • Added test for dispatching functionality in covariance algorithm.

@Vika-F Vika-F added enhancement dpc++ Issue/PR related to DPC++ functionality labels Oct 4, 2023
@Vika-F
Copy link
Contributor Author

Vika-F commented Oct 5, 2023

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Oct 9, 2023

/intelci: run

@Vika-F Vika-F marked this pull request as ready for review October 10, 2023 08:09
Copy link
Contributor

@inteldimitrius inteldimitrius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ahuber21 ahuber21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few points for discussion and a request for more tests. Very interesting work!

@@ -35,9 +36,24 @@ template <typename Float, daal::CpuType Cpu>
using daal_covariance_kernel_t = daal_covariance::internal::
CovarianceDenseBatchKernel<Float, daal_covariance::Method::defaultDense, Cpu>;

template <typename Float, typename Task>
static auto convert_parameters(const detail::compute_parameters<Task>& params) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't use auto as return type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a user facing function

cpp/daal/include/algorithms/algorithm_base_mode.h Outdated Show resolved Hide resolved
inline auto implementation(const Policy& ctx,
const descriptor_base<Task>& desc,
const compute_parameters<Task>& params,
const compute_input<Task>& input) const {
using kernel_dispatcher_t = dal::backend::kernel_dispatcher< //
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the same // to achieve a line break was already in the code ... hmm, is this really the best way to do it?

@Vika-F
Copy link
Contributor Author

Vika-F commented Oct 11, 2023

/intelci: run

Copy link
Contributor

@ahuber21 ahuber21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments! I don't want to block progress here any longer.

@@ -35,9 +36,24 @@ template <typename Float, daal::CpuType Cpu>
using daal_covariance_kernel_t = daal_covariance::internal::
CovarianceDenseBatchKernel<Float, daal_covariance::Method::defaultDense, Cpu>;

template <typename Float, typename Task>
static auto convert_parameters(const detail::compute_parameters<Task>& params) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a user facing function

auto status = daal_hyperparameter.set(HyperparameterId::denseUpdateStepBlockSize, block);
interop::status_to_exception(status);

return daal_hyperparameter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to pass it by value? What about using std::shared_ptr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think shared_ptr is too heavy for this case.
It's better to just assign the result to const reference -> no passing by value and the lifetime is acceptable.
The full description of the trick is in GotW #88

Comment on lines +51 to +53
if (5000l < row_count && row_count <= 50000l) {
block_size = 1024l;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want dispatching for avx2 as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is replicated from DAAL. I think performance testing needed here to be sure that the change gives some benefit on AVX2.

I think this is outside of the scope of this PR.

params_t operator()(const context_gpu& ctx,
const detail::descriptor_base<Task>& desc,
const compute_input<Task>& input) const {
return params_t{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have blocking here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 79 to 83
GENERATE_DATAFRAME(te::dataframe_builder{ 1000, 20 }.fill_uniform(-30, 30, 7777),
te::dataframe_builder{ 100, 10 }.fill_uniform(0, 1, 7777),
te::dataframe_builder{ 100, 10 }.fill_uniform(-10, 10, 7777),
te::dataframe_builder{ 500, 40 }.fill_uniform(-100, 100, 7777),
te::dataframe_builder{ 500, 250 }.fill_uniform(0, 1, 7777));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want to have some non-trivial dispatch results. Please remove really small tests and extend with tests designed for wide and/or long datasets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Vika-F
Copy link
Contributor Author

Vika-F commented Oct 12, 2023

/intelci: run

@Vika-F Vika-F merged commit 85ad140 into oneapi-src:master Oct 17, 2023
14 checks passed
@Vika-F Vika-F deleted the dev/vsfedoto/cov_dispatcher branch January 17, 2024 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dpc++ Issue/PR related to DPC++ functionality enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants