-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce dispatching in oneAPI covariance algorithm #2527
Introduce dispatching in oneAPI covariance algorithm #2527
Conversation
… in Covariance algorithm
…pute_parameters_impl classes
Pull changes from master
/intelci: run |
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few points for discussion and a request for more tests. Very interesting work!
@@ -35,9 +36,24 @@ template <typename Float, daal::CpuType Cpu> | |||
using daal_covariance_kernel_t = daal_covariance::internal:: | |||
CovarianceDenseBatchKernel<Float, daal_covariance::Method::defaultDense, Cpu>; | |||
|
|||
template <typename Float, typename Task> | |||
static auto convert_parameters(const detail::compute_parameters<Task>& params) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't use auto
as return type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not a user facing function
inline auto implementation(const Policy& ctx, | ||
const descriptor_base<Task>& desc, | ||
const compute_parameters<Task>& params, | ||
const compute_input<Task>& input) const { | ||
using kernel_dispatcher_t = dal::backend::kernel_dispatcher< // |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the same //
to achieve a line break was already in the code ... hmm, is this really the best way to do it?
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments! I don't want to block progress here any longer.
@@ -35,9 +36,24 @@ template <typename Float, daal::CpuType Cpu> | |||
using daal_covariance_kernel_t = daal_covariance::internal:: | |||
CovarianceDenseBatchKernel<Float, daal_covariance::Method::defaultDense, Cpu>; | |||
|
|||
template <typename Float, typename Task> | |||
static auto convert_parameters(const detail::compute_parameters<Task>& params) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not a user facing function
auto status = daal_hyperparameter.set(HyperparameterId::denseUpdateStepBlockSize, block); | ||
interop::status_to_exception(status); | ||
|
||
return daal_hyperparameter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to pass it by value? What about using std::shared_ptr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think shared_ptr
is too heavy for this case.
It's better to just assign the result to const reference -> no passing by value and the lifetime is acceptable.
The full description of the trick is in GotW #88
if (5000l < row_count && row_count <= 50000l) { | ||
block_size = 1024l; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want dispatching for avx2
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is replicated from DAAL. I think performance testing needed here to be sure that the change gives some benefit on AVX2.
I think this is outside of the scope of this PR.
params_t operator()(const context_gpu& ctx, | ||
const detail::descriptor_base<Task>& desc, | ||
const compute_input<Task>& input) const { | ||
return params_t{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have blocking here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think no, at least for now. I do not see any blocking here:
https://github.com/oneapi-src/oneDAL/blob/master/cpp/oneapi/dal/backend/primitives/stat/cov_dpc.cpp#L55
and here:
https://github.com/oneapi-src/oneDAL/blob/master/cpp/oneapi/dal/algo/covariance/backend/gpu/compute_kernel_dense_impl_dpc.cpp#L120
GENERATE_DATAFRAME(te::dataframe_builder{ 1000, 20 }.fill_uniform(-30, 30, 7777), | ||
te::dataframe_builder{ 100, 10 }.fill_uniform(0, 1, 7777), | ||
te::dataframe_builder{ 100, 10 }.fill_uniform(-10, 10, 7777), | ||
te::dataframe_builder{ 500, 40 }.fill_uniform(-100, 100, 7777), | ||
te::dataframe_builder{ 500, 250 }.fill_uniform(0, 1, 7777)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to have some non-trivial dispatch results. Please remove really small tests and extend with tests designed for wide and/or long datasets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/intelci: run |
Only batch algorithm was modified for now. Online part of the covariance algorithm will be modified further in a separate PR.
Changes proposed in this pull request: