Skip to content

Commit

Permalink
Merge branch 'main' into dev/skopienko/optimize_merge_sort_V1
Browse files Browse the repository at this point in the history
Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>

# Conflicts:
#	include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge.h
#	include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
  • Loading branch information
SergeyKopienko committed Dec 19, 2024
2 parents 263a09d + e65fcd2 commit caf7693
Show file tree
Hide file tree
Showing 62 changed files with 496 additions and 131 deletions.
1 change: 0 additions & 1 deletion .github/workflows/ci-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ on:
branches: [main]
pull_request:
branches:
- release_oneDPL
- main
- 'release/**'

Expand Down
1 change: 0 additions & 1 deletion .github/workflows/ci-testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ on:
branches: [main]
pull_request:
branches:
- release_oneDPL
- main
- 'release/**'
paths:
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ We welcome community contributions to oneAPI DPC++ Library (oneDPL). You can:

# License

oneDPL is licensed under the terms in [LICENSE](https://github.com/oneapi-src/oneDPL/blob/release_oneDPL/licensing/LICENSE.txt).
oneDPL is licensed under the terms in [LICENSE](https://github.com/oneapi-src/oneDPL/blob/main/LICENSE.txt).
By contributing to the project, you agree to the license and copyright terms therein and
release your contribution under these terms.

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ for more information.
Visit the latest [Release Notes](https://github.com/oneapi-src/oneDPL/blob/main/documentation/release_notes.rst).

## License
oneDPL is licensed under [Apache License Version 2.0 with LLVM exceptions](https://github.com/oneapi-src/oneDPL/blob/release_oneDPL/licensing/LICENSE.txt).
oneDPL is licensed under [Apache License Version 2.0 with LLVM exceptions](https://github.com/oneapi-src/oneDPL/blob/main/LICENSE.txt).
Refer to the [LICENSE](licensing/LICENSE.txt) file for the full license text and copyright notice.

## Security
Expand All @@ -31,7 +31,7 @@ for information on how to report a potential security issue or vulnerability.
You can also view the [Security Policy](SECURITY.md).

## Contributing
See [CONTRIBUTING.md](https://github.com/oneapi-src/oneDPL/blob/release_oneDPL/CONTRIBUTING.md) for details.
See [CONTRIBUTING.md](https://github.com/oneapi-src/oneDPL/blob/main/CONTRIBUTING.md) for details.

## Documentation

Expand Down
31 changes: 17 additions & 14 deletions cmake/templates/oneDPLConfig.cmake.in
Original file line number Diff line number Diff line change
Expand Up @@ -117,26 +117,29 @@ if (EXISTS "${_onedpl_headers}")
endif()

# Check SYCL support by the compiler
check_cxx_compiler_flag("-fsycl" _fsycl_option)
set(FSYCL_OPTION "-fsycl")
check_cxx_compiler_flag(${FSYCL_OPTION} _fsycl_option)
if (_fsycl_option)
CHECK_INCLUDE_FILE_CXX("sycl/sycl.hpp" _sycl_header "-fsycl")
if (NOT _sycl_header)
CHECK_INCLUDE_FILE_CXX("CL/sycl.hpp" _sycl_header_old "-fsycl")
endif()
if (_sycl_header OR _sycl_header_old)
set(_sycl_support TRUE)
endif()
set(FSYCL_OPTION_IF_SUPPORTED ${FSYCL_OPTION})
endif()

CHECK_INCLUDE_FILE_CXX("sycl/sycl.hpp" SYCL_HEADER ${FSYCL_OPTION_IF_SUPPORTED})
if (NOT SYCL_HEADER)
CHECK_INCLUDE_FILE_CXX("CL/sycl.hpp" SYCL_HEADER_OLD ${FSYCL_OPTION_IF_SUPPORTED})
endif()
if (SYCL_HEADER OR SYCL_HEADER_OLD)
set(SYCL_SUPPORT TRUE)
endif()

if (_sycl_support)
if (SYCL_SUPPORT)
# Enable SYCL* with compilers/compiler drivers not passing -fsycl by default
if (NOT CMAKE_CXX_COMPILER MATCHES ".*(dpcpp-cl|dpcpp)(.exe)?$")
message(STATUS "Adding -fsycl compiler option")
target_compile_options(oneDPL INTERFACE -fsycl)
target_link_libraries(oneDPL INTERFACE -fsycl)
if (_fsycl_option AND NOT CMAKE_CXX_COMPILER MATCHES ".*(dpcpp-cl|dpcpp)(.exe)?$")
message(STATUS "Adding ${FSYCL_OPTION} compiler option")
target_compile_options(oneDPL INTERFACE ${FSYCL_OPTION})
target_link_libraries(oneDPL INTERFACE ${FSYCL_OPTION})
endif()
else()
message(STATUS "oneDPL: -fsycl compiler option is not supported or sycl.hpp is not available, set ONEDPL_USE_DPCPP_BACKEND=0")
message(STATUS "oneDPL: SYCL is not supported. Set ONEDPL_USE_DPCPP_BACKEND=0")
set_property(TARGET oneDPL APPEND PROPERTY INTERFACE_COMPILE_DEFINITIONS ONEDPL_USE_DPCPP_BACKEND=0)
endif()
endif()
Expand Down
2 changes: 1 addition & 1 deletion documentation/library_guide/notices_disclaimers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ License

oneDPL is licensed under Apache License Version 2.0 with LLVM exceptions.

Refer to the `LICENSE <https://github.com/oneapi-src/oneDPL/blob/release_oneDPL/licensing/LICENSE.txt>`_ file for the full license text and copyright notice.
Refer to the `LICENSE <https://github.com/oneapi-src/oneDPL/blob/main/LICENSE.txt>`_ file for the full license text and copyright notice.



11 changes: 10 additions & 1 deletion documentation/library_guide/onedpl_gsg.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,15 @@ page for:
Install the `Intel® oneAPI Base Toolkit (Base Kit) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html>`_
to use |onedpl_short|.

Additionally, to use |onedpl_short| and other Base Kit components on non-Intel GPUs install either the
`oneAPI for NVIDIA® GPUs plugin <https://developer.codeplay.com/products/oneapi/nvidia/home/>`_ or the
`oneAPI for AMD GPUs plugin <https://developer.codeplay.com/products/oneapi/amd/home/>`_.

See the |onedpl_short|
`System Requirements <https://www.intel.com/content/www/us/en/docs/onedpl/developer-guide/2022-7/intel-oneapi-dpc-library-introduction.html>`_
for details on the compiler support required to compile applications using |onedpl_short| on CPU using OpenMP* or Intel®
oneAPI Threading Building Blocks (oneTBB).

To use Parallel API, include the corresponding header files in your source code.

All |onedpl_short| header files are in the ``oneapi/dpl`` directory. Use ``#include <oneapi/dpl/…>`` to include them.
Expand Down Expand Up @@ -220,4 +229,4 @@ Find More
- Add oneAPI components to a Yocto project build using the meta-intel layers.
* - `oneAPI Samples Catalog <https://oneapi-src.github.io/oneAPI-samples/>`_
- Explore the complete list of oneAPI code samples in the oneAPI Samples Catalog (GitHub*).
These samples were designed to help you develop, offload, and optimize multiarchitecture applications targeting CPUs, GPUs, and FPGAs.
These samples were designed to help you develop, offload, and optimize multiarchitecture applications targeting CPUs, GPUs, and FPGAs.
2 changes: 1 addition & 1 deletion examples/convex_hull/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Correctness of the convex hull is checked by `std::any_of` algorithm using `coun

## License

This code example is licensed under [Apache License Version 2.0 with LLVM exceptions](https://github.com/oneapi-src/oneDPL/blob/release_oneDPL/licensing/LICENSE.txt). Refer to the "[LICENSE](licensing/LICENSE.txt)" file for the full license text and copyright notice.
This code example is licensed under [Apache License Version 2.0 with LLVM exceptions](https://github.com/oneapi-src/oneDPL/blob/main/LICENSE.txt). Refer to the "[LICENSE](licensing/LICENSE.txt)" file for the full license text and copyright notice.

## Building the 'Convex hull' Program

Expand Down
2 changes: 1 addition & 1 deletion examples/dot_product/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ This example contains the oneDPL-based implementation of dot product based on `s

## License

This code example is licensed under [Apache License Version 2.0 with LLVM exceptions](https://github.com/oneapi-src/oneDPL/blob/release_oneDPL/licensing/LICENSE.txt). Refer to the "[LICENSE](licensing/LICENSE.txt)" file for the full license text and copyright notice.
This code example is licensed under [Apache License Version 2.0 with LLVM exceptions](https://github.com/oneapi-src/oneDPL/blob/main/LICENSE.txt). Refer to the "[LICENSE](licensing/LICENSE.txt)" file for the full license text and copyright notice.

## Building the 'Dot product' Program

Expand Down
2 changes: 1 addition & 1 deletion examples/random/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ This example demonstrates how to use scalar and vector random number generation

## License

This code example is licensed under [Apache License Version 2.0 with LLVM exceptions](https://github.com/oneapi-src/oneDPL/blob/release_oneDPL/licensing/LICENSE.txt). Refer to the "[LICENSE](licensing/LICENSE.txt)" file for the full license text and copyright notice.
This code example is licensed under [Apache License Version 2.0 with LLVM exceptions](https://github.com/oneapi-src/oneDPL/blob/main/LICENSE.txt). Refer to the "[LICENSE](licensing/LICENSE.txt)" file for the full license text and copyright notice.

## Building the 'Random' Program for CPU and GPU

Expand Down
2 changes: 1 addition & 1 deletion include/oneapi/dpl/internal/version_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
// The library version
#define ONEDPL_VERSION_MAJOR 2022
#define ONEDPL_VERSION_MINOR 7
#define ONEDPL_VERSION_PATCH 0
#define ONEDPL_VERSION_PATCH 1

#if _ONEDPL___cplusplus >= 202002L && __has_include(<version>)
# include <version> // The standard C++20 header
Expand Down
69 changes: 26 additions & 43 deletions include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge.h
Original file line number Diff line number Diff line change
Expand Up @@ -217,50 +217,33 @@ __find_start_point_in(const _Rng1& __rng1, const _Index __rng1_from, _Index __rn
// to rng3 (starting from start3) in 'chunk' steps, but do not exceed the total size of the sequences (n1 and n2)
template <typename _Rng1, typename _Rng2, typename _Rng3, typename _Index, typename _Compare>
void
__serial_merge(const _Rng1& __rng1, const _Rng2& __rng2, _Rng3& __rng3, _Index __start1, _Index __start2,
const _Index __start3, const std::uint8_t __chunk, const _Index __n1, const _Index __n2, _Compare __comp)
__serial_merge(const _Rng1& __rng1, const _Rng2& __rng2, _Rng3& __rng3, const _Index __start1, const _Index __start2,
const _Index __start3, const _Index __chunk, const _Index __n1, const _Index __n2, _Compare __comp)
{
if (__start1 >= __n1)
{
//copying a residual of the second seq
const _Index __n = std::min<_Index>(__n2 - __start2, __chunk);
for (std::uint8_t __i = 0; __i < __n; ++__i)
__rng3[__start3 + __i] = __rng2[__start2 + __i];
}
else if (__start2 >= __n2)
{
//copying a residual of the first seq
const _Index __n = std::min<_Index>(__n1 - __start1, __chunk);
for (std::uint8_t __i = 0; __i < __n; ++__i)
__rng3[__start3 + __i] = __rng1[__start1 + __i];
}
else
const _Index __rng1_size = std::min<_Index>(__n1 > __start1 ? __n1 - __start1 : _Index{0}, __chunk);
const _Index __rng2_size = std::min<_Index>(__n2 > __start2 ? __n2 - __start2 : _Index{0}, __chunk);
const _Index __rng3_size = std::min<_Index>(__rng1_size + __rng2_size, __chunk);

const _Index __rng1_idx_end = __start1 + __rng1_size;
const _Index __rng2_idx_end = __start2 + __rng2_size;
const _Index __rng3_idx_end = __start3 + __rng3_size;

_Index __rng1_idx = __start1;
_Index __rng2_idx = __start2;

for (_Index __rng3_idx = __start3; __rng3_idx < __rng3_idx_end; ++__rng3_idx)
{
for (std::uint8_t __i = 0; __i < __chunk && __start1 < __n1 && __start2 < __n2; ++__i)
{
const auto& __val1 = __rng1[__start1];
const auto& __val2 = __rng2[__start2];
if (__comp(__val2, __val1))
{
__rng3[__start3 + __i] = __val2;
if (++__start2 == __n2)
{
//copying a residual of the first seq
for (++__i; __i < __chunk && __start1 < __n1; ++__i, ++__start1)
__rng3[__start3 + __i] = __rng1[__start1];
}
}
else
{
__rng3[__start3 + __i] = __val1;
if (++__start1 == __n1)
{
//copying a residual of the second seq
for (++__i; __i < __chunk && __start2 < __n2; ++__i, ++__start2)
__rng3[__start3 + __i] = __rng2[__start2];
}
}
}
const bool __rng1_idx_less_n1 = __rng1_idx < __rng1_idx_end;
const bool __rng2_idx_less_n2 = __rng2_idx < __rng2_idx_end;

// One of __rng1_idx_less_n1 and __rng2_idx_less_n2 should be true here
// because 1) we should fill output data with elements from one of the input ranges
// 2) we calculate __rng3_idx_end as std::min<_Index>(__rng1_size + __rng2_size, __chunk).
__rng3[__rng3_idx] =
((__rng1_idx_less_n1 && __rng2_idx_less_n2 && __comp(__rng2[__rng2_idx], __rng1[__rng1_idx])) ||
!__rng1_idx_less_n1)
? __rng2[__rng2_idx++]
: __rng1[__rng1_idx++];
}
}

Expand All @@ -279,7 +262,7 @@ struct __parallel_merge_submitter<_IdType, _CustomName, __internal::__optional_k
{
std::size_t base_diag_count = 0;
std::size_t steps_between_two_base_diags = 0;
std::uint8_t chunk = 0;
_IdType chunk = 0;
_IdType steps = 0;
};

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ struct __group_merge_path_sorter
template <typename _StorageAcc, typename _Compare>
bool
sort(const sycl::nd_item<1>& __item, const _StorageAcc& __storage_acc, _Compare __comp, std::uint32_t __start,
std::uint32_t __end, std::uint32_t __sorted, std::uint16_t __data_per_workitem,
std::uint32_t __end, std::uint32_t __sorted, std::uint32_t __data_per_workitem,
std::uint32_t __workgroup_size) const
{
const std::uint32_t __sorted_final = __data_per_workitem * __workgroup_size;
Expand Down Expand Up @@ -92,7 +92,8 @@ struct __group_merge_path_sorter
auto __in_ptr1 = __in_ptr + __start1;
auto __in_ptr2 = __in_ptr + __start2;

const auto __start = __find_start_point(__in_ptr1, __in_ptr2, __id_local, __n1, __n2, __comp);
const std::pair<std::uint32_t, std::uint32_t> __start =
__find_start_point(__in_ptr1, __in_ptr2, __id_local, __n1, __n2, __comp);
// TODO: copy the data into registers before the merge to halve the required amount of SLM
__serial_merge(__in_ptr1, __in_ptr2, __out_ptr, __start.first, __start.second, __id, __data_per_workitem,
__n1, __n2, __comp);
Expand Down Expand Up @@ -245,10 +246,10 @@ struct __merge_sort_global_submitter<_IndexT,

struct nd_range_params
{
std::size_t base_diag_count = 0;
std::size_t steps_between_two_base_diags = 0;
std::uint32_t chunk = 0;
std::size_t steps = 0;
std::size_t base_diag_count = 0;
std::size_t steps_between_two_base_diags = 0;
_IndexT chunk = 0;
_IndexT steps = 0;
};

struct WorkDataArea
Expand Down
24 changes: 18 additions & 6 deletions rfcs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,14 +66,15 @@ The "RFC" label can be used to mark PRs containing RFC/design proposals.

The RFC approval process generally follows the guidelines in the [UXL Foundation Operational Procedures](
https://github.com/uxlfoundation/uxl_operational_procedures/blob/release/Process_Documents/Organization_Operational_Process.md#review--approval-process).
Once two or more maintainers approve the PR, it is merged into the main branch
as an RFC proposed for implementation.
Once two or more maintainers approve the PR, it is merged into the main branch.

As the RFC moves to different states, use new PRs to update the RFC document
with additional information.
RFC documents can be developed iteratively at each stage. For example, an initial RFC
can be approved even if some details of the design or the API are not yet sufficiently
elaborated. In that case, subsequent revisions (new PRs) should update the document
in `rfcs/proposed`, adding the requested information.

A proposal that is subsequently implemented and released as an experimental feature
is moved into the `rfcs/experimental` folder.
is moved into the `rfcs/experimental` directory.
The RFC for such a feature should include a description
of what is required to move it from experimental to fully supported -- for
example, feedback from users, demonstrated performance improvements, etc.
Expand All @@ -84,9 +85,20 @@ changes and should therefore have a link to the section in the specification
with its formal wording.

A feature that is removed or a proposal that is abandoned or rejected will
be moved to the `rfcs/archived` folder. It should state the reasons for
be moved to the `rfcs/archived` directory. It should state the reasons for
rejection or removal.

There is no requirement that an RFC should pass all the stages in order.
A typical flow for an RFC would include at least `proposed` and `supported`;
however, any state can be skipped, depending on the progress and the needs.

For a document that describes a wide set of functionality or a general direction
and includes sub-RFCs for specific features, a few instances might simultaneously
reside in different states, adjusted as necessary to reflect the overall progress
on the direction and on its sub-proposals.

See the README files in respective directories for additional information.

## Document Style Recommendations

- Follow the document structure described in [template.md](template.md).
Expand Down
16 changes: 9 additions & 7 deletions rfcs/experimental/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,21 @@ released as experimental features in the library. An experimental
feature is expected to have an implementation that is of comparable quality
to a fully supported feature. Sufficient tests are required.

An experimental feature does not yet appear as part of the oneDPL
An experimental feature does not yet appear as part of the oneDPL
specification. Therefore, the interface and design can change.
There is no commitment to backward compatibility for experimental features.

The documents in this directory should include a list of the exit conditions
that need to be met to move the functionality from experimental to fully supported.
These conditions might include demonstrated performance improvements,
demonstrated interest from the community,
acceptance of the required specification changes, etc.
These conditions might include demonstrated performance improvements, demonstrated
interest from the community, acceptance of the required specification changes, etc.

For features that require specification changes, the document might
include wording for those changes or a link to any PRs opened
against the specification.
A document here needs to be updated if the corresponding feature undergoes
modifications while remaining experimental. Other changes, such as updates on the
exit conditions or on the implementation and usage experience, are also welcome.

For features that require specification changes prior to production, the document might
include wording for those changes or a link to any PRs opened against the specification.

Proposals in the `rfcs/experimental` directory do not remain there indefinitely.
They should move either to `rfcs/supported` when they become fully supported
Expand Down
4 changes: 3 additions & 1 deletion rfcs/proposed/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@ However, the proposed changes have not yet been implemented as a
preview or fully supported feature of the library.

RFCs in the `rfcs/proposed` directory should explain the motivation,
design, and open questions related to the proposed extension.
design, and open questions related to the proposed extension. There can be
several update iterations on a proposed RFC to clarify the necessary details
and address the questions before it is accepted for the implementation.
3 changes: 2 additions & 1 deletion rfcs/supported/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ changes in the oneDPL specification. The RFC document should, in that case,
have a link to the formal wording in the specification.

Proposals that appear in `rfcs/supported` may be retained indefinitely to
provide insight into the design of existing features.
provide insight into the design of existing features. They could be updated
over time if the corresponding functionality is extended or modified.
Loading

0 comments on commit caf7693

Please sign in to comment.