-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial support for RISCV vector extension #1716
base: main
Are you sure you want to change the base?
Conversation
Thanks a lot for this huge contribution. I will add the RiscV compiler to our CI image so we can have the tests running. We are gonna review this soonish. |
include/eve/detail/spy.hpp
Outdated
@@ -808,6 +814,11 @@ namespace avx512 | |||
# endif | |||
# endif | |||
#endif | |||
#if !defined(SPY_SIMD_DETECTED) && defined(__riscv) && defined(__riscv_vector) | |||
# define SPY_SIMD_DETECTED ::spy::detail::simd_version::rvv_ | |||
# define SPY_SIMD_IS_RISCV_FLEXIBLE_SVE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be just SPY_SIMD_IS_RISCV_FLEXIBLE
?
Also, care to make an issue/PR over at www.github.com/jfalcou/spy ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hii.
Thank you for putting in effort!
96 files will take some time to review, I personally will do in chunks - some comments - then more comments etc.
static constexpr bool is_fp_v = std::is_floating_point_v<Type>; | ||
static constexpr bool is_signed_v = std::is_signed_v<Type>; | ||
|
||
# ifdef EVE_RISCV_REG_CHOOSE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heavy macro usage seems unjustified on a surface.
Can you maybe explain what you are trying to do and then we figure out how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to create a group of functions to make the best decision for vector type depending on requested cardinal/type and sew(single element width). For example, let's suppose vlen == 128
:
- you need to work with 16 int8 elements - you can use vector type
vint8m1_t
- you need to work with 128 uint8 elements - you can use vector type
vuint8m8_t
- you need to work with 8 float elements - you can use vector type
vfloat32m4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I luck understanding of risk-v simd fundamentals.
I see this doc https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc but it's huge and contains a lot of information that I don't immediately need.
I also can't find intrinsics descriptions.
What are you using?
//================================================================================================== | ||
#pragma once | ||
|
||
#include <eve/detail/function/friends.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not an include you should use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I wanted to use it as I need self_neq
, but it seems I do not need this. Removed.
EVE_FORCEINLINE logical<wide<T, N>> | ||
rvv_true() | ||
{ | ||
static constexpr auto lmul = riscv_rvv_dyn_::getLMUL<T>(N::value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling with riscv_rvv_dyn_
. I can't find where it;s defined.
Is it that different machines use different number of bits per logical element?
Like on one machine logical is represented by 1 bit, on the other logical - 8 bits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is defined in tags and I use it as common place for different support function, like calculation of LMUL( vector register group count - the number of vector registers, that participate in this operation), as well as standard EVE functionality (e.g. expected_cardinal
).
different machines use different number of bits per logical element
Renamed. There I calculate ratio=SEW/LMUl
.
{ | ||
static constexpr auto lmul = riscv_rvv_dyn_::getLMUL<T>(N::value); | ||
static constexpr size_t size = sizeof(T) * 8; | ||
static constexpr size_t bit_size = lmul > 0 ? size / lmul : size * (-lmul); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be copypasted a lot. Should we just have logical<wide<T, N>>::platform_bit_size
or smth?
Also - really - this is changing depending on machine???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just have logical<wide<T, N>>::platform_bit_size or smth?
This is not platform_bit_size of logical, it is the number that should be put to form right type. This number is equal to SEW/LMUL. Renamed it to ratio
.
Also - really - this is changing depending on machine?
Well, it depends on VLEN
of your machine.
For example, you have VLEN==128
. To operate with 8 int32 with one instruction, you will need to set LMUL( the number of consecutive vector registers that participate as one operand) to M2 (you need 2 registers). And mask for this you will need type vbool16_t
(32/2=16).
If you have VLEN==256
, you can operate 8 int32 with LMUL == M1, so you need to use vbool32_t
(32/1=32).
include/eve/arch/riscv/tags.hpp
Outdated
//================================================================================================ | ||
template<std::size_t Size> struct rvv_abi_ | ||
{ | ||
static_assert(CHAR_BIT == 8, "[eve riscv] - For riscv we expect CHAR_BIT to be 8"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we everywhere expect that - I would remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
static constexpr auto lmul = riscv_rvv_dyn_::getLMUL<T>(N::value); | ||
static constexpr size_t size = sizeof(T) * 8; | ||
static constexpr size_t bit_size = lmul > 0 ? size / lmul : size * (-lmul); | ||
if constexpr( bit_size == 1 ) return __riscv_vmclr_m_b1(N::value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my lack of knowledge here shows through heavily - what is the difference between different clears? I'd expect all clears to be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They return different types.
For example, __riscv_vmclr_m_b1
returns vbool1_t
, and __riscv_vmclr_m_b8
results in vbool8_t
.
include/eve/arch/riscv/tags.hpp
Outdated
auto type_size = sizeof(Type); | ||
if( type_size == 1 ) return 2; | ||
if( type_size == 2 ) return 2; | ||
if( type_size == 4 ) return 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks weird. Maybe comment and return 2, if it's ;ess than 4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, fixed
{}; | ||
#else | ||
struct riscv_rvv_dyn_ : rvv_abi_<1> | ||
{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't make sense to me. Is it like a fallback? If we can't run v5, risk-v5 should not be used by eve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is following:
You need to specify while running complier VLEN
(bit size of one vector register) by passing -mrvv-vector-bits=size
. After this compiler define __riscv_v_fixed_vlen
that will be equal the number that you passed to compiler (currently I use 128).
As tags included without information that current platform is RISC-V with vector extension, we should check that __riscv_v_fixed_vlen
is defined, and if not - define some empty riscv_rvv_dyn_
just not to break complation.
include/eve/arch/riscv/tags.hpp
Outdated
#endif | ||
|
||
//================================================================================================ | ||
// Dispatching tag for ARM SIMD implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not arm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, fixed to RISC-V
include/eve/arch/riscv/tags.hpp
Outdated
}; | ||
|
||
//================================================================================================ | ||
// SVE extensions tag objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// RISCV SVE ABI concept | ||
//================================================================================================ | ||
template<typename T> | ||
concept rvv_abi = detail::is_one_of<T>(detail::types<riscv_rvv_dyn_> {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why dyn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I wanted to note that we should use this template for any VLEN. I could rename, if you want.
test/unit/memory/load/tuple.cpp
Outdated
, &data1[idx1] - 1 | ||
, eve::as_aligned(&data2[idx2],typename w8_t::cardinal_type{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test was correct. It's testing fixed size wide
@@ -72,7 +72,7 @@ TTS_CASE_TPL( "Check eve::wide splat constructor", eve::test::simd::all_types) | |||
|
|||
// Test smaller size wide for non-garbage | |||
using v_t = typename T::value_type; | |||
if constexpr( T::size() < eve::fundamental_cardinal_v<v_t> && !eve::has_emulated_abi_v<T> ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comes from the fundamental wide being one idea I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For RISC-V I forbid this as we can have different types as storage type depending on cardinal value. To support this I need to add additional constructor in core wide functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this check if fundamental cardinal is 1?
I think I need to understand more about risc v simd/vector code. What docs do you use? I only see a very large GitHub repo, that has a lot of information and no intrinsics. Can you share smth? |
Yes, there is a separate repository: https://github.com/riscv-non-isa/rvv-intrinsic-doc There you need release v1.0-rc0 |
self = bit_cast(i_t(intrinsic(self_cast, other_cast, N::value)), as(self)); \ | ||
return self; \ | ||
} \ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike macros intensely.
I would prefer callbacks, but it's going to be annoyting because of forceinline.
For exanple we can
enum class bit_op { and_, or_, xor_ };
template<bit_op op,
plain_scalar_value T, typename N, plain_scalar_value P>
EVE_FORCEINLINE auto& rvv_bit_compound_impl(wide<T, N>& self, P const& other) {
...
if constexpr (op == bit_op::and_) self_cast = __riscv_vand(self_cast, other_cast);
if constexpr (op == bit_op::xor_) self_cast = __riscv_vxor(self_cast, other_cast);
...
}
EVE_FORCEINLINE wide<T, N> | ||
perform_load(logical<wide<T, N>> mask, as<wide<T, N>> tgt, PtrTy p) | ||
{ | ||
auto zero_init = make(as<wide<T, N>> {}, static_cast<T>(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wide<T, N> zero_init{0};
or
auto zero_init = eve::zero(tgt);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed a bit of the doc, will look more. In the mean time, I really think that heavy macro usage is not helpful. Let's try to figure out how we can do it less.
if constexpr( bit_size == 16 ) { EVE_RVV_RET_MASK_TYPE(16); } | ||
if constexpr( bit_size == 32 ) { EVE_RVV_RET_MASK_TYPE(32); } | ||
if constexpr( bit_size == 64 ) { EVE_RVV_RET_MASK_TYPE(64); } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heavy macro usage makes this very difficult to review.
How close can you get to?
template <typename rvv_type, std::size_t num, std::size_t denum>
using rvv_with_attr = rvv_type __attribute__((riscv_rvv_vector_bits(__riscv_v_fixed_vlen * num / denum)));
auto find() {
if constexpr ( std::same_as<T, std::int8_t> ) {
if constexpr ( N::value == 1 ) return rvv_with_attr<...>{};
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed defines from as_register.hpp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do a deeper dive later - there is still a lot. Thank you for cleaning up macros. I think there are some simplifications that can be made.
FYI: we ususally don't enable all tests for new arch in one go. We do them from internal.exe, than core.exe and then add more targets. You don't need to change for this pr, just fyi.
FYI2: in my comments i went from bottom to top - (if it seems confusing).
} | ||
} | ||
} | ||
else { TTS_PASS("For RISC-V uint8 not enough to store element index."); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's instead on rvv just pass cardinal explicitly.
auto alg0 = eve::algo::min_element //
[eve::algo::single_pass] //
[eve::algo::index_type<std::uint8_t>] //
[eve::algo::unroll<2>];
auto alg = [&]{
if constexpr( eve::expected_cardinal_v<std::uint8_t> < 128) {
return alg0;
} else {
return alg0[eve::algo::force_cardinal<64>];
}
}();
@@ -17,68 +17,79 @@ TTS_CASE("Min element one pass, uint8 index") | |||
[eve::algo::single_pass] // | |||
[eve::algo::index_type<std::uint8_t>] // | |||
[eve::algo::unroll<2>]; | |||
if constexpr( eve::current_api != eve::rvv ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here as below suggested
if constexpr (eve::current_api != eve::rvv) { | ||
TTS_CONSTEXPR_EXPECT(match(eve::detail::categorize<T>(), lanes)); | ||
TTS_CONSTEXPR_EQUAL(eve::detail::categorize<T>(), uint8 | lanes); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jfalcou - I don't understand this test. What should be done here.
@@ -72,7 +72,7 @@ TTS_CASE_TPL( "Check eve::wide splat constructor", eve::test::simd::all_types) | |||
|
|||
// Test smaller size wide for non-garbage | |||
using v_t = typename T::value_type; | |||
if constexpr( T::size() < eve::fundamental_cardinal_v<v_t> && !eve::has_emulated_abi_v<T> ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this check if fundamental cardinal is 1?
//================================================================================================== | ||
#pragma once | ||
|
||
namespace eve::detail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the emulation code to work here. What happened.
{ | ||
template<scalar_value T, typename N, std::ptrdiff_t Shift> | ||
EVE_FORCEINLINE auto | ||
slide_left_(EVE_SUPPORTS(rvv_), logical<wide<T, N>> v, index_t<Shift>) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, emulation should do this, why doesn't that work
logical<wide<U, N>> const &b) noexcept | ||
requires rvv_abi<abi_t<T, N>> | ||
{ | ||
return self_neq(a, bit_cast(b, as<logical<wide<T, N>>> {})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a != bit_cast(b, as(a))
;
logical<wide<T, N>> masked = __riscv_vmand(v0.storage, m, N::value); | ||
return last_true(masked); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure you need this one.
if( v0.get(i) ) return i; | ||
} | ||
return {}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not wild about doing this tbh. Is there really no good solution in the isa? should be smth easy-ish.
You can iota + mask + maximum for example.
auto bitnot_res = self_bitnot(v0_copy); | ||
return self_bitand(bitnot_res, v1); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this file can be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked a bit more.
I am so far not convinced about riscv_rvv_dyn_
and using len multipliers in code. Let's first clean up everything else and then come back to this
|
||
to_return &= to_clean.to_ullong(); | ||
return to_return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oof. not amazing. But I guess you can't do anything. Maybe we can just completely not have this method.
if constexpr( out_lmul == 4 ) return __riscv_vlmul_ext_u8m4(a); | ||
if constexpr( out_lmul == 8 ) return __riscv_vlmul_ext_u8m8(a); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a surface level, I don't think that extract should be usued in bit_cast
if constexpr( is_aggregated_v<abi_t<T, N>> || is_aggregated_v<abi_t<U, M>> ) static_assert(false); | ||
if constexpr( is_aggregated_v<abi_t<T, typename N::combined_type>> | ||
|| is_aggregated_v<abi_t<U, M>> ) | ||
static_assert(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggregated won't be here, because it is not rvv_abi.
also if sizeof(T) * N == sizeof(U) * M - should work.
EVE_FORCEINLINE wide<U, M> | ||
bit_cast_(EVE_SUPPORTS(rvv_), wide<T, N> const& x, as<wide<U, M>> const& to_as) noexcept | ||
requires rvv_abi<abi_t<T, N>> && rvv_abi<abi_t<U, M>> && same_wide_size<T, N, U, M> | ||
&& (sizeof(T) * N::value > sizeof(U) * M::value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can only bit_cast if sizeof(T) * N::value == sizeof(U) * M::value - this overload is invalid
to_mask(rvv_ const&, logical<wide<T, N>> p) noexcept | ||
{ | ||
return bit_cast(p.bits(), as<typename logical<wide<T, N>>::mask_type> {}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this overload when the previous overload exists? It should be one or the other, unless I am missing something.
{ | ||
return self = __riscv_vmul(self, static_cast<T>(other), N::value); | ||
} | ||
if constexpr( match(c, category::float_) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just else for 2 cases, It's fine.
|
||
RVV_BIT(self_bitand, __riscv_vand) | ||
RVV_BIT(self_bitxor, __riscv_vxor) | ||
RVV_BIT(self_bitor, __riscv_vor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this macro gotta go
i_t to_cast_res = __riscv_vnot(self_cast, N::value); | ||
self = bit_cast(to_cast_res, as(self)); | ||
return self; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if constepxr (match(cat, unsigned_)) {
__riscv_vnot
} else {
// to unsigned and recurse
}
constexpr auto c = categorize<wide<T, N>>(); | ||
if constexpr( match(c, category::uint_) ) | ||
{ | ||
auto shift_casted = convert(shift, as<as_integer_t<U, unsigned>>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the same between branches
RVV_LOGIC(self_geq, __riscv_vmsge, __riscv_vmsgeu, __riscv_vmfge) | ||
RVV_LOGIC(self_leq, __riscv_vmsle, __riscv_vmsleu, __riscv_vmfle) | ||
RVV_LOGIC(self_eq, __riscv_vmseq, __riscv_vmseq, __riscv_vmfeq) | ||
RVV_LOGIC(self_neq, __riscv_vmsne, __riscv_vmsne, __riscv_vmfne) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please no macros like this. We can figure out template to do this if needed.
Hello there! I've added initial support for the RISC-V vector extension in the EVE library. All unit tests have successfully passed with this patch. To verify, please follow these steps:
Obtain the riscv-gcc-13 toolchain (for sysroot). Set the path to it as the environment variable RISCV_GCC.
Use clang with the patch currently under review ([https://github.com/llvm/llvm-project/pull/76510]). To build it, navigate to the llvm directory:
Add it to your PATH.
After completing the above steps, you should be able to run unit tests by specifying cmake/toolchain/clang.rvv128.cmake as the toolchain file.
Your contributions are highly appreciated! If you encounter any issues or have questions, feel free to reach out. Thanks for your valuable work!