Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pseudo-inverse of symmetric matrices using SVD / Utility for moving least squares #950

Merged
merged 20 commits into from
Sep 27, 2023

Conversation

mrlag31
Copy link
Contributor

@mrlag31 mrlag31 commented Sep 11, 2023

In an attempt to break-up #946, this PR focuses only on the pseudo inverse used in the moving least squares algorithm. It includes the pseudo inverse itself in src/interpolation folder, tests and an iterator for multi-dimensional views (for use in boost tests) a macro to test equality on md views

@aprokop aprokop added the enhancement New feature or request label Sep 11, 2023
symmetricPseudoInverseSVDSerialKernel(InOutMatrix &io, SMatrix &s, UMatrix &u)
{
using value_t = typename InOutMatrix::non_const_value_type;
std::size_t const size = io.extent(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is io.extent(1) == size a precondition?
Presumably s and u must also be properly sized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and yes. io, s and u must have the same size and this should be guaranteed by the host function. But due to the fact I cannot use assert in kernels, I skipped checking it. (although, speaking of it, I could use KOKKOS_ASSERT)

@@ -235,6 +235,12 @@ target_link_libraries(ArborX_Test_BoostAdapters.exe PRIVATE ArborX Boost::unit_t
target_compile_definitions(ArborX_Test_BoostAdapters.exe PRIVATE BOOST_TEST_DYN_LINK)
add_test(NAME ArborX_Test_BoostAdapters COMMAND ArborX_Test_BoostAdapters.exe)

add_executable(ArborX_Test_InterpDetailsSymmPInvSVD.exe tstInterpDetailsSymmPInvSVD.cpp utf_main.cpp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably change all the relevant names ArborX_Test_DetailsInterpolationSVD. This would emphasize that it is a Details test, as it starts with ArborX_Test_Details. We don't need to emphasize what exact SVD is done, so may omit that. And imho, Interpolation is better than Interp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will stick with InterpDetailsSVD because it mimics the location and name of the tested file (in src/interpolation/details). But maybe would it be better to have ArborX_InterpDetailsSymmetricPseudoInverseSVD.hpp elsewhere?

Comment on lines 108 to 110
value_t const a = sigma(p, p);
value_t const b = sigma(p, q);
value_t const c = sigma(q, q);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
value_t const a = sigma(p, p);
value_t const b = sigma(p, q);
value_t const c = sigma(q, q);
auto const a = sigma(p, p);
auto const b = sigma(p, q);
auto const c = sigma(q, q);

// | b | c | | 0 | y |
// +---+---+ +---+---+

value_t cos, sin, x, y;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a bad idea to name variables cos and sin. If one puts using namespace std; somewhere above, this will break.

if (a == c)
{
cos = Kokkos::sqrt(value_t(2)) / 2;
sin = Kokkos::sqrt(value_t(2)) / 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno if the compiler is smart enough to not do this computation twice.

Suggested change
sin = Kokkos::sqrt(value_t(2)) / 2;
sin = cos;

Comment on lines 136 to 137
value_t const u = (2 * b) / (a - c);
value_t const v = 1 / Kokkos::sqrt(u * u + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
value_t const u = (2 * b) / (a - c);
value_t const v = 1 / Kokkos::sqrt(u * u + 1);
auto const u = (2 * b) / (a - c);
auto const v = 1 / Kokkos::sqrt(u * u + 1);


template <typename ExecutionSpace, typename InvMatrices>
void symmetricPseudoInverseSVD(ExecutionSpace const &space,
InvMatrices &inv_matrices)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name InvMatrices is misleading, given that it is both input and output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have renamed the type InOutMatrices, as the inverse is already implied by the function name.

{
isSquareMatrix(mat);
using value_t = typename Matrix::non_const_value_type;
int const size = mat.extent(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this closer to the place you first use it.

Comment on lines +63 to +68
struct
{
value_t max = 0;
int row = 0;
int col = 0;
} result;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't even know you could return an object from a local unnamed class.
Sure why not given that these are implementation details.

test/ArborX_EnableViewComparison.hpp Outdated Show resolved Hide resolved
test/ArborX_EnableViewComparison.hpp Outdated Show resolved Hide resolved
@mrlag31 mrlag31 marked this pull request as ready for review September 15, 2023 13:50
Comment on lines 171 to 173
int i = 0;
for (; i < p; i++)
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int i = 0;
for (; i < p; i++)
{
for (int i = 0; i < p; i++)
{

Comment on lines 180 to 181
i++;
for (; i < q; i++)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
i++;
for (; i < q; i++)
for (int i = p + 1; i < q; i++)

Comment on lines 189 to 190
i++;
for (; i < size; i++)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
i++;
for (; i < size; i++)
for (int i = q + 1; i < size; i++)

Comment on lines +200 to +206
for (int i = 0; i < size; i++)
{
auto const u_ip = U(i, p);
auto const u_iq = U(i, q);
U(i, p) = cos_theta * u_ip + sin_theta * u_iq;
U(i, q) = -sin_theta * u_ip + cos_theta * u_iq;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use hierarchical parallelism for these for-loops or do we assume size to be small enough that it doesn't matter? Or do we have sufficient parallelism already in the outer loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make a lot of sense to use hierarchical parallelism here (and in other loops) as well as using scratch pads for the auxiliary matrices. However, because this is part of the MLS algorithm, it might not be a bottleneck and it is best to avoid early optimization.

However, the following should be a logical distribution:

  • Each team treats a single matrix
  • Threads/Vectors handle the loops and reductions
  • Use the team scratch pad for ES and U

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

// +---+---+ +---+---+

value_t cos_theta, sin_theta, x, y;
if (a == c)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to have a tolerance here? What happens if a = 1 and c = 1+1e-15?

Copy link
Contributor Author

@mrlag31 mrlag31 Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tolerance is not really needed as the algorithm behaves correctly even if the values are very close together. This case is here to avoid a "true" division by zero.

symmetricPseudoInverseSVDSerialKernel(A, ES, U);
});
}
else if constexpr (InOutMatrices::rank == 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a case where we care about using SVD for a single matrix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, at least for now. This was added because the interface of the kernel itself takes only a single matrix. However, I will probably remove it as this might be the source of why cuda+nvcc 11.0.3 fails.

}

if (!same_dim_size)
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you don't need this. BOOST_TEST_REQUIRE does what you want (to check).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, as this will abort the current test and not check the other matrices of the same test.

}
makeCase<ExecutionSpace, view_t, double[128][128]>(space, 0, mat, inv, 128);

// Case for invertible 128x128 matrix
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to add a test in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I do find an way to easily build a non-trivial 128x128 matrix and its inverse. Since I added this comment I did not found one so I will remove that command. This test could be extended if I ever find said matrix.

@masterleinad
Copy link
Collaborator

Windows CI is reporting

LINK : fatal error LNK1104: cannot open file 'D:\a\ArborX\ArborX\build\test\headers_self_contained\Debug\ArborX_HeaderSelfContained_interpolation_details_ArborX_InterpDetailsSymmetricPseudoInverseSVD_hpp.exe' [D:\a\ArborX\ArborX\build\test\headers_self_contained\ArborX_HeaderSelfContained_interpolation_details_ArborX_InterpDetailsSymmetricPseudoInverseSVD_hpp.vcxproj]

and all GPU builds seem to be stuck in ArborX_Test_InterpDetailsSVD.

for (int i = 0; i < size; i++)
max_eigen = Kokkos::max(Kokkos::abs(ES(i, i)), max_eigen);

// We inverse the diagonal of ES, except if "0" is found
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We inverse the diagonal of ES, except if "0" is found
// We invert the diagonal of ES, except if "0" is found

typename ESMatrix::value_type> &&
std::is_same_v<typename ESMatrix::value_type,
typename UMatrix::value_type>,
"Each input matrix must have the same value type");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Each input matrix must have the same value type");
"All input matrices must have the same value type");

// Pseudo-inverse of symmetric matrices using SVD
// We must find U, E (diagonal and positive) and V such that A = U.E.V^T
// We also suppose, as the input, that A is symmetric, so U = SV where S is
// a sign matrix (only 1 or -1 in the diagonal, 0 elsewhere).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// a sign matrix (only 1 or -1 in the diagonal, 0 elsewhere).
// a sign matrix (only 1 or -1 on the diagonal, 0 elsewhere).

}

static constexpr value_t epsilon = Kokkos::Experimental::epsilon_v<float>;
while (true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the CI is stuck in this loop?

// | ES(q, p) | ES(q, q) | | b | c |
// +----------+----------+ +---+---+

// Lets compute x, y and theta such that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Lets compute x, y and theta such that
// Let's compute x, y and theta such that

y = a + c - x;
}

// Now lets compute the following new values for U and ES
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Now lets compute the following new values for U and ES
// Now let's compute the following new values for U and ES

Comment on lines +200 to +206
for (int i = 0; i < size; i++)
{
auto const u_ip = U(i, p);
auto const u_iq = U(i, q);
U(i, p) = cos_theta * u_ip + sin_theta * u_iq;
U(i, q) = -sin_theta * u_ip + cos_theta * u_iq;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

}
}

// We compute the max to get a range of the invertible eigen values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We compute the max to get a range of the invertible eigen values
// We compute the max to get a range of the invertible eigenvalues

@mrlag31
Copy link
Contributor Author

mrlag31 commented Sep 20, 2023

Windows CI is reporting

LINK : fatal error LNK1104: cannot open file 'D:\a\ArborX\ArborX\build\test\headers_self_contained\Debug\ArborX_HeaderSelfContained_interpolation_details_ArborX_InterpDetailsSymmetricPseudoInverseSVD_hpp.exe' [D:\a\ArborX\ArborX\build\test\headers_self_contained\ArborX_HeaderSelfContained_interpolation_details_ArborX_InterpDetailsSymmetricPseudoInverseSVD_hpp.vcxproj]

I don't know why Windows cannot build this. The logs show no error. I am wondering if the headers must be referenced somewhere else.

and all GPU builds seem to be stuck in ArborX_Test_InterpDetailsSVD.

It seems more likely that, in job 12, only cuda 11.0.3 + clang stopped on that test due to the job time limit (3h) and the others copied the logs of that test.

Comment on lines 42 to 49
auto const val = Kokkos::abs(mat(i, j) - mat(j, i));
auto const ref = Kokkos::abs(mat(i, j));
static constexpr value_t epsilon =
Kokkos::Experimental::epsilon_v<float>;
if (ref == value_t(0) && val > epsilon)
return false;
if (ref != value_t(0) && val / ref > epsilon)
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why the check is using tolerances. I think the input matrix should always be symmetric without tolerances, if constructed correctly. So I would just do

Suggested change
auto const val = Kokkos::abs(mat(i, j) - mat(j, i));
auto const ref = Kokkos::abs(mat(i, j));
static constexpr value_t epsilon =
Kokkos::Experimental::epsilon_v<float>;
if (ref == value_t(0) && val > epsilon)
return false;
if (ref != value_t(0) && val / ref > epsilon)
return false;
if (mat(i, j) != mat(j, i))
return false;

If that's not the case, I would rename the function to include Tol in the name, and provide an epsilon function argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change to the tolerance-free version, as it forces users to have their matrices properly symmetric.

#include "BoostTest_CUDA_clang_workarounds.hpp"
#include <boost/test/unit_test.hpp>

namespace axid = ArborX::Interpolation::Details;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does axid stand for?

Copy link
Contributor Author

@mrlag31 mrlag31 Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It stands for ArborX Interpolation Details. It is simply a namespace shortcut. But because it is not used in a lot of places, I will use the full namespace name.

Comment on lines 26 to 28
host_view src("src", m, n, n);
host_view ref("ref", m, n, n);
U inv("inv", m, n, n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
host_view src("src", m, n, n);
host_view ref("ref", m, n, n);
U inv("inv", m, n, n);
host_view src("Testing::src", m, n, n);
host_view ref("Testing::ref", m, n, n);
U inv("Testing::inv", m, n, n);

@aprokop
Copy link
Contributor

aprokop commented Sep 20, 2023

It seems more likely that, in job 12, only cuda 11.0.3 + clang stopped on that test due to the job time limit (3h) and the others copied the logs of that test.

I'm confused about what's going on with the testing.

@mrlag31
Copy link
Contributor Author

mrlag31 commented Sep 22, 2023

The windows CI fails because of the location of the added header (two folder deep from src). Jenkins passes with no issues.

@masterleinad
Copy link
Collaborator

The windows CI fails because of the location of the added header (two folder deep from src). Jenkins passes with no issues.

There are still some warnings to fix, see https://cloud.cees.ornl.gov/jenkins-ci/job/ArborX/job/PR-950/19/gcc/.

@masterleinad
Copy link
Collaborator

We need to deal with the failing Windows CI one way or the other. As I said earlier, I think it's OK to disable the header self-containment test for that CI build and open an issue for it.

@mrlag31
Copy link
Contributor Author

mrlag31 commented Sep 25, 2023

I have removed the self-containment test from the Windows CI and it does pass, and made an issue #953. CUDA 11.5 + MPI pipeline did not start but all other succeeded.

Copy link
Collaborator

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good enough.

@aprokop aprokop merged commit ae00f3d into arborx:master Sep 27, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants