Reducing memory allocation using lazy evaluation #29

Shuenhoy · 2024-08-13T00:15:25Z

Thank you for this project! .NET world really need good numerics and linear algebra libraries nowadays.

The README indicates the impact of memory allocation on performance is already noticed, and an inplace operation is suggested.
In principle, such inplace operations do avoid memory allocations. However, it would requrie the coder to manually arrange the orders for a complex chained expression, and make the code harder to read.

The popular libraries in C++ world (e.g. Eigen3 and Blaze) use a lazy evaluation technique to handle such difficulties. For example, var X = A.Add(B) produces a MatAdd<Mat, Mat> instead of a Mat, var X = A.Add(B.MultipleBy(C)) will produce a MatAdd<Mat, MatMul<Mat, Mat>>, where MatAdd and MatMul only track the operations to be performed (but not actually perform them) and are something like a ref struct that does not need heap allocation.

And finally, X.Eval() will allocate a new matrix and X.EvalTo(M) will use the pre-allocated M. Only at this point the lazy evaluation is performed, and all the intermediate temp allocation can be avoided.

(This may be simplified using the upcoming Extention Everthing, e.g var R = X.Eval() may be just replaced by Mat R = X, but I am not sure)

The text was updated successfully, but these errors were encountered:

sinshu · 2024-08-13T07:27:19Z

Thanks for the kind comment 😊

Regarding the lazy evaluation technique, while it's an interesting topic, it's difficult for me to implement and maintain due to my current workload 😖

Anyway, I really hope that interest in numerical computation and ML expands within the .NET community.

MarsRover75 · 2024-08-28T01:39:37Z

The popular libraries in C++ world (e.g. Eigen3 and Blaze) use a lazy evaluation technique to For example, var X = A.Add(B) produces a MatAdd<Mat, Mat> instead of a Mat, var X = A.Add(B.MultipleBy(C)) will produce a MatAdd<Mat, MatMul<Mat, Mat>>, where MatAdd and MatMul only track the operations to be performed (but not actually perform them) and are something like a ref struct that does not need heap allocation.

INMO in C# any lazy evaluations will imply ether boxing of value-types, or allocations of delegates, which in both cases implies heap allocations (and impossibility to use ref structs).
Maybe I'm wrong, but that's exactly the reason why there're so many constraints with Span usage.

With the current approach all intermediate objects are structs, so they are stack-allocated by compiler which is cheap.

Something like this can be achieved with code-generators, but that's another level.

UPD: There's some hope https://github.com/dotnet/runtime/issues/104936 with the upcoming NET10, though.
But, by that time I hope we'll also have more usable System.Numerics.Tensors.Tensor

Shuenhoy · 2024-08-28T03:29:05Z

The popular libraries in C++ world (e.g. Eigen3 and Blaze) use a lazy evaluation technique to For example, var X = A.Add(B) produces a MatAdd<Mat, Mat> instead of a Mat, var X = A.Add(B.MultipleBy(C)) will produce a MatAdd<Mat, MatMul<Mat, Mat>>, where MatAdd and MatMul only track the operations to be performed (but not actually perform them) and are something like a ref struct that does not need heap allocation.

INMO in C# any lazy evaluations will imply ether boxing of value-types, or allocations of delegates, which in both cases implies heap allocations (and impossibility to use ref structs). Maybe I'm wrong, but that's exactly the reason why there're so many constraints with Span usage.

With the current approach all intermediate objects are structs, so they are stack-allocated by compiler which is cheap.

Something like this can be achieved with code-generators, but that's another level.

UPD: There's some hope https://github.com/dotnet/runtime/issues/104936 with the upcoming NET10, though. But, by that time I hope we'll also have more usable System.Numerics.Tensors.Tensor

Here by "lazy evaluation" I do not refer to any specific current implementations like System.Lazy<T>, which indeed introduces heap allocations as you described. Instead, I mean a general evaluation strategy. Extra work may be needed to avoid the heap allocation. I have made some preliminary experiments: https://github.com/Shuenhoy/DotnetETExp . Generally, I let the struct implements some interfaces, and when passing struct to generic methods with interface constraints, boxing can be avoided. Though the approach in that repo is not good enough. As struct may still be boxed accidently. C# 13 will allow ref struct to implement interfaces, which may make the situation better.

As for System.Numerics.Tensors, of course I also hope it to be good enough to use. However, AFAIK Microsoft is not particularly experienced with numerics. I am not so confident that they will make it right. In fact, I have already seen some severe criticism (indicating the whole design is problematic) on System.Numerics.Tensors and the whole System.Numerics actually. You could use a translator to read https://zhuanlan.zhihu.com/p/713157309 if you are interested.

MarsRover75 · 2024-08-28T06:40:23Z

My point was that when constructing any deferred execution expression chain, you'll have to return something from it's building blocks to get nice syntax.

You can not return ref struct as an Interface that it implements (A ref struct can't be converted to an instance of an interface it implements). Those interfaces serve different goals.
So, what will be returned instead? Some Interface for a class (regular structs are boxed when referenced as an Interface they implement) that represents the expression being built, or a stateful delegate?
All this leads us to heap allocation at some point.

Shuenhoy · 2024-08-28T07:09:03Z

My point was that when constructing any deferred execution expression chain, you'll have to return something from it's building blocks to get nice syntax.

You can not return ref struct as an Interface that it implements (A ref struct can't be converted to an instance of an interface it implements). Those interfaces serve different goals. So, what will be returned instead? Some Interface for a class (regular structs are boxed when referenced as an Interface they implement) that represents the expression being built, or a stateful delegate? All this leads us to heap allocation at some point.

Why just returning the raw ref struct is a problem? I can not see why we need to convert it to an interface or delegate.

For example, suppose the operation is defined as:

struct AddGeneric<V1, V2>: IVecExp where V1: IVecExp, V2: IVecExp {
V1 left;
V2 right;
}

The expression v1.add(v2).add(v3) will have a value of type AddGeneric<AddGeneric<Vec, Vec>, Vec> where the inner AddGeneric<Vec, Vec> does not need to be boxed, because the type of left in AddGeneric is defined as a generic parameter with constraints instead of an interface type like

struct AddInterface: IVecExp {
   IVec left;
   IVec right;
}

The AddInterface will lead to heap allocation as you concerned which we want to avoid, but AddGeneric will not.

sinshu added the enhancement New feature or request label Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing memory allocation using lazy evaluation #29

Reducing memory allocation using lazy evaluation #29

Shuenhoy commented Aug 13, 2024

sinshu commented Aug 13, 2024

MarsRover75 commented Aug 28, 2024 •

edited

Loading

Shuenhoy commented Aug 28, 2024 •

edited

Loading

MarsRover75 commented Aug 28, 2024

Shuenhoy commented Aug 28, 2024 •

edited

Loading

Reducing memory allocation using lazy evaluation #29

Reducing memory allocation using lazy evaluation #29

Comments

Shuenhoy commented Aug 13, 2024

sinshu commented Aug 13, 2024

MarsRover75 commented Aug 28, 2024 • edited Loading

Shuenhoy commented Aug 28, 2024 • edited Loading

MarsRover75 commented Aug 28, 2024

Shuenhoy commented Aug 28, 2024 • edited Loading

MarsRover75 commented Aug 28, 2024 •

edited

Loading

Shuenhoy commented Aug 28, 2024 •

edited

Loading

Shuenhoy commented Aug 28, 2024 •

edited

Loading