Basic arithmetic quality-of-life array methods? #740

iliya-malecki · 2024-04-24T23:11:42Z

iliya-malecki
Apr 24, 2024

Hi there! I am looking into options to speed up some hypothetical numpy code - I don't have anything in particular in mind, I'm just preparing for the future. And so I've stumbled upon this wonderful package, but I quickly understood it was developed by people who know what they are doing for people who know what they are doing. However, as I'm not one of those, I do ML so I feel proud for knowing what a register is - I would like to have some quality of life stuff, for a mere mortal, so to speak.

The way I envision it is an ability to send numpy arrays to GPU devices and to receive some overgrown smart pointer in return, that would know how to schedule binary and unary math operations. Even better if it helps me to keep those arrays on said devices, do inplace operations and only retrieve the results of long chains of math!

The questions are:

does this exist in pyopencl? I didn't find an answer by looking through the docs, they don't have many examples, and 90% of the explanations I'm not sure I understand the intended way.
is this something that is in principle easily implementable by familiarizing with the exposed functionality or are there fundamental footguns I don't understand?
did someone attempt this? I mean, surely someone did, but the fact that the only well-known numpy GPU acceleration type thing is cupy, I have an ominous feeling about this

Answered by inducer

Apr 25, 2024

Here's a simple example:

https://github.com/inducer/pyopencl/blob/58717882517e49b451980437074f560cdb823a85/examples/demo_array.py

Generally, that's OK, though I would strongly suggest to_device over set.

View full answer

inducer · 2024-04-25T01:13:25Z

inducer
Apr 25, 2024
Maintainer

There's https://documen.tician.de/pyopencl/array.html. But beware that while this code can store multi-d metadata just like numpy, arrays must be contiguous, and non-scalar broadcasting and arrays with differing strides will not work.

3 replies

iliya-malecki Apr 25, 2024
Author

I couldn't find too many examples of how to work with that! For example, self.set accepts an ary, which makes 0 sense - and there is no explanation as to why.

Do I understand it correctly that

ary.set(gpu_backed_ary)
ary2.set(gpu_backed_ary2)
rse = (gpu_backed_ary - gpu_backed_ary2)**2**0.5
rse.get(...)

Is the right workflow for doing multiple operations on the GPU and then getting data back?

inducer Apr 25, 2024
Maintainer

Here's a simple example:

https://github.com/inducer/pyopencl/blob/58717882517e49b451980437074f560cdb823a85/examples/demo_array.py

Generally, that's OK, though I would strongly suggest to_device over set.

Answer selected by iliya-malecki

iliya-malecki Apr 25, 2024
Author

Thanks! The first time I looked that the example file I got scared by the shader code and didn't consider there could be arithmetic operations at the bottom

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic arithmetic quality-of-life array methods? #740

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Basic arithmetic quality-of-life array methods? #740

iliya-malecki Apr 24, 2024

Replies: 1 comment · 3 replies

inducer Apr 25, 2024 Maintainer

iliya-malecki Apr 25, 2024 Author

inducer Apr 25, 2024 Maintainer

iliya-malecki Apr 25, 2024 Author

iliya-malecki
Apr 24, 2024

Replies: 1 comment 3 replies

inducer
Apr 25, 2024
Maintainer

iliya-malecki Apr 25, 2024
Author

inducer Apr 25, 2024
Maintainer

iliya-malecki Apr 25, 2024
Author