Nvidia CUDA vs OPENCL: It's Time to Decide What to Use #145

Samyssmile · 2023-12-15T11:15:37Z

Samyssmile
Dec 15, 2023
Maintainer

Hello everyone,

As we continue to develop our lib, a critical decision lies ahead of us: Should we integrate GPU support using Nvidia CUDA or OpenCL? We have experimented with both to a certain extent, but now it's time to make a strategic choice.

Our Experience So Far

CUDA: We've noticed that CUDA, being specific to Nvidia GPUs, offers excellent optimization and performance. Its ecosystem, documentation, and community support are robust. However, it limits our library's usability to only those with Nvidia GPUs.
OpenCL: On the other hand, OpenCL provides a more flexible approach, being cross-platform and able to run on a variety of hardware including Nvidia and AMD GPUs, and even CPUs. But, it might not be as optimized as CUDA for specific Nvidia hardware.

We already implemented some code for Cuda and Opencl
You can find cuda related code on main branch. Search for "cuda", take a look on kernels/cuda
For opencl, @acsolle66 contributed 3753e07

We should compare kernel files, from development point of view, this is most difficult part.

Seeking Your Input

We value the community's input and would love to hear your thoughts:

Experiences and Preferences: What have been your experiences with CUDA and OpenCL, especially in the context of Java-based ML libraries?
Technical Insights: Any technical insights or benchmarks that could guide our decision?
Long-Term Perspective: From a long-term development perspective, which option do you believe is more sustainable?

Your feedback is crucial in helping us make an informed decision. We are excited to build a library that not only meets our technical needs but also aligns with our community's preferences and requirements.

Looking forward to your thoughts and insights!

Example Matrix Multiplication

Kernel for Opencl

__kernel void matrixProduct (__global double *A, __global double *B, __global double *C, const int aCols, const int bCols) {
    int j,k;
    int i = get_global_id(0);

    int localCopyA[SIZE];
    for (k = 0; k < SIZE; k++) {
        localCopyA[k] = A[i*SIZE+k];
    }

    double result;
    for (j = 0; j < bCols; j++) {
        result = 0;
        for (k = 0; k < aCols; k++) {
            result += localCopyA[k] * B[k*bCols+j];
        }
        C[i*bCols+j] = result;
    }
}

Kernel for Cuda

extern "C"
__global__ void matrixMultiply(double *A, double *B, double *C, int numARows, int numAColumns, int numBColumns) {
    int row = blockIdx.y * blockDim.y + threadIdx.y;
    int col = blockIdx.x * blockDim.x + threadIdx.x;

    if (row < numARows && col < numBColumns) {
        double sum = 0;
        for (int i = 0; i < numAColumns; ++i) {
            sum += A[row * numAColumns + i] * B[i * numBColumns + col];
        }
        C[row * numBColumns + col] = sum;
    }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nvidia CUDA vs OPENCL: It's Time to Decide What to Use #145

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Nvidia CUDA vs OPENCL: It's Time to Decide What to Use #145

Samyssmile Dec 15, 2023 Maintainer

Our Experience So Far

Seeking Your Input

Example Matrix Multiplication

Kernel for Opencl

Kernel for Cuda

Replies: 0 comments

Samyssmile
Dec 15, 2023
Maintainer