You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As we continue to develop our lib, a critical decision lies ahead of us: Should we integrate GPU support using Nvidia CUDA or OpenCL? We have experimented with both to a certain extent, but now it's time to make a strategic choice.
Our Experience So Far
CUDA: We've noticed that CUDA, being specific to Nvidia GPUs, offers excellent optimization and performance. Its ecosystem, documentation, and community support are robust. However, it limits our library's usability to only those with Nvidia GPUs.
OpenCL: On the other hand, OpenCL provides a more flexible approach, being cross-platform and able to run on a variety of hardware including Nvidia and AMD GPUs, and even CPUs. But, it might not be as optimized as CUDA for specific Nvidia hardware.
We already implemented some code for Cuda and Opencl
You can find cuda related code on main branch. Search for "cuda", take a look on kernels/cuda
For opencl, @acsolle66 contributed 3753e07
We should compare kernel files, from development point of view, this is most difficult part.
Seeking Your Input
We value the community's input and would love to hear your thoughts:
Experiences and Preferences: What have been your experiences with CUDA and OpenCL, especially in the context of Java-based ML libraries?
Technical Insights: Any technical insights or benchmarks that could guide our decision?
Long-Term Perspective: From a long-term development perspective, which option do you believe is more sustainable?
Your feedback is crucial in helping us make an informed decision. We are excited to build a library that not only meets our technical needs but also aligns with our community's preferences and requirements.
Looking forward to your thoughts and insights!
Example Matrix Multiplication
Kernel for Opencl
__kernel void matrixProduct (__global double *A, __global double *B, __global double *C, const int aCols, const int bCols) {
int j,k;
int i = get_global_id(0);
int localCopyA[SIZE];
for (k = 0; k < SIZE; k++) {
localCopyA[k] = A[i*SIZE+k];
}
double result;
for (j = 0; j < bCols; j++) {
result = 0;
for (k = 0; k < aCols; k++) {
result += localCopyA[k] * B[k*bCols+j];
}
C[i*bCols+j] = result;
}
}
Kernel for Cuda
extern "C"
__global__ void matrixMultiply(double *A, double *B, double *C, int numARows, int numAColumns, int numBColumns) {
int row = blockIdx.y * blockDim.y + threadIdx.y;
int col = blockIdx.x * blockDim.x + threadIdx.x;
if (row < numARows && col < numBColumns) {
double sum = 0;
for (int i = 0; i < numAColumns; ++i) {
sum += A[row * numAColumns + i] * B[i * numBColumns + col];
}
C[row * numBColumns + col] = sum;
}
}
help wantedExtra attention is neededquestionFurther information is requested
1 participant
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello everyone,
As we continue to develop our lib, a critical decision lies ahead of us: Should we integrate GPU support using Nvidia CUDA or OpenCL? We have experimented with both to a certain extent, but now it's time to make a strategic choice.
Our Experience So Far
We already implemented some code for Cuda and Opencl
You can find cuda related code on main branch. Search for "cuda", take a look on kernels/cuda
For opencl, @acsolle66 contributed 3753e07
We should compare kernel files, from development point of view, this is most difficult part.
Seeking Your Input
We value the community's input and would love to hear your thoughts:
Your feedback is crucial in helping us make an informed decision. We are excited to build a library that not only meets our technical needs but also aligns with our community's preferences and requirements.
Looking forward to your thoughts and insights!
Example Matrix Multiplication
Kernel for Opencl
Kernel for Cuda
Beta Was this translation helpful? Give feedback.
All reactions