-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Vulkan REPEAT performance #2
Commits on Jul 29, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 49164e6 - Browse repository at this point
Copy the full SHA 49164e6View commit details -
ggml : move c parameter comment to ggml_rope_ext (ggerganov#901)
This commit moves the comment for the c parameter from ggml_rope to ggml_rope_ext. The comment is currently incorrect as ggml_rope does not have a c parameter (freq_factors tensor). Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 29d87fc - Browse repository at this point
Copy the full SHA 29d87fcView commit details
Commits on Aug 4, 2024
-
vulkan : implement Stable Diffusion operators (ggerganov#904)
* Fix Vulkan repeat op * Implement Vulkan concat op * Delete old Vulkan shader generator * Implement Vulkan im2col op * Implement Vulkan unary gelu_quick op * Implement Vulkan group_norm op * Implement Vulkan timestep_embedding op * Implement Vulkan upscale op * Fix Vulkan vk_context tensor extra index issue * Fix Vulkan matmul shader parameter bug * Properly fix Vulkan matmul shader parameter bug * Add Vulkan ADD f16 + f32 -> f16 operator support * Implement Vulkan tanh op * Fix Vulkan group count too large Validation error on non-Nvidia GPUs * Throw error when too much memory is requested * Fix another Vulkan group count too large Validation error on non-Nvidia GPUs * Fix matmul MMQ condition * Implement Vulkan pad op * Fix Vulkan crash when tensor is used multiple times in a compute graph * Add Vulkan CONCAT f16 + f16 -> f16 op * Add Vulkan LEAKY_RELU op
Configuration menu - View commit details
-
Copy full SHA for 18703ad - Browse repository at this point
Copy the full SHA 18703adView commit details
Commits on Aug 7, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 1f2b80a - Browse repository at this point
Copy the full SHA 1f2b80aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 444e896 - Browse repository at this point
Copy the full SHA 444e896View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6c71d5a - Browse repository at this point
Copy the full SHA 6c71d5aView commit details
Commits on Aug 8, 2024
-
feat: Support Moore Threads GPU (llama/8383)
* Update doc for MUSA Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Add GGML_MUSA in Makefile Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Add GGML_MUSA in CMake Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * CUDA => MUSA Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * MUSA adds support for __vsubss4 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Fix CI build failure Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Configuration menu - View commit details
-
Copy full SHA for a00acc9 - Browse repository at this point
Copy the full SHA a00acc9View commit details -
Configuration menu - View commit details
-
Copy full SHA for dcb2400 - Browse repository at this point
Copy the full SHA dcb2400View commit details -
cuda : organize vendor-specific headers into vendors directory (llama…
…/8746) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Configuration menu - View commit details
-
Copy full SHA for 116362c - Browse repository at this point
Copy the full SHA 116362cView commit details -
ggml: bugfix: fix the inactive elements is agnostic for risc-v vector…
… (llama/8748) In these codes, we want to retain the value that they previously held when mask[i] is false. So we should use undisturbed. With the default agnostic policy of rvv intrinsic, these values can be held or be written with 1s. Co-authored-by: carter.li <carter.li@starfivetech.com>
Configuration menu - View commit details
-
Copy full SHA for 4952391 - Browse repository at this point
Copy the full SHA 4952391View commit details -
Add
TIMESTEP_EMBEDDING
OP (llama/8707)Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 5efe2dc - Browse repository at this point
Copy the full SHA 5efe2dcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 607299d - Browse repository at this point
Copy the full SHA 607299dView commit details -
added android implementation of ggml_print_backtrace_symbols (llama/8…
…751) * added android implementation of ggml_print_backtrace_symbols * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 597f40a - Browse repository at this point
Copy the full SHA 597f40aView commit details -
cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (llama/8800)
* cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X * update asserts * only use dmmv for supported types * add test
Configuration menu - View commit details
-
Copy full SHA for 58e50d2 - Browse repository at this point
Copy the full SHA 58e50d2View commit details -
Build: Only include execinfo.h on linux systems that support it (llam…
…a/8783) * Only enable backtrace on GLIBC linux systems * fix missing file from copy * use glibc macro instead of defining a custom one
Configuration menu - View commit details
-
Copy full SHA for 77e79cd - Browse repository at this point
Copy the full SHA 77e79cdView commit details -
ggml-cuda: Adding support for unified memory (llama/8035)
* Adding support for unified memory * adding again the documentation about unified memory * refactoring: Moved the unified memory code in the correct location. * Fixed compilation error when using hipblas * cleaning up the documentation * Updating the documentation Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * adding one more case where the PR should not be enabled --------- Co-authored-by: matteo serva <matteo.serva@gmail.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Configuration menu - View commit details
-
Copy full SHA for 0edb8d8 - Browse repository at this point
Copy the full SHA 0edb8d8View commit details -
Configuration menu - View commit details
-
Copy full SHA for eaeff32 - Browse repository at this point
Copy the full SHA eaeff32View commit details -
cann: Fix ggml_cann_im2col for 1D im2col (llama/8819)
* fix ggml_cann_im2col for 1D im2col * fix build warning
Configuration menu - View commit details
-
Copy full SHA for 0a19c02 - Browse repository at this point
Copy the full SHA 0a19c02View commit details -
Fix conversion of unnormalized BF16->BF16 weights (llama/7843)
* add truncate_bf16 * truncate intermediate fp32 if converting bf16 to bf16 * fix masking in __compute_fp32_to_bf16 * np.int16 no longer used * missing cast and additional numpy 2.x fix * ggml-impl : do not flush bf16 subnormals to zero * ggml : add reference fp32 to bf16 conversion The fast version is no longer equivalent for all platforms because of the handling of subnormal values. * gguf-py : remove flush to zero for bf16 subnormals * gguf-py : remove float32 truncation to bf16 Rounding achieves the same thing in the cases where this was used. * missed prototype update in merge * merge cleanup --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net>
Configuration menu - View commit details
-
Copy full SHA for e59cf54 - Browse repository at this point
Copy the full SHA e59cf54View commit details -
ggml : reading the runtime sve config of the cpu (llama/8709)
* ggml : reading the runtime sve config of the cpu * change to one time init to prevent performance drop * prefix variable to avoid possible conflicts * revert xxhash fix and add brackets --------- Co-authored-by: domke <673751-domke@users.noreply.gitlab.com>
Configuration menu - View commit details
-
Copy full SHA for 0b5195f - Browse repository at this point
Copy the full SHA 0b5195fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1fb1c9d - Browse repository at this point
Copy the full SHA 1fb1c9dView commit details -
vulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 (llama/8…
…855) * Fix Vulkan mul mat vec invalid results when ncols < warp size * Only run backend ops mul mat vec block size test if block size not already covered
Configuration menu - View commit details
-
Copy full SHA for a3b2059 - Browse repository at this point
Copy the full SHA a3b2059View commit details -
ggml : fix overflows in elu function (llama/8866)
It's helpful to use expm1f(x), because expf(x)-1 will result in overflow for 25% of single-precision floating point numbers.
Configuration menu - View commit details
-
Copy full SHA for 15eac32 - Browse repository at this point
Copy the full SHA 15eac32View commit details -
Configuration menu - View commit details
-
Copy full SHA for 70f29c7 - Browse repository at this point
Copy the full SHA 70f29c7View commit details -
Fix ggml_backend_cann_buffer_get_tensor (llama/8871)
* cann: fix ggml_backend_cann_buffer_get_tensor 1. fix data ptr offset 2. enable the acquisition of incomplete tensors * fix backend cann set_tensor
Configuration menu - View commit details
-
Copy full SHA for dc3dba3 - Browse repository at this point
Copy the full SHA dc3dba3View commit details -
ggml : add epsilon as a parameter for group_norm (llama/8818)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for fc31d40 - Browse repository at this point
Copy the full SHA fc31d40View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9510e3c - Browse repository at this point
Copy the full SHA 9510e3cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 63f2251 - Browse repository at this point
Copy the full SHA 63f2251View commit details -
Updated SYCL device filtering (llama/8901)
* Updated device filter to depend on default_selector (fixes non-intel device issues) * Small related update to example/sycl Readme
Configuration menu - View commit details
-
Copy full SHA for 02a0b27 - Browse repository at this point
Copy the full SHA 02a0b27View commit details -
ggml-backend : fix async copy from CPU (llama/8897)
* ggml-backend : fix async copy from CPU * cuda : more reliable async copy, fix stream used when the devices are the same
Configuration menu - View commit details
-
Copy full SHA for 67c3e78 - Browse repository at this point
Copy the full SHA 67c3e78View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9793ab7 - Browse repository at this point
Copy the full SHA 9793ab7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3058ec3 - Browse repository at this point
Copy the full SHA 3058ec3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3266c07 - Browse repository at this point
Copy the full SHA 3266c07View commit details -
Configuration menu - View commit details
-
Copy full SHA for 723445e - Browse repository at this point
Copy the full SHA 723445eView commit details -
Configuration menu - View commit details
-
Copy full SHA for bc97237 - Browse repository at this point
Copy the full SHA bc97237View commit details -
Configuration menu - View commit details
-
Copy full SHA for a06c683 - Browse repository at this point
Copy the full SHA a06c683View commit details
Commits on Aug 9, 2024
-
whisper : use vulkan as gpu backend when available (whisper/2302)
* ggml: use vulkan as gpu backend when available Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com> * whisper: enable using vk as default buffer type Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com> --------- Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 2373142 - Browse repository at this point
Copy the full SHA 2373142View commit details -
Configuration menu - View commit details
-
Copy full SHA for 797faa2 - Browse repository at this point
Copy the full SHA 797faa2View commit details
Commits on Aug 10, 2024
-
rpc : sanitize tensor data + warnings (llama/0)
Co-authored-by: slaren <slarengh@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 483ccfb - Browse repository at this point
Copy the full SHA 483ccfbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4bf4a25 - Browse repository at this point
Copy the full SHA 4bf4a25View commit details
Commits on Aug 11, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 9309817 - Browse repository at this point
Copy the full SHA 9309817View commit details -
Configuration menu - View commit details
-
Copy full SHA for 681247d - Browse repository at this point
Copy the full SHA 681247dView commit details -
ggml : support forward pass broadcasting in ggml_sub (ggerganov#914)
* ggml: support forward pass broadcasting in ggml_sub Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> * Use assert instead of GGML_ASSERT in ggml_compute_forward_sub_f32 The check is already performed in ggml_sub_impl Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> --------- Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for a735a7b - Browse repository at this point
Copy the full SHA a735a7bView commit details
Commits on Aug 12, 2024
-
feat: add new
sin
andcos
operators (ggerganov#919)* ggml : add sin/cos operators * ggml-cuda : add sin/cos operators * ggml : add corresponding tests for sin/cos * ggml : add backward computation for sin/cos operators * ggml-vulkan : add sin/cos operators * ggml-vulkan : add sin/cos shader source * metal : add sin, cos --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 21f9e5c - Browse repository at this point
Copy the full SHA 21f9e5cView commit details
Commits on Aug 23, 2024
-
Optimize Vulkan REPEAT performance
Co-Authored-By: 0cc4m <11707594+0cc4m@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 681d4f3 - Browse repository at this point
Copy the full SHA 681d4f3View commit details