Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Vulkan REPEAT performance #2

Closed
wants to merge 45 commits into from

Commits on Jul 29, 2024

  1. Configuration menu
    Copy the full SHA
    49164e6 View commit details
    Browse the repository at this point in the history
  2. ggml : move c parameter comment to ggml_rope_ext (ggerganov#901)

    This commit moves the comment for the c parameter from ggml_rope to
    ggml_rope_ext. The comment is currently incorrect as ggml_rope does not
    have a c parameter (freq_factors tensor).
    
    Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
    danbev authored Jul 29, 2024
    Configuration menu
    Copy the full SHA
    29d87fc View commit details
    Browse the repository at this point in the history

Commits on Aug 4, 2024

  1. vulkan : implement Stable Diffusion operators (ggerganov#904)

    * Fix Vulkan repeat op
    
    * Implement Vulkan concat op
    
    * Delete old Vulkan shader generator
    
    * Implement Vulkan im2col op
    
    * Implement Vulkan unary gelu_quick op
    
    * Implement Vulkan group_norm op
    
    * Implement Vulkan timestep_embedding op
    
    * Implement Vulkan upscale op
    
    * Fix Vulkan vk_context tensor extra index issue
    
    * Fix Vulkan matmul shader parameter bug
    
    * Properly fix Vulkan matmul shader parameter bug
    
    * Add Vulkan ADD f16 + f32 -> f16 operator support
    
    * Implement Vulkan tanh op
    
    * Fix Vulkan group count too large Validation error on non-Nvidia GPUs
    
    * Throw error when too much memory is requested
    
    * Fix another Vulkan group count too large Validation error on non-Nvidia GPUs
    
    * Fix matmul MMQ condition
    
    * Implement Vulkan pad op
    
    * Fix Vulkan crash when tensor is used multiple times in a compute graph
    
    * Add Vulkan CONCAT f16 + f16 -> f16 op
    
    * Add Vulkan LEAKY_RELU op
    0cc4m authored Aug 4, 2024
    Configuration menu
    Copy the full SHA
    18703ad View commit details
    Browse the repository at this point in the history

Commits on Aug 7, 2024

  1. Configuration menu
    Copy the full SHA
    1f2b80a View commit details
    Browse the repository at this point in the history
  2. metal : fix struct name (ggerganov#912)

    ggml-ci
    ggerganov authored Aug 7, 2024
    Configuration menu
    Copy the full SHA
    444e896 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6c71d5a View commit details
    Browse the repository at this point in the history

Commits on Aug 8, 2024

  1. feat: Support Moore Threads GPU (llama/8383)

    * Update doc for MUSA
    
    Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
    
    * Add GGML_MUSA in Makefile
    
    Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
    
    * Add GGML_MUSA in CMake
    
    Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
    
    * CUDA => MUSA
    
    Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
    
    * MUSA adds support for __vsubss4
    
    Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
    
    * Fix CI build failure
    
    Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
    
    ---------
    
    Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
    yeahdongcn authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    a00acc9 View commit details
    Browse the repository at this point in the history
  2. add conv support (llama/8688)

    airMeng authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    dcb2400 View commit details
    Browse the repository at this point in the history
  3. cuda : organize vendor-specific headers into vendors directory (llama…

    …/8746)
    
    Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
    yeahdongcn authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    116362c View commit details
    Browse the repository at this point in the history
  4. ggml: bugfix: fix the inactive elements is agnostic for risc-v vector…

    … (llama/8748)
    
    In these codes, we want to retain the value that they previously held
    when mask[i] is false. So we should use undisturbed. With the default
    agnostic policy of rvv intrinsic, these values can be held or be
    written with 1s.
    
    Co-authored-by: carter.li <carter.li@starfivetech.com>
    2 people authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    4952391 View commit details
    Browse the repository at this point in the history
  5. Add TIMESTEP_EMBEDDING OP (llama/8707)

    Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
    zhentaoyu authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    5efe2dc View commit details
    Browse the repository at this point in the history
  6. cann: update cmake (llama/8765)

    wangshuai09 authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    607299d View commit details
    Browse the repository at this point in the history
  7. added android implementation of ggml_print_backtrace_symbols (llama/8…

    …751)
    
    * added android implementation of ggml_print_backtrace_symbols
    
    * Update ggml/src/ggml.c
    
    Co-authored-by: slaren <slarengh@gmail.com>
    
    * Update ggml/src/ggml.c
    
    Co-authored-by: slaren <slarengh@gmail.com>
    
    * Update ggml/src/ggml.c
    
    Co-authored-by: slaren <slarengh@gmail.com>
    
    * Update ggml/src/ggml.c
    
    Co-authored-by: slaren <slarengh@gmail.com>
    
    * Update ggml/src/ggml.c
    
    Co-authored-by: slaren <slarengh@gmail.com>
    
    ---------
    
    Co-authored-by: slaren <slarengh@gmail.com>
    2 people authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    597f40a View commit details
    Browse the repository at this point in the history
  8. cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (llama/8800)

    * cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X
    
    * update asserts
    
    * only use dmmv for supported types
    
    * add test
    slaren authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    58e50d2 View commit details
    Browse the repository at this point in the history
  9. Build: Only include execinfo.h on linux systems that support it (llam…

    …a/8783)
    
    * Only enable backtrace on GLIBC linux systems
    
    * fix missing file from copy
    
    * use glibc macro instead of defining a custom one
    acon96 authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    77e79cd View commit details
    Browse the repository at this point in the history
  10. ggml-cuda: Adding support for unified memory (llama/8035)

    * Adding support for unified memory
    
    * adding again the documentation about unified memory
    
    * refactoring: Moved the unified memory code in the correct location.
    
    * Fixed compilation error when using hipblas
    
    * cleaning up the documentation
    
    * Updating the documentation
    
    Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
    
    * adding one more case where the PR should not be enabled
    
    ---------
    
    Co-authored-by: matteo serva <matteo.serva@gmail.com>
    Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
    3 people authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    0edb8d8 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    eaeff32 View commit details
    Browse the repository at this point in the history
  12. cann: Fix ggml_cann_im2col for 1D im2col (llama/8819)

    * fix ggml_cann_im2col for 1D im2col
    
    * fix build warning
    MengqingCao authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    0a19c02 View commit details
    Browse the repository at this point in the history
  13. Fix conversion of unnormalized BF16->BF16 weights (llama/7843)

    * add truncate_bf16
    
    * truncate intermediate fp32 if converting bf16 to bf16
    
    * fix masking in __compute_fp32_to_bf16
    
    * np.int16 no longer used
    
    * missing cast and additional numpy 2.x fix
    
    * ggml-impl : do not flush bf16 subnormals to zero
    
    * ggml : add reference fp32 to bf16 conversion
    
    The fast version is no longer equivalent for all platforms
    because of the handling of subnormal values.
    
    * gguf-py : remove flush to zero for bf16 subnormals
    
    * gguf-py : remove float32 truncation to bf16
    
    Rounding achieves the same thing in the cases where this was used.
    
    * missed prototype update in merge
    
    * merge cleanup
    
    ---------
    
    Co-authored-by: Francis Couture-Harpin <git@compilade.net>
    2 people authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    e59cf54 View commit details
    Browse the repository at this point in the history
  14. ggml : reading the runtime sve config of the cpu (llama/8709)

    * ggml : reading the runtime sve config of the cpu
    
    * change to one time init to prevent performance drop
    
    * prefix variable to avoid possible conflicts
    
    * revert xxhash fix and add brackets
    
    ---------
    
    Co-authored-by: domke <673751-domke@users.noreply.gitlab.com>
    2 people authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    0b5195f View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    1fb1c9d View commit details
    Browse the repository at this point in the history
  16. vulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 (llama/8…

    …855)
    
    * Fix Vulkan mul mat vec invalid results when ncols < warp size
    
    * Only run backend ops mul mat vec block size test if block size not already covered
    0cc4m authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    a3b2059 View commit details
    Browse the repository at this point in the history
  17. ggml : fix overflows in elu function (llama/8866)

    It's helpful to use expm1f(x), because expf(x)-1 will result in overflow
    for 25% of single-precision floating point numbers.
    jart authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    15eac32 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    70f29c7 View commit details
    Browse the repository at this point in the history
  19. Fix ggml_backend_cann_buffer_get_tensor (llama/8871)

    * cann: fix ggml_backend_cann_buffer_get_tensor
    
     1. fix data ptr offset
     2. enable the acquisition of incomplete tensors
    
    * fix backend cann set_tensor
    MengqingCao authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    dc3dba3 View commit details
    Browse the repository at this point in the history
  20. ggml : add epsilon as a parameter for group_norm (llama/8818)

    Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
    MollySophia authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    fc31d40 View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    9510e3c View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    63f2251 View commit details
    Browse the repository at this point in the history
  23. Updated SYCL device filtering (llama/8901)

    * Updated device filter to depend on default_selector (fixes non-intel device issues)
    * Small related update to example/sycl Readme
    OuadiElfarouki authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    02a0b27 View commit details
    Browse the repository at this point in the history
  24. ggml-backend : fix async copy from CPU (llama/8897)

    * ggml-backend : fix async copy from CPU
    
    * cuda : more reliable async copy, fix stream used when the devices are the same
    slaren authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    67c3e78 View commit details
    Browse the repository at this point in the history
  25. sync : llama.cpp

    ggml-ci
    ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    9793ab7 View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    3058ec3 View commit details
    Browse the repository at this point in the history
  27. Configuration menu
    Copy the full SHA
    3266c07 View commit details
    Browse the repository at this point in the history
  28. scripts : sync sycl (#0)

    ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    723445e View commit details
    Browse the repository at this point in the history
  29. sync : vulkan (llama/0)

    ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    bc97237 View commit details
    Browse the repository at this point in the history
  30. ggml : add CANN backend (llama/0)

    ggml-ci
    hipudding authored and ggerganov committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    a06c683 View commit details
    Browse the repository at this point in the history

Commits on Aug 9, 2024

  1. whisper : use vulkan as gpu backend when available (whisper/2302)

    * ggml: use vulkan as gpu backend when available
    
    Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
    
    * whisper: enable using vk as default buffer type
    
    Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
    
    ---------
    
    Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
    mstephenson6 authored and ggerganov committed Aug 9, 2024
    Configuration menu
    Copy the full SHA
    2373142 View commit details
    Browse the repository at this point in the history
  2. sync : whisper.cpp

    ggerganov committed Aug 9, 2024
    Configuration menu
    Copy the full SHA
    797faa2 View commit details
    Browse the repository at this point in the history

Commits on Aug 10, 2024

  1. rpc : sanitize tensor data + warnings (llama/0)

    Co-authored-by: slaren <slarengh@gmail.com>
    ggerganov and slaren committed Aug 10, 2024
    Configuration menu
    Copy the full SHA
    483ccfb View commit details
    Browse the repository at this point in the history
  2. sync : llama.cpp

    ggerganov committed Aug 10, 2024
    Configuration menu
    Copy the full SHA
    4bf4a25 View commit details
    Browse the repository at this point in the history

Commits on Aug 11, 2024

  1. Configuration menu
    Copy the full SHA
    9309817 View commit details
    Browse the repository at this point in the history
  2. sync : llama.cpp

    ggerganov committed Aug 11, 2024
    Configuration menu
    Copy the full SHA
    681247d View commit details
    Browse the repository at this point in the history
  3. ggml : support forward pass broadcasting in ggml_sub (ggerganov#914)

    * ggml: support forward pass broadcasting in ggml_sub
    
    Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
    
    * Use assert instead of GGML_ASSERT in ggml_compute_forward_sub_f32
    
    The check is already performed in ggml_sub_impl
    
    Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
    
    ---------
    
    Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
    smeso authored Aug 11, 2024
    Configuration menu
    Copy the full SHA
    a735a7b View commit details
    Browse the repository at this point in the history

Commits on Aug 12, 2024

  1. feat: add new sin and cos operators (ggerganov#919)

    * ggml : add sin/cos operators
    
    * ggml-cuda : add sin/cos operators
    
    * ggml : add corresponding tests for sin/cos
    
    * ggml : add backward computation for sin/cos operators
    
    * ggml-vulkan : add sin/cos operators
    
    * ggml-vulkan : add sin/cos shader source
    
    * metal : add sin, cos
    
    ---------
    
    Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
    Ronsor and ggerganov authored Aug 12, 2024
    Configuration menu
    Copy the full SHA
    21f9e5c View commit details
    Browse the repository at this point in the history

Commits on Aug 23, 2024

  1. Optimize Vulkan REPEAT performance

    Co-Authored-By: 0cc4m <11707594+0cc4m@users.noreply.github.com>
    SkutteOleg and 0cc4m committed Aug 23, 2024
    Configuration menu
    Copy the full SHA
    681d4f3 View commit details
    Browse the repository at this point in the history