Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gguf-py: Improve GGUFReader read-only mode performance #10159

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

Isotr0py
Copy link

@Isotr0py Isotr0py commented Nov 4, 2024

This PR aims to optimize the GGUFReader read-only performance with following modifications:

  • Using native python file I/O to build fields instead of memmap array.
  • Optimize _get_str and _get function in read-only mode with np.from_buffer.
  • Avoid calculating offsets from array with creating intermediate data, using tell from native python I/O file to update offsets instead.

Performance Comparison

Benchmark script
#!/usr/bin/env python3
import logging
import sys
import time
from pathlib import Path
import psutil
from gguf.gguf_reader import GGUFReader

logger = logging.getLogger("reader")

sys.path.insert(0, str(Path(__file__).parent.parent))


def read_gguf_file(gguf_file_path):
    """
    Reads and prints key-value pairs and tensor information from a GGUF file in an improved format.

    Parameters:
    - gguf_file_path: Path to the GGUF file.
    """

    time0 = time.time()
    ram_init1 = psutil.virtual_memory()[2]
    ram_init2 = psutil.virtual_memory()[3]/1000000000

    reader = GGUFReader(gguf_file_path)

    # List all key-value pairs in a columnized format
    print("Key-Value Pairs:") # noqa: NP100
    max_key_length = max(len(key) for key in reader.fields.keys())
    for key, field in reader.fields.items():
        value = field.parts[field.data[0]]
        print(f"{key:{max_key_length}} : {value}") # noqa: NP100
    print("----") # noqa: NP100

    # List all tensors
    print("Tensors:") # noqa: NP100
    tensor_info_format = "{:<30} | Shape: {:<15} | Size: {:<12} | Quantization: {}"
    print(tensor_info_format.format("Tensor Name", "Shape", "Size", "Quantization")) # noqa: NP100
    print("-" * 80) # noqa: NP100
    for tensor in reader.tensors:
        shape_str = "x".join(map(str, tensor.shape))
        size_str = str(tensor.n_elements)
        quantization_str = tensor.tensor_type.name
        print(tensor_info_format.format(tensor.name, shape_str, size_str, quantization_str)) # noqa: NP100

    print('Time (s):', time.time() - time0)
    print('RAM memory % used:', psutil.virtual_memory()[2] - ram_init1)
    print('RAM Used (GB):', psutil.virtual_memory()[3]/1000000000 - ram_init2)


if __name__ == '__main__':
    if len(sys.argv) < 2:
        logger.info("Usage: reader.py <path_to_gguf_file>")
        sys.exit(1)
    gguf_file_path = sys.argv[1]
    read_gguf_file(gguf_file_path)
Comparison Results

File: qwen2-0_5b-instruct-q2_k.gguf
CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
RAM: 16GB

Master

Time (s): 12.987974643707275
RAM memory % used: 1.7999999999999972
RAM Used (GB): 0.31249203199999975

This PR

Time (s): 4.433131456375122
RAM memory % used: 0.7999999999999972
RAM Used (GB): 0.1335459839999995

@github-actions github-actions bot added the python python script changes label Nov 4, 2024
@Isotr0py Isotr0py marked this pull request as draft November 4, 2024 13:26
@Isotr0py Isotr0py changed the title gguf-py: Improve GGUFReader performance gguf-py: Improve GGUFReader read-only mode performance Nov 5, 2024
@Isotr0py Isotr0py marked this pull request as ready for review November 5, 2024 07:31
Signed-off-by: isotr0py <2037008807@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant