gguf-py: Improve `GGUFReader` read-only mode performance #10159

Isotr0py · 2024-11-04T09:11:58Z

This PR aims to optimize the GGUFReader read-only performance with following modifications:

Using native python file I/O to build fields instead of memmap array.
Optimize _get_str and _get function in read-only mode with np.from_buffer.
Avoid calculating offsets from array with creating intermediate data, using tell from native python I/O file to update offsets instead.

Performance Comparison

Benchmark script

#!/usr/bin/env python3
import logging
import sys
import time
from pathlib import Path
import psutil
from gguf.gguf_reader import GGUFReader

logger = logging.getLogger("reader")

sys.path.insert(0, str(Path(__file__).parent.parent))


def read_gguf_file(gguf_file_path):
    """
    Reads and prints key-value pairs and tensor information from a GGUF file in an improved format.

    Parameters:
    - gguf_file_path: Path to the GGUF file.
    """

    time0 = time.time()
    ram_init1 = psutil.virtual_memory()[2]
    ram_init2 = psutil.virtual_memory()[3]/1000000000

    reader = GGUFReader(gguf_file_path)

    # List all key-value pairs in a columnized format
    print("Key-Value Pairs:") # noqa: NP100
    max_key_length = max(len(key) for key in reader.fields.keys())
    for key, field in reader.fields.items():
        value = field.parts[field.data[0]]
        print(f"{key:{max_key_length}} : {value}") # noqa: NP100
    print("----") # noqa: NP100

    # List all tensors
    print("Tensors:") # noqa: NP100
    tensor_info_format = "{:<30} | Shape: {:<15} | Size: {:<12} | Quantization: {}"
    print(tensor_info_format.format("Tensor Name", "Shape", "Size", "Quantization")) # noqa: NP100
    print("-" * 80) # noqa: NP100
    for tensor in reader.tensors:
        shape_str = "x".join(map(str, tensor.shape))
        size_str = str(tensor.n_elements)
        quantization_str = tensor.tensor_type.name
        print(tensor_info_format.format(tensor.name, shape_str, size_str, quantization_str)) # noqa: NP100

    print('Time (s):', time.time() - time0)
    print('RAM memory % used:', psutil.virtual_memory()[2] - ram_init1)
    print('RAM Used (GB):', psutil.virtual_memory()[3]/1000000000 - ram_init2)


if __name__ == '__main__':
    if len(sys.argv) < 2:
        logger.info("Usage: reader.py <path_to_gguf_file>")
        sys.exit(1)
    gguf_file_path = sys.argv[1]
    read_gguf_file(gguf_file_path)

Comparison Results

File: qwen2-0_5b-instruct-q2_k.gguf
CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
RAM: 16GB

Master

Time (s): 12.987974643707275
RAM memory % used: 1.7999999999999972
RAM Used (GB): 0.31249203199999975

This PR

Time (s): 4.433131456375122
RAM memory % used: 0.7999999999999972
RAM Used (GB): 0.1335459839999995

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Signed-off-by: isotr0py <2037008807@qq.com>

Isotr0py added 5 commits October 31, 2024 20:19

refactor gguf reader

5ce2dbc

improve performance

bcef54e

fix mode

205676c

fix mode

dd320df

optimize offsets calculation

1dc0215

github-actions bot added the python python script changes label Nov 4, 2024

Isotr0py marked this pull request as draft November 4, 2024 13:26

Isotr0py added 5 commits November 5, 2024 01:03

revert unnecessary change

a92c920

revert unnecessary change

ad6fd8d

code format

6a13722

make mode compatiable

07ef1a8

revert

810f06b

Isotr0py changed the title ~~gguf-py: Improve GGUFReader performance~~ gguf-py: Improve GGUFReader read-only mode performance Nov 5, 2024

Isotr0py marked this pull request as ready for review November 5, 2024 07:31

ggerganov requested a review from compilade November 13, 2024 11:45

fix reader on linux

94d814c

Signed-off-by: isotr0py <2037008807@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gguf-py: Improve `GGUFReader` read-only mode performance #10159

gguf-py: Improve `GGUFReader` read-only mode performance #10159

Isotr0py commented Nov 4, 2024 •

edited

Loading

gguf-py: Improve GGUFReader read-only mode performance #10159

Are you sure you want to change the base?

gguf-py: Improve GGUFReader read-only mode performance #10159

Conversation

Isotr0py commented Nov 4, 2024 • edited Loading

gguf-py: Improve `GGUFReader` read-only mode performance #10159

gguf-py: Improve `GGUFReader` read-only mode performance #10159

Isotr0py commented Nov 4, 2024 •

edited

Loading