cuBLAS error with NVIDIA H100 HGX, CUDA v12.1, and cuDNN 8.8.1 #76

BenFauber · 2023-04-14T17:55:38Z

cuBLAS error when running the HuggingFace accelerate following benchmark code on NVIDIA H100 HGX, CUDA v12.1, cuDNN 8.8.1, pytorch==2.0.0+cu118, within Jupyter Notebook:

!CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python transformers-bloom-inference/bloom-inference-scripts/bloom-accelerate-inference.py --name bigscience/bloom --dtype int8 --batch_size 1 --benchmark

...or at CLI:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python transformers-bloom-inference/bloom-inference-scripts/bloom-accelerate-inference.py --name bigscience/bloom --dtype int8 --batch_size 1 --benchmark

SOLUTION:

Make the following edits within transformers-bloom-inference/bloom-inference-scripts/bloom-accelerate-inference.py to enable mixed precision:

line 10:
Add BitsAndBytesConfig to import call, as follows:
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

line 56:
Add quantization_config = BitsAndBytesConfig(llm_int8_enable_fp32_cpu_offload=True) to statement as follows:
infer_dtype = args.dtype if infer_dtype == "int8": dtype = torch.int8 quantization_config = BitsAndBytesConfig(llm_int8_enable_fp32_cpu_offload=True) kwargs = dict( device_map="auto", )

line 77:
Edit kwargs to include quantization_config, as follows:
if infer_dtype == "int8": print_rank0("Using load_in_8bit=True to use quanitized model") #kwargs["load_in_8bit"] = True kwargs={"load_in_8bit":True, "quantization_config": quantization_config, "device_map": "auto"} else: kwargs["torch_dtype"] = dtype

Save the updated PY file, then run the accelerate inferencing code.

Updated PY file runs benchmarks without errors. Recommend making these, or similar code changes, to the parent repo.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuBLAS error with NVIDIA H100 HGX, CUDA v12.1, and cuDNN 8.8.1 #76

cuBLAS error with NVIDIA H100 HGX, CUDA v12.1, and cuDNN 8.8.1 #76

BenFauber commented Apr 14, 2023

cuBLAS error with NVIDIA H100 HGX, CUDA v12.1, and cuDNN 8.8.1 #76

cuBLAS error with NVIDIA H100 HGX, CUDA v12.1, and cuDNN 8.8.1 #76

Comments

BenFauber commented Apr 14, 2023