-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Created some models #10
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main request is to move models to gpu before inference
models/huggingface_models.py
Outdated
model = AutoModel.from_pretrained("daryl149/llama-2-7b-chat-hf") | ||
|
||
|
||
# tokenizer = LlamaTokenizer.from_pretrained("/output/path") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better do not leave commented unused code like this
server/models/huggingface_models.py
Outdated
pipeline = transformers.pipeline( | ||
"text-generation", | ||
model=model, | ||
torch_dtype=torch.float16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use torch.bfloat16
but still that would be ~12Gb
I've used int4 quantisation
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16
)
model_nf4 = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=nf4_config)
tokenizer = AutoTokenizer.from_pretrained(model_name)
pipeline = transformers.pipeline(
"text-generation",
model=model_nf4,
torch_dtype=torch.bfloat16,
tokenizer=tokenizer
)
The reason why I cannot test locally - I have Metal architecture. The most common error is the inability to use int4 in Metal, but it is used in model implementation, so I simply have not got access to this part of the code.
BitsAndBytes, without local testing
ATM this can not be merged until the conflicts are resolved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
No description provided.