-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The generated results are different when using greedy search during generation #65
Comments
Hi, do_sample = true and top_k = 1 should be fine but the correct way to do it is just do_sample = False. |
Hi @mayank31398 Sorry for the late reply. |
Refer to https://huggingface.co/blog/how-to-generate. Because sampling is designed to incorporate randomness into picking the next word. |
But the |
Thank you very much for your work. I got a problem when I ran BLOOM-176B on 8*A100.
I followed the
README.md
and executed the following command. To be specific, I setdo_sample = true
andtop_k = 1
which I thought it was equivalent to greedy search:python -m inference_server.cli --model_name bigscience/bloom --model_class AutoModelForCausalLM --dtype bf16 --deployment_framework hf_accelerate --generate_kwargs '{"min_length": 100, "max_new_tokens": 100, "do_sample": true, "top_k": 1}'
However, the generated outputs of several forwards were different with the same inputs. This situation happened occasionally.
Do you have any clues or ideas about this?
My env info:
The text was updated successfully, but these errors were encountered: