Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The generated results are different when using greedy search during generation #65

Open
FrostML opened this issue Mar 14, 2023 · 4 comments

Comments

@FrostML
Copy link

FrostML commented Mar 14, 2023

Thank you very much for your work. I got a problem when I ran BLOOM-176B on 8*A100.

I followed the README.md and executed the following command. To be specific, I set do_sample = true and top_k = 1 which I thought it was equivalent to greedy search:

python -m inference_server.cli --model_name bigscience/bloom --model_class AutoModelForCausalLM --dtype bf16 --deployment_framework hf_accelerate --generate_kwargs '{"min_length": 100, "max_new_tokens": 100, "do_sample": true, "top_k": 1}'

However, the generated outputs of several forwards were different with the same inputs. This situation happened occasionally.

Do you have any clues or ideas about this?

My env info:

CUDA 11.7
nccl 2.14.3

accelerate 0.17.1
Flask 2.2.3
Flask-API 3.0.post1
gunicorn 20.1.0
pydantic 1.10.6
huggingface-hub 0.13.2
@mayank31398
Copy link
Collaborator

Hi, do_sample = true and top_k = 1 should be fine but the correct way to do it is just do_sample = False.
This is weird. I don't this is a bug in the code in this repository.
But will try to give it a shot.
Can you try with just do_sample = False?

@FrostML
Copy link
Author

FrostML commented Mar 20, 2023

Hi @mayank31398 Sorry for the late reply.
It was ok with do_sample=False. The results were all the same.
But I still can't figure out why sampling can't work properly. Do you know who or which repo I can turn to for some help?

@richarddwang
Copy link

Refer to https://huggingface.co/blog/how-to-generate. Because sampling is designed to incorporate randomness into picking the next word.

@FrostML
Copy link
Author

FrostML commented Mar 22, 2023

But the k is 1. There shouldn't be any randomness. @richarddwang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants