This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/Transformers.
pip install transformers-stream-generator
- just add two lines of code before your original code
from transformers_stream_generator import init_stream_support
init_stream_support()
- add
do_stream=True
inmodel.generate
function and keepdo_sample=True
, then you can get a generator
generator = model.generate(input_ids, do_stream=True, do_sample=True)
for token in generator:
word = tokenizer.decode(token)
print(word)