Token-wise and real-time display Inference module for Llama2 and other LLMs.
cd llm-tokenwise-inference
pip install -r requirements.txt
- Run the following program in ipython or Jupyter.
from llminferencepkg import TokenWiseLLM
model = TokenWiseLLM("path/to/model") # or HF repository
model.inference("Question")