Skip to content

Latest commit

 

History

History
23 lines (20 loc) · 875 Bytes

README.md

File metadata and controls

23 lines (20 loc) · 875 Bytes

Examples

xFasterTransformer provides C++, Python(Pytorch) examples to help users learn the API usage. Web demos of some models based on Gradio are provided. All of the examples and web demo support multi-rank.

C++ example support automatic identification model and tokenizer which is implemented by SentencePiece, excluding Opt model which tokenizer is a hard code.

Python(PyTorch) example achieves end-to-end inference of the model with streaming output combining the transformer's tokenizer.

A web demo based on Gradio is provided in repo.
Support list:

  • ChatGLM
  • ChatGLM2
  • ChatGLM3
  • ChatGLM4
  • Llama2
  • Llama3
  • Gemma
  • Yi
  • Baichuan2
  • Qwen
  • Qwen2