Examples

xFasterTransformer provides C++, Python(Pytorch) examples to help users learn the API usage. Web demos of some models based on Gradio are provided. All of the examples and web demo support multi-rank.

C++ example

C++ example support automatic identification model and tokenizer which is implemented by SentencePiece, excluding Opt model which tokenizer is a hard code.

Python (PyTorch) example

Python(PyTorch) example achieves end-to-end inference of the model with streaming output combining the transformer's tokenizer.

Web Demo

A web demo based on Gradio is provided in repo.
Support list:

ChatGLM
ChatGLM2
ChatGLM3
ChatGLM4
Llama2
Llama3
Gemma
Yi
Baichuan2
Qwen
Qwen2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Examples

C++ example

Python (PyTorch) example

Web Demo

Files

README.md

Latest commit

History

README.md

File metadata and controls

Examples

C++ example

Python (PyTorch) example

Web Demo