GitHub - aws-samples/amazon-sagemaker-llama2-response-streaming-recipes: Amazon SageMaker Llama 2 Inference via Response Streaming

Amazon SageMaker Llama 2 Inference via Response Streaming

Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. The fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama 2 outperforms other open source language models on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests.

This repo helps customers looking to have faster response times in the form of TTFB and thus reduce the overall perceived latency. The streaming support is possible with the latest announcement Sagemaker Real-time Inference now supports response streaming.

The samples covers notebook recipes on how to implement Response Streaming SageMaker Endpoints for Llama 2 LLMs. These models were deployed using the Amazon SageMaker Deep Learning Containers HF TGI and DLC for LMI. To be precise, these are DLC for Large Model Inference and the recently announced Hugging Face DLC powered by Text Generation Inference.

This repo covers Deploy and Inference Llama 2 Models on SageMaker via Response Streaming.

DLC	Model ID	Deploy Notebook	Inference Notebook
HF TGI	meta-llama/Llama-2-7b-chat-hf	Deploy	Inference
HF TGI	meta-llama/Llama-2-13b-chat-hf	Deploy	Inference
HF TGI	meta-llama/Llama-2-70b-chat-hf	Deploy	Inference
LMI	meta-llama/Llama-2-7b-chat-hf	Deploy	Inference
LMI	meta-llama/Llama-2-13b-chat-hf	Deploy	Inference
LMI	meta-llama/Llama-2-70b-chat-hf	Deploy	Inference

Blog

📖 Inference Llama 2 models with real-time response streaming using Amazon SageMaker

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
llama-2-hf-tgi		llama-2-hf-tgi
llama-2-lmi		llama-2-lmi
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
stream.gif		stream.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon SageMaker Llama 2 Inference via Response Streaming

Blog

References

Security

License

About

Releases

Packages

Contributors 3

Languages

License

aws-samples/amazon-sagemaker-llama2-response-streaming-recipes

Folders and files

Latest commit

History

Repository files navigation

Amazon SageMaker Llama 2 Inference via Response Streaming

Blog

References

Security

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages