From 8a6dc2d32354569ecd837577de26731aece0f5ec Mon Sep 17 00:00:00 2001 From: Appu Shaji Date: Thu, 2 Nov 2023 09:03:58 +0100 Subject: [PATCH] Update README.md to include details to blog and code --- README.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/README.md b/README.md index b429778..de95c8a 100644 --- a/README.md +++ b/README.md @@ -1 +1,12 @@ # Low-Rank LLama2 + +In the ever-evolving landscape of artificial intelligence (AI), one undeniable trend has emerged in recent years: the relentless growth in the size and complexity of machine learning models. More specifically, large language models (LLMs) that mainly rely on transformers as building blocks, are reaching a substantial number of parameters and require a significant amount of compute that is expected to increase with larger and larger models being released. + +In this blog post and supporting code, we explore low-rankness as a pruning technique of the LLama2-7B base model. We show that, by splitting almost all the linear layer weights into low-rank pairs without fine-tuning and leveraging LoRA for custom training, we can achieve the following without implementing custom kernels: + +- ~50% reduction in the number of parameters. +- Up to ~50% faster training vs. bitsandbytes’s 8-bit quantization. +- Up to ~1.25x inference speed-up. + +The blog is at [https://mobiusml.github.io/low-rank-llama2/](https://mobiusml.github.io/low-rank-llama2/) +and code is at [https://github.com/mobiusml/low-rank-llama2/tree/main/code](https://github.com/mobiusml/low-rank-llama2/tree/main/code)