Skip to content

Commit

Permalink
Merge branch 'main' of github.com:mobiusml/hicham-sparsity-blog-post
Browse files Browse the repository at this point in the history
  • Loading branch information
appoose committed Nov 2, 2023
2 parents a729ff6 + 8a6dc2d commit 81af311
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1 +1,12 @@
# Low-Rank LLama2

In the ever-evolving landscape of artificial intelligence (AI), one undeniable trend has emerged in recent years: the relentless growth in the size and complexity of machine learning models. More specifically, large language models (LLMs) that mainly rely on transformers as building blocks, are reaching a substantial number of parameters and require a significant amount of compute that is expected to increase with larger and larger models being released.

In this blog post and supporting code, we explore low-rankness as a pruning technique of the LLama2-7B base model. We show that, by splitting almost all the linear layer weights into low-rank pairs without fine-tuning and leveraging LoRA for custom training, we can achieve the following without implementing custom kernels:

- ~50% reduction in the number of parameters.
- Up to ~50% faster training vs. bitsandbytes’s 8-bit quantization.
- Up to ~1.25x inference speed-up.

The blog is at [https://mobiusml.github.io/low-rank-llama2/](https://mobiusml.github.io/low-rank-llama2/)
and code is at [https://github.com/mobiusml/low-rank-llama2/tree/main/code](https://github.com/mobiusml/low-rank-llama2/tree/main/code)

0 comments on commit 81af311

Please sign in to comment.