Skip to content

Recipes for shrinking, optimizing, customizing cutting edge vision models. πŸ’œ

License

Notifications You must be signed in to change notification settings

Raxa/smol-vision

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

33 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Smol

Smol Vision 🐣

Recipes for shrinking, optimizing, customizing cutting edge vision models.

Notebook Description
Quantization/ONNX Faster and Smaller Zero-shot Object Detection with Optimum Quantize the state-of-the-art zero-shot object detection model OWLv2 using Optimum ONNXRuntime tools.
VLM Fine-tuning Fine-tune PaliGemma Fine-tune state-of-the-art vision language backbone PaliGemma using transformers.
Intro to Optimum/ORT Optimizing DETR with πŸ€— Optimum A soft introduction to exporting vision models to ONNX and quantizing them.
Model Shrinking Knowledge Distillation for Computer Vision Knowledge distillation for image classification.
Quantization Fit in vision models using Quanto Fit in vision models to smaller hardware using quanto
Speed-up Faster foundation models with torch.compile Improving latency for foundation models using torch.compile
Speed-up/Memory Optimization Vision language model serving using TGI (SOON) Explore speed-ups and memory improvements for vision-language model serving with text-generation inference
Quantization/Optimum/ORT All levels of quantization and graph optimizations for Image Segmentation using Optimum (SOON) End-to-end model optimization using Optimum
VLM Fine-tuning Fine-tune Florence-2 Fine-tune Florence-2 on DocVQA dataset
Fine-tune IDEFICS3 on visual question answering QLoRA Fine-tune IDEFICS3 on VQAv2 QLoRA Fine-tune IDEFICS3 on VQAv2 dataset
Multimodal RAG using ColPali and Qwen2-VL Multimodal RAG using ColPali and Qwen2-VL Learn to retrieve documents and pipeline to RAG without hefty document processing using ColPali through Byaldi and do the generation with Qwen2-VL

About

Recipes for shrinking, optimizing, customizing cutting edge vision models. πŸ’œ

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%