Fine-Tuning and Inference with LLaMA-3.1-8B Model

This project demonstrates how to fine-tune the LLaMA-3.1-8B model using LoRA adapters, apply chat templates, and save the model for inference. The model is trained on local data, optimized for parameter-efficient training, and deployed to the Hugging Face Hub.

Overview

Model: LLaMA-3.1-8B with 4-bit quantization for efficient memory usage.
Techniques: Fine-tuning using LoRA (Low-Rank Adaptation) adapters, gradient checkpointing, and chat templates.
Data: Custom local CSV file used for training.
Goal: Train and deploy a chatbot model capable of handling user input in conversation-based formats.

Features

LoRA Adapters: Parameter-efficient fine-tuning.
Quantization: Efficient memory usage with 4-bit precision.
Chat Templates: Structured conversation flow with tokenization.
Model Deployment: Saving and pushing models to Hugging Face Hub in different quantization formats.

Installation

To run this project, you'll need to install the required packages. You can set this up in Google Colab or your local environment:

pip install torch transformers datasets pandas unsloth trl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme.md

Readme.md

Fine-Tuning and Inference with LLaMA-3.1-8B Model

Overview

Features

Installation

Files

Readme.md

Latest commit

History

Readme.md

File metadata and controls

Fine-Tuning and Inference with LLaMA-3.1-8B Model

Overview

Features

Installation