This repository hosts a modified version of the original UnSloth library code, tailored for fine-tuning any Hugging Face language model (LLM) with custom data. This adaptation facilitates the utilization of custom datasets in JSONL format stored on Google Drive, providing an accessible and efficient approach to model fine-tuning directly within Google Colab environments.
- Google Colab Integration: Designed for seamless execution in Google Colab, leveraging its computational resources and pre-installed libraries.
- Custom Data Training: Enables fine-tuning with datasets in JSONL format from Google Drive.
- UnSloth Library Enhancements: Incorporates modifications to the original UnSloth codebase for improved performance and usability.
- Hugging Face Model Compatibility: Supports fine-tuning of any LLM available on Hugging Face.
- A Google Account for access to Google Colab and Google Drive.
- A dataset formatted in JSONL stored within your Google Drive.
To begin fine-tuning your language model with custom data:
- Access the CustomLMFineTuning.ipynb notebook from the repository, or directly
- Follow the step-by-step instructions within the notebook. These will guide you through setting up your environment, including mounting your Google Drive to access your dataset and executing the fine-tuning process.
- Adjust any configurations as necessary to tailor the fine-tuning to your specific model and dataset requirements.
Contributions are highly appreciated and help make this project even better. If you're interested in contributing:
- Fork this repository.
- Create your feature branch (
git checkout -b feature/AmazingFeature
). - Commit your changes (
git commit -m 'Add some AmazingFeature'
). - Push to the branch (
git push origin feature/AmazingFeature
). - Open a pull request.
- Original UnSloth Library - For the foundational codebase.
- Hugging Face - For their extensive library of language models.
- Google Colab - For providing a flexible and powerful environment for machine learning projects.
This project is licensed under the MIT License. See the LICENSE
file for more details.