diff --git a/README.md b/README.md index c769ad4..e6033c0 100644 --- a/README.md +++ b/README.md @@ -57,6 +57,8 @@ We are looking for contributors! Please send an email to [founders@simplifine.co Simplifine is licensed under the GNU General Public License Version 3. See the LICENSE file for more details. +## ๐Ÿ“š Documentation +MAJOR OVERHAUL OF DOCUMENTATION IN THE WORKS (Done by 11th Aug 2024). In the meantime, please use this [notebook](https://github.com/simplifine-llm/Simplifine/blob/main/examples/cloud_quickstart.ipynb) here to learn how to use the model. ## ๐Ÿ’ฌ Support diff --git a/README.md.orig b/README.md.orig new file mode 100644 index 0000000..728f611 --- /dev/null +++ b/README.md.orig @@ -0,0 +1,135 @@ +# ๐ŸŒŸ Simplifine ๐ŸŒŸ + +<<<<<<< HEAD +## Super-Easy, Open-Source Cloud-Based LLM Finetuning + +**Try here โ€“** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/simplifine-llm/Simplifine/blob/main/examples/cloud_quickstart.ipynb) + +### **Get a FREE API Key for FINETUNING [HERE](https://app.simplifine.com/#/signup)** +======= +Simplifine lets you invoke LLM finetuning with just one line of code using any Hugging Face dataset or model. +> The easiest, fully open-source LLM finetuning library! + + +**Get free Simplifine Cloud Credits to finetune [here](https://www.simplifine.com/api-key-interest)** + +## Roadmap +- **COMPREHENSIVE UPDATE of DOCUMENTATIONS on INCOMING (By Aug 9th, 2024) to match new config files. + +## ๐Ÿ”„ Updates +**v0.0.8 (2024-08-08)** +- **Bug Fixes:** Code clean up and trainer fixes. +- **New Feature:** Ability to define more complex configuration files for the trainer. +- **Examples:** -New examples on training cloud and training a fake news detector. +>>>>>>> 3bd68be2d8b6173abf701764e497240bff577925 + + +Simplifine streamlines LLM finetuning on any dataset or model with one simple command, handling all infrastructure, job management, cloud model storage, and inference. + +## Features +- **๐Ÿš€ Easy Cloud-Based LLM Finetuning:** Fine-tune any LLM with just one command. + +- **โ˜๏ธ Seamless Cloud Integration:** Automatically manage the downloading, storing, and running of models directly from the cloud. + +- **๐Ÿค– Built-in AI Assistance:** Get help with hyperparameter selection, synthetic dataset generation, and data quality checks. + +- **๐Ÿ”„ On-Device to Cloud Switching:** Add a simple decorator to transition from local to cloud-based training. + +- **โšก Auto-Optimization:** Automatically optimizes model and data parallelization Unsloth (*coming soon!*), Deepspeed โœ… and FDSP โœ… + +- **๐Ÿ“Š Custom Evaluation Support:** Use the built-in LLM for evaluations functions or import your own custom evaluation metrics. + +- **๐Ÿ’ผ Community Support:** Asking any support questions on the Simplifine Community Discord. + +- **๐Ÿ… Trusted by Leading Institutions:** Research labs at the University of Oxford rely on Simplifine for their LLM finetuning needs. + +--- + +## ๐Ÿ Quickstart + +Get started here > [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/simplifine-llm/Simplifine/blob/main/examples/cloud_quickstart.ipynb) + + +## ๐Ÿ“š Documentation + +Find our full documentation at [docs.simplifine.com](http://docs.simplifine.com). + +## ๐Ÿ“ฆ Installation + +Installing from PyPI +```bash +pip install simplifine-alpha +``` + +You can also directly install from github using the following command: +```bash +pip install git+https://github.com/simplifine-llm/Simplifine.git +``` + +## ๐Ÿค Contributing + +We are looking for contributors! Please send an email to [founders@simplifine.com](mailto:founders@simplifine.com) to get onboarded! We welcome all types of contributions. + +## ๐Ÿ“„ License + +Simplifine is licensed under the GNU General Public License Version 3. See the LICENSE file for more details. + +<<<<<<< HEAD +======= +## ๐Ÿ“š Documentation +MAJOR OVERHAUL OF DOCUMENTATION IN THE WORKS (Done by 11th Aug 2024). In the meantime, please use this [notebook](https://github.com/simplifine-llm/Simplifine/blob/main/examples/cloud_quickstart.ipynb) here to learn how to use the model. +>>>>>>> 3bd68be2d8b6173abf701764e497240bff577925 + +## ๐Ÿ’ฌ Support + +If you have any suggestions for new features you'd like to see implemented, please raise an issueโ€”we will work hard to make it happen ASAP! For any other questions, feel free to contact us at [founders@simplifine.com](mailto:founders@simplifine.com). + + + +## ๐Ÿ”„ Updates + +#### **v0.0.8** +- **๐Ÿ› Bug Fixes:** Streamlined code and resolved trainer-related issues for smoother operation. +- **โœจ New Feature:** Introduced support for defining more complex configuration files, enhancing the flexibility of the trainer. +- **๐Ÿ“š Documentation:** Added new examples, including tutorials on cloud-based training and creating a fake news detector. +- **๐Ÿ”— Updated Documentation:** Check out the latest docs at [docs.simplifine.com](https://docs.simplifine.com). + +#### **v0.0.71** +- **๐Ÿ› Bug Fixes:** Fixed issues that caused loading failures on certain configurations, ensuring broader compatibility. +- **โœจ New Feature:** Enabled direct installation from Git and added support for Hugging Face API Tokens, allowing access to restricted models. +- **๐Ÿ“š Documentation:** Refreshed examples to reflect the latest features. + + + +## โ›ฎ General Compute Considerations + +We currently support both DistributedDataParallel (DDP) and ZeRO from DeepSpeed. + +**TL;DR**: +- **DDP** is useful when a model can fit in GPU memory (this includes gradients and activation states). +- **ZeRO** is useful when a model requires sharding across multiple GPUs. + +**Longer Version**: + +- **DDP**: Distributed Data Parallel (DDP) creates a replica of the model on each processor (GPU). For example, imagine 8 GPUs, each being fed a single data pointโ€”this would make a batch size of 8. The model replicas are then updated on each device. DDP speeds up training by parallelizing the data-feeding process. However, DDP **fails** if the replica cannot fit in GPU memory. Remember, the memory not only hosts parameters but also gradients and optimizer states. + +- **ZeRO**: ZeRO is a powerful optimization developed by DeepSpeed and comes in different stages (1, 2, and 3). Each stage shards different parts of the training process (parameters, gradients, and activation states). This is really useful if a model cannot fit in GPU memory. ZeRO also supports offloading to the CPU, making even more room for training larger models. + +### Example Scenarios and Appropriate Optimization Methods: +1. **LLaMA-3-8b model with 16-bit precision**: Use ZeRO Stage 3 on 8 A100s. +2. **LLaMA-3-8b model with LoRA adapters**: Usually fine with DDP on A100s. +3. **GPT-2 with 16-bit precision**: Use DDP. + +## ๐Ÿชฒ FAQs + +**Issue: RuntimeError: Error building extension 'cpu_adam' python dev** + +This error occurs when `python-dev` is not installed, and ZeRO is using offload. To resolve this, try: + +```bash +# Try sudo apt-get install python3-dev if the following fails. +apt-get install python-dev # for Python 2.x installs +apt-get install python3-dev # for Python 3.x installs +``` + +See this [link](https://stackoverflow.com/questions/21530577/fatal-error-python-h-no-such-file-or-directory) diff --git a/examples/.DS_Store b/examples/.DS_Store deleted file mode 100644 index dcdb9bb..0000000 Binary files a/examples/.DS_Store and /dev/null differ diff --git a/examples/SFT_finetune/readme.md b/examples/SFT_finetune/readme.md deleted file mode 100644 index 438f9bb..0000000 --- a/examples/SFT_finetune/readme.md +++ /dev/null @@ -1,19 +0,0 @@ -This folder ontains an example on instruction-tuning an LLM with supervised fine-tuning (SFT). - -In this example, you will see how to design a format, and what columns from a huggingface dataset to use to fine-tune this model. - -There are 2 methods of distributed training supported so far, ZeRO (DeepSpeed) and torch DDP. - -ZeRO enables training larger models on cheaper infra, by sharding gradients, parameters and optimizer states across processes. - -DDP, would require a replica on each worker (GPU) and as such, if the model cannot fit on a single GPU, would not work. - -To run the model, use torchrun such as this: - -```sh -torchrun --nrpoc_per_node NUM_WORKERS sft_ft.py -``` - -Note that NUM_WORKERS is the number of GPUs in a single node. - -This requires for you to have installed the relevant NVCC and CUDA libraries. diff --git a/examples/SFT_finetune/sft_ft.py b/examples/SFT_finetune/sft_ft.py deleted file mode 100644 index f9b784d..0000000 --- a/examples/SFT_finetune/sft_ft.py +++ /dev/null @@ -1,15 +0,0 @@ -from simplifine_alpha import train_engine - -# model name -model_name = 'TinyLlama/TinyLlama-1.1B-Chat-v1.0' - -# your huggingface token -hf_token = '' - -# dataset path/name -dataset_name = 'nlpie/pandemic_pact' - -train_engine.hf_sft(model_name, - keys=['title', 'abstract', 'explanation'], - template='''### TITLE: {title}\n ### ABSTRACT: {abstract}\n ###EXPLANATION: {explanation}''', - response_template='###EXPLANATION:', hf_token=hf_token, zero=True, ddp=False, gradient_accumulation_steps=4, fp16=True, max_seq_length=2048) \ No newline at end of file diff --git a/examples/SFT_finetune_cloud/readme.md b/examples/SFT_finetune_cloud/readme.md deleted file mode 100644 index 222b9b9..0000000 --- a/examples/SFT_finetune_cloud/readme.md +++ /dev/null @@ -1,5 +0,0 @@ -This example shows how, with just changing the function name, you can send your training job to us. - -You can check the status of your jobs on the cloud based on your API key. - -To get your API key, please get in touch with us! \ No newline at end of file diff --git a/examples/SFT_finetune_cloud/sft_ft_cloud.py b/examples/SFT_finetune_cloud/sft_ft_cloud.py deleted file mode 100644 index af75eee..0000000 --- a/examples/SFT_finetune_cloud/sft_ft_cloud.py +++ /dev/null @@ -1,19 +0,0 @@ -from simplifine_alpha import train_utils - -# model name -model_name = 'TinyLlama/TinyLlama-1.1B-Chat-v1.0' - -# your huggingface token -hf_token = '' - -# dataset path/name -dataset_name = 'nlpie/pandemic_pact' - -# your simplifine API key -simplifine_api_key = '' - -# and simply, pass it on to us for training! -train_utils.sft_train_cloud(api_key=simplifine_api_key, job_name='sft_cloud_example', model_name=model_name, dataset_name=dataset_name, - huggingface_token=hf_token, keys=['title', 'abstract', 'explanation'], - template='''### TITLE: {title}\n ### ABSTRACT: {abstract}\n ###EXPLANATION: {explanation}''', - ) \ No newline at end of file diff --git a/examples/SFT_finetune_cloud/sft_ft_cloud_llama3.ipynb b/examples/SFT_finetune_cloud/sft_ft_cloud_llama3.ipynb deleted file mode 100644 index b1d8436..0000000 --- a/examples/SFT_finetune_cloud/sft_ft_cloud_llama3.ipynb +++ /dev/null @@ -1,200 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## ๐Ÿ“ฆ Installation\n", - "To get started with fine-tuning your own LLM models using the Simplifine library, install it directly from the GitHub repository using the following code:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Install the latest Simplifine library from the GitHub repository\n", - "!pip install git+https://github.com/simplifine-llm/Simplifine.git -q\n", - "\n", - "# The 'pip install' command is used to install Python packages.\n", - "# The '-q' option stands for 'quiet', which minimizes the amount of output produced during the installation.\n", - "# 'git+https://github.com/simplifine-llm/Simplifine.git' specifies the URL of the GitHub repository from which to install the package.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## ๐Ÿš€ Fine-Tuning LLaMA-3 8B Model\n", - "\n", - "In this section, we will focus on fine-tuning the LLaMA-3 8B model using the Simplifine library. Follow the steps below to set up your environment, initialize WandB, prepare your dataset, and configure the Simplifine client." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from simplifine_alpha.train_utils import Client\n", - "import wandb\n", - "import os\n", - "\n", - "# Disabling WandB logging. Change this if you'd like to enable it.\n", - "# Note that you will need a WandB token if you enable logging.\n", - "wandb.init(mode='disabled')\n", - "\n", - "# Define your dataset template and response keys.\n", - "# Be sure to adjust the keys, response template, and dataset accordingly.\n", - "template = '''### TITLE: {title}\\n ### ABSTRACT: {abstract}\\n ###EXPLANATION: {explanation}'''\n", - "response_template = '\\n ###EXPLANATION:'\n", - "keys = ['title', 'abstract', 'explanation']\n", - "dataset_name = '' # Provide a Hugging Face dataset name if applicable.\n", - "\n", - "# Set the model name to LLaMA-3 8B. Note that larger models may cause OOM (Out of Memory) errors.\n", - "model_name = 'meta-llama/Meta-Llama-3-8B'\n", - "hf_token = '' # Insert your Hugging Face token here to access the LLaMA-3 model.\n", - "\n", - "from_hf = True # Set to False if using custom data.\n", - "\n", - "# Option to use your own dataset. Change `own_data` to True if you have custom data.\n", - "own_data = False\n", - "if own_data:\n", - " from_hf = False\n", - " data = {} # Insert your custom dataset here.\n", - "\n", - "# Set up the Simplifine client with your API key and GPU type.\n", - "simplifine_api_key = ''\n", - "gpu_type = 'a100' # Options are 'l4' or 'a100'\n", - "\n", - "client = Client(api_key=simplifine_api_key, gpu_type=gpu_type)\n", - "\n", - "# Start the training process for fine-tuning LLaMA-3 8B. Adjust parameters for parallelization if needed.\n", - "client.sft_train_cloud(\n", - " model_name=model_name, \n", - " from_hf=from_hf, \n", - " dataset_name=dataset_name,\n", - " keys=keys,\n", - " template=template, \n", - " job_name='ddp_job',\n", - " response_template=response_template, \n", - " use_zero=True, \n", - " use_ddp=False, \n", - " hf_token=hf_token\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## ๐Ÿ“Š Checking Job Status\n", - "\n", - "After initiating the fine-tuning process, you might want to check the status of your training jobs. The following code will help you extract and display the statuses of the most recent jobs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Retrieve the status of all jobs from the client.\n", - "status = client.get_all_jobs()\n", - "\n", - "# Display the status of the last 5 jobs.\n", - "for num, i in enumerate(status[-5:]):\n", - " print(f'Number {num} status: {i}\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## ๐Ÿ’พ Downloading the Trained Model\n", - "\n", - "Once your fine-tuning job is complete, the next step is to download the trained model. Follow the steps below to create a folder and save the model locally." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "job_id = '' # Get your job ID from the list of job statuses above.\n", - "\n", - "# Create a folder to store the trained model.\n", - "os.mkdir('sf_trained_model_ZeRO')\n", - "\n", - "# Download and save the model to the specified folder.\n", - "# This might take some time, so relax and enjoy a cup of coffee! :)\n", - "client.download_model(job_id=job_id, extract_to='/content/sf_trained_model_ZeRO')\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## ๐Ÿงช Testing Your Fine-Tuned Model\n", - "\n", - "Now that you've downloaded your fine-tuned model, it's time to test it. We'll load the model and tokenizer using the `transformers` library and generate a sample output." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from transformers import AutoModelForCausalLM, AutoTokenizer\n", - "\n", - "# Define the path where the trained model is stored.\n", - "path = '/content/sf_trained_model_ZeRO'\n", - "\n", - "# Load the fine-tuned model and tokenizer.\n", - "sf_model = AutoModelForCausalLM.from_pretrained(path)\n", - "sf_tokenizer = AutoTokenizer.from_pretrained(path)\n", - "\n", - "# Create an example input for the model.\n", - "input_example = '''### TITLE: title 1\\n ### ABSTRACT: abstract 1\\n ###EXPLANATION: '''\n", - "\n", - "# Tokenize the input example.\n", - "input_example = sf_tokenizer(input_example, return_tensors='pt')\n", - "\n", - "# Generate output from the fine-tuned model.\n", - "output = sf_model.generate(input_example['input_ids'],\n", - " attention_mask=input_example['attention_mask'],\n", - " max_length=30,\n", - " eos_token_id=sf_tokenizer.eos_token_id,\n", - " early_stopping=True,\n", - " pad_token_id=sf_tokenizer.eos_token_id)\n", - "\n", - "# Decode and print the generated output.\n", - "print(sf_tokenizer.decode(output[0]))" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "simplifine", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.12.4" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/examples/__pycache__/cont_emb_ft.cpython-312.pyc b/examples/__pycache__/cont_emb_ft.cpython-312.pyc deleted file mode 100644 index 67e14af..0000000 Binary files a/examples/__pycache__/cont_emb_ft.cpython-312.pyc and /dev/null differ diff --git a/examples/cloud_quickstart.ipynb b/examples/cloud_quickstart.ipynb index e839939..80e6a0d 100644 --- a/examples/cloud_quickstart.ipynb +++ b/examples/cloud_quickstart.ipynb @@ -4,10 +4,7 @@ "metadata": { "colab": { "provenance": [], - "machine_shape": "hm", - "gpuType": "L4", - "authorship_tag": "ABX9TyOkySQHV/4UdgZPvyOgwZDq", - "include_colab_link": true + "machine_shape": "hm" }, "kernelspec": { "name": "python3", @@ -16,10 +13,9 @@ "language_info": { "name": "python" }, - "accelerator": "GPU", "widgets": { "application/vnd.jupyter.widget-state+json": { - "11192f5f2a0f419abc73a0b19b778bc8": { + "85b43ab19cda4f72a77fdfd5dc096496": { "model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "model_module_version": "1.5.0", @@ -34,14 +30,14 @@ "_view_name": "HBoxView", "box_style": "", "children": [ - "IPY_MODEL_354f48b02ffe42189cde2221ad1410b4", - "IPY_MODEL_3075ac4886474981aa4ff5346031d59e", - "IPY_MODEL_6b35e9ce3e684dd3aefbc0b413a7f4b5" + "IPY_MODEL_121146c229204799b4d6c7defb3d6474", + "IPY_MODEL_05fba94d8a2c4e1c9beda729c24511cf", + "IPY_MODEL_b34a0424b9c54f20adfcf36fd31c20ec" ], - "layout": "IPY_MODEL_4576c6f8758b4ce0a20e1becfe1f5d41" + "layout": "IPY_MODEL_0897b7e7dd4745338bb516eea9a91459" } }, - "354f48b02ffe42189cde2221ad1410b4": { + "121146c229204799b4d6c7defb3d6474": { "model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "model_module_version": "1.5.0", @@ -56,13 +52,13 @@ "_view_name": "HTMLView", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_b61bd3786bef452b938611ca8f8425ad", + "layout": "IPY_MODEL_a70c9f8d1b0d4d46ac64f06e2edb867b", "placeholder": "โ€‹", - "style": "IPY_MODEL_f81226a58e544f7283f7d8e22e4dffe6", + "style": "IPY_MODEL_0082ad32163248f4a2012837a73d07de", "value": "tokenizer_config.json:โ€‡100%" } }, - "3075ac4886474981aa4ff5346031d59e": { + "05fba94d8a2c4e1c9beda729c24511cf": { "model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "model_module_version": "1.5.0", @@ -78,15 +74,15 @@ "bar_style": "success", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_60a4dc08f4134011910fe5d2740cf821", - "max": 396, + "layout": "IPY_MODEL_42e2d8e7c622491d9c9acd8dd6d59493", + "max": 1289, "min": 0, "orientation": "horizontal", - "style": "IPY_MODEL_b034da9e06be4f6f9f21a4f01f1c85d2", - "value": 396 + "style": "IPY_MODEL_e46fa81470a445d898f60b4afbc52e02", + "value": 1289 } }, - "6b35e9ce3e684dd3aefbc0b413a7f4b5": { + "b34a0424b9c54f20adfcf36fd31c20ec": { "model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "model_module_version": "1.5.0", @@ -101,13 +97,13 @@ "_view_name": "HTMLView", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_3e740156fc704c8fa4046f1c42ba513e", + "layout": "IPY_MODEL_7fbd642afa624c3593a484da4e223f2e", "placeholder": "โ€‹", - "style": "IPY_MODEL_1add250a0ca34f2b83a4cf773d44f433", - "value": "โ€‡396/396โ€‡[00:00<00:00,โ€‡30.9kB/s]" + "style": "IPY_MODEL_826de6d75f7f4642999e32ad61441cc6", + "value": "โ€‡1.29k/1.29kโ€‡[00:00<00:00,โ€‡110kB/s]" } }, - "4576c6f8758b4ce0a20e1becfe1f5d41": { + "0897b7e7dd4745338bb516eea9a91459": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -159,7 +155,7 @@ "width": null } }, - "b61bd3786bef452b938611ca8f8425ad": { + "a70c9f8d1b0d4d46ac64f06e2edb867b": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -211,7 +207,7 @@ "width": null } }, - "f81226a58e544f7283f7d8e22e4dffe6": { + "0082ad32163248f4a2012837a73d07de": { "model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "model_module_version": "1.5.0", @@ -226,7 +222,7 @@ "description_width": "" } }, - "60a4dc08f4134011910fe5d2740cf821": { + "42e2d8e7c622491d9c9acd8dd6d59493": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -278,7 +274,7 @@ "width": null } }, - "b034da9e06be4f6f9f21a4f01f1c85d2": { + "e46fa81470a445d898f60b4afbc52e02": { "model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "model_module_version": "1.5.0", @@ -294,7 +290,7 @@ "description_width": "" } }, - "3e740156fc704c8fa4046f1c42ba513e": { + "7fbd642afa624c3593a484da4e223f2e": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -346,7 +342,7 @@ "width": null } }, - "1add250a0ca34f2b83a4cf773d44f433": { + "826de6d75f7f4642999e32ad61441cc6": { "model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "model_module_version": "1.5.0", @@ -361,7 +357,7 @@ "description_width": "" } }, - "58cd9e66a03e45f5bf5054a9905948b5": { + "4d32eb49961e48f1958fc9bc6a494766": { "model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "model_module_version": "1.5.0", @@ -376,14 +372,14 @@ "_view_name": "HBoxView", "box_style": "", "children": [ - "IPY_MODEL_bff58105f90145869fbf6c5a49399c31", - "IPY_MODEL_d741ff2bc9914811a2cf2016d61b5ab0", - "IPY_MODEL_12b642c93fd34118b6d57ae38b6290ec" + "IPY_MODEL_4f6e9afd59974e31b81b06e95ca77704", + "IPY_MODEL_ed210f3938024db1af4c5f7c42b2fff7", + "IPY_MODEL_fb8aa59768494ac98b6e928f2ca3d99c" ], - "layout": "IPY_MODEL_c50ed3d16c4a4d10a35be8eb80a2198d" + "layout": "IPY_MODEL_ae34dc9c883b42cab7a342beb229f11f" } }, - "bff58105f90145869fbf6c5a49399c31": { + "4f6e9afd59974e31b81b06e95ca77704": { "model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "model_module_version": "1.5.0", @@ -398,13 +394,13 @@ "_view_name": "HTMLView", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_4837b95ac97d40c1a5bab60d72eb34d6", + "layout": "IPY_MODEL_0cbfe2fc9b9f46be8b297d9ecd52beba", "placeholder": "โ€‹", - "style": "IPY_MODEL_8a9ce28a49d84d03b03c9c15e3a126dc", - "value": "tokenizer.json:โ€‡100%" + "style": "IPY_MODEL_a8052d6f4e354f32a8be207154050fbd", + "value": "tokenizer.model:โ€‡100%" } }, - "d741ff2bc9914811a2cf2016d61b5ab0": { + "ed210f3938024db1af4c5f7c42b2fff7": { "model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "model_module_version": "1.5.0", @@ -420,15 +416,15 @@ "bar_style": "success", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_15b34d19cb534e2da4bdb7ce63f26267", - "max": 2113710, + "layout": "IPY_MODEL_c868c5b13be44b3c8acaf1a04ba29444", + "max": 499723, "min": 0, "orientation": "horizontal", - "style": "IPY_MODEL_4db10b1cd4094a78bf59abbbe862602c", - "value": 2113710 + "style": "IPY_MODEL_3db4b88f52d8462cb68265dcddf61139", + "value": 499723 } }, - "12b642c93fd34118b6d57ae38b6290ec": { + "fb8aa59768494ac98b6e928f2ca3d99c": { "model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "model_module_version": "1.5.0", @@ -443,13 +439,13 @@ "_view_name": "HTMLView", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_b340b7510add48ab8cf55f340afc2c15", + "layout": "IPY_MODEL_a25c8ead57594d2aaa103967b5666b1d", "placeholder": "โ€‹", - "style": "IPY_MODEL_c6a85afeffd441a1812d685a132124a8", - "value": "โ€‡2.11M/2.11Mโ€‡[00:01<00:00,โ€‡1.90MB/s]" + "style": "IPY_MODEL_4c6c83f35d25465aa66f59572fb5c109", + "value": "โ€‡500k/500kโ€‡[00:00<00:00,โ€‡26.2MB/s]" } }, - "c50ed3d16c4a4d10a35be8eb80a2198d": { + "ae34dc9c883b42cab7a342beb229f11f": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -501,7 +497,7 @@ "width": null } }, - "4837b95ac97d40c1a5bab60d72eb34d6": { + "0cbfe2fc9b9f46be8b297d9ecd52beba": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -553,7 +549,7 @@ "width": null } }, - "8a9ce28a49d84d03b03c9c15e3a126dc": { + "a8052d6f4e354f32a8be207154050fbd": { "model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "model_module_version": "1.5.0", @@ -568,7 +564,7 @@ "description_width": "" } }, - "15b34d19cb534e2da4bdb7ce63f26267": { + "c868c5b13be44b3c8acaf1a04ba29444": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -620,7 +616,7 @@ "width": null } }, - "4db10b1cd4094a78bf59abbbe862602c": { + "3db4b88f52d8462cb68265dcddf61139": { "model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "model_module_version": "1.5.0", @@ -636,7 +632,7 @@ "description_width": "" } }, - "b340b7510add48ab8cf55f340afc2c15": { + "a25c8ead57594d2aaa103967b5666b1d": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -688,7 +684,7 @@ "width": null } }, - "c6a85afeffd441a1812d685a132124a8": { + "4c6c83f35d25465aa66f59572fb5c109": { "model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "model_module_version": "1.5.0", @@ -703,7 +699,7 @@ "description_width": "" } }, - "70b470dd561c4589b056931f26e8a9bb": { + "793fbbc4eb60486ca9a8dd3a466d30d4": { "model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "model_module_version": "1.5.0", @@ -718,14 +714,14 @@ "_view_name": "HBoxView", "box_style": "", "children": [ - "IPY_MODEL_3ae7713c09f24fe1a3f91cfdd22fbff9", - "IPY_MODEL_5dc27afea8344173a2130b6553eb5be8", - "IPY_MODEL_6fcfc1af885c41a987454c5ee34b945a" + "IPY_MODEL_94e61d92d82446e4b05e55ead678adbb", + "IPY_MODEL_7c758e142ba24f978e3620f06ee28d94", + "IPY_MODEL_cf24deaffcfd4f518595321ce8060443" ], - "layout": "IPY_MODEL_7ca30f56c4c541ab97a17920facb482d" + "layout": "IPY_MODEL_34a12296d04f42a7a4217dc90eb11217" } }, - "3ae7713c09f24fe1a3f91cfdd22fbff9": { + "94e61d92d82446e4b05e55ead678adbb": { "model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "model_module_version": "1.5.0", @@ -740,13 +736,13 @@ "_view_name": "HTMLView", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_af05a5aecb2a4046ac63d7c02b751ea0", + "layout": "IPY_MODEL_72af7044a375436ca9797b8d95a2c809", "placeholder": "โ€‹", - "style": "IPY_MODEL_e46110a64b6345488b19560229db62ce", - "value": "special_tokens_map.json:โ€‡100%" + "style": "IPY_MODEL_b2f3791c26664dcba447762f6f7488a7", + "value": "tokenizer.json:โ€‡100%" } }, - "5dc27afea8344173a2130b6553eb5be8": { + "7c758e142ba24f978e3620f06ee28d94": { "model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "model_module_version": "1.5.0", @@ -762,15 +758,15 @@ "bar_style": "success", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_3ce1b7d5a796496784c72c263e9f3181", - "max": 99, + "layout": "IPY_MODEL_43e20b7b3ba542e3bd712306881cca5f", + "max": 1842767, "min": 0, "orientation": "horizontal", - "style": "IPY_MODEL_757d171f689b43a5afe610e65123904f", - "value": 99 + "style": "IPY_MODEL_7d9540c9bbc54bb39778b85862caac67", + "value": 1842767 } }, - "6fcfc1af885c41a987454c5ee34b945a": { + "cf24deaffcfd4f518595321ce8060443": { "model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "model_module_version": "1.5.0", @@ -785,13 +781,13 @@ "_view_name": "HTMLView", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_57081f159ab44b4ebbc25d805cb3daa4", + "layout": "IPY_MODEL_30c9b1c3f5ff482293a2674d8d106cdf", "placeholder": "โ€‹", - "style": "IPY_MODEL_45bd88ec92094662b0b735e9c62c3938", - "value": "โ€‡99.0/99.0โ€‡[00:00<00:00,โ€‡8.87kB/s]" + "style": "IPY_MODEL_e098b085b9804f559a0c738531d53048", + "value": "โ€‡1.84M/1.84Mโ€‡[00:00<00:00,โ€‡27.6MB/s]" } }, - "7ca30f56c4c541ab97a17920facb482d": { + "34a12296d04f42a7a4217dc90eb11217": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -843,7 +839,7 @@ "width": null } }, - "af05a5aecb2a4046ac63d7c02b751ea0": { + "72af7044a375436ca9797b8d95a2c809": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -895,7 +891,7 @@ "width": null } }, - "e46110a64b6345488b19560229db62ce": { + "b2f3791c26664dcba447762f6f7488a7": { "model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "model_module_version": "1.5.0", @@ -910,7 +906,7 @@ "description_width": "" } }, - "3ce1b7d5a796496784c72c263e9f3181": { + "43e20b7b3ba542e3bd712306881cca5f": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -962,7 +958,7 @@ "width": null } }, - "757d171f689b43a5afe610e65123904f": { + "7d9540c9bbc54bb39778b85862caac67": { "model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "model_module_version": "1.5.0", @@ -978,7 +974,7 @@ "description_width": "" } }, - "57081f159ab44b4ebbc25d805cb3daa4": { + "30c9b1c3f5ff482293a2674d8d106cdf": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -1030,7 +1026,7 @@ "width": null } }, - "45bd88ec92094662b0b735e9c62c3938": { + "e098b085b9804f559a0c738531d53048": { "model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "model_module_version": "1.5.0", @@ -1045,7 +1041,7 @@ "description_width": "" } }, - "4ddc93cc8a1f4f1a8aedae30c5b11179": { + "48a1153f99d1431aa1eed392c2dd73ee": { "model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "model_module_version": "1.5.0", @@ -1060,14 +1056,14 @@ "_view_name": "HBoxView", "box_style": "", "children": [ - "IPY_MODEL_7d7f5635f00a464d9271549eb28120cf", - "IPY_MODEL_c17acdfe7ad44754ab05acaee0c5829e", - "IPY_MODEL_ed7a5a37b14243d5a56610ebb6d9f951" + "IPY_MODEL_7a75bdd3a1494108879b98ae19ab51c7", + "IPY_MODEL_5ec82f8657724e72af779aad205f34f0", + "IPY_MODEL_19ae213a184647e39c97e6de4fcfe58e" ], - "layout": "IPY_MODEL_40a8acc274c54598bc9bc31dfa205b93" + "layout": "IPY_MODEL_17f04516bd00461781391d37eda14bdb" } }, - "7d7f5635f00a464d9271549eb28120cf": { + "7a75bdd3a1494108879b98ae19ab51c7": { "model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "model_module_version": "1.5.0", @@ -1082,13 +1078,13 @@ "_view_name": "HTMLView", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_ffce1b42634f44abb9f6df2827cd5c99", + "layout": "IPY_MODEL_ce666f6bca8e43d582e78cb04405e84d", "placeholder": "โ€‹", - "style": "IPY_MODEL_643a8775f35342edb985cfd29dc963ac", - "value": "Map:โ€‡100%" + "style": "IPY_MODEL_96c3b56b2c564f0787841d7a9e2d1a0f", + "value": "special_tokens_map.json:โ€‡100%" } }, - "c17acdfe7ad44754ab05acaee0c5829e": { + "5ec82f8657724e72af779aad205f34f0": { "model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "model_module_version": "1.5.0", @@ -1104,15 +1100,15 @@ "bar_style": "success", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_4cd9b780722045aa92713f1e0076d09c", - "max": 480, + "layout": "IPY_MODEL_5e2194031c4445a08833e96daca605f6", + "max": 551, "min": 0, "orientation": "horizontal", - "style": "IPY_MODEL_1036434cb0784c6cb6b2b7ffdfa09e4e", - "value": 480 + "style": "IPY_MODEL_3acaea18f63445a8a8e4078203749afc", + "value": 551 } }, - "ed7a5a37b14243d5a56610ebb6d9f951": { + "19ae213a184647e39c97e6de4fcfe58e": { "model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "model_module_version": "1.5.0", @@ -1127,13 +1123,13 @@ "_view_name": "HTMLView", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_5ee1670b508c4e319475bc8284842a51", + "layout": "IPY_MODEL_bf7943992abf410eb4239e81bf0908e2", "placeholder": "โ€‹", - "style": "IPY_MODEL_94f238cd12a94a5e8681c24ae2af59bf", - "value": "โ€‡480/480โ€‡[00:00<00:00,โ€‡14315.13โ€‡examples/s]" + "style": "IPY_MODEL_d9fd4fbf92cc4066ba627f43752692a7", + "value": "โ€‡551/551โ€‡[00:00<00:00,โ€‡34.3kB/s]" } }, - "40a8acc274c54598bc9bc31dfa205b93": { + "17f04516bd00461781391d37eda14bdb": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -1185,7 +1181,7 @@ "width": null } }, - "ffce1b42634f44abb9f6df2827cd5c99": { + "ce666f6bca8e43d582e78cb04405e84d": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -1237,7 +1233,7 @@ "width": null } }, - "643a8775f35342edb985cfd29dc963ac": { + "96c3b56b2c564f0787841d7a9e2d1a0f": { "model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "model_module_version": "1.5.0", @@ -1252,7 +1248,7 @@ "description_width": "" } }, - "4cd9b780722045aa92713f1e0076d09c": { + "5e2194031c4445a08833e96daca605f6": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -1304,7 +1300,7 @@ "width": null } }, - "1036434cb0784c6cb6b2b7ffdfa09e4e": { + "3acaea18f63445a8a8e4078203749afc": { "model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "model_module_version": "1.5.0", @@ -1320,7 +1316,7 @@ "description_width": "" } }, - "5ee1670b508c4e319475bc8284842a51": { + "bf7943992abf410eb4239e81bf0908e2": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -1372,7 +1368,7 @@ "width": null } }, - "94f238cd12a94a5e8681c24ae2af59bf": { + "d9fd4fbf92cc4066ba627f43752692a7": { "model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "model_module_version": "1.5.0", @@ -1387,7 +1383,7 @@ "description_width": "" } }, - "d25d3a7b410045bb9fce43509066c364": { + "618b14b62e9240adb441445ef43dc187": { "model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "model_module_version": "1.5.0", @@ -1402,14 +1398,14 @@ "_view_name": "HBoxView", "box_style": "", "children": [ - "IPY_MODEL_fb3891be3ed74ddc87ce8a1ebc2973b6", - "IPY_MODEL_75461fb629ad415eb3f0ebe12db22696", - "IPY_MODEL_0d2b7fadf52a4994b6d5ddc1e4fd1a9f" + "IPY_MODEL_88ac8b5a97bf4f10892ba89b9aa8bc22", + "IPY_MODEL_ec3e67749fb240049932732f7fc1ec4b", + "IPY_MODEL_04f66b0623bb48be9124418d91cb0a9b" ], - "layout": "IPY_MODEL_a671fb491a6941b5a93bf6a573e694ae" + "layout": "IPY_MODEL_688d4f3db00c48da974c58ef73e5ba8c" } }, - "fb3891be3ed74ddc87ce8a1ebc2973b6": { + "88ac8b5a97bf4f10892ba89b9aa8bc22": { "model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "model_module_version": "1.5.0", @@ -1424,13 +1420,13 @@ "_view_name": "HTMLView", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_7aafc3aeb17c4ef19da98092ddb441f6", + "layout": "IPY_MODEL_853358201c564b5d9f6c8cfbcde10c11", "placeholder": "โ€‹", - "style": "IPY_MODEL_46b260beed2e46cea37938607f90b83e", - "value": "Map:โ€‡100%" + "style": "IPY_MODEL_7269d969455c4bec8b5cac0f7ed35d2c", + "value": "Downloadingโ€‡readme:โ€‡100%" } }, - "75461fb629ad415eb3f0ebe12db22696": { + "ec3e67749fb240049932732f7fc1ec4b": { "model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "model_module_version": "1.5.0", @@ -1446,15 +1442,15 @@ "bar_style": "success", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_7d74ac7da173430eae402359bd8d0f39", - "max": 120, + "layout": "IPY_MODEL_3e7c189588044c9387c67812edc86f24", + "max": 5010, "min": 0, "orientation": "horizontal", - "style": "IPY_MODEL_bbdffc025b814b4298409d4a54530290", - "value": 120 + "style": "IPY_MODEL_d36209a3813d4b52b7806091dfe2a8b6", + "value": 5010 } }, - "0d2b7fadf52a4994b6d5ddc1e4fd1a9f": { + "04f66b0623bb48be9124418d91cb0a9b": { "model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "model_module_version": "1.5.0", @@ -1469,13 +1465,13 @@ "_view_name": "HTMLView", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_6b40e05732a848f6a3d7f296f94f2e59", + "layout": "IPY_MODEL_c17b106000c9452fb7702b1d583040b3", "placeholder": "โ€‹", - "style": "IPY_MODEL_2e15cf85703042af9ad22ad0b69e3d9a", - "value": "โ€‡120/120โ€‡[00:00<00:00,โ€‡5131.01โ€‡examples/s]" + "style": "IPY_MODEL_aa027bc42cba4c178af51be5775d33a5", + "value": "โ€‡5.01k/5.01kโ€‡[00:00<00:00,โ€‡406kB/s]" } }, - "a671fb491a6941b5a93bf6a573e694ae": { + "688d4f3db00c48da974c58ef73e5ba8c": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -1527,7 +1523,7 @@ "width": null } }, - "7aafc3aeb17c4ef19da98092ddb441f6": { + "853358201c564b5d9f6c8cfbcde10c11": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -1579,7 +1575,7 @@ "width": null } }, - "46b260beed2e46cea37938607f90b83e": { + "7269d969455c4bec8b5cac0f7ed35d2c": { "model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "model_module_version": "1.5.0", @@ -1594,7 +1590,7 @@ "description_width": "" } }, - "7d74ac7da173430eae402359bd8d0f39": { + "3e7c189588044c9387c67812edc86f24": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -1646,7 +1642,7 @@ "width": null } }, - "bbdffc025b814b4298409d4a54530290": { + "d36209a3813d4b52b7806091dfe2a8b6": { "model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "model_module_version": "1.5.0", @@ -1662,7 +1658,7 @@ "description_width": "" } }, - "6b40e05732a848f6a3d7f296f94f2e59": { + "c17b106000c9452fb7702b1d583040b3": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -1714,7 +1710,7 @@ "width": null } }, - "2e15cf85703042af9ad22ad0b69e3d9a": { + "aa027bc42cba4c178af51be5775d33a5": { "model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "model_module_version": "1.5.0", @@ -1729,7 +1725,7 @@ "description_width": "" } }, - "a9e0a37a1ba341c2b1448f7a5213e463": { + "b9741487d3254932a3d7ada3eeefc595": { "model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "model_module_version": "1.5.0", @@ -1744,14 +1740,14 @@ "_view_name": "HBoxView", "box_style": "", "children": [ - "IPY_MODEL_b898aad6e7004142b4fa13dc3ec5003c", - "IPY_MODEL_7d2ee5b19dbc4147af200d47a87672b7", - "IPY_MODEL_3920b321ba594c8292fc25610b776cdd" + "IPY_MODEL_48014c0699df4921ad0bd436578f5f78", + "IPY_MODEL_c16ad9c1067e4c14872be1dde19aa68c", + "IPY_MODEL_a679bfa91d9c4ba5b0b766dd21ac7bdf" ], - "layout": "IPY_MODEL_fdb0ef5e8ee841d2b7c1eab760d8c285" + "layout": "IPY_MODEL_28017307068642d9846314f22aba8dd9" } }, - "b898aad6e7004142b4fa13dc3ec5003c": { + "48014c0699df4921ad0bd436578f5f78": { "model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "model_module_version": "1.5.0", @@ -1766,13 +1762,13 @@ "_view_name": "HTMLView", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_36ad1c4fe2c1467fa440e94ba553e306", + "layout": "IPY_MODEL_b67317eb1552400ab120b334d2692020", "placeholder": "โ€‹", - "style": "IPY_MODEL_0980f602e791476ebd37e9ed2d284543", - "value": "config.json:โ€‡100%" + "style": "IPY_MODEL_2020c1a4a675488ca8265d62e7e58f29", + "value": "Downloadingโ€‡data:โ€‡100%" } }, - "7d2ee5b19dbc4147af200d47a87672b7": { + "c16ad9c1067e4c14872be1dde19aa68c": { "model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "model_module_version": "1.5.0", @@ -1788,15 +1784,15 @@ "bar_style": "success", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_292e081ea45e4a8fa28870b88badaabf", - "max": 569, + "layout": "IPY_MODEL_d0ca10eb9f2544eb9b332d919aac0fbf", + "max": 43101, "min": 0, "orientation": "horizontal", - "style": "IPY_MODEL_eb1d56cb9201467598a673492ec27e45", - "value": 569 + "style": "IPY_MODEL_54eaff9e53a143098bb3e77f2d824268", + "value": 43101 } }, - "3920b321ba594c8292fc25610b776cdd": { + "a679bfa91d9c4ba5b0b766dd21ac7bdf": { "model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "model_module_version": "1.5.0", @@ -1811,13 +1807,13 @@ "_view_name": "HTMLView", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_007dc1327c7a4a04af370ebb8602a439", + "layout": "IPY_MODEL_610036373632486fbee44beb1e2e526d", "placeholder": "โ€‹", - "style": "IPY_MODEL_f955f4f8567a4eafa7e4615a8c7cff1d", - "value": "โ€‡569/569โ€‡[00:00<00:00,โ€‡46.2kB/s]" + "style": "IPY_MODEL_6a9d87e1c39a4e59b20315bdfd57e799", + "value": "โ€‡43.1k/43.1kโ€‡[00:00<00:00,โ€‡259kB/s]" } }, - "fdb0ef5e8ee841d2b7c1eab760d8c285": { + "28017307068642d9846314f22aba8dd9": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -1869,7 +1865,7 @@ "width": null } }, - "36ad1c4fe2c1467fa440e94ba553e306": { + "b67317eb1552400ab120b334d2692020": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -1921,7 +1917,7 @@ "width": null } }, - "0980f602e791476ebd37e9ed2d284543": { + "2020c1a4a675488ca8265d62e7e58f29": { "model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "model_module_version": "1.5.0", @@ -1936,7 +1932,7 @@ "description_width": "" } }, - "292e081ea45e4a8fa28870b88badaabf": { + "d0ca10eb9f2544eb9b332d919aac0fbf": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -1988,7 +1984,7 @@ "width": null } }, - "eb1d56cb9201467598a673492ec27e45": { + "54eaff9e53a143098bb3e77f2d824268": { "model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "model_module_version": "1.5.0", @@ -2004,7 +2000,7 @@ "description_width": "" } }, - "007dc1327c7a4a04af370ebb8602a439": { + "610036373632486fbee44beb1e2e526d": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -2056,7 +2052,7 @@ "width": null } }, - "f955f4f8567a4eafa7e4615a8c7cff1d": { + "6a9d87e1c39a4e59b20315bdfd57e799": { "model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "model_module_version": "1.5.0", @@ -2071,7 +2067,7 @@ "description_width": "" } }, - "af4fad602c3444078591c65aa90fd06f": { + "c462787a34024ffd84a048697906a5da": { "model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "model_module_version": "1.5.0", @@ -2086,14 +2082,14 @@ "_view_name": "HBoxView", "box_style": "", "children": [ - "IPY_MODEL_25a64611142247829fc1f692defb6a0b", - "IPY_MODEL_270be6c8fe164910897a67de83f6fc43", - "IPY_MODEL_187ffe00356e4c69a08d2478c3a24e7c" + "IPY_MODEL_ce13114c190a4580a88b7d87397799cb", + "IPY_MODEL_5c41598c9c134bb2a5e6d7ecc8180780", + "IPY_MODEL_c76946151342424b9f1c247b8d822c9f" ], - "layout": "IPY_MODEL_f02188f57f6e4b039a3cbe2f58b807da" + "layout": "IPY_MODEL_a76e818268294b72b2eb4f1b3e8e88a0" } }, - "25a64611142247829fc1f692defb6a0b": { + "ce13114c190a4580a88b7d87397799cb": { "model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "model_module_version": "1.5.0", @@ -2108,13 +2104,13 @@ "_view_name": "HTMLView", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_44c01104c64d4b0d9a6e8b7de34b9ff5", + "layout": "IPY_MODEL_60de8ac15a4a460fa518fe05fde546b7", "placeholder": "โ€‹", - "style": "IPY_MODEL_baa2674453624873813daa3a2a65b833", - "value": "model.safetensors:โ€‡100%" + "style": "IPY_MODEL_be8caf785f124615a83a068226c11fc0", + "value": "Generatingโ€‡trainโ€‡split:โ€‡100%" } }, - "270be6c8fe164910897a67de83f6fc43": { + "5c41598c9c134bb2a5e6d7ecc8180780": { "model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "model_module_version": "1.5.0", @@ -2130,15 +2126,15 @@ "bar_style": "success", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_58d0645431ff4988948500f03fa4ba6c", - "max": 374998696, + "layout": "IPY_MODEL_8b481231e25b469d912c89d25f2e130a", + "max": 492, "min": 0, "orientation": "horizontal", - "style": "IPY_MODEL_51cdb01e0e3f433c8874d063305556ad", - "value": 374998696 + "style": "IPY_MODEL_b741b85fa6e14d318ddd2ea5b318f801", + "value": 492 } }, - "187ffe00356e4c69a08d2478c3a24e7c": { + "c76946151342424b9f1c247b8d822c9f": { "model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "model_module_version": "1.5.0", @@ -2153,13 +2149,13 @@ "_view_name": "HTMLView", "description": "", "description_tooltip": null, - "layout": "IPY_MODEL_6bd0cd8260a941e38af15585939324b1", + "layout": "IPY_MODEL_a0e6f8b69797457481f7ff2b785ca099", "placeholder": "โ€‹", - "style": "IPY_MODEL_09307b7d15194ebfa8ec6dfda2325ced", - "value": "โ€‡375M/375Mโ€‡[00:00<00:00,โ€‡400MB/s]" + "style": "IPY_MODEL_ebe980a4978b422e991efc4c358fef14", + "value": "โ€‡492/492โ€‡[00:00<00:00,โ€‡20898.25โ€‡examples/s]" } }, - "f02188f57f6e4b039a3cbe2f58b807da": { + "a76e818268294b72b2eb4f1b3e8e88a0": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -2211,7 +2207,7 @@ "width": null } }, - "44c01104c64d4b0d9a6e8b7de34b9ff5": { + "60de8ac15a4a460fa518fe05fde546b7": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -2263,7 +2259,7 @@ "width": null } }, - "baa2674453624873813daa3a2a65b833": { + "be8caf785f124615a83a068226c11fc0": { "model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "model_module_version": "1.5.0", @@ -2278,7 +2274,7 @@ "description_width": "" } }, - "58d0645431ff4988948500f03fa4ba6c": { + "8b481231e25b469d912c89d25f2e130a": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -2330,7 +2326,7 @@ "width": null } }, - "51cdb01e0e3f433c8874d063305556ad": { + "b741b85fa6e14d318ddd2ea5b318f801": { "model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "model_module_version": "1.5.0", @@ -2346,7 +2342,7 @@ "description_width": "" } }, - "6bd0cd8260a941e38af15585939324b1": { + "a0e6f8b69797457481f7ff2b785ca099": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", @@ -2398,7 +2394,7 @@ "width": null } }, - "09307b7d15194ebfa8ec6dfda2325ced": { + "ebe980a4978b422e991efc4c358fef14": { "model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "model_module_version": "1.5.0", @@ -2412,897 +2408,3622 @@ "_view_name": "StyleView", "description_width": "" } - } - } - } - }, - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "view-in-github", - "colab_type": "text" - }, - "source": [ - "\"Open" - ] - }, - { - "cell_type": "markdown", - "source": [ - "#Simplifine-tuning your LLMs! ๐Ÿ’ซ\n", - "\n", - "This is a quick guide on getting started with Simplifine!\n", - "\n", - "Below is an example of sending a supervised fine-tuning job to Simplifine's hosted servers.\n", - "\n", - "First, we start by downloading Simplifine's latest version from github." - ], - "metadata": { - "id": "wMs0O0fMLbkY" - } - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "GbF02GaxLSw8", - "outputId": "f74ee822-ca88-42c5-f028-aea13c96f8ac" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - " Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n", - " Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n", - " Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m37.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m547.8/547.8 kB\u001b[0m \u001b[31m2.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m64.9/64.9 kB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m337.0/337.0 kB\u001b[0m \u001b[31m28.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m296.4/296.4 kB\u001b[0m \u001b[31m26.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m232.6/232.6 kB\u001b[0m \u001b[31m18.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m227.1/227.1 kB\u001b[0m \u001b[31m21.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m245.8/245.8 kB\u001b[0m \u001b[31m23.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m6.8/6.8 MB\u001b[0m \u001b[31m108.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m116.3/116.3 kB\u001b[0m \u001b[31m11.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m316.1/316.1 kB\u001b[0m \u001b[31m26.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m207.3/207.3 kB\u001b[0m \u001b[31m18.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m75.6/75.6 kB\u001b[0m \u001b[31m7.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m77.9/77.9 kB\u001b[0m \u001b[31m8.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m39.9/39.9 MB\u001b[0m \u001b[31m54.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m303.6/303.6 kB\u001b[0m \u001b[31m25.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m103.4/103.4 kB\u001b[0m \u001b[31m11.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m54.0/54.0 kB\u001b[0m \u001b[31m4.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m12.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m307.2/307.2 kB\u001b[0m \u001b[31m26.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m194.1/194.1 kB\u001b[0m \u001b[31m18.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m62.7/62.7 kB\u001b[0m \u001b[31m5.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m21.3/21.3 MB\u001b[0m \u001b[31m89.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m5.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25h Building wheel for simplifine-alpha (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", - " Building wheel for deepspeed (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", - "cudf-cu12 24.4.1 requires pyarrow<15.0.0a0,>=14.0.1, but you have pyarrow 17.0.0 which is incompatible.\n", - "gcsfs 2024.6.1 requires fsspec==2024.6.1, but you have fsspec 2024.5.0 which is incompatible.\n", - "google-colab 1.0.0 requires requests==2.31.0, but you have requests 2.32.3 which is incompatible.\n", - "ibis-framework 8.0.0 requires pyarrow<16,>=2, but you have pyarrow 17.0.0 which is incompatible.\u001b[0m\u001b[31m\n", - "\u001b[0m" - ] - } - ], - "source": [ - "!pip install git+https://github.com/simplifine-llm/Simplifine.git -q" - ] - }, - { - "cell_type": "markdown", - "source": [ - "Supervised fine tuning is a useful method to fine-tune a model for generating formatted answers, based on the provided input.\n", - "\n", - "An example would be to generate an answer based on provided context.\n", - "\n", - "An example would be:\n", - "\n", - "QUESTION: What is the capital France?\n", - "\n", - "CONTEXT: France has had its capital as Paris for some time now!\n", - "\n", - "ANSWER: Paris is the capital of France.\n", - "\n", - "In this example, you would want the model to fill in for the answer, having provided it with the question and context.\n", - "\n", - "In this example, an arbitrary dataset will be used. We will use the following prompt template:\n", - "\n", - "```\n", - "'''### TITLE: {title}\\n ### ABSTRACT: {abstract}\\n ###EXPLANATION: {explanation}'''\n", - "```\n", - "\n", - "Then as mentioned, we want the model to fill in the text for answer, so we asign this to a response template:\n", - "\n", - "\n", - "\n", - "```\n", - "response_template='\\n ###EXPLANATION:'\n", - "```\n", - "\n", - "In the example below, we use our own dataset. This dataset should be a python dictionary, which should include the keys that are required to populate the template you provided. You can also use any dataset hosted on huggingface (some require authentication/tokens)" - ], - "metadata": { - "id": "_hSHffSjLhSG" - } - }, - { - "cell_type": "code", - "source": [ - "from simplifine_alpha import train_engine\n", - "import wandb\n", - "import os\n", - "\n", - "# disabling WandB logging, change if you'd like to have one.\n", - "# Note that you will need a wandb token.\n", - "wandb.init(mode='disabled')\n", - "\n", - "# You can provided a HF dataset name.\n", - "# be sure to change the keys, response template and tempalte accordingly.\n", - "template = '''### TITLE: {title}\\n ### ABSTRACT: {abstract}\\n ###EXPLANATION: {explanation}'''\n", - "response_template='\\n ###EXPLANATION:'\n", - "keys = ['title', 'abstract', 'explanation']\n", - "dataset_name=''\n", - "\n", - "# you can change the model. bigger models might throw OOM errors.\n", - "model_name = 'EleutherAI/pythia-160m'\n", - "\n", - "from_hf = True\n", - "if True: # change this if you want to try this on a dataset on huggingface!\n", - " from_hf = False\n", - " data = {\n", - " 'title':['title 1', 'title 2', 'title 3']*200,\n", - " 'abstract':['abstract 1', 'abstract 2', 'abstract 3']*200,\n", - " 'explanation':['explanation 1', 'explanation 2', 'explanation 3']*200\n", - " }\n", - "\n", - "train_engine.hf_sft(model_name, from_hf=from_hf, dataset_name=dataset_name,\n", - " keys = keys, data = data,\n", - " template = template,\n", - " response_template=response_template, zero=False, ddp=False, gradient_accumulation_steps=4, fp16=True, max_seq_length=2048)" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 623, - "referenced_widgets": [ - "11192f5f2a0f419abc73a0b19b778bc8", - "354f48b02ffe42189cde2221ad1410b4", - "3075ac4886474981aa4ff5346031d59e", - "6b35e9ce3e684dd3aefbc0b413a7f4b5", - "4576c6f8758b4ce0a20e1becfe1f5d41", - "b61bd3786bef452b938611ca8f8425ad", - "f81226a58e544f7283f7d8e22e4dffe6", - "60a4dc08f4134011910fe5d2740cf821", - "b034da9e06be4f6f9f21a4f01f1c85d2", - "3e740156fc704c8fa4046f1c42ba513e", - "1add250a0ca34f2b83a4cf773d44f433", - "58cd9e66a03e45f5bf5054a9905948b5", - "bff58105f90145869fbf6c5a49399c31", - "d741ff2bc9914811a2cf2016d61b5ab0", - "12b642c93fd34118b6d57ae38b6290ec", - "c50ed3d16c4a4d10a35be8eb80a2198d", - "4837b95ac97d40c1a5bab60d72eb34d6", - "8a9ce28a49d84d03b03c9c15e3a126dc", - "15b34d19cb534e2da4bdb7ce63f26267", - "4db10b1cd4094a78bf59abbbe862602c", - "b340b7510add48ab8cf55f340afc2c15", - "c6a85afeffd441a1812d685a132124a8", - "70b470dd561c4589b056931f26e8a9bb", - "3ae7713c09f24fe1a3f91cfdd22fbff9", - "5dc27afea8344173a2130b6553eb5be8", - "6fcfc1af885c41a987454c5ee34b945a", - "7ca30f56c4c541ab97a17920facb482d", - "af05a5aecb2a4046ac63d7c02b751ea0", - "e46110a64b6345488b19560229db62ce", - "3ce1b7d5a796496784c72c263e9f3181", - "757d171f689b43a5afe610e65123904f", - "57081f159ab44b4ebbc25d805cb3daa4", - "45bd88ec92094662b0b735e9c62c3938", - "4ddc93cc8a1f4f1a8aedae30c5b11179", - "7d7f5635f00a464d9271549eb28120cf", - "c17acdfe7ad44754ab05acaee0c5829e", - "ed7a5a37b14243d5a56610ebb6d9f951", - "40a8acc274c54598bc9bc31dfa205b93", - "ffce1b42634f44abb9f6df2827cd5c99", - "643a8775f35342edb985cfd29dc963ac", - "4cd9b780722045aa92713f1e0076d09c", - "1036434cb0784c6cb6b2b7ffdfa09e4e", - "5ee1670b508c4e319475bc8284842a51", - "94f238cd12a94a5e8681c24ae2af59bf", - "d25d3a7b410045bb9fce43509066c364", - "fb3891be3ed74ddc87ce8a1ebc2973b6", - "75461fb629ad415eb3f0ebe12db22696", - "0d2b7fadf52a4994b6d5ddc1e4fd1a9f", - "a671fb491a6941b5a93bf6a573e694ae", - "7aafc3aeb17c4ef19da98092ddb441f6", - "46b260beed2e46cea37938607f90b83e", - "7d74ac7da173430eae402359bd8d0f39", - "bbdffc025b814b4298409d4a54530290", - "6b40e05732a848f6a3d7f296f94f2e59", - "2e15cf85703042af9ad22ad0b69e3d9a", - "a9e0a37a1ba341c2b1448f7a5213e463", - "b898aad6e7004142b4fa13dc3ec5003c", - "7d2ee5b19dbc4147af200d47a87672b7", - "3920b321ba594c8292fc25610b776cdd", - "fdb0ef5e8ee841d2b7c1eab760d8c285", - "36ad1c4fe2c1467fa440e94ba553e306", - "0980f602e791476ebd37e9ed2d284543", - "292e081ea45e4a8fa28870b88badaabf", - "eb1d56cb9201467598a673492ec27e45", - "007dc1327c7a4a04af370ebb8602a439", - "f955f4f8567a4eafa7e4615a8c7cff1d", - "af4fad602c3444078591c65aa90fd06f", - "25a64611142247829fc1f692defb6a0b", - "270be6c8fe164910897a67de83f6fc43", - "187ffe00356e4c69a08d2478c3a24e7c", - "f02188f57f6e4b039a3cbe2f58b807da", - "44c01104c64d4b0d9a6e8b7de34b9ff5", - "baa2674453624873813daa3a2a65b833", - "58d0645431ff4988948500f03fa4ba6c", - "51cdb01e0e3f433c8874d063305556ad", - "6bd0cd8260a941e38af15585939324b1", - "09307b7d15194ebfa8ec6dfda2325ced" - ] - }, - "id": "ctDa_6sMLiY9", - "outputId": "517bdad1-540b-4ad3-e595-8efc3186a55b" - }, - "execution_count": 2, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "[2024-07-28 18:09:39,647] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", - "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", - "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n", - "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n", - "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n", - "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3\n", - "\u001b[93m [WARNING] \u001b[0m using untested triton version (2.3.1), only 1.0.0 is known to be compatible\n" - ] - }, - { - "output_type": "stream", - "name": "stderr", - "text": [ - "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: \n", - "The secret `HF_TOKEN` does not exist in your Colab secrets.\n", - "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n", - "You will be able to reuse this secret in all of your notebooks.\n", - "Please note that authentication is recommended but still optional to access public models or datasets.\n", - " warnings.warn(\n" - ] - }, - { - "output_type": "display_data", - "data": { - "text/plain": [ - "tokenizer_config.json: 0%| | 0.00/396 [00:00" - ], - "text/html": [ - "\n", - "
\n", - " \n", - " \n", - " [360/360 01:17, Epoch 3/3]\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
StepTraining Loss

" - ] - }, - "metadata": {} - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "Testing the model's generation after training.\n", - "The simplifine trainer saves the final model in a folder in output_dir called \"final_model\"." - ], - "metadata": { - "id": "fWjphDgAMxuk" - } - }, - { - "cell_type": "code", - "source": [ - "from transformers import AutoModelForCausalLM, AutoTokenizer\n", - "\n", - "# This the path that the model and other relevant files are saved to.\n", - "# this is the default folder name in the trainer.\n", - "# The final checkpoint is saved under final_model.\n", - "path = '/content/sft_output/final_model'\n", - "sf_model = AutoModelForCausalLM.from_pretrained(path)\n", - "sf_tokenizer = AutoTokenizer.from_pretrained(path)\n", - "\n", - "# an example following the arbitrary training data\n", - "input_example = '''### TITLE: title 1\\n ### ABSTRACT: abstract 1\\n ###EXPLANATION: '''\n", - "\n", - "input_example = sf_tokenizer(input_example, return_tensors='pt')\n", - "\n", - "output = sf_model.generate(input_example['input_ids'],\n", - " attention_mask=input_example['attention_mask'],\n", - " max_length=30,eos_token_id=sf_tokenizer.eos_token_id,\n", - " early_stopping=True,\n", - " pad_token_id=sf_tokenizer.eos_token_id\n", - ")\n", - "\n", - "print(sf_tokenizer.decode(output[0]))" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" + "dddd99644ee94fddb4f8b3b436a5f3c4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } }, - "id": "pYv_RPZWMzdD", - "outputId": "7d6b8222-94de-4a61-9f48-dd69cc5d846f" - }, - "execution_count": 3, - "outputs": [ - { - "output_type": "stream", - "name": "stderr", - "text": [ - "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n", - "/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:588: UserWarning: `num_beams` is set to 1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.\n", - " warnings.warn(\n" - ] + "028492fb8b4d483cb13d45491353fd66": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } }, - { - "output_type": "stream", - "name": "stdout", - "text": [ - "### TITLE: title 1\n", - " ### ABSTRACT: abstract 1\n", - " ###EXPLANATION: explanation 1 3 explanation 1 3\n" - ] + "0fed45330b9b471185fe6d427ecf51f6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } } - ] - }, - { - "cell_type": "markdown", - "source": [ - "## Using Simplifine's GPU clusters\n", - "\n", - "In the example above, a small Pythia model (160m parameters) on a L4 GPU. Note that we do not use any adapters e.g. LoRA.\n", - "In the next step, we show how simplifine allows to carry out the same thing, but on GPU clusters. This will use functions of train_utils.\n", - "\n", - "By using this command, you can manually pass the parallelization method.\n", - "\n", - "If you have a model that is small enough, try using DDP. In this method, each processor (fansy word for GPU!) has a replica of the model and attends to a different sample.\n", - "\n", - "You can also utilize ZeRO from DeepSpeed. With this, you can shard the model parameters, activation states and gradients across the GPUs. You also have the option to offload some to CPUs, at the expense of lower throughput.\n", - "\n", - "**NOTE**: we currently support L4 and A100 gpus. When initilising the client, you can define which GPU you would like to run your job on. each server goes up to 8 gpus. The default is L4 GPUs." - ], - "metadata": { - "id": "J0OQyt44M6Ei" } - }, + } + }, + "cells": [ { "cell_type": "markdown", "source": [ - "# Using DDP to train\n", - "The example below uses DDP to distribute the training process.\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/simplifine-llm/Simplifine/blob/main/examples/cloud_quickstart.ipynb)", "\n", + "### ๐Ÿ“ฆ Installing Required Libraries\n", "\n", - "you would need a simplifine API key. contact us for one for free! :)\n", + "Before we begin fine-tuning our fake news detector, we need to install the necessary libraries. In this step, weโ€™re installing the `Simplifine` library, which provides tools to streamline the fine-tuning process for large language models. Weโ€™re also installing the `datasets` library, which allows us to easily access and manage datasets from Hugging Face.\n", "\n", - "see contact details at our github repo at https://github.com/simplifine-llm/Simplifine/tree/main" - ], - "metadata": { - "id": "csLAwuVmM8Va" - } - }, - { - "cell_type": "code", - "source": [ - "from simplifine_alpha.train_utils import Client\n", - "\n", - "# setting up the client with\n", - "# enter your simplifine api key below\n", - "api_key = ''\n", - "gpu_type = 'a100' # l4 or a100\n", - "client = Client(api_key=api_key, gpu_type=gpu_type)\n", - "\n", - "# simply pass all the arguements you used above, and change ddp ot zero if you want parallelization.\n", - "client.sft_train_cloud(model_name = model_name, from_hf=from_hf, dataset_name=dataset_name,\n", - " keys = keys, data = data,\n", - " template = template, job_name='ddp_job',\n", - " response_template=response_template, use_zero=False, use_ddp=True)" - ], - "metadata": { - "id": "ynn-NEDEM5qU" - }, - "execution_count": 5, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "After sending the query, you can check the status of your jobs. Note that the status is one of the three options:\n", - "```text\n", - "status = complete|in progress|pending\n", - "```" + "- The `Simplifine` library helps in making the fine-tuning process more efficient, whether you're working locally or in the cloud.\n", + "- The `datasets` library is essential for loading and processing the dataset we'll be using for this project.\n", + "\n", + "Running this cell will install both libraries quietly in the background.\n" ], "metadata": { - "id": "I0cXnfYQPogc" + "id": "0SClYIzAQrpD" } }, { "cell_type": "code", - "source": [ - "status = client.get_all_jobs()\n", - "for num,i in enumerate(status[-5:]):\n", - " print(f'Job {num}: {i}')" - ], + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, - "id": "AulHnk5-Pqh8", - "outputId": "375a3336-2ecf-46d1-97e2-f4df5b003686" + "id": "lxDXEqYrw-gh", + "outputId": "b768c964-b87d-41da-f121-0e83373fbdac" }, - "execution_count": 6, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "Job 0: {'job_id': '544bb4f0-206f-43b7-850e-5e1e9f7b4d23', 'job_name': 'job-4', 'status': 'completed'}\n", - "Job 1: {'job_id': 'bde91132-9776-41ae-89f9-855dfb116a91', 'job_name': 'ddp_job', 'status': 'completed'}\n", - "Job 2: {'job_id': 'a1ff54dd-5ee2-4e35-9e78-6868f63dad37', 'job_name': 'zero_example_cloud', 'status': 'completed'}\n", - "Job 3: {'job_id': '543d3bc3-3ce4-4af6-9f9a-6c0823dcc9b0', 'job_name': 'ddp_job', 'status': 'in progress'}\n", - "Job 4: {'job_id': '5d55d46a-7793-4c06-9cef-279f03a0f953', 'job_name': 'job_1', 'status': 'pending'}\n" + " Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n", + " Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n", + " Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m6.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m547.8/547.8 kB\u001b[0m \u001b[31m249.1 kB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m360.4/360.4 kB\u001b[0m \u001b[31m6.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m296.4/296.4 kB\u001b[0m \u001b[31m6.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m232.6/232.6 kB\u001b[0m \u001b[31m12.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m227.1/227.1 kB\u001b[0m \u001b[31m11.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m245.8/245.8 kB\u001b[0m \u001b[31m12.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m6.8/6.8 MB\u001b[0m \u001b[31m51.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m116.3/116.3 kB\u001b[0m \u001b[31m6.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m316.1/316.1 kB\u001b[0m \u001b[31m13.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m207.3/207.3 kB\u001b[0m \u001b[31m11.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m75.6/75.6 kB\u001b[0m \u001b[31m3.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m77.9/77.9 kB\u001b[0m \u001b[31m4.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m318.9/318.9 kB\u001b[0m \u001b[31m16.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m39.9/39.9 MB\u001b[0m \u001b[31m44.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m301.8/301.8 kB\u001b[0m \u001b[31m12.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m103.4/103.4 kB\u001b[0m \u001b[31m4.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m54.0/54.0 kB\u001b[0m \u001b[31m1.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m5.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m307.2/307.2 kB\u001b[0m \u001b[31m13.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m194.1/194.1 kB\u001b[0m \u001b[31m11.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m62.7/62.7 kB\u001b[0m \u001b[31m3.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90mโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m2.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h Building wheel for simplifine-alpha (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", + " Building wheel for deepspeed (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", + "cudf-cu12 24.4.1 requires pyarrow<15.0.0a0,>=14.0.1, but you have pyarrow 17.0.0 which is incompatible.\n", + "gcsfs 2024.6.1 requires fsspec==2024.6.1, but you have fsspec 2024.5.0 which is incompatible.\n", + "ibis-framework 8.0.0 requires pyarrow<16,>=2, but you have pyarrow 17.0.0 which is incompatible.\u001b[0m\u001b[31m\n", + "\u001b[0m" ] } + ], + "source": [ + "!pip install git+https://github.com/simplifine-llm/Simplifine.git -q\n", + "!pip install datasets -q" ] }, { "cell_type": "markdown", "source": [ - "You can also stop an ongoing job, by calling the function below" + "### ๐Ÿ› ๏ธ Setting Up for Local Training\n", + "\n", + "In this section, weโ€™re preparing to fine-tune our fake news detector model using Google Colabโ€™s resources. The steps below outline how to configure and initiate the training process.\n", + "\n", + "1. **Importing Libraries:**\n", + " - We import `train_engine` from the `Simplifine` library, which provides the necessary functions to handle the fine-tuning process.\n", + " - We also import `SFTConfig` from the `trl` library, which allows us to configure the supervised fine-tuning parameters.\n", + "\n", + "2. **Dataset Selection:**\n", + " - We define the dataset name as `'community-datasets/fake_news_english'`. This dataset contains examples of fake news articles that we will use to fine-tune our model.\n", + "\n", + "3. **Prompt Configuration:**\n", + " - We create a `sftPromptConfig` object to specify how the training data is formatted.\n", + " - The `template` parameter defines the input format, and the `response_template` specifies how the model should generate outputs.\n", + " - The `use_chat_template` flag is set to `True` to format the inputs in a conversational style, which can be effective for chat-based models.\n", + "\n", + "4. **Training Configuration:**\n", + " - We define the training settings using `SFTConfig`. This includes parameters like batch size, learning rate, and the number of epochs.\n", + " - We also enable `fp16` (16-bit floating-point) training for faster computation and set `gradient_checkpointing` to save memory during training.\n", + "\n", + "5. **Model Selection:**\n", + " - The model weโ€™re fine-tuning is `'TinyLlama/TinyLlama-1.1B-Chat-v1.0'`. This is a smaller, efficient model suitable for demonstration purposes on Colab.\n", + "\n", + "6. **Training the Model:**\n", + " - Finally, we call `sft_train` to start the fine-tuning process. This step will take a while to complete, as weโ€™re training the model from scratch without any optimizations like quantization or LoRA.\n", + "\n", + "Running this cell will fine-tune the model locally on Colab, using the configurations weโ€™ve set up. This is ideal for quick experiments or when cloud resources are not available." ], "metadata": { - "id": "nPgJBtDbXola" + "id": "C0dDwmg4Rb3N" } }, { "cell_type": "code", "source": [ - "stop_running_job = False\n", - "if stop_running_job:\n", - " job_id = status[-1]['job_id']\n", - " client.stop_job(job_id)" - ], - "metadata": { - "id": "AG5HVYBAXr0w" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "# getting the job_id of the last job\n", - "job_id = status[-1]['job_id']\n", + "from simplifine_alpha import train_engine\n", + "from trl import SFTConfig\n", "\n", - "logs = client.get_train_logs(job_id)\n", - "print(logs['response'])" + "dataset_name = 'community-datasets/fake_news_english'\n", + "\n", + "# defining prompt config\n", + "sft_prompt_config = train_engine.sftPromptConfig(\n", + " keys = ['url_of_article', 'fake_or_satire'],\n", + " template = \"###URL: {url_of_article}. \\n###CLS: {fake_or_satire}\",\n", + " response_template = '. \\n###CLS: ',\n", + " use_chat_template=True\n", + " )\n", + "\n", + "# defining training config\n", + "sft_config = SFTConfig(\n", + " output_dir='/content/fake_news_english_phi3',\n", + " per_device_train_batch_size=1,\n", + " gradient_accumulation_steps=4,\n", + " learning_rate=1e-5,\n", + " num_train_epochs=2,\n", + " report_to='none',\n", + " fp16=True,\n", + " gradient_checkpointing=True,\n", + ")\n", + "\n", + "model_name = 'TinyLlama/TinyLlama-1.1B-Chat-v1.0'\n", + "\n", + "# this is just for demo purposes, this will take a while here, no quantization, no lora...\n", + "train_engine.sft_train(model_name=model_name, dataset_name=dataset_name,\n", + " sft_config = sft_config, sft_prompt_config=sft_prompt_config,\n", + " use_zero=False, use_ddp=False\n", + " )" ], "metadata": { "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 1000, + "referenced_widgets": [ + "85b43ab19cda4f72a77fdfd5dc096496", + "121146c229204799b4d6c7defb3d6474", + "05fba94d8a2c4e1c9beda729c24511cf", + "b34a0424b9c54f20adfcf36fd31c20ec", + "0897b7e7dd4745338bb516eea9a91459", + "a70c9f8d1b0d4d46ac64f06e2edb867b", + "0082ad32163248f4a2012837a73d07de", + "42e2d8e7c622491d9c9acd8dd6d59493", + "e46fa81470a445d898f60b4afbc52e02", + "7fbd642afa624c3593a484da4e223f2e", + "826de6d75f7f4642999e32ad61441cc6", + "4d32eb49961e48f1958fc9bc6a494766", + "4f6e9afd59974e31b81b06e95ca77704", + "ed210f3938024db1af4c5f7c42b2fff7", + "fb8aa59768494ac98b6e928f2ca3d99c", + "ae34dc9c883b42cab7a342beb229f11f", + "0cbfe2fc9b9f46be8b297d9ecd52beba", + "a8052d6f4e354f32a8be207154050fbd", + "c868c5b13be44b3c8acaf1a04ba29444", + "3db4b88f52d8462cb68265dcddf61139", + "a25c8ead57594d2aaa103967b5666b1d", + "4c6c83f35d25465aa66f59572fb5c109", + "793fbbc4eb60486ca9a8dd3a466d30d4", + "94e61d92d82446e4b05e55ead678adbb", + "7c758e142ba24f978e3620f06ee28d94", + "cf24deaffcfd4f518595321ce8060443", + "34a12296d04f42a7a4217dc90eb11217", + "72af7044a375436ca9797b8d95a2c809", + "b2f3791c26664dcba447762f6f7488a7", + "43e20b7b3ba542e3bd712306881cca5f", + "7d9540c9bbc54bb39778b85862caac67", + "30c9b1c3f5ff482293a2674d8d106cdf", + "e098b085b9804f559a0c738531d53048", + "48a1153f99d1431aa1eed392c2dd73ee", + "7a75bdd3a1494108879b98ae19ab51c7", + "5ec82f8657724e72af779aad205f34f0", + "19ae213a184647e39c97e6de4fcfe58e", + "17f04516bd00461781391d37eda14bdb", + "ce666f6bca8e43d582e78cb04405e84d", + "96c3b56b2c564f0787841d7a9e2d1a0f", + "5e2194031c4445a08833e96daca605f6", + "3acaea18f63445a8a8e4078203749afc", + "bf7943992abf410eb4239e81bf0908e2", + "d9fd4fbf92cc4066ba627f43752692a7", + "618b14b62e9240adb441445ef43dc187", + "88ac8b5a97bf4f10892ba89b9aa8bc22", + "ec3e67749fb240049932732f7fc1ec4b", + "04f66b0623bb48be9124418d91cb0a9b", + "688d4f3db00c48da974c58ef73e5ba8c", + "853358201c564b5d9f6c8cfbcde10c11", + "7269d969455c4bec8b5cac0f7ed35d2c", + "3e7c189588044c9387c67812edc86f24", + "d36209a3813d4b52b7806091dfe2a8b6", + "c17b106000c9452fb7702b1d583040b3", + "aa027bc42cba4c178af51be5775d33a5", + "b9741487d3254932a3d7ada3eeefc595", + "48014c0699df4921ad0bd436578f5f78", + "c16ad9c1067e4c14872be1dde19aa68c", + "a679bfa91d9c4ba5b0b766dd21ac7bdf", + "28017307068642d9846314f22aba8dd9", + "b67317eb1552400ab120b334d2692020", + "2020c1a4a675488ca8265d62e7e58f29", + "d0ca10eb9f2544eb9b332d919aac0fbf", + "54eaff9e53a143098bb3e77f2d824268", + "610036373632486fbee44beb1e2e526d", + "6a9d87e1c39a4e59b20315bdfd57e799", + "c462787a34024ffd84a048697906a5da", + "ce13114c190a4580a88b7d87397799cb", + "5c41598c9c134bb2a5e6d7ecc8180780", + "c76946151342424b9f1c247b8d822c9f", + "a76e818268294b72b2eb4f1b3e8e88a0", + "60de8ac15a4a460fa518fe05fde546b7", + "be8caf785f124615a83a068226c11fc0", + "8b481231e25b469d912c89d25f2e130a", + "b741b85fa6e14d318ddd2ea5b318f801", + "a0e6f8b69797457481f7ff2b785ca099", + "ebe980a4978b422e991efc4c358fef14", + "62c66fdfd5f945d4b31103b14324c2e5", + "fe1ca594239248199310a9e342c1cbf6", + "9869a50effa547c1892d2bc030f91b3c", + "9039e0f23305429c9e65da5d4d978027", + "5e5acc1d1fc3481a889000e1708be0e9", + "e2fbee372dbb4d70adfaea19e4689d58", + "24470dfeb185451d9c9d8ca77a8d96c2", + "34a64bde1da146929adc8add25ab62cf", + "3a4b43d1701248ada52d6a1b023c0c03", + "69bea92a64e44baa896a9ce46a1bdb9f", + "15302520e0824d8bb6047c6e9df14922", + "300d1ed9cf5a48d0939c44efaeaa4f0a", + "3ed611fdd019473791afe760eadd332c", + "ff8654654c8547dca02be130d3b90456", + "c6babed63a9c458e9f20533dada1c0ab", + "636489b3d9f64e5fb1a1d59cbe03e87c", + "666c3dd0a02345f88f4a133348930eb6", + "c9c78c49251d44d59274b0ac9476e302", + "2f82fa9b538b43fa94bdb0338488bbf3", + "5f3d21cdbf63479c95d9716194f8e379", + "3fb9a1c091b44f4c8204c51097c33ecd", + "161ffef13b4641228cf68d5e0cdb4752", + "30b358e1d97442c48fd139910d7bb18d", + "5d387c69d2574ff19cf968ac314b08ef", + "2d49108838c7419ebff091cb26d01fc1", + "362fd834bb1843f5830696d45b1313ac", + "d8cf4efccdde487e944376eebc9e1107", + "fc6a12045e4140d1984786159e887eb6", + "62f7da4f2ff4415680a2a7588176cf60", + "4cbdbc2a339c4c5ca1952240a3aad792", + "e6dbea9a7f5145d78842da1a8024f121", + "9804f53f6ce845f3bd4225341fbb1b83", + "d29a9bc16e594ca28b1e910e7351f9f7", + "bedbc8d020414d10946648ec8fb35b98", + "d8450c7c22664e50b7a2aa8d2aa450ca", + "76b45ad5d52a4239a7128458f2e4ff8e", + "2e96d67753aa4cdd87d1a12b63ad4557", + "8849369ef4c64bde9cd4873a660f0ecb", + "30ca5eda0d5e49989a96d62864a6f5ea", + "12223cb52da347e4bf44b793323809ab", + "3adcdfad6da04e119f6301f6272d94b6", + "f8723f4b610d4b649f063de7b55bc284", + "b487318c01d44853890029fb496b4347", + "9aadd246448343b7af6f1c2506e072c7", + "7153c9902f874e87a02ee45d2bb5c3e2", + "d10241169ba44a18a7a25a7b7383c318", + "c2e992a64b7b4c6b9a2de645d8eda4ef", + "9d090fd9a91d48e6b1c4f7ef13dfe0de", + "ad06e1bdf44e46658315320afe17398e", + "c9abd4782c9e461b98c2982cac983c9e", + "212a753d65224d82b483572073416337", + "3cc4d8faa6084ff3b4f2725e3b69a3a2", + "0be5ab6599be4360b282aa558514fd0c", + "17287fdf1c4e4fd0a9117896b4761b6f", + "ecaa05f88907468696610a4f623e7517" + ] }, - "id": "ziFDfaygbzOl", - "outputId": "2e2ca538-f85d-4a87-e50b-442a226fb25b" + "id": "uKH1cxpkxFAr", + "outputId": "bae79adb-9ed2-49f6-c618-efdb66923cc3" }, - "execution_count": 7, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "W0728 18:13:03.377000 134787342856320 torch/distributed/run.py:779] \n", - "W0728 18:13:03.377000 134787342856320 torch/distributed/run.py:779] *****************************************\n", - "W0728 18:13:03.377000 134787342856320 torch/distributed/run.py:779] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. \n", - "W0728 18:13:03.377000 134787342856320 torch/distributed/run.py:779] *****************************************\n", - "[2024-07-28 18:13:08,712] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", - "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", - "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n", - "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n", - "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n", - "[2024-07-28 18:13:08,803] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", - "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", - "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n", - "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n", - "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n", - "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n", - "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_fwd\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_bwd\n", - "[2024-07-28 18:13:08,963] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", - "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n", - "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n", - "[2024-07-28 18:13:09,002] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", - "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_fwd\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_bwd\n", - "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n", - "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n", - "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n", - "[2024-07-28 18:13:09,067] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", - "[2024-07-28 18:13:09,073] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", - "[2024-07-28 18:13:09,075] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", - "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", - "[2024-07-28 18:13:09,083] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", - "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n", - "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n", - "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n", - "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", - "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", - "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", - "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n", - "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n", - "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n", - "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", - "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n", - "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n", - "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n", - "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n", - "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n", - "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n", - "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n", - "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n", - "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n", - "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n", - "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_fwd\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_bwd\n", - "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n", - "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_fwd\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_bwd\n", - "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n", - "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n", - "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n", - "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n", - "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n", - "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n", - "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n", - "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_fwd\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_bwd\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_fwd\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_bwd\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_fwd\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_bwd\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_fwd\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_bwd\n", - "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n", - "\n", - "Map: 0%| | 0/480 [00:00\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 26\u001b[0m \u001b[0mmodel_name\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'TinyLlama/TinyLlama-1.1B-Chat-v1.0'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 27\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 28\u001b[0;31m train_engine.sft_train(model_name=model_name, dataset_name=dataset_name,\n\u001b[0m\u001b[1;32m 29\u001b[0m \u001b[0msft_config\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msft_config\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msft_prompt_config\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msft_prompt_config\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 30\u001b[0m \u001b[0muse_zero\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0muse_ddp\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mFalse\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/simplifine_alpha/train_engine.py\u001b[0m in \u001b[0;36msft_train\u001b[0;34m(model_name, dataset_name, hf_token, dataset_config_name, data_from_hf, do_split, split_ratio, use_peft, lora_config, sft_config, data, wandb_config, use_ddp, use_zero, sft_prompt_config)\u001b[0m\n\u001b[1;32m 842\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmakedirs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moutput_dir_final\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexist_ok\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 843\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 844\u001b[0;31m \u001b[0mtrainer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 845\u001b[0m \u001b[0mtrainer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msave_model\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moutput_dir_final\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 846\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py\u001b[0m in \u001b[0;36mtrain\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 449\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmodel\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_trl_activate_neftune\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmodel\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 450\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 451\u001b[0;31m \u001b[0moutput\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msuper\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 452\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 453\u001b[0m \u001b[0;31m# After training we make sure to retrieve back the original forward pass method\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/transformers/trainer.py\u001b[0m in \u001b[0;36mtrain\u001b[0;34m(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)\u001b[0m\n\u001b[1;32m 1930\u001b[0m \u001b[0mhf_hub_utils\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0menable_progress_bars\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1931\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1932\u001b[0;31m return inner_training_loop(\n\u001b[0m\u001b[1;32m 1933\u001b[0m \u001b[0margs\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1934\u001b[0m \u001b[0mresume_from_checkpoint\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mresume_from_checkpoint\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/transformers/trainer.py\u001b[0m in \u001b[0;36m_inner_training_loop\u001b[0;34m(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)\u001b[0m\n\u001b[1;32m 2266\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2267\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maccelerator\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maccumulate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2268\u001b[0;31m \u001b[0mtr_loss_step\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtraining_step\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2269\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2270\u001b[0m if (\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/transformers/trainer.py\u001b[0m in \u001b[0;36mtraining_step\u001b[0;34m(***failed resolving arguments***)\u001b[0m\n\u001b[1;32m 3322\u001b[0m \u001b[0mscaled_loss\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3323\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 3324\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maccelerator\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mloss\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3325\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3326\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mloss\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdetach\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgradient_accumulation_steps\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py\u001b[0m in \u001b[0;36mbackward\u001b[0;34m(self, loss, **kwargs)\u001b[0m\n\u001b[1;32m 2149\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlomo_backward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mloss\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlearning_rate\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2150\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2151\u001b[0;31m \u001b[0mloss\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2152\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2153\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mset_trigger\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/torch/_tensor.py\u001b[0m in \u001b[0;36mbackward\u001b[0;34m(self, gradient, retain_graph, create_graph, inputs)\u001b[0m\n\u001b[1;32m 523\u001b[0m \u001b[0minputs\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0minputs\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 524\u001b[0m )\n\u001b[0;32m--> 525\u001b[0;31m torch.autograd.backward(\n\u001b[0m\u001b[1;32m 526\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgradient\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mretain_graph\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcreate_graph\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minputs\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0minputs\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 527\u001b[0m )\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py\u001b[0m in \u001b[0;36mbackward\u001b[0;34m(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)\u001b[0m\n\u001b[1;32m 265\u001b[0m \u001b[0;31m# some Python versions print out the first line of a multi-line function\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 266\u001b[0m \u001b[0;31m# calls in the traceback and some print out the last line\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 267\u001b[0;31m _engine_run_backward(\n\u001b[0m\u001b[1;32m 268\u001b[0m \u001b[0mtensors\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 269\u001b[0m \u001b[0mgrad_tensors_\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py\u001b[0m in \u001b[0;36m_engine_run_backward\u001b[0;34m(t_outputs, *args, **kwargs)\u001b[0m\n\u001b[1;32m 742\u001b[0m \u001b[0munregister_hooks\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_register_logging_hooks_on_whole_graph\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mt_outputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 743\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 744\u001b[0;31m return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass\n\u001b[0m\u001b[1;32m 745\u001b[0m \u001b[0mt_outputs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 746\u001b[0m ) # Calls into the C++ engine to run the backward pass\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py\u001b[0m in \u001b[0;36mapply\u001b[0;34m(self, *args)\u001b[0m\n\u001b[1;32m 299\u001b[0m )\n\u001b[1;32m 300\u001b[0m \u001b[0muser_fn\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mvjp_fn\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mvjp_fn\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mFunction\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvjp\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0mbackward_fn\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 301\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0muser_fn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 302\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 303\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mapply_jvp\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py\u001b[0m in \u001b[0;36mbackward\u001b[0;34m(ctx, *args)\u001b[0m\n\u001b[1;32m 318\u001b[0m \u001b[0;34m\" this checkpoint() is not necessary\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 319\u001b[0m )\n\u001b[0;32m--> 320\u001b[0;31m \u001b[0mtorch\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mautograd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moutputs_with_grad\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0margs_with_grad\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 321\u001b[0m grads = tuple(\n\u001b[1;32m 322\u001b[0m \u001b[0minp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgrad\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minp\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtorch\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mTensor\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py\u001b[0m in \u001b[0;36mbackward\u001b[0;34m(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)\u001b[0m\n\u001b[1;32m 265\u001b[0m \u001b[0;31m# some Python versions print out the first line of a multi-line function\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 266\u001b[0m \u001b[0;31m# calls in the traceback and some print out the last line\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 267\u001b[0;31m _engine_run_backward(\n\u001b[0m\u001b[1;32m 268\u001b[0m \u001b[0mtensors\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 269\u001b[0m \u001b[0mgrad_tensors_\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py\u001b[0m in \u001b[0;36m_engine_run_backward\u001b[0;34m(t_outputs, *args, **kwargs)\u001b[0m\n\u001b[1;32m 742\u001b[0m \u001b[0munregister_hooks\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_register_logging_hooks_on_whole_graph\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mt_outputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 743\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 744\u001b[0;31m return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass\n\u001b[0m\u001b[1;32m 745\u001b[0m \u001b[0mt_outputs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 746\u001b[0m ) # Calls into the C++ engine to run the backward pass\n", + "\u001b[0;31mKeyboardInterrupt\u001b[0m: " ] } ] @@ -3310,62 +6031,72 @@ { "cell_type": "markdown", "source": [ - "Finally, we test loading the model!" + "### โ˜๏ธ Training the Model on Cloud Servers\n", + "\n", + "In this section, weโ€™re moving from local training to cloud-based training using Simplifineโ€™s cloud infrastructure. This allows you to leverage powerful GPUs like the A100 for more intensive tasks, making it easier to handle larger models and datasets.\n", + "\n", + "1. **Importing the `train_utils` Module:**\n", + " - We start by importing the `train_utils` module from the `Simplifine` library. This module provides utilities to interact with Simplifine's cloud servers.\n", + "\n", + "2. **Model and API Configuration:**\n", + " - We select a different model for this cloud training: `'microsoft/Phi-3-mini-4k-instruct'`. This model is more powerful and well-suited for deployment on cloud GPUs.\n", + " - The `simplifine_api_key` is your unique key to access Simplifineโ€™s cloud services. Ensure you have it ready.\n", + " - The `gpu_type` is set to `'a100'`, which specifies the type of GPU to be used in the cloud. The A100 is a high-performance GPU ideal for deep learning tasks.\n", + "\n", + " ### ๐Ÿ”‘ Need an API Key?\n", + " If you don't have an API key yet, you can [**request one here for free**](https://www.simplifine.com/api-key-interest). The turnaround time is just 24 hours, so you'll be up and running in no time!\n", + "\n", + "3. **Client Initialization:**\n", + " - We create a `Client` object using the API key and GPU type. This client will handle the communication with Simplifineโ€™s cloud infrastructure, managing the training job on your behalf.\n", + "\n", + "4. **Defining the Training Job:**\n", + " - The `job_name` is set to `'fake_news_english_phi3'`, which uniquely identifies this training task.\n", + " - We then call the `sft_train_cloud` method on our `client` object. This method sends the training job to the cloud, using the model and configurations weโ€™ve defined earlier.\n", + "\n", + "5. **Cloud Training Setup:**\n", + " - We enable `use_zero=True` to utilize DeepSpeed's ZeRO optimization, allowing the model to scale effectively across multiple GPUs.\n", + " - We disable Distributed Data Parallel (DDP) for this job, which is appropriate when ZeRO is handling the distribution of data.\n", + "\n", + "Running this cell will initiate the training process on Simplifineโ€™s cloud servers, allowing you to offload the heavy lifting to a powerful cloud infrastructure. This is ideal when working with larger models or when your local resources are insufficient.\n" ], "metadata": { - "id": "67SHOrw0gUhM" + "id": "oehMA7hwRky5" } }, { "cell_type": "code", "source": [ - "from transformers import AutoModelForCausalLM, AutoTokenizer\n", - "\n", - "path = '/content/sf_trained_model'\n", - "sf_model = AutoModelForCausalLM.from_pretrained(path)\n", - "sf_tokenizer = AutoTokenizer.from_pretrained(path)\n", + "from simplifine_alpha import train_utils\n", "\n", - "input_example = '''### TITLE: title 1\\n ### ABSTRACT: abstract 1\\n ###EXPLANATION: '''\n", + "# change name to phi 3\n", + "model_name = 'microsoft/Phi-3-mini-4k-instruct'\n", + "simplifine_api_key = 'PUT YOUR OWN API KEY PROVIDED BY SIMPLIFINE'\n", + "gpu_type = 'a100'\n", + "client = train_utils.Client(simplifine_api_key, gpu_type)\n", "\n", - "input_example = sf_tokenizer(input_example, return_tensors='pt')\n", + "job_name = 'fake_news_english_phi3'\n", "\n", - "output = sf_model.generate(input_example['input_ids'],\n", - " attention_mask=input_example['attention_mask'],\n", - " max_length=30,eos_token_id=sf_tokenizer.eos_token_id,\n", - " early_stopping=True,\n", - " pad_token_id=sf_tokenizer.eos_token_id\n", - ")\n", "\n", - "print(sf_tokenizer.decode(output[0]))" + "client.sft_train_cloud(job_name=job_name, model_name=model_name, dataset_name=dataset_name,\n", + " sft_config = sft_config, sft_prompt_config=sft_prompt_config,\n", + " use_zero=True, use_ddp=False\n", + " )" ], "metadata": { + "id": "O1zdn8r85n-o", "colab": { "base_uri": "https://localhost:8080/" }, - "id": "IsUWweRdgVgZ", - "outputId": "ac088e5b-a57d-4640-d8ab-74bdbe631dbc" + "outputId": "d2510f4d-5246-4631-df37-8a741cf92240" }, - "execution_count": 9, + "execution_count": null, "outputs": [ - { - "output_type": "stream", - "name": "stderr", - "text": [ - "Some weights of the model checkpoint at /content/sf_trained_model were not used when initializing GPTNeoXForCausalLM: ['module.embed_out.weight', 'module.gpt_neox.embed_in.weight', 'module.gpt_neox.final_layer_norm.bias', 'module.gpt_neox.final_layer_norm.weight', 'module.gpt_neox.layers.0.attention.dense.bias', 'module.gpt_neox.layers.0.attention.dense.weight', 'module.gpt_neox.layers.0.attention.query_key_value.bias', 'module.gpt_neox.layers.0.attention.query_key_value.weight', 'module.gpt_neox.layers.0.input_layernorm.bias', 'module.gpt_neox.layers.0.input_layernorm.weight', 'module.gpt_neox.layers.0.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.0.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.0.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.0.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.0.post_attention_layernorm.bias', 'module.gpt_neox.layers.0.post_attention_layernorm.weight', 'module.gpt_neox.layers.1.attention.dense.bias', 'module.gpt_neox.layers.1.attention.dense.weight', 'module.gpt_neox.layers.1.attention.query_key_value.bias', 'module.gpt_neox.layers.1.attention.query_key_value.weight', 'module.gpt_neox.layers.1.input_layernorm.bias', 'module.gpt_neox.layers.1.input_layernorm.weight', 'module.gpt_neox.layers.1.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.1.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.1.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.1.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.1.post_attention_layernorm.bias', 'module.gpt_neox.layers.1.post_attention_layernorm.weight', 'module.gpt_neox.layers.10.attention.dense.bias', 'module.gpt_neox.layers.10.attention.dense.weight', 'module.gpt_neox.layers.10.attention.query_key_value.bias', 'module.gpt_neox.layers.10.attention.query_key_value.weight', 'module.gpt_neox.layers.10.input_layernorm.bias', 'module.gpt_neox.layers.10.input_layernorm.weight', 'module.gpt_neox.layers.10.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.10.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.10.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.10.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.10.post_attention_layernorm.bias', 'module.gpt_neox.layers.10.post_attention_layernorm.weight', 'module.gpt_neox.layers.11.attention.dense.bias', 'module.gpt_neox.layers.11.attention.dense.weight', 'module.gpt_neox.layers.11.attention.query_key_value.bias', 'module.gpt_neox.layers.11.attention.query_key_value.weight', 'module.gpt_neox.layers.11.input_layernorm.bias', 'module.gpt_neox.layers.11.input_layernorm.weight', 'module.gpt_neox.layers.11.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.11.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.11.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.11.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.11.post_attention_layernorm.bias', 'module.gpt_neox.layers.11.post_attention_layernorm.weight', 'module.gpt_neox.layers.2.attention.dense.bias', 'module.gpt_neox.layers.2.attention.dense.weight', 'module.gpt_neox.layers.2.attention.query_key_value.bias', 'module.gpt_neox.layers.2.attention.query_key_value.weight', 'module.gpt_neox.layers.2.input_layernorm.bias', 'module.gpt_neox.layers.2.input_layernorm.weight', 'module.gpt_neox.layers.2.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.2.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.2.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.2.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.2.post_attention_layernorm.bias', 'module.gpt_neox.layers.2.post_attention_layernorm.weight', 'module.gpt_neox.layers.3.attention.dense.bias', 'module.gpt_neox.layers.3.attention.dense.weight', 'module.gpt_neox.layers.3.attention.query_key_value.bias', 'module.gpt_neox.layers.3.attention.query_key_value.weight', 'module.gpt_neox.layers.3.input_layernorm.bias', 'module.gpt_neox.layers.3.input_layernorm.weight', 'module.gpt_neox.layers.3.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.3.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.3.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.3.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.3.post_attention_layernorm.bias', 'module.gpt_neox.layers.3.post_attention_layernorm.weight', 'module.gpt_neox.layers.4.attention.dense.bias', 'module.gpt_neox.layers.4.attention.dense.weight', 'module.gpt_neox.layers.4.attention.query_key_value.bias', 'module.gpt_neox.layers.4.attention.query_key_value.weight', 'module.gpt_neox.layers.4.input_layernorm.bias', 'module.gpt_neox.layers.4.input_layernorm.weight', 'module.gpt_neox.layers.4.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.4.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.4.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.4.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.4.post_attention_layernorm.bias', 'module.gpt_neox.layers.4.post_attention_layernorm.weight', 'module.gpt_neox.layers.5.attention.dense.bias', 'module.gpt_neox.layers.5.attention.dense.weight', 'module.gpt_neox.layers.5.attention.query_key_value.bias', 'module.gpt_neox.layers.5.attention.query_key_value.weight', 'module.gpt_neox.layers.5.input_layernorm.bias', 'module.gpt_neox.layers.5.input_layernorm.weight', 'module.gpt_neox.layers.5.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.5.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.5.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.5.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.5.post_attention_layernorm.bias', 'module.gpt_neox.layers.5.post_attention_layernorm.weight', 'module.gpt_neox.layers.6.attention.dense.bias', 'module.gpt_neox.layers.6.attention.dense.weight', 'module.gpt_neox.layers.6.attention.query_key_value.bias', 'module.gpt_neox.layers.6.attention.query_key_value.weight', 'module.gpt_neox.layers.6.input_layernorm.bias', 'module.gpt_neox.layers.6.input_layernorm.weight', 'module.gpt_neox.layers.6.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.6.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.6.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.6.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.6.post_attention_layernorm.bias', 'module.gpt_neox.layers.6.post_attention_layernorm.weight', 'module.gpt_neox.layers.7.attention.dense.bias', 'module.gpt_neox.layers.7.attention.dense.weight', 'module.gpt_neox.layers.7.attention.query_key_value.bias', 'module.gpt_neox.layers.7.attention.query_key_value.weight', 'module.gpt_neox.layers.7.input_layernorm.bias', 'module.gpt_neox.layers.7.input_layernorm.weight', 'module.gpt_neox.layers.7.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.7.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.7.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.7.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.7.post_attention_layernorm.bias', 'module.gpt_neox.layers.7.post_attention_layernorm.weight', 'module.gpt_neox.layers.8.attention.dense.bias', 'module.gpt_neox.layers.8.attention.dense.weight', 'module.gpt_neox.layers.8.attention.query_key_value.bias', 'module.gpt_neox.layers.8.attention.query_key_value.weight', 'module.gpt_neox.layers.8.input_layernorm.bias', 'module.gpt_neox.layers.8.input_layernorm.weight', 'module.gpt_neox.layers.8.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.8.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.8.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.8.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.8.post_attention_layernorm.bias', 'module.gpt_neox.layers.8.post_attention_layernorm.weight', 'module.gpt_neox.layers.9.attention.dense.bias', 'module.gpt_neox.layers.9.attention.dense.weight', 'module.gpt_neox.layers.9.attention.query_key_value.bias', 'module.gpt_neox.layers.9.attention.query_key_value.weight', 'module.gpt_neox.layers.9.input_layernorm.bias', 'module.gpt_neox.layers.9.input_layernorm.weight', 'module.gpt_neox.layers.9.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.9.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.9.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.9.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.9.post_attention_layernorm.bias', 'module.gpt_neox.layers.9.post_attention_layernorm.weight']\n", - "- This IS expected if you are initializing GPTNeoXForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n", - "- This IS NOT expected if you are initializing GPTNeoXForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n", - "Some weights of GPTNeoXForCausalLM were not initialized from the model checkpoint at /content/sf_trained_model and are newly initialized: ['embed_in.weight', 'embed_out.weight', 'final_layer_norm.bias', 'final_layer_norm.weight', 'layers.0.attention.dense.bias', 'layers.0.attention.dense.weight', 'layers.0.attention.query_key_value.bias', 'layers.0.attention.query_key_value.weight', 'layers.0.input_layernorm.bias', 'layers.0.input_layernorm.weight', 'layers.0.mlp.dense_4h_to_h.bias', 'layers.0.mlp.dense_4h_to_h.weight', 'layers.0.mlp.dense_h_to_4h.bias', 'layers.0.mlp.dense_h_to_4h.weight', 'layers.0.post_attention_layernorm.bias', 'layers.0.post_attention_layernorm.weight', 'layers.1.attention.dense.bias', 'layers.1.attention.dense.weight', 'layers.1.attention.query_key_value.bias', 'layers.1.attention.query_key_value.weight', 'layers.1.input_layernorm.bias', 'layers.1.input_layernorm.weight', 'layers.1.mlp.dense_4h_to_h.bias', 'layers.1.mlp.dense_4h_to_h.weight', 'layers.1.mlp.dense_h_to_4h.bias', 'layers.1.mlp.dense_h_to_4h.weight', 'layers.1.post_attention_layernorm.bias', 'layers.1.post_attention_layernorm.weight', 'layers.10.attention.dense.bias', 'layers.10.attention.dense.weight', 'layers.10.attention.query_key_value.bias', 'layers.10.attention.query_key_value.weight', 'layers.10.input_layernorm.bias', 'layers.10.input_layernorm.weight', 'layers.10.mlp.dense_4h_to_h.bias', 'layers.10.mlp.dense_4h_to_h.weight', 'layers.10.mlp.dense_h_to_4h.bias', 'layers.10.mlp.dense_h_to_4h.weight', 'layers.10.post_attention_layernorm.bias', 'layers.10.post_attention_layernorm.weight', 'layers.11.attention.dense.bias', 'layers.11.attention.dense.weight', 'layers.11.attention.query_key_value.bias', 'layers.11.attention.query_key_value.weight', 'layers.11.input_layernorm.bias', 'layers.11.input_layernorm.weight', 'layers.11.mlp.dense_4h_to_h.bias', 'layers.11.mlp.dense_4h_to_h.weight', 'layers.11.mlp.dense_h_to_4h.bias', 'layers.11.mlp.dense_h_to_4h.weight', 'layers.11.post_attention_layernorm.bias', 'layers.11.post_attention_layernorm.weight', 'layers.2.attention.dense.bias', 'layers.2.attention.dense.weight', 'layers.2.attention.query_key_value.bias', 'layers.2.attention.query_key_value.weight', 'layers.2.input_layernorm.bias', 'layers.2.input_layernorm.weight', 'layers.2.mlp.dense_4h_to_h.bias', 'layers.2.mlp.dense_4h_to_h.weight', 'layers.2.mlp.dense_h_to_4h.bias', 'layers.2.mlp.dense_h_to_4h.weight', 'layers.2.post_attention_layernorm.bias', 'layers.2.post_attention_layernorm.weight', 'layers.3.attention.dense.bias', 'layers.3.attention.dense.weight', 'layers.3.attention.query_key_value.bias', 'layers.3.attention.query_key_value.weight', 'layers.3.input_layernorm.bias', 'layers.3.input_layernorm.weight', 'layers.3.mlp.dense_4h_to_h.bias', 'layers.3.mlp.dense_4h_to_h.weight', 'layers.3.mlp.dense_h_to_4h.bias', 'layers.3.mlp.dense_h_to_4h.weight', 'layers.3.post_attention_layernorm.bias', 'layers.3.post_attention_layernorm.weight', 'layers.4.attention.dense.bias', 'layers.4.attention.dense.weight', 'layers.4.attention.query_key_value.bias', 'layers.4.attention.query_key_value.weight', 'layers.4.input_layernorm.bias', 'layers.4.input_layernorm.weight', 'layers.4.mlp.dense_4h_to_h.bias', 'layers.4.mlp.dense_4h_to_h.weight', 'layers.4.mlp.dense_h_to_4h.bias', 'layers.4.mlp.dense_h_to_4h.weight', 'layers.4.post_attention_layernorm.bias', 'layers.4.post_attention_layernorm.weight', 'layers.5.attention.dense.bias', 'layers.5.attention.dense.weight', 'layers.5.attention.query_key_value.bias', 'layers.5.attention.query_key_value.weight', 'layers.5.input_layernorm.bias', 'layers.5.input_layernorm.weight', 'layers.5.mlp.dense_4h_to_h.bias', 'layers.5.mlp.dense_4h_to_h.weight', 'layers.5.mlp.dense_h_to_4h.bias', 'layers.5.mlp.dense_h_to_4h.weight', 'layers.5.post_attention_layernorm.bias', 'layers.5.post_attention_layernorm.weight', 'layers.6.attention.dense.bias', 'layers.6.attention.dense.weight', 'layers.6.attention.query_key_value.bias', 'layers.6.attention.query_key_value.weight', 'layers.6.input_layernorm.bias', 'layers.6.input_layernorm.weight', 'layers.6.mlp.dense_4h_to_h.bias', 'layers.6.mlp.dense_4h_to_h.weight', 'layers.6.mlp.dense_h_to_4h.bias', 'layers.6.mlp.dense_h_to_4h.weight', 'layers.6.post_attention_layernorm.bias', 'layers.6.post_attention_layernorm.weight', 'layers.7.attention.dense.bias', 'layers.7.attention.dense.weight', 'layers.7.attention.query_key_value.bias', 'layers.7.attention.query_key_value.weight', 'layers.7.input_layernorm.bias', 'layers.7.input_layernorm.weight', 'layers.7.mlp.dense_4h_to_h.bias', 'layers.7.mlp.dense_4h_to_h.weight', 'layers.7.mlp.dense_h_to_4h.bias', 'layers.7.mlp.dense_h_to_4h.weight', 'layers.7.post_attention_layernorm.bias', 'layers.7.post_attention_layernorm.weight', 'layers.8.attention.dense.bias', 'layers.8.attention.dense.weight', 'layers.8.attention.query_key_value.bias', 'layers.8.attention.query_key_value.weight', 'layers.8.input_layernorm.bias', 'layers.8.input_layernorm.weight', 'layers.8.mlp.dense_4h_to_h.bias', 'layers.8.mlp.dense_4h_to_h.weight', 'layers.8.mlp.dense_h_to_4h.bias', 'layers.8.mlp.dense_h_to_4h.weight', 'layers.8.post_attention_layernorm.bias', 'layers.8.post_attention_layernorm.weight', 'layers.9.attention.dense.bias', 'layers.9.attention.dense.weight', 'layers.9.attention.query_key_value.bias', 'layers.9.attention.query_key_value.weight', 'layers.9.input_layernorm.bias', 'layers.9.input_layernorm.weight', 'layers.9.mlp.dense_4h_to_h.bias', 'layers.9.mlp.dense_4h_to_h.weight', 'layers.9.mlp.dense_h_to_4h.bias', 'layers.9.mlp.dense_h_to_4h.weight', 'layers.9.post_attention_layernorm.bias', 'layers.9.post_attention_layernorm.weight']\n", - "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n", - "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n" - ] - }, { "output_type": "stream", "name": "stdout", "text": [ - "### TITLE: title 1\n", - " ### ABSTRACT: abstract 1\n", - " ###EXPLANATION: rugu stretmediate complains GermanServ\n" + "[2024-08-07 18:34:35,105] [WARNING] [real_accelerator.py:162:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.\n", + "[2024-08-07 18:34:35,110] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cpu (auto detect)\n" ] } ] @@ -3373,68 +6104,80 @@ { "cell_type": "markdown", "source": [ - "### Using ZeRO\n", - "ZeRO is a strong tool when a model cannot fit on GPU memory, so it is sharded across them (parameters, gradients and activations). Further memory reduction could be by enabling fp16/bf16, and gradient_checkpointing." + "### ๐Ÿ“ Checking the Status of Your Training Jobs\n", + "\n", + "After submitting your training job to Simplifineโ€™s cloud servers, itโ€™s important to monitor its status to ensure everything is running smoothly. In this section, weโ€™ll check the status of your most recent job.\n", + "\n", + "1. **Retrieving Job Status:**\n", + " - We call the `get_all_jobs` method on our `client` object. This method returns a list of all jobs associated with your API key, including their current statuses.\n", + "\n", + "2. **Displaying the Latest Job:**\n", + " - We loop through the latest job in the list and print its status. This gives you a quick overview of how your most recent training job is progressing.\n", + "\n", + "3. **Understanding Job Statuses:**\n", + " - Your job can have one of the following statuses:\n", + " - `pending`: The job has been submitted and is waiting to start.\n", + " - `in progress`: The job is currently running.\n", + " - `stopped`: The job was stopped before completion, either manually or due to an error.\n", + " - `completed`: The job has successfully finished.\n", + "\n", + "Running this cell will display the status of your most recent job, helping you keep track of your training tasks on Simplifineโ€™s cloud servers.\n" ], "metadata": { - "id": "os5pt22OgZc3" + "id": "W88J_Ef7yaYG" } }, { "cell_type": "code", "source": [ - "# This time, we just change the use_zero arg to True, and opposite to use_ddp.\n", - "client.sft_train_cloud(model_name = model_name, from_hf=from_hf, dataset_name=dataset_name,\n", - " keys = keys, data = data,\n", - " template = template, job_name='zero_example_cloud',\n", - " response_template=response_template, use_zero=True, use_ddp=False)" - ], - "metadata": { - "id": "m3LGu5ZYga2y" - }, - "execution_count": 10, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "# repeat the same step of extracting jobs and ids\n", "status = client.get_all_jobs()\n", - "\n", - "for num,i in enumerate(status[-5:]):\n", - " print(f'Number {num} status: {i}\\n')" + "for num,i in enumerate(status[-1:]):\n", + " print(f'Job {num}: {i}')" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, - "id": "poAxZn-bgcnC", - "outputId": "a78dede0-ddbd-4d1a-f80d-4f9b06b16a90" + "id": "l70vZyPV6_AC", + "outputId": "b32db3fe-e353-4105-e8b7-63a772d7ccde" }, - "execution_count": 11, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "Number 0 status: {'job_id': 'bde91132-9776-41ae-89f9-855dfb116a91', 'job_name': 'ddp_job', 'status': 'completed'}\n", - "\n", - "Number 1 status: {'job_id': 'a1ff54dd-5ee2-4e35-9e78-6868f63dad37', 'job_name': 'zero_example_cloud', 'status': 'completed'}\n", - "\n", - "Number 2 status: {'job_id': '543d3bc3-3ce4-4af6-9f9a-6c0823dcc9b0', 'job_name': 'ddp_job', 'status': 'completed'}\n", - "\n", - "Number 3 status: {'job_id': '5d55d46a-7793-4c06-9cef-279f03a0f953', 'job_name': 'job_1', 'status': 'completed'}\n", - "\n", - "Number 4 status: {'job_id': '42d965c0-773f-4b45-8dfb-a4f310e6606e', 'job_name': 'zero_example_cloud', 'status': 'in progress'}\n", - "\n" + "Job 0: {'job_id': '183c65ad-2b4e-4d11-b2a5-d66232d5b15b', 'job_name': 'fake_news_english_phi3', 'status': 'completed'}\n" ] } ] }, + { + "cell_type": "markdown", + "source": [ + "### ๐Ÿ“Š Retrieving and Viewing Training Logs\n", + "\n", + "After checking the status of your training job, you might want to dive deeper into the details by viewing the training logs. These logs provide insights into the training process, including any issues or updates on the progress.\n", + "\n", + "1. **Getting the `job_id`:**\n", + " - We start by extracting the `job_id` of the last job from the status list. The `job_id` is a unique identifier for each training job, which weโ€™ll use to retrieve its logs.\n", + "\n", + "2. **Retrieving Logs:**\n", + " - We call the `get_train_logs` method on our `client` object, passing in the `job_id`. This method fetches the detailed logs for the specified job, giving you access to the complete training history.\n", + "\n", + "3. **Viewing the Logs:**\n", + " - Finally, we print the `response` from the logs, which contains detailed information about the training process. This includes updates, errors, and any other relevant messages from the training run.\n", + "\n", + "Running this cell will display the logs for your most recent job, allowing you to monitor and troubleshoot the training process effectively.\n" + ], + "metadata": { + "id": "BDe93gbayl_n" + } + }, { "cell_type": "code", "source": [ - "# extracting logs again\n", + "# getting the job_id of the last job\n", "job_id = status[-1]['job_id']\n", "\n", "logs = client.get_train_logs(job_id)\n", @@ -3444,64 +6187,76 @@ "colab": { "base_uri": "https://localhost:8080/" }, - "id": "8zHTeTBmgzm7", - "outputId": "d5ada91a-76c1-48ac-df4b-ed35bf38661d" + "id": "jt35FPNn8ADK", + "outputId": "1de668ed-718e-452d-eb85-0632d7652008" }, - "execution_count": 12, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "W0728 18:16:44.514000 133239404900480 torch/distributed/run.py:779] \n", - "W0728 18:16:44.514000 133239404900480 torch/distributed/run.py:779] *****************************************\n", - "W0728 18:16:44.514000 133239404900480 torch/distributed/run.py:779] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. \n", - "W0728 18:16:44.514000 133239404900480 torch/distributed/run.py:779] *****************************************\n", - "[2024-07-28 18:16:49,912] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", - "[2024-07-28 18:16:49,967] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", + "W0806 18:14:41.510000 129132731527296 torch/distributed/run.py:779] \n", + "W0806 18:14:41.510000 129132731527296 torch/distributed/run.py:779] *****************************************\n", + "W0806 18:14:41.510000 129132731527296 torch/distributed/run.py:779] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. \n", + "W0806 18:14:41.510000 129132731527296 torch/distributed/run.py:779] *****************************************\n", + "[2024-08-06 18:14:46,878] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", + "[2024-08-06 18:14:46,910] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", + "[2024-08-06 18:14:46,961] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n", "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n", "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n", "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", - "[2024-07-28 18:16:50,049] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n", "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n", "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n", - "[2024-07-28 18:16:50,075] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", - "[2024-07-28 18:16:50,082] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", + "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n", + "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n", + "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n", + "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n", + "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n", + "[2024-08-06 18:14:47,065] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", + "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n", + " @autocast_custom_fwd\n", + "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n", + " @autocast_custom_bwd\n", "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n", "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n", + "[2024-08-06 18:14:47,135] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", + "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", + "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n", + " @autocast_custom_fwd\n", + "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n", + " @autocast_custom_bwd\n", "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n", "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n", "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n", - "[2024-07-28 18:16:50,149] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", - "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", - "[2024-07-28 18:16:50,153] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", + "[2024-08-06 18:14:47,158] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", + "[2024-08-06 18:14:47,172] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", + "[2024-08-06 18:14:47,194] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", + "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n", + "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n", "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n", "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n", "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n", + "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n", "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n", "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n", - "[2024-07-28 18:16:50,168] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", + "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n", " @autocast_custom_fwd\n", "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n", " @autocast_custom_bwd\n", - "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n", - "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n", - "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", - "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", - "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n", - "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n", - "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n", "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n", "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n", "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n", "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n", + "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n", + "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n", "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n", "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n", "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n", @@ -3523,14 +6278,6 @@ " @autocast_custom_fwd\n", "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n", " @autocast_custom_bwd\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_fwd\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_bwd\n", - "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n", - "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n", - "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n", - "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n", "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n", "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n", "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n", @@ -3541,85 +6288,232 @@ " @autocast_custom_fwd\n", "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n", " @autocast_custom_bwd\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_fwd\n", - "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n", - " @autocast_custom_bwd\n", + "[2024-08-06 18:14:48,688] [INFO] [comm.py:637:init_distributed] cdb=None\n", + "[2024-08-06 18:14:48,695] [INFO] [comm.py:637:init_distributed] cdb=None\n", + "[2024-08-06 18:14:48,785] [INFO] [comm.py:637:init_distributed] cdb=None\n", + "[2024-08-06 18:14:48,850] [INFO] [comm.py:637:init_distributed] cdb=None\n", + "[2024-08-06 18:14:48,890] [INFO] [comm.py:637:init_distributed] cdb=None\n", + "Destroying existing process group\n", + "Destroying existing process group\n", + "[2024-08-06 18:14:48,922] [INFO] [comm.py:637:init_distributed] cdb=None\n", + "[2024-08-06 18:14:48,946] [INFO] [comm.py:637:init_distributed] cdb=None\n", + "[2024-08-06 18:14:48,947] [INFO] [comm.py:637:init_distributed] cdb=None\n", + "Destroying existing process group\n", + "Destroying existing process group\n", + "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n", + "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n", + "Destroying existing process group\n", + "Destroying existing process group\n", + "Destroying existing process group\n", + "Destroying existing process group\n", + "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n", + "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n", + "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n", "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n", + "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n", + "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n", + "\n", + "Map: 0%| | 0/393 [00:00\"Open" - ] + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/simplifine-llm/Simplifine/blob/main/examples/fake_url_detection.ipynb)", + "\n", + "### ๐Ÿ“ฆ Installing Required Libraries\n", + "\n", + "Before we begin fine-tuning our fake news detector, we need to install the necessary libraries. In this step, weโ€™re installing the `Simplifine` library, which provides tools to streamline the fine-tuning process for large language models. Weโ€™re also installing the `datasets` library, which allows us to easily access and manage datasets from Hugging Face.\n", + "\n", + "- The `Simplifine` library helps in making the fine-tuning process more efficient, whether you're working locally or in the cloud.\n", + "- The `datasets` library is essential for loading and processing the dataset we'll be using for this project.\n", + "\n", + "Running this cell will install both libraries quietly in the background.\n" + ], + "metadata": { + "id": "0SClYIzAQrpD" + } }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -5560,6 +5566,41 @@ "!pip install datasets -q" ] }, + { + "cell_type": "markdown", + "source": [ + "### ๐Ÿ› ๏ธ Setting Up for Local Training\n", + "\n", + "In this section, weโ€™re preparing to fine-tune our fake news detector model using Google Colabโ€™s resources. The steps below outline how to configure and initiate the training process.\n", + "\n", + "1. **Importing Libraries:**\n", + " - We import `train_engine` from the `Simplifine` library, which provides the necessary functions to handle the fine-tuning process.\n", + " - We also import `SFTConfig` from the `trl` library, which allows us to configure the supervised fine-tuning parameters.\n", + "\n", + "2. **Dataset Selection:**\n", + " - We define the dataset name as `'community-datasets/fake_news_english'`. This dataset contains examples of fake news articles that we will use to fine-tune our model.\n", + "\n", + "3. **Prompt Configuration:**\n", + " - We create a `sftPromptConfig` object to specify how the training data is formatted.\n", + " - The `template` parameter defines the input format, and the `response_template` specifies how the model should generate outputs.\n", + " - The `use_chat_template` flag is set to `True` to format the inputs in a conversational style, which can be effective for chat-based models.\n", + "\n", + "4. **Training Configuration:**\n", + " - We define the training settings using `SFTConfig`. This includes parameters like batch size, learning rate, and the number of epochs.\n", + " - We also enable `fp16` (16-bit floating-point) training for faster computation and set `gradient_checkpointing` to save memory during training.\n", + "\n", + "5. **Model Selection:**\n", + " - The model weโ€™re fine-tuning is `'TinyLlama/TinyLlama-1.1B-Chat-v1.0'`. This is a smaller, efficient model suitable for demonstration purposes on Colab.\n", + "\n", + "6. **Training the Model:**\n", + " - Finally, we call `sft_train` to start the fine-tuning process. This step will take a while to complete, as weโ€™re training the model from scratch without any optimizations like quantization or LoRA.\n", + "\n", + "Running this cell will fine-tune the model locally on Colab, using the configurations weโ€™ve set up. This is ideal for quick experiments or when cloud resources are not available." + ], + "metadata": { + "id": "C0dDwmg4Rb3N" + } + }, { "cell_type": "code", "source": [ @@ -5738,7 +5779,7 @@ "id": "uKH1cxpkxFAr", "outputId": "bae79adb-9ed2-49f6-c618-efdb66923cc3" }, - "execution_count": 2, + "execution_count": null, "outputs": [ { "output_type": "stream", @@ -5987,6 +6028,41 @@ } ] }, + { + "cell_type": "markdown", + "source": [ + "### โ˜๏ธ Training the Model on Cloud Servers\n", + "\n", + "In this section, weโ€™re moving from local training to cloud-based training using Simplifineโ€™s cloud infrastructure. This allows you to leverage powerful GPUs like the A100 for more intensive tasks, making it easier to handle larger models and datasets.\n", + "\n", + "1. **Importing the `train_utils` Module:**\n", + " - We start by importing the `train_utils` module from the `Simplifine` library. This module provides utilities to interact with Simplifine's cloud servers.\n", + "\n", + "2. **Model and API Configuration:**\n", + " - We select a different model for this cloud training: `'microsoft/Phi-3-mini-4k-instruct'`. This model is more powerful and well-suited for deployment on cloud GPUs.\n", + " - The `simplifine_api_key` is your unique key to access Simplifineโ€™s cloud services. Ensure you have it ready.\n", + " - The `gpu_type` is set to `'a100'`, which specifies the type of GPU to be used in the cloud. The A100 is a high-performance GPU ideal for deep learning tasks.\n", + "\n", + " ### ๐Ÿ”‘ Need an API Key?\n", + " If you don't have an API key yet, you can [**request one here for free**](https://www.simplifine.com/api-key-interest). The turnaround time is just 24 hours, so you'll be up and running in no time!\n", + "\n", + "3. **Client Initialization:**\n", + " - We create a `Client` object using the API key and GPU type. This client will handle the communication with Simplifineโ€™s cloud infrastructure, managing the training job on your behalf.\n", + "\n", + "4. **Defining the Training Job:**\n", + " - The `job_name` is set to `'fake_news_english_phi3'`, which uniquely identifies this training task.\n", + " - We then call the `sft_train_cloud` method on our `client` object. This method sends the training job to the cloud, using the model and configurations weโ€™ve defined earlier.\n", + "\n", + "5. **Cloud Training Setup:**\n", + " - We enable `use_zero=True` to utilize DeepSpeed's ZeRO optimization, allowing the model to scale effectively across multiple GPUs.\n", + " - We disable Distributed Data Parallel (DDP) for this job, which is appropriate when ZeRO is handling the distribution of data.\n", + "\n", + "Running this cell will initiate the training process on Simplifineโ€™s cloud servers, allowing you to offload the heavy lifting to a powerful cloud infrastructure. This is ideal when working with larger models or when your local resources are insufficient.\n" + ], + "metadata": { + "id": "oehMA7hwRky5" + } + }, { "cell_type": "code", "source": [ @@ -5994,7 +6070,7 @@ "\n", "# change name to phi 3\n", "model_name = 'microsoft/Phi-3-mini-4k-instruct'\n", - "simplifine_api_key = ''\n", + "simplifine_api_key = 'PUT YOUR OWN API KEY PROVIDED BY SIMPLIFINE'\n", "gpu_type = 'a100'\n", "client = train_utils.Client(simplifine_api_key, gpu_type)\n", "\n", @@ -6013,7 +6089,7 @@ }, "outputId": "d2510f4d-5246-4631-df37-8a741cf92240" }, - "execution_count": 2, + "execution_count": null, "outputs": [ { "output_type": "stream", @@ -6028,13 +6104,24 @@ { "cell_type": "markdown", "source": [ - "You can check the status of your job. The status can be any of the following:\n", + "### ๐Ÿ“ Checking the Status of Your Training Jobs\n", + "\n", + "After submitting your training job to Simplifineโ€™s cloud servers, itโ€™s important to monitor its status to ensure everything is running smoothly. In this section, weโ€™ll check the status of your most recent job.\n", + "\n", + "1. **Retrieving Job Status:**\n", + " - We call the `get_all_jobs` method on our `client` object. This method returns a list of all jobs associated with your API key, including their current statuses.\n", "\n", + "2. **Displaying the Latest Job:**\n", + " - We loop through the latest job in the list and print its status. This gives you a quick overview of how your most recent training job is progressing.\n", "\n", - "```\n", - "pending | in progress | stopped | completed\n", - "```\n", - "\n" + "3. **Understanding Job Statuses:**\n", + " - Your job can have one of the following statuses:\n", + " - `pending`: The job has been submitted and is waiting to start.\n", + " - `in progress`: The job is currently running.\n", + " - `stopped`: The job was stopped before completion, either manually or due to an error.\n", + " - `completed`: The job has successfully finished.\n", + "\n", + "Running this cell will display the status of your most recent job, helping you keep track of your training tasks on Simplifineโ€™s cloud servers.\n" ], "metadata": { "id": "W88J_Ef7yaYG" @@ -6054,7 +6141,7 @@ "id": "l70vZyPV6_AC", "outputId": "b32db3fe-e353-4105-e8b7-63a772d7ccde" }, - "execution_count": 3, + "execution_count": null, "outputs": [ { "output_type": "stream", @@ -6068,7 +6155,20 @@ { "cell_type": "markdown", "source": [ - "To see how things are going, you can take a look at the logs, to see if there were any errors or what not." + "### ๐Ÿ“Š Retrieving and Viewing Training Logs\n", + "\n", + "After checking the status of your training job, you might want to dive deeper into the details by viewing the training logs. These logs provide insights into the training process, including any issues or updates on the progress.\n", + "\n", + "1. **Getting the `job_id`:**\n", + " - We start by extracting the `job_id` of the last job from the status list. The `job_id` is a unique identifier for each training job, which weโ€™ll use to retrieve its logs.\n", + "\n", + "2. **Retrieving Logs:**\n", + " - We call the `get_train_logs` method on our `client` object, passing in the `job_id`. This method fetches the detailed logs for the specified job, giving you access to the complete training history.\n", + "\n", + "3. **Viewing the Logs:**\n", + " - Finally, we print the `response` from the logs, which contains detailed information about the training process. This includes updates, errors, and any other relevant messages from the training run.\n", + "\n", + "Running this cell will display the logs for your most recent job, allowing you to monitor and troubleshoot the training process effectively.\n" ], "metadata": { "id": "BDe93gbayl_n" @@ -6090,7 +6190,7 @@ "id": "jt35FPNn8ADK", "outputId": "1de668ed-718e-452d-eb85-0632d7652008" }, - "execution_count": 4, + "execution_count": null, "outputs": [ { "output_type": "stream", @@ -6552,6 +6652,27 @@ } ] }, + { + "cell_type": "markdown", + "source": [ + "### ๐Ÿ“‚ Downloading and Saving the Trained Model\n", + "\n", + "Once your training job is completed, the next step is to download the trained model so you can use it locally or for further fine-tuning.\n", + "\n", + "1. **Creating a Directory for the Model:**\n", + " - We begin by creating a new folder called `sf_trained_model_zero_phi`. This folder will serve as the destination for the downloaded model files.\n", + "\n", + "2. **Downloading the Model:**\n", + " - We use the `download_model` method on our `client` object to download the trained model from the cloud. The `job_id` is passed to specify which model to download, and we extract the files to the newly created directory.\n", + " \n", + " - **Tip:** This process might take some time depending on the size of the model, so feel free to take a break or grab a coffee while you wait! โ˜•\n", + "\n", + "Running this cell will download your trained model and save it in the specified directory, making it ready for use in your next project or analysis.\n" + ], + "metadata": { + "id": "koKpp2XNU-y1" + } + }, { "cell_type": "code", "source": [ @@ -6571,7 +6692,7 @@ }, "outputId": "b88812a9-8f64-464e-d4c5-d2ace8814f08" }, - "execution_count": 5, + "execution_count": null, "outputs": [ { "output_type": "stream", @@ -6593,6 +6714,31 @@ } ] }, + { + "cell_type": "markdown", + "source": [ + "### ๐Ÿ”„ Loading the Trained Model and Tokenizer\n", + "\n", + "Now that we've successfully downloaded the trained model, the next step is to load it into our environment so we can use it for inference or further fine-tuning.\n", + "\n", + "1. **Importing Required Libraries:**\n", + " - We import `AutoModelForCausalLM` and `AutoTokenizer` from the `transformers` library. These classes are used to load the model and tokenizer from the saved files.\n", + "\n", + "2. **Setting the Path:**\n", + " - We set the `path` variable to point to the directory where we saved the trained model (`'/content/sf_trained_model_zero_phi'`).\n", + "\n", + "3. **Loading the Model:**\n", + " - We use `AutoModelForCausalLM.from_pretrained(path)` to load the trained model from the specified path. This initializes the model so itโ€™s ready for use.\n", + "\n", + "4. **Loading the Tokenizer:**\n", + " - Similarly, we load the tokenizer using `AutoTokenizer.from_pretrained(path)`. The tokenizer is essential for processing text input into a format that the model can understand.\n", + "\n", + "Running this cell will load both the trained model and tokenizer into your environment, allowing you to start generating text or continue fine-tuning with your freshly trained model." + ], + "metadata": { + "id": "mQ1fk9tJVJKy" + } + }, { "cell_type": "code", "source": [ @@ -6624,7 +6770,7 @@ }, "outputId": "9b14f28f-3376-45bc-82cb-c6b09a31aa6c" }, - "execution_count": 6, + "execution_count": null, "outputs": [ { "output_type": "display_data", @@ -6649,6 +6795,25 @@ } ] }, + { + "cell_type": "markdown", + "source": [ + "### ๐Ÿ“š Loading the Dataset\n", + "\n", + "Before we can use our trained model for inference or further fine-tuning, we need to load the dataset that weโ€™ve been working with.\n", + "\n", + "1. **Importing the Datasets Library:**\n", + " - We start by importing the `datasets` library, which provides easy access to a wide range of datasets, including the one we've been using for training.\n", + "\n", + "2. **Loading the Dataset:**\n", + " - We load the dataset using the `load_dataset` function from the `datasets` library. The `dataset_name` variable contains the name of the dataset we specified earlier in our code.\n", + "\n", + "Running this cell will load the dataset into your environment, making it ready for evaluation, inference," + ], + "metadata": { + "id": "UZ-1si0bVOMC" + } + }, { "cell_type": "code", "source": [ @@ -6698,7 +6863,7 @@ "id": "Orm2RTPh1s-s", "outputId": "34794037-e2bb-4e64-cf52-445e61a7aaf6" }, - "execution_count": 8, + "execution_count": null, "outputs": [ { "output_type": "stream", @@ -6756,6 +6921,39 @@ } ] }, + { + "cell_type": "markdown", + "source": [ + "### ๐Ÿง  Generating Text with the Trained Model\n", + "\n", + "Now that we've loaded both the model and the dataset, itโ€™s time to generate some text using our trained model. In this section, weโ€™ll configure the generation settings and produce some sample outputs.\n", + "\n", + "1. **Importing Inference Tools:**\n", + " - We import `inference_tools` from the `simplifine_alpha` library. This module provides the necessary tools to generate text using the model weโ€™ve fine-tuned.\n", + "\n", + "2. **Configuring Text Generation:**\n", + " - We create a `GenerationConfig` object to define how the model should generate text. This configuration includes:\n", + " - `prompt_template` and `response_template`: Templates for how the inputs and outputs are formatted.\n", + " - `keys`: Specifies the data keys used in the templates.\n", + " - `train_type`: Indicates that we're using supervised fine-tuning (`sft`).\n", + " - `max_length`: The maximum length of the generated sequences.\n", + " - `num_return_sequences`: How many sequences to generate.\n", + " - `do_sample`, `top_k`, `top_p`, `temperature`: Parameters that control the randomness and diversity of the generated text.\n", + "\n", + "3. **Generating Text:**\n", + " - We call `generate_from_pretrained` using our fine-tuned model, tokenizer, and the generation configuration. We also pass in a small sample of the dataset to generate text based on the training data.\n", + " \n", + " - **Note:** Weโ€™re using only the first three examples from the training dataset (`dataset['train'][:3]`) for quick testing.\n", + "\n", + "4. **Displaying the Generated Text:**\n", + " - Finally, we print the generated text, which provides a glimpse into how well the model has learned to detect fake news.\n", + "\n", + "Running this cell will generate text using your trained model, showcasing its ability to produce outputs based on the fine-tuned dataset. This is where you can see the real impact of your training efforts!" + ], + "metadata": { + "id": "tHGpRwU6VVav" + } + }, { "cell_type": "code", "source": [ @@ -6784,7 +6982,7 @@ "id": "8KWnTV9w1OMQ", "outputId": "e78d14ca-9b91-4412-8d16-ece24b3ffe7d" }, - "execution_count": 11, + "execution_count": null, "outputs": [ { "output_type": "stream", @@ -6803,4 +7001,4 @@ ] } ] -} \ No newline at end of file +}