"
- ],
- "text/html": [
- "\n",
- " \n",
- " \n",
- "
\n",
- " [360/360 01:17, Epoch 3/3]\n",
- "
\n",
- " \n",
- " \n",
- " \n",
- " Step | \n",
- " Training Loss | \n",
- "
\n",
- " \n",
- " \n",
- " \n",
- "
"
- ]
- },
- "metadata": {}
- }
- ]
- },
- {
- "cell_type": "markdown",
- "source": [
- "Testing the model's generation after training.\n",
- "The simplifine trainer saves the final model in a folder in output_dir called \"final_model\"."
- ],
- "metadata": {
- "id": "fWjphDgAMxuk"
- }
- },
- {
- "cell_type": "code",
- "source": [
- "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
- "\n",
- "# This the path that the model and other relevant files are saved to.\n",
- "# this is the default folder name in the trainer.\n",
- "# The final checkpoint is saved under final_model.\n",
- "path = '/content/sft_output/final_model'\n",
- "sf_model = AutoModelForCausalLM.from_pretrained(path)\n",
- "sf_tokenizer = AutoTokenizer.from_pretrained(path)\n",
- "\n",
- "# an example following the arbitrary training data\n",
- "input_example = '''### TITLE: title 1\\n ### ABSTRACT: abstract 1\\n ###EXPLANATION: '''\n",
- "\n",
- "input_example = sf_tokenizer(input_example, return_tensors='pt')\n",
- "\n",
- "output = sf_model.generate(input_example['input_ids'],\n",
- " attention_mask=input_example['attention_mask'],\n",
- " max_length=30,eos_token_id=sf_tokenizer.eos_token_id,\n",
- " early_stopping=True,\n",
- " pad_token_id=sf_tokenizer.eos_token_id\n",
- ")\n",
- "\n",
- "print(sf_tokenizer.decode(output[0]))"
- ],
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
+ "dddd99644ee94fddb4f8b3b436a5f3c4": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "ProgressStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "ProgressStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "bar_color": null,
+ "description_width": ""
+ }
},
- "id": "pYv_RPZWMzdD",
- "outputId": "7d6b8222-94de-4a61-9f48-dd69cc5d846f"
- },
- "execution_count": 3,
- "outputs": [
- {
- "output_type": "stream",
- "name": "stderr",
- "text": [
- "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
- "/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:588: UserWarning: `num_beams` is set to 1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.\n",
- " warnings.warn(\n"
- ]
+ "028492fb8b4d483cb13d45491353fd66": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
},
- {
- "output_type": "stream",
- "name": "stdout",
- "text": [
- "### TITLE: title 1\n",
- " ### ABSTRACT: abstract 1\n",
- " ###EXPLANATION: explanation 1 3 explanation 1 3\n"
- ]
+ "0fed45330b9b471185fe6d427ecf51f6": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "DescriptionStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "DescriptionStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "description_width": ""
+ }
}
- ]
- },
- {
- "cell_type": "markdown",
- "source": [
- "## Using Simplifine's GPU clusters\n",
- "\n",
- "In the example above, a small Pythia model (160m parameters) on a L4 GPU. Note that we do not use any adapters e.g. LoRA.\n",
- "In the next step, we show how simplifine allows to carry out the same thing, but on GPU clusters. This will use functions of train_utils.\n",
- "\n",
- "By using this command, you can manually pass the parallelization method.\n",
- "\n",
- "If you have a model that is small enough, try using DDP. In this method, each processor (fansy word for GPU!) has a replica of the model and attends to a different sample.\n",
- "\n",
- "You can also utilize ZeRO from DeepSpeed. With this, you can shard the model parameters, activation states and gradients across the GPUs. You also have the option to offload some to CPUs, at the expense of lower throughput.\n",
- "\n",
- "**NOTE**: we currently support L4 and A100 gpus. When initilising the client, you can define which GPU you would like to run your job on. each server goes up to 8 gpus. The default is L4 GPUs."
- ],
- "metadata": {
- "id": "J0OQyt44M6Ei"
}
- },
+ }
+ },
+ "cells": [
{
"cell_type": "markdown",
"source": [
- "# Using DDP to train\n",
- "The example below uses DDP to distribute the training process.\n",
+ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/simplifine-llm/Simplifine/blob/main/examples/cloud_quickstart.ipynb)",
"\n",
+ "### ๐ฆ Installing Required Libraries\n",
"\n",
- "you would need a simplifine API key. contact us for one for free! :)\n",
+ "Before we begin fine-tuning our fake news detector, we need to install the necessary libraries. In this step, weโre installing the `Simplifine` library, which provides tools to streamline the fine-tuning process for large language models. Weโre also installing the `datasets` library, which allows us to easily access and manage datasets from Hugging Face.\n",
"\n",
- "see contact details at our github repo at https://github.com/simplifine-llm/Simplifine/tree/main"
- ],
- "metadata": {
- "id": "csLAwuVmM8Va"
- }
- },
- {
- "cell_type": "code",
- "source": [
- "from simplifine_alpha.train_utils import Client\n",
- "\n",
- "# setting up the client with\n",
- "# enter your simplifine api key below\n",
- "api_key = ''\n",
- "gpu_type = 'a100' # l4 or a100\n",
- "client = Client(api_key=api_key, gpu_type=gpu_type)\n",
- "\n",
- "# simply pass all the arguements you used above, and change ddp ot zero if you want parallelization.\n",
- "client.sft_train_cloud(model_name = model_name, from_hf=from_hf, dataset_name=dataset_name,\n",
- " keys = keys, data = data,\n",
- " template = template, job_name='ddp_job',\n",
- " response_template=response_template, use_zero=False, use_ddp=True)"
- ],
- "metadata": {
- "id": "ynn-NEDEM5qU"
- },
- "execution_count": 5,
- "outputs": []
- },
- {
- "cell_type": "markdown",
- "source": [
- "After sending the query, you can check the status of your jobs. Note that the status is one of the three options:\n",
- "```text\n",
- "status = complete|in progress|pending\n",
- "```"
+ "- The `Simplifine` library helps in making the fine-tuning process more efficient, whether you're working locally or in the cloud.\n",
+ "- The `datasets` library is essential for loading and processing the dataset we'll be using for this project.\n",
+ "\n",
+ "Running this cell will install both libraries quietly in the background.\n"
],
"metadata": {
- "id": "I0cXnfYQPogc"
+ "id": "0SClYIzAQrpD"
}
},
{
"cell_type": "code",
- "source": [
- "status = client.get_all_jobs()\n",
- "for num,i in enumerate(status[-5:]):\n",
- " print(f'Job {num}: {i}')"
- ],
+ "execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
- "id": "AulHnk5-Pqh8",
- "outputId": "375a3336-2ecf-46d1-97e2-f4df5b003686"
+ "id": "lxDXEqYrw-gh",
+ "outputId": "b768c964-b87d-41da-f121-0e83373fbdac"
},
- "execution_count": 6,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
- "Job 0: {'job_id': '544bb4f0-206f-43b7-850e-5e1e9f7b4d23', 'job_name': 'job-4', 'status': 'completed'}\n",
- "Job 1: {'job_id': 'bde91132-9776-41ae-89f9-855dfb116a91', 'job_name': 'ddp_job', 'status': 'completed'}\n",
- "Job 2: {'job_id': 'a1ff54dd-5ee2-4e35-9e78-6868f63dad37', 'job_name': 'zero_example_cloud', 'status': 'completed'}\n",
- "Job 3: {'job_id': '543d3bc3-3ce4-4af6-9f9a-6c0823dcc9b0', 'job_name': 'ddp_job', 'status': 'in progress'}\n",
- "Job 4: {'job_id': '5d55d46a-7793-4c06-9cef-279f03a0f953', 'job_name': 'job_1', 'status': 'pending'}\n"
+ " Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
+ " Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
+ " Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m6.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m547.8/547.8 kB\u001b[0m \u001b[31m249.1 kB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m360.4/360.4 kB\u001b[0m \u001b[31m6.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m296.4/296.4 kB\u001b[0m \u001b[31m6.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m232.6/232.6 kB\u001b[0m \u001b[31m12.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m227.1/227.1 kB\u001b[0m \u001b[31m11.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m245.8/245.8 kB\u001b[0m \u001b[31m12.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m6.8/6.8 MB\u001b[0m \u001b[31m51.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m116.3/116.3 kB\u001b[0m \u001b[31m6.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m316.1/316.1 kB\u001b[0m \u001b[31m13.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m207.3/207.3 kB\u001b[0m \u001b[31m11.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m75.6/75.6 kB\u001b[0m \u001b[31m3.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m77.9/77.9 kB\u001b[0m \u001b[31m4.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m318.9/318.9 kB\u001b[0m \u001b[31m16.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m39.9/39.9 MB\u001b[0m \u001b[31m44.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m301.8/301.8 kB\u001b[0m \u001b[31m12.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m103.4/103.4 kB\u001b[0m \u001b[31m4.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m54.0/54.0 kB\u001b[0m \u001b[31m1.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m5.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m307.2/307.2 kB\u001b[0m \u001b[31m13.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m194.1/194.1 kB\u001b[0m \u001b[31m11.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m62.7/62.7 kB\u001b[0m \u001b[31m3.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m2.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25h Building wheel for simplifine-alpha (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
+ " Building wheel for deepspeed (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+ "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
+ "cudf-cu12 24.4.1 requires pyarrow<15.0.0a0,>=14.0.1, but you have pyarrow 17.0.0 which is incompatible.\n",
+ "gcsfs 2024.6.1 requires fsspec==2024.6.1, but you have fsspec 2024.5.0 which is incompatible.\n",
+ "ibis-framework 8.0.0 requires pyarrow<16,>=2, but you have pyarrow 17.0.0 which is incompatible.\u001b[0m\u001b[31m\n",
+ "\u001b[0m"
]
}
+ ],
+ "source": [
+ "!pip install git+https://github.com/simplifine-llm/Simplifine.git -q\n",
+ "!pip install datasets -q"
]
},
{
"cell_type": "markdown",
"source": [
- "You can also stop an ongoing job, by calling the function below"
+ "### ๐ ๏ธ Setting Up for Local Training\n",
+ "\n",
+ "In this section, weโre preparing to fine-tune our fake news detector model using Google Colabโs resources. The steps below outline how to configure and initiate the training process.\n",
+ "\n",
+ "1. **Importing Libraries:**\n",
+ " - We import `train_engine` from the `Simplifine` library, which provides the necessary functions to handle the fine-tuning process.\n",
+ " - We also import `SFTConfig` from the `trl` library, which allows us to configure the supervised fine-tuning parameters.\n",
+ "\n",
+ "2. **Dataset Selection:**\n",
+ " - We define the dataset name as `'community-datasets/fake_news_english'`. This dataset contains examples of fake news articles that we will use to fine-tune our model.\n",
+ "\n",
+ "3. **Prompt Configuration:**\n",
+ " - We create a `sftPromptConfig` object to specify how the training data is formatted.\n",
+ " - The `template` parameter defines the input format, and the `response_template` specifies how the model should generate outputs.\n",
+ " - The `use_chat_template` flag is set to `True` to format the inputs in a conversational style, which can be effective for chat-based models.\n",
+ "\n",
+ "4. **Training Configuration:**\n",
+ " - We define the training settings using `SFTConfig`. This includes parameters like batch size, learning rate, and the number of epochs.\n",
+ " - We also enable `fp16` (16-bit floating-point) training for faster computation and set `gradient_checkpointing` to save memory during training.\n",
+ "\n",
+ "5. **Model Selection:**\n",
+ " - The model weโre fine-tuning is `'TinyLlama/TinyLlama-1.1B-Chat-v1.0'`. This is a smaller, efficient model suitable for demonstration purposes on Colab.\n",
+ "\n",
+ "6. **Training the Model:**\n",
+ " - Finally, we call `sft_train` to start the fine-tuning process. This step will take a while to complete, as weโre training the model from scratch without any optimizations like quantization or LoRA.\n",
+ "\n",
+ "Running this cell will fine-tune the model locally on Colab, using the configurations weโve set up. This is ideal for quick experiments or when cloud resources are not available."
],
"metadata": {
- "id": "nPgJBtDbXola"
+ "id": "C0dDwmg4Rb3N"
}
},
{
"cell_type": "code",
"source": [
- "stop_running_job = False\n",
- "if stop_running_job:\n",
- " job_id = status[-1]['job_id']\n",
- " client.stop_job(job_id)"
- ],
- "metadata": {
- "id": "AG5HVYBAXr0w"
- },
- "execution_count": null,
- "outputs": []
- },
- {
- "cell_type": "code",
- "source": [
- "# getting the job_id of the last job\n",
- "job_id = status[-1]['job_id']\n",
+ "from simplifine_alpha import train_engine\n",
+ "from trl import SFTConfig\n",
"\n",
- "logs = client.get_train_logs(job_id)\n",
- "print(logs['response'])"
+ "dataset_name = 'community-datasets/fake_news_english'\n",
+ "\n",
+ "# defining prompt config\n",
+ "sft_prompt_config = train_engine.sftPromptConfig(\n",
+ " keys = ['url_of_article', 'fake_or_satire'],\n",
+ " template = \"###URL: {url_of_article}. \\n###CLS: {fake_or_satire}\",\n",
+ " response_template = '. \\n###CLS: ',\n",
+ " use_chat_template=True\n",
+ " )\n",
+ "\n",
+ "# defining training config\n",
+ "sft_config = SFTConfig(\n",
+ " output_dir='/content/fake_news_english_phi3',\n",
+ " per_device_train_batch_size=1,\n",
+ " gradient_accumulation_steps=4,\n",
+ " learning_rate=1e-5,\n",
+ " num_train_epochs=2,\n",
+ " report_to='none',\n",
+ " fp16=True,\n",
+ " gradient_checkpointing=True,\n",
+ ")\n",
+ "\n",
+ "model_name = 'TinyLlama/TinyLlama-1.1B-Chat-v1.0'\n",
+ "\n",
+ "# this is just for demo purposes, this will take a while here, no quantization, no lora...\n",
+ "train_engine.sft_train(model_name=model_name, dataset_name=dataset_name,\n",
+ " sft_config = sft_config, sft_prompt_config=sft_prompt_config,\n",
+ " use_zero=False, use_ddp=False\n",
+ " )"
],
"metadata": {
"colab": {
- "base_uri": "https://localhost:8080/"
+ "base_uri": "https://localhost:8080/",
+ "height": 1000,
+ "referenced_widgets": [
+ "85b43ab19cda4f72a77fdfd5dc096496",
+ "121146c229204799b4d6c7defb3d6474",
+ "05fba94d8a2c4e1c9beda729c24511cf",
+ "b34a0424b9c54f20adfcf36fd31c20ec",
+ "0897b7e7dd4745338bb516eea9a91459",
+ "a70c9f8d1b0d4d46ac64f06e2edb867b",
+ "0082ad32163248f4a2012837a73d07de",
+ "42e2d8e7c622491d9c9acd8dd6d59493",
+ "e46fa81470a445d898f60b4afbc52e02",
+ "7fbd642afa624c3593a484da4e223f2e",
+ "826de6d75f7f4642999e32ad61441cc6",
+ "4d32eb49961e48f1958fc9bc6a494766",
+ "4f6e9afd59974e31b81b06e95ca77704",
+ "ed210f3938024db1af4c5f7c42b2fff7",
+ "fb8aa59768494ac98b6e928f2ca3d99c",
+ "ae34dc9c883b42cab7a342beb229f11f",
+ "0cbfe2fc9b9f46be8b297d9ecd52beba",
+ "a8052d6f4e354f32a8be207154050fbd",
+ "c868c5b13be44b3c8acaf1a04ba29444",
+ "3db4b88f52d8462cb68265dcddf61139",
+ "a25c8ead57594d2aaa103967b5666b1d",
+ "4c6c83f35d25465aa66f59572fb5c109",
+ "793fbbc4eb60486ca9a8dd3a466d30d4",
+ "94e61d92d82446e4b05e55ead678adbb",
+ "7c758e142ba24f978e3620f06ee28d94",
+ "cf24deaffcfd4f518595321ce8060443",
+ "34a12296d04f42a7a4217dc90eb11217",
+ "72af7044a375436ca9797b8d95a2c809",
+ "b2f3791c26664dcba447762f6f7488a7",
+ "43e20b7b3ba542e3bd712306881cca5f",
+ "7d9540c9bbc54bb39778b85862caac67",
+ "30c9b1c3f5ff482293a2674d8d106cdf",
+ "e098b085b9804f559a0c738531d53048",
+ "48a1153f99d1431aa1eed392c2dd73ee",
+ "7a75bdd3a1494108879b98ae19ab51c7",
+ "5ec82f8657724e72af779aad205f34f0",
+ "19ae213a184647e39c97e6de4fcfe58e",
+ "17f04516bd00461781391d37eda14bdb",
+ "ce666f6bca8e43d582e78cb04405e84d",
+ "96c3b56b2c564f0787841d7a9e2d1a0f",
+ "5e2194031c4445a08833e96daca605f6",
+ "3acaea18f63445a8a8e4078203749afc",
+ "bf7943992abf410eb4239e81bf0908e2",
+ "d9fd4fbf92cc4066ba627f43752692a7",
+ "618b14b62e9240adb441445ef43dc187",
+ "88ac8b5a97bf4f10892ba89b9aa8bc22",
+ "ec3e67749fb240049932732f7fc1ec4b",
+ "04f66b0623bb48be9124418d91cb0a9b",
+ "688d4f3db00c48da974c58ef73e5ba8c",
+ "853358201c564b5d9f6c8cfbcde10c11",
+ "7269d969455c4bec8b5cac0f7ed35d2c",
+ "3e7c189588044c9387c67812edc86f24",
+ "d36209a3813d4b52b7806091dfe2a8b6",
+ "c17b106000c9452fb7702b1d583040b3",
+ "aa027bc42cba4c178af51be5775d33a5",
+ "b9741487d3254932a3d7ada3eeefc595",
+ "48014c0699df4921ad0bd436578f5f78",
+ "c16ad9c1067e4c14872be1dde19aa68c",
+ "a679bfa91d9c4ba5b0b766dd21ac7bdf",
+ "28017307068642d9846314f22aba8dd9",
+ "b67317eb1552400ab120b334d2692020",
+ "2020c1a4a675488ca8265d62e7e58f29",
+ "d0ca10eb9f2544eb9b332d919aac0fbf",
+ "54eaff9e53a143098bb3e77f2d824268",
+ "610036373632486fbee44beb1e2e526d",
+ "6a9d87e1c39a4e59b20315bdfd57e799",
+ "c462787a34024ffd84a048697906a5da",
+ "ce13114c190a4580a88b7d87397799cb",
+ "5c41598c9c134bb2a5e6d7ecc8180780",
+ "c76946151342424b9f1c247b8d822c9f",
+ "a76e818268294b72b2eb4f1b3e8e88a0",
+ "60de8ac15a4a460fa518fe05fde546b7",
+ "be8caf785f124615a83a068226c11fc0",
+ "8b481231e25b469d912c89d25f2e130a",
+ "b741b85fa6e14d318ddd2ea5b318f801",
+ "a0e6f8b69797457481f7ff2b785ca099",
+ "ebe980a4978b422e991efc4c358fef14",
+ "62c66fdfd5f945d4b31103b14324c2e5",
+ "fe1ca594239248199310a9e342c1cbf6",
+ "9869a50effa547c1892d2bc030f91b3c",
+ "9039e0f23305429c9e65da5d4d978027",
+ "5e5acc1d1fc3481a889000e1708be0e9",
+ "e2fbee372dbb4d70adfaea19e4689d58",
+ "24470dfeb185451d9c9d8ca77a8d96c2",
+ "34a64bde1da146929adc8add25ab62cf",
+ "3a4b43d1701248ada52d6a1b023c0c03",
+ "69bea92a64e44baa896a9ce46a1bdb9f",
+ "15302520e0824d8bb6047c6e9df14922",
+ "300d1ed9cf5a48d0939c44efaeaa4f0a",
+ "3ed611fdd019473791afe760eadd332c",
+ "ff8654654c8547dca02be130d3b90456",
+ "c6babed63a9c458e9f20533dada1c0ab",
+ "636489b3d9f64e5fb1a1d59cbe03e87c",
+ "666c3dd0a02345f88f4a133348930eb6",
+ "c9c78c49251d44d59274b0ac9476e302",
+ "2f82fa9b538b43fa94bdb0338488bbf3",
+ "5f3d21cdbf63479c95d9716194f8e379",
+ "3fb9a1c091b44f4c8204c51097c33ecd",
+ "161ffef13b4641228cf68d5e0cdb4752",
+ "30b358e1d97442c48fd139910d7bb18d",
+ "5d387c69d2574ff19cf968ac314b08ef",
+ "2d49108838c7419ebff091cb26d01fc1",
+ "362fd834bb1843f5830696d45b1313ac",
+ "d8cf4efccdde487e944376eebc9e1107",
+ "fc6a12045e4140d1984786159e887eb6",
+ "62f7da4f2ff4415680a2a7588176cf60",
+ "4cbdbc2a339c4c5ca1952240a3aad792",
+ "e6dbea9a7f5145d78842da1a8024f121",
+ "9804f53f6ce845f3bd4225341fbb1b83",
+ "d29a9bc16e594ca28b1e910e7351f9f7",
+ "bedbc8d020414d10946648ec8fb35b98",
+ "d8450c7c22664e50b7a2aa8d2aa450ca",
+ "76b45ad5d52a4239a7128458f2e4ff8e",
+ "2e96d67753aa4cdd87d1a12b63ad4557",
+ "8849369ef4c64bde9cd4873a660f0ecb",
+ "30ca5eda0d5e49989a96d62864a6f5ea",
+ "12223cb52da347e4bf44b793323809ab",
+ "3adcdfad6da04e119f6301f6272d94b6",
+ "f8723f4b610d4b649f063de7b55bc284",
+ "b487318c01d44853890029fb496b4347",
+ "9aadd246448343b7af6f1c2506e072c7",
+ "7153c9902f874e87a02ee45d2bb5c3e2",
+ "d10241169ba44a18a7a25a7b7383c318",
+ "c2e992a64b7b4c6b9a2de645d8eda4ef",
+ "9d090fd9a91d48e6b1c4f7ef13dfe0de",
+ "ad06e1bdf44e46658315320afe17398e",
+ "c9abd4782c9e461b98c2982cac983c9e",
+ "212a753d65224d82b483572073416337",
+ "3cc4d8faa6084ff3b4f2725e3b69a3a2",
+ "0be5ab6599be4360b282aa558514fd0c",
+ "17287fdf1c4e4fd0a9117896b4761b6f",
+ "ecaa05f88907468696610a4f623e7517"
+ ]
},
- "id": "ziFDfaygbzOl",
- "outputId": "2e2ca538-f85d-4a87-e50b-442a226fb25b"
+ "id": "uKH1cxpkxFAr",
+ "outputId": "bae79adb-9ed2-49f6-c618-efdb66923cc3"
},
- "execution_count": 7,
+ "execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
- "W0728 18:13:03.377000 134787342856320 torch/distributed/run.py:779] \n",
- "W0728 18:13:03.377000 134787342856320 torch/distributed/run.py:779] *****************************************\n",
- "W0728 18:13:03.377000 134787342856320 torch/distributed/run.py:779] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. \n",
- "W0728 18:13:03.377000 134787342856320 torch/distributed/run.py:779] *****************************************\n",
- "[2024-07-28 18:13:08,712] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
- "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
- "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
- "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
- "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
- "[2024-07-28 18:13:08,803] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
- "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
- "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n",
- "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n",
- "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
- "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
- "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_fwd\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_bwd\n",
- "[2024-07-28 18:13:08,963] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
- "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n",
- "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n",
- "[2024-07-28 18:13:09,002] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
- "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_fwd\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_bwd\n",
- "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
- "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
- "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
- "[2024-07-28 18:13:09,067] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
- "[2024-07-28 18:13:09,073] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
- "[2024-07-28 18:13:09,075] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
- "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
- "[2024-07-28 18:13:09,083] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
- "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
- "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
- "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
- "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
- "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
- "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
- "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
- "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
- "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
- "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
- "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
- "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
- "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
- "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
- "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
- "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
- "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n",
- "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n",
- "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
- "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
- "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_fwd\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_bwd\n",
- "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n",
- "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_fwd\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_bwd\n",
- "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n",
- "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n",
- "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n",
- "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n",
- "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n",
- "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n",
- "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n",
- "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_fwd\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_bwd\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_fwd\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_bwd\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_fwd\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_bwd\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_fwd\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_bwd\n",
- "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
- "\n",
- "Map: 0%| | 0/480 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 480/480 [00:00<00:00, 27874.92 examples/s]\n",
- "\n",
- "Map: 0%| | 0/120 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 120/120 [00:00<00:00, 15003.32 examples/s]\n",
- "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
- "\n",
- "Map: 0%| | 0/480 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 480/480 [00:00<00:00, 27687.08 examples/s]\n",
- "\n",
- "Map: 0%| | 0/120 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 120/120 [00:00<00:00, 13977.91 examples/s]\n",
- "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
- "\n",
- "Map: 0%| | 0/480 [00:00, ? examples/s]Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
- "\n",
- "Map: 100%|โโโโโโโโโโ| 480/480 [00:00<00:00, 25838.26 examples/s]\n",
- "\n",
- "Map: 0%| | 0/120 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 120/120 [00:00<00:00, 13956.59 examples/s]\n",
- "\n",
- "Map: 0%| | 0/480 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 480/480 [00:00<00:00, 27829.07 examples/s]\n",
- "\n",
- "Map: 0%| | 0/120 [00:00, ? examples/s]Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
- "\n",
- "Map: 100%|โโโโโโโโโโ| 120/120 [00:00<00:00, 14669.67 examples/s]\n",
- "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
- "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
- "\n",
- "Map: 0%| | 0/480 [00:00, ? examples/s]Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
- "\n",
- "Map: 0%| | 0/480 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 480/480 [00:00<00:00, 26593.92 examples/s]\n",
- "\n",
- "Map: 100%|โโโโโโโโโโ| 480/480 [00:00<00:00, 26710.35 examples/s]\n",
- "\n",
- "Map: 0%| | 0/480 [00:00, ? examples/s]\n",
- "Map: 0%| | 0/120 [00:00, ? examples/s]\n",
- "Map: 0%| | 0/480 [00:00, ? examples/s]\n",
- "Map: 0%| | 0/120 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 120/120 [00:00<00:00, 9773.33 examples/s]\n",
- "\n",
- "Map: 100%|โโโโโโโโโโ| 480/480 [00:00<00:00, 19004.19 examples/s]\n",
- "\n",
- "Map: 100%|โโโโโโโโโโ| 120/120 [00:00<00:00, 13260.87 examples/s]\n",
- "\n",
- "Map: 100%|โโโโโโโโโโ| 480/480 [00:00<00:00, 24817.45 examples/s]\n",
- "\n",
- "Map: 0%| | 0/120 [00:00, ? examples/s]\n",
- "Map: 0%| | 0/120 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 120/120 [00:00<00:00, 13024.10 examples/s]\n",
- "\n",
- "Map: 100%|โโโโโโโโโโ| 120/120 [00:00<00:00, 13083.01 examples/s]\n",
- "Using CUDA\n",
- "Initializing process group for DDP\n",
- "Using CUDA\n",
- "Initializing process group for DDP\n",
- "Using CUDA\n",
- "Initializing process group for DDP\n",
- "Using CUDA\n",
- "Initializing process group for DDP\n",
- "Using CUDA\n",
- "Initializing process group for DDP\n",
- "Using CUDA\n",
- "Initializing process group for DDP\n",
- "Using CUDA\n",
- "Initializing process group for DDP\n",
- "Using CUDA\n",
- "Initializing process group for DDP\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:494: UserWarning: You passed a dataset that is already processed (contains an `input_ids` field) together with a valid formatting function. Therefore `formatting_func` will be ignored.\n",
- " warnings.warn(\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:494: UserWarning: You passed a dataset that is already processed (contains an `input_ids` field) together with a valid formatting function. Therefore `formatting_func` will be ignored.\n",
- " warnings.warn(\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:494: UserWarning: You passed a dataset that is already processed (contains an `input_ids` field) together with a valid formatting function. Therefore `formatting_func` will be ignored.\n",
- " warnings.warn(\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:494: UserWarning: You passed a dataset that is already processed (contains an `input_ids` field) together with a valid formatting function. Therefore `formatting_func` will be ignored.\n",
- " warnings.warn(\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:494: UserWarning: You passed a dataset that is already processed (contains an `input_ids` field) together with a valid formatting function. Therefore `formatting_func` will be ignored.\n",
- " warnings.warn(\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:494: UserWarning: You passed a dataset that is already processed (contains an `input_ids` field) together with a valid formatting function. Therefore `formatting_func` will be ignored.\n",
- " warnings.warn(\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:494: UserWarning: You passed a dataset that is already processed (contains an `input_ids` field) together with a valid formatting function. Therefore `formatting_func` will be ignored.\n",
- " warnings.warn(\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:494: UserWarning: You passed a dataset that is already processed (contains an `input_ids` field) together with a valid formatting function. Therefore `formatting_func` will be ignored.\n",
- " warnings.warn(\n",
- "\n",
- " 0%| | 0/60 [00:00, ?it/s]\n",
- " 2%|โ | 1/60 [00:00<00:51, 1.14it/s]\n",
- " 7%|โ | 4/60 [00:01<00:11, 4.87it/s]\n",
- " 12%|โโ | 7/60 [00:01<00:06, 8.22it/s]\n",
- " 17%|โโ | 10/60 [00:01<00:04, 11.04it/s]\n",
- " 22%|โโโ | 13/60 [00:01<00:03, 13.51it/s]\n",
- " 27%|โโโ | 16/60 [00:01<00:02, 15.44it/s]\n",
- " 32%|โโโโ | 19/60 [00:01<00:02, 16.91it/s]\n",
- " 37%|โโโโ | 22/60 [00:01<00:02, 17.96it/s]\n",
- " 42%|โโโโโ | 25/60 [00:02<00:01, 18.82it/s]\n",
- " 47%|โโโโโ | 28/60 [00:02<00:01, 19.44it/s]\n",
- " 52%|โโโโโโ | 31/60 [00:02<00:01, 19.90it/s]\n",
- " 57%|โโโโโโ | 34/60 [00:02<00:01, 20.22it/s]\n",
- " 62%|โโโโโโโ | 37/60 [00:02<00:01, 20.47it/s]\n",
- " 67%|โโโโโโโ | 40/60 [00:02<00:00, 20.65it/s]\n",
- " 72%|โโโโโโโโ | 43/60 [00:02<00:00, 20.75it/s]\n",
- " 77%|โโโโโโโโ | 46/60 [00:03<00:00, 20.79it/s]\n",
- " 82%|โโโโโโโโโ | 49/60 [00:03<00:00, 20.83it/s]\n",
- " 87%|โโโโโโโโโ | 52/60 [00:03<00:00, 20.90it/s]\n",
- " 92%|โโโโโโโโโโ| 55/60 [00:03<00:00, 20.94it/s]\n",
- " 97%|โโโโโโโโโโ| 58/60 [00:03<00:00, 20.98it/s]\n",
- " \n",
- "{'train_runtime': 6.5488, 'train_samples_per_second': 73.295, 'train_steps_per_second': 9.162, 'train_loss': 0.1135852018992106, 'epoch': 1.0}\n",
- "\n",
- "100%|โโโโโโโโโโ| 60/60 [00:06<00:00, 20.98it/s]\n",
- "100%|โโโโโโโโโโ| 60/60 [00:06<00:00, 9.16it/s]\n",
- "\n"
+ "[2024-08-06 17:41:44,218] [WARNING] [real_accelerator.py:162:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.\n",
+ "[2024-08-06 17:41:44,222] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cpu (auto detect)\n"
]
- }
- ]
- },
- {
- "cell_type": "markdown",
- "source": [
- "### Downloading\n",
- "The trained model can be downloaded using the \"download_model\" function. it will be a zip file."
- ],
- "metadata": {
- "id": "6kqPRkPNgK07"
- }
- },
- {
- "cell_type": "code",
- "source": [
- "import os\n",
- "\n",
- "# creating a folder to store the model\n",
- "os.mkdir('sf_trained_model')\n",
- "\n",
- "# download and save the model to it.\n",
- "# This might take some time, have a sip of that coffee! :)\n",
- "client.download_model(job_id=job_id, extract_to='/content/sf_trained_model')"
- ],
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
},
- "id": "ivuJ8634gMVr",
- "outputId": "8cc93aef-5a38-4389-d956-ec24451af393"
- },
- "execution_count": 8,
- "outputs": [
{
"output_type": "stream",
- "name": "stderr",
+ "name": "stderr",
+ "text": [
+ "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: \n",
+ "The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
+ "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
+ "You will be able to reuse this secret in all of your notebooks.\n",
+ "Please note that authentication is recommended but still optional to access public models or datasets.\n",
+ " warnings.warn(\n"
+ ]
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "tokenizer_config.json: 0%| | 0.00/1.29k [00:00, ?B/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "85b43ab19cda4f72a77fdfd5dc096496"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "tokenizer.model: 0%| | 0.00/500k [00:00, ?B/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "4d32eb49961e48f1958fc9bc6a494766"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "tokenizer.json: 0%| | 0.00/1.84M [00:00, ?B/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "793fbbc4eb60486ca9a8dd3a466d30d4"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "special_tokens_map.json: 0%| | 0.00/551 [00:00, ?B/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "48a1153f99d1431aa1eed392c2dd73ee"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "Downloading readme: 0%| | 0.00/5.01k [00:00, ?B/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "618b14b62e9240adb441445ef43dc187"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "Downloading data: 0%| | 0.00/43.1k [00:00, ?B/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "b9741487d3254932a3d7ada3eeefc595"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "Generating train split: 0%| | 0/492 [00:00, ? examples/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "c462787a34024ffd84a048697906a5da"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "Map: 0%| | 0/393 [00:00, ? examples/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "62c66fdfd5f945d4b31103b14324c2e5"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "Map: 0%| | 0/99 [00:00, ? examples/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "300d1ed9cf5a48d0939c44efaeaa4f0a"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "stream",
+ "name": "stdout",
"text": [
- "Downloading: 100%|โโโโโโโโโโ| 540M/540M [00:36<00:00, 14.9MiB/s]\n"
+ "Using CPU\n"
]
},
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "config.json: 0%| | 0.00/608 [00:00, ?B/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "30b358e1d97442c48fd139910d7bb18d"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "model.safetensors: 0%| | 0.00/2.20G [00:00, ?B/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "bedbc8d020414d10946648ec8fb35b98"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "generation_config.json: 0%| | 0.00/124 [00:00, ?B/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "7153c9902f874e87a02ee45d2bb5c3e2"
+ }
+ },
+ "metadata": {}
+ },
{
"output_type": "stream",
"name": "stdout",
"text": [
- "\n",
- "Directory downloaded successfully and saved to /content/sf_trained_model/5d55d46a-7793-4c06-9cef-279f03a0f953.zip\n",
- "Model unzipped successfully to /content/sf_trained_model\n",
- "Deleted the zip file at /content/sf_trained_model/5d55d46a-7793-4c06-9cef-279f03a0f953.zip\n",
- "Model downloaded, unzipped, and zip file deleted successfully!\n"
+ "train data set is: Dataset({\n",
+ " features: ['input_ids', 'attention_mask'],\n",
+ " num_rows: 393\n",
+ "}), eval dataset is Dataset({\n",
+ " features: ['input_ids', 'attention_mask'],\n",
+ " num_rows: 99\n",
+ "})\n"
+ ]
+ },
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:289: UserWarning: You didn't pass a `max_seq_length` argument to the SFTTrainer, this will default to 1024\n",
+ " warnings.warn(\n",
+ "/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:505: UserWarning: You passed a dataset that is already processed (contains an `input_ids` field) together with a valid formatting function. Therefore `formatting_func` will be ignored.\n",
+ " warnings.warn(\n",
+ "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.\n",
+ "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.\n",
+ " warnings.warn(\n"
+ ]
+ },
+ {
+ "output_type": "error",
+ "ename": "KeyboardInterrupt",
+ "evalue": "",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 26\u001b[0m \u001b[0mmodel_name\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'TinyLlama/TinyLlama-1.1B-Chat-v1.0'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 27\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 28\u001b[0;31m train_engine.sft_train(model_name=model_name, dataset_name=dataset_name,\n\u001b[0m\u001b[1;32m 29\u001b[0m \u001b[0msft_config\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msft_config\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msft_prompt_config\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msft_prompt_config\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 30\u001b[0m \u001b[0muse_zero\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0muse_ddp\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mFalse\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/simplifine_alpha/train_engine.py\u001b[0m in \u001b[0;36msft_train\u001b[0;34m(model_name, dataset_name, hf_token, dataset_config_name, data_from_hf, do_split, split_ratio, use_peft, lora_config, sft_config, data, wandb_config, use_ddp, use_zero, sft_prompt_config)\u001b[0m\n\u001b[1;32m 842\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmakedirs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moutput_dir_final\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexist_ok\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 843\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 844\u001b[0;31m \u001b[0mtrainer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 845\u001b[0m \u001b[0mtrainer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msave_model\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moutput_dir_final\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 846\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py\u001b[0m in \u001b[0;36mtrain\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 449\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmodel\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_trl_activate_neftune\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmodel\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 450\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 451\u001b[0;31m \u001b[0moutput\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msuper\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 452\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 453\u001b[0m \u001b[0;31m# After training we make sure to retrieve back the original forward pass method\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/transformers/trainer.py\u001b[0m in \u001b[0;36mtrain\u001b[0;34m(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)\u001b[0m\n\u001b[1;32m 1930\u001b[0m \u001b[0mhf_hub_utils\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0menable_progress_bars\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1931\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1932\u001b[0;31m return inner_training_loop(\n\u001b[0m\u001b[1;32m 1933\u001b[0m \u001b[0margs\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1934\u001b[0m \u001b[0mresume_from_checkpoint\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mresume_from_checkpoint\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/transformers/trainer.py\u001b[0m in \u001b[0;36m_inner_training_loop\u001b[0;34m(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)\u001b[0m\n\u001b[1;32m 2266\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2267\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maccelerator\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maccumulate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2268\u001b[0;31m \u001b[0mtr_loss_step\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtraining_step\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2269\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2270\u001b[0m if (\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/transformers/trainer.py\u001b[0m in \u001b[0;36mtraining_step\u001b[0;34m(***failed resolving arguments***)\u001b[0m\n\u001b[1;32m 3322\u001b[0m \u001b[0mscaled_loss\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3323\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 3324\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maccelerator\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mloss\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3325\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3326\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mloss\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdetach\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgradient_accumulation_steps\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py\u001b[0m in \u001b[0;36mbackward\u001b[0;34m(self, loss, **kwargs)\u001b[0m\n\u001b[1;32m 2149\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlomo_backward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mloss\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlearning_rate\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2150\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2151\u001b[0;31m \u001b[0mloss\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2152\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2153\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mset_trigger\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/torch/_tensor.py\u001b[0m in \u001b[0;36mbackward\u001b[0;34m(self, gradient, retain_graph, create_graph, inputs)\u001b[0m\n\u001b[1;32m 523\u001b[0m \u001b[0minputs\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0minputs\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 524\u001b[0m )\n\u001b[0;32m--> 525\u001b[0;31m torch.autograd.backward(\n\u001b[0m\u001b[1;32m 526\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgradient\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mretain_graph\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcreate_graph\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minputs\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0minputs\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 527\u001b[0m )\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py\u001b[0m in \u001b[0;36mbackward\u001b[0;34m(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)\u001b[0m\n\u001b[1;32m 265\u001b[0m \u001b[0;31m# some Python versions print out the first line of a multi-line function\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 266\u001b[0m \u001b[0;31m# calls in the traceback and some print out the last line\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 267\u001b[0;31m _engine_run_backward(\n\u001b[0m\u001b[1;32m 268\u001b[0m \u001b[0mtensors\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 269\u001b[0m \u001b[0mgrad_tensors_\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py\u001b[0m in \u001b[0;36m_engine_run_backward\u001b[0;34m(t_outputs, *args, **kwargs)\u001b[0m\n\u001b[1;32m 742\u001b[0m \u001b[0munregister_hooks\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_register_logging_hooks_on_whole_graph\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mt_outputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 743\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 744\u001b[0;31m return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass\n\u001b[0m\u001b[1;32m 745\u001b[0m \u001b[0mt_outputs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 746\u001b[0m ) # Calls into the C++ engine to run the backward pass\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py\u001b[0m in \u001b[0;36mapply\u001b[0;34m(self, *args)\u001b[0m\n\u001b[1;32m 299\u001b[0m )\n\u001b[1;32m 300\u001b[0m \u001b[0muser_fn\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mvjp_fn\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mvjp_fn\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mFunction\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvjp\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0mbackward_fn\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 301\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0muser_fn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 302\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 303\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mapply_jvp\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py\u001b[0m in \u001b[0;36mbackward\u001b[0;34m(ctx, *args)\u001b[0m\n\u001b[1;32m 318\u001b[0m \u001b[0;34m\" this checkpoint() is not necessary\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 319\u001b[0m )\n\u001b[0;32m--> 320\u001b[0;31m \u001b[0mtorch\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mautograd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moutputs_with_grad\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0margs_with_grad\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 321\u001b[0m grads = tuple(\n\u001b[1;32m 322\u001b[0m \u001b[0minp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgrad\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minp\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtorch\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mTensor\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py\u001b[0m in \u001b[0;36mbackward\u001b[0;34m(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)\u001b[0m\n\u001b[1;32m 265\u001b[0m \u001b[0;31m# some Python versions print out the first line of a multi-line function\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 266\u001b[0m \u001b[0;31m# calls in the traceback and some print out the last line\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 267\u001b[0;31m _engine_run_backward(\n\u001b[0m\u001b[1;32m 268\u001b[0m \u001b[0mtensors\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 269\u001b[0m \u001b[0mgrad_tensors_\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py\u001b[0m in \u001b[0;36m_engine_run_backward\u001b[0;34m(t_outputs, *args, **kwargs)\u001b[0m\n\u001b[1;32m 742\u001b[0m \u001b[0munregister_hooks\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_register_logging_hooks_on_whole_graph\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mt_outputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 743\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 744\u001b[0;31m return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass\n\u001b[0m\u001b[1;32m 745\u001b[0m \u001b[0mt_outputs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 746\u001b[0m ) # Calls into the C++ engine to run the backward pass\n",
+ "\u001b[0;31mKeyboardInterrupt\u001b[0m: "
]
}
]
@@ -3310,62 +6031,72 @@
{
"cell_type": "markdown",
"source": [
- "Finally, we test loading the model!"
+ "### โ๏ธ Training the Model on Cloud Servers\n",
+ "\n",
+ "In this section, weโre moving from local training to cloud-based training using Simplifineโs cloud infrastructure. This allows you to leverage powerful GPUs like the A100 for more intensive tasks, making it easier to handle larger models and datasets.\n",
+ "\n",
+ "1. **Importing the `train_utils` Module:**\n",
+ " - We start by importing the `train_utils` module from the `Simplifine` library. This module provides utilities to interact with Simplifine's cloud servers.\n",
+ "\n",
+ "2. **Model and API Configuration:**\n",
+ " - We select a different model for this cloud training: `'microsoft/Phi-3-mini-4k-instruct'`. This model is more powerful and well-suited for deployment on cloud GPUs.\n",
+ " - The `simplifine_api_key` is your unique key to access Simplifineโs cloud services. Ensure you have it ready.\n",
+ " - The `gpu_type` is set to `'a100'`, which specifies the type of GPU to be used in the cloud. The A100 is a high-performance GPU ideal for deep learning tasks.\n",
+ "\n",
+ " ### ๐ Need an API Key?\n",
+ " If you don't have an API key yet, you can [**request one here for free**](https://www.simplifine.com/api-key-interest). The turnaround time is just 24 hours, so you'll be up and running in no time!\n",
+ "\n",
+ "3. **Client Initialization:**\n",
+ " - We create a `Client` object using the API key and GPU type. This client will handle the communication with Simplifineโs cloud infrastructure, managing the training job on your behalf.\n",
+ "\n",
+ "4. **Defining the Training Job:**\n",
+ " - The `job_name` is set to `'fake_news_english_phi3'`, which uniquely identifies this training task.\n",
+ " - We then call the `sft_train_cloud` method on our `client` object. This method sends the training job to the cloud, using the model and configurations weโve defined earlier.\n",
+ "\n",
+ "5. **Cloud Training Setup:**\n",
+ " - We enable `use_zero=True` to utilize DeepSpeed's ZeRO optimization, allowing the model to scale effectively across multiple GPUs.\n",
+ " - We disable Distributed Data Parallel (DDP) for this job, which is appropriate when ZeRO is handling the distribution of data.\n",
+ "\n",
+ "Running this cell will initiate the training process on Simplifineโs cloud servers, allowing you to offload the heavy lifting to a powerful cloud infrastructure. This is ideal when working with larger models or when your local resources are insufficient.\n"
],
"metadata": {
- "id": "67SHOrw0gUhM"
+ "id": "oehMA7hwRky5"
}
},
{
"cell_type": "code",
"source": [
- "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
- "\n",
- "path = '/content/sf_trained_model'\n",
- "sf_model = AutoModelForCausalLM.from_pretrained(path)\n",
- "sf_tokenizer = AutoTokenizer.from_pretrained(path)\n",
+ "from simplifine_alpha import train_utils\n",
"\n",
- "input_example = '''### TITLE: title 1\\n ### ABSTRACT: abstract 1\\n ###EXPLANATION: '''\n",
+ "# change name to phi 3\n",
+ "model_name = 'microsoft/Phi-3-mini-4k-instruct'\n",
+ "simplifine_api_key = 'PUT YOUR OWN API KEY PROVIDED BY SIMPLIFINE'\n",
+ "gpu_type = 'a100'\n",
+ "client = train_utils.Client(simplifine_api_key, gpu_type)\n",
"\n",
- "input_example = sf_tokenizer(input_example, return_tensors='pt')\n",
+ "job_name = 'fake_news_english_phi3'\n",
"\n",
- "output = sf_model.generate(input_example['input_ids'],\n",
- " attention_mask=input_example['attention_mask'],\n",
- " max_length=30,eos_token_id=sf_tokenizer.eos_token_id,\n",
- " early_stopping=True,\n",
- " pad_token_id=sf_tokenizer.eos_token_id\n",
- ")\n",
"\n",
- "print(sf_tokenizer.decode(output[0]))"
+ "client.sft_train_cloud(job_name=job_name, model_name=model_name, dataset_name=dataset_name,\n",
+ " sft_config = sft_config, sft_prompt_config=sft_prompt_config,\n",
+ " use_zero=True, use_ddp=False\n",
+ " )"
],
"metadata": {
+ "id": "O1zdn8r85n-o",
"colab": {
"base_uri": "https://localhost:8080/"
},
- "id": "IsUWweRdgVgZ",
- "outputId": "ac088e5b-a57d-4640-d8ab-74bdbe631dbc"
+ "outputId": "d2510f4d-5246-4631-df37-8a741cf92240"
},
- "execution_count": 9,
+ "execution_count": null,
"outputs": [
- {
- "output_type": "stream",
- "name": "stderr",
- "text": [
- "Some weights of the model checkpoint at /content/sf_trained_model were not used when initializing GPTNeoXForCausalLM: ['module.embed_out.weight', 'module.gpt_neox.embed_in.weight', 'module.gpt_neox.final_layer_norm.bias', 'module.gpt_neox.final_layer_norm.weight', 'module.gpt_neox.layers.0.attention.dense.bias', 'module.gpt_neox.layers.0.attention.dense.weight', 'module.gpt_neox.layers.0.attention.query_key_value.bias', 'module.gpt_neox.layers.0.attention.query_key_value.weight', 'module.gpt_neox.layers.0.input_layernorm.bias', 'module.gpt_neox.layers.0.input_layernorm.weight', 'module.gpt_neox.layers.0.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.0.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.0.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.0.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.0.post_attention_layernorm.bias', 'module.gpt_neox.layers.0.post_attention_layernorm.weight', 'module.gpt_neox.layers.1.attention.dense.bias', 'module.gpt_neox.layers.1.attention.dense.weight', 'module.gpt_neox.layers.1.attention.query_key_value.bias', 'module.gpt_neox.layers.1.attention.query_key_value.weight', 'module.gpt_neox.layers.1.input_layernorm.bias', 'module.gpt_neox.layers.1.input_layernorm.weight', 'module.gpt_neox.layers.1.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.1.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.1.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.1.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.1.post_attention_layernorm.bias', 'module.gpt_neox.layers.1.post_attention_layernorm.weight', 'module.gpt_neox.layers.10.attention.dense.bias', 'module.gpt_neox.layers.10.attention.dense.weight', 'module.gpt_neox.layers.10.attention.query_key_value.bias', 'module.gpt_neox.layers.10.attention.query_key_value.weight', 'module.gpt_neox.layers.10.input_layernorm.bias', 'module.gpt_neox.layers.10.input_layernorm.weight', 'module.gpt_neox.layers.10.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.10.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.10.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.10.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.10.post_attention_layernorm.bias', 'module.gpt_neox.layers.10.post_attention_layernorm.weight', 'module.gpt_neox.layers.11.attention.dense.bias', 'module.gpt_neox.layers.11.attention.dense.weight', 'module.gpt_neox.layers.11.attention.query_key_value.bias', 'module.gpt_neox.layers.11.attention.query_key_value.weight', 'module.gpt_neox.layers.11.input_layernorm.bias', 'module.gpt_neox.layers.11.input_layernorm.weight', 'module.gpt_neox.layers.11.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.11.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.11.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.11.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.11.post_attention_layernorm.bias', 'module.gpt_neox.layers.11.post_attention_layernorm.weight', 'module.gpt_neox.layers.2.attention.dense.bias', 'module.gpt_neox.layers.2.attention.dense.weight', 'module.gpt_neox.layers.2.attention.query_key_value.bias', 'module.gpt_neox.layers.2.attention.query_key_value.weight', 'module.gpt_neox.layers.2.input_layernorm.bias', 'module.gpt_neox.layers.2.input_layernorm.weight', 'module.gpt_neox.layers.2.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.2.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.2.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.2.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.2.post_attention_layernorm.bias', 'module.gpt_neox.layers.2.post_attention_layernorm.weight', 'module.gpt_neox.layers.3.attention.dense.bias', 'module.gpt_neox.layers.3.attention.dense.weight', 'module.gpt_neox.layers.3.attention.query_key_value.bias', 'module.gpt_neox.layers.3.attention.query_key_value.weight', 'module.gpt_neox.layers.3.input_layernorm.bias', 'module.gpt_neox.layers.3.input_layernorm.weight', 'module.gpt_neox.layers.3.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.3.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.3.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.3.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.3.post_attention_layernorm.bias', 'module.gpt_neox.layers.3.post_attention_layernorm.weight', 'module.gpt_neox.layers.4.attention.dense.bias', 'module.gpt_neox.layers.4.attention.dense.weight', 'module.gpt_neox.layers.4.attention.query_key_value.bias', 'module.gpt_neox.layers.4.attention.query_key_value.weight', 'module.gpt_neox.layers.4.input_layernorm.bias', 'module.gpt_neox.layers.4.input_layernorm.weight', 'module.gpt_neox.layers.4.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.4.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.4.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.4.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.4.post_attention_layernorm.bias', 'module.gpt_neox.layers.4.post_attention_layernorm.weight', 'module.gpt_neox.layers.5.attention.dense.bias', 'module.gpt_neox.layers.5.attention.dense.weight', 'module.gpt_neox.layers.5.attention.query_key_value.bias', 'module.gpt_neox.layers.5.attention.query_key_value.weight', 'module.gpt_neox.layers.5.input_layernorm.bias', 'module.gpt_neox.layers.5.input_layernorm.weight', 'module.gpt_neox.layers.5.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.5.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.5.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.5.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.5.post_attention_layernorm.bias', 'module.gpt_neox.layers.5.post_attention_layernorm.weight', 'module.gpt_neox.layers.6.attention.dense.bias', 'module.gpt_neox.layers.6.attention.dense.weight', 'module.gpt_neox.layers.6.attention.query_key_value.bias', 'module.gpt_neox.layers.6.attention.query_key_value.weight', 'module.gpt_neox.layers.6.input_layernorm.bias', 'module.gpt_neox.layers.6.input_layernorm.weight', 'module.gpt_neox.layers.6.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.6.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.6.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.6.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.6.post_attention_layernorm.bias', 'module.gpt_neox.layers.6.post_attention_layernorm.weight', 'module.gpt_neox.layers.7.attention.dense.bias', 'module.gpt_neox.layers.7.attention.dense.weight', 'module.gpt_neox.layers.7.attention.query_key_value.bias', 'module.gpt_neox.layers.7.attention.query_key_value.weight', 'module.gpt_neox.layers.7.input_layernorm.bias', 'module.gpt_neox.layers.7.input_layernorm.weight', 'module.gpt_neox.layers.7.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.7.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.7.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.7.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.7.post_attention_layernorm.bias', 'module.gpt_neox.layers.7.post_attention_layernorm.weight', 'module.gpt_neox.layers.8.attention.dense.bias', 'module.gpt_neox.layers.8.attention.dense.weight', 'module.gpt_neox.layers.8.attention.query_key_value.bias', 'module.gpt_neox.layers.8.attention.query_key_value.weight', 'module.gpt_neox.layers.8.input_layernorm.bias', 'module.gpt_neox.layers.8.input_layernorm.weight', 'module.gpt_neox.layers.8.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.8.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.8.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.8.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.8.post_attention_layernorm.bias', 'module.gpt_neox.layers.8.post_attention_layernorm.weight', 'module.gpt_neox.layers.9.attention.dense.bias', 'module.gpt_neox.layers.9.attention.dense.weight', 'module.gpt_neox.layers.9.attention.query_key_value.bias', 'module.gpt_neox.layers.9.attention.query_key_value.weight', 'module.gpt_neox.layers.9.input_layernorm.bias', 'module.gpt_neox.layers.9.input_layernorm.weight', 'module.gpt_neox.layers.9.mlp.dense_4h_to_h.bias', 'module.gpt_neox.layers.9.mlp.dense_4h_to_h.weight', 'module.gpt_neox.layers.9.mlp.dense_h_to_4h.bias', 'module.gpt_neox.layers.9.mlp.dense_h_to_4h.weight', 'module.gpt_neox.layers.9.post_attention_layernorm.bias', 'module.gpt_neox.layers.9.post_attention_layernorm.weight']\n",
- "- This IS expected if you are initializing GPTNeoXForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
- "- This IS NOT expected if you are initializing GPTNeoXForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
- "Some weights of GPTNeoXForCausalLM were not initialized from the model checkpoint at /content/sf_trained_model and are newly initialized: ['embed_in.weight', 'embed_out.weight', 'final_layer_norm.bias', 'final_layer_norm.weight', 'layers.0.attention.dense.bias', 'layers.0.attention.dense.weight', 'layers.0.attention.query_key_value.bias', 'layers.0.attention.query_key_value.weight', 'layers.0.input_layernorm.bias', 'layers.0.input_layernorm.weight', 'layers.0.mlp.dense_4h_to_h.bias', 'layers.0.mlp.dense_4h_to_h.weight', 'layers.0.mlp.dense_h_to_4h.bias', 'layers.0.mlp.dense_h_to_4h.weight', 'layers.0.post_attention_layernorm.bias', 'layers.0.post_attention_layernorm.weight', 'layers.1.attention.dense.bias', 'layers.1.attention.dense.weight', 'layers.1.attention.query_key_value.bias', 'layers.1.attention.query_key_value.weight', 'layers.1.input_layernorm.bias', 'layers.1.input_layernorm.weight', 'layers.1.mlp.dense_4h_to_h.bias', 'layers.1.mlp.dense_4h_to_h.weight', 'layers.1.mlp.dense_h_to_4h.bias', 'layers.1.mlp.dense_h_to_4h.weight', 'layers.1.post_attention_layernorm.bias', 'layers.1.post_attention_layernorm.weight', 'layers.10.attention.dense.bias', 'layers.10.attention.dense.weight', 'layers.10.attention.query_key_value.bias', 'layers.10.attention.query_key_value.weight', 'layers.10.input_layernorm.bias', 'layers.10.input_layernorm.weight', 'layers.10.mlp.dense_4h_to_h.bias', 'layers.10.mlp.dense_4h_to_h.weight', 'layers.10.mlp.dense_h_to_4h.bias', 'layers.10.mlp.dense_h_to_4h.weight', 'layers.10.post_attention_layernorm.bias', 'layers.10.post_attention_layernorm.weight', 'layers.11.attention.dense.bias', 'layers.11.attention.dense.weight', 'layers.11.attention.query_key_value.bias', 'layers.11.attention.query_key_value.weight', 'layers.11.input_layernorm.bias', 'layers.11.input_layernorm.weight', 'layers.11.mlp.dense_4h_to_h.bias', 'layers.11.mlp.dense_4h_to_h.weight', 'layers.11.mlp.dense_h_to_4h.bias', 'layers.11.mlp.dense_h_to_4h.weight', 'layers.11.post_attention_layernorm.bias', 'layers.11.post_attention_layernorm.weight', 'layers.2.attention.dense.bias', 'layers.2.attention.dense.weight', 'layers.2.attention.query_key_value.bias', 'layers.2.attention.query_key_value.weight', 'layers.2.input_layernorm.bias', 'layers.2.input_layernorm.weight', 'layers.2.mlp.dense_4h_to_h.bias', 'layers.2.mlp.dense_4h_to_h.weight', 'layers.2.mlp.dense_h_to_4h.bias', 'layers.2.mlp.dense_h_to_4h.weight', 'layers.2.post_attention_layernorm.bias', 'layers.2.post_attention_layernorm.weight', 'layers.3.attention.dense.bias', 'layers.3.attention.dense.weight', 'layers.3.attention.query_key_value.bias', 'layers.3.attention.query_key_value.weight', 'layers.3.input_layernorm.bias', 'layers.3.input_layernorm.weight', 'layers.3.mlp.dense_4h_to_h.bias', 'layers.3.mlp.dense_4h_to_h.weight', 'layers.3.mlp.dense_h_to_4h.bias', 'layers.3.mlp.dense_h_to_4h.weight', 'layers.3.post_attention_layernorm.bias', 'layers.3.post_attention_layernorm.weight', 'layers.4.attention.dense.bias', 'layers.4.attention.dense.weight', 'layers.4.attention.query_key_value.bias', 'layers.4.attention.query_key_value.weight', 'layers.4.input_layernorm.bias', 'layers.4.input_layernorm.weight', 'layers.4.mlp.dense_4h_to_h.bias', 'layers.4.mlp.dense_4h_to_h.weight', 'layers.4.mlp.dense_h_to_4h.bias', 'layers.4.mlp.dense_h_to_4h.weight', 'layers.4.post_attention_layernorm.bias', 'layers.4.post_attention_layernorm.weight', 'layers.5.attention.dense.bias', 'layers.5.attention.dense.weight', 'layers.5.attention.query_key_value.bias', 'layers.5.attention.query_key_value.weight', 'layers.5.input_layernorm.bias', 'layers.5.input_layernorm.weight', 'layers.5.mlp.dense_4h_to_h.bias', 'layers.5.mlp.dense_4h_to_h.weight', 'layers.5.mlp.dense_h_to_4h.bias', 'layers.5.mlp.dense_h_to_4h.weight', 'layers.5.post_attention_layernorm.bias', 'layers.5.post_attention_layernorm.weight', 'layers.6.attention.dense.bias', 'layers.6.attention.dense.weight', 'layers.6.attention.query_key_value.bias', 'layers.6.attention.query_key_value.weight', 'layers.6.input_layernorm.bias', 'layers.6.input_layernorm.weight', 'layers.6.mlp.dense_4h_to_h.bias', 'layers.6.mlp.dense_4h_to_h.weight', 'layers.6.mlp.dense_h_to_4h.bias', 'layers.6.mlp.dense_h_to_4h.weight', 'layers.6.post_attention_layernorm.bias', 'layers.6.post_attention_layernorm.weight', 'layers.7.attention.dense.bias', 'layers.7.attention.dense.weight', 'layers.7.attention.query_key_value.bias', 'layers.7.attention.query_key_value.weight', 'layers.7.input_layernorm.bias', 'layers.7.input_layernorm.weight', 'layers.7.mlp.dense_4h_to_h.bias', 'layers.7.mlp.dense_4h_to_h.weight', 'layers.7.mlp.dense_h_to_4h.bias', 'layers.7.mlp.dense_h_to_4h.weight', 'layers.7.post_attention_layernorm.bias', 'layers.7.post_attention_layernorm.weight', 'layers.8.attention.dense.bias', 'layers.8.attention.dense.weight', 'layers.8.attention.query_key_value.bias', 'layers.8.attention.query_key_value.weight', 'layers.8.input_layernorm.bias', 'layers.8.input_layernorm.weight', 'layers.8.mlp.dense_4h_to_h.bias', 'layers.8.mlp.dense_4h_to_h.weight', 'layers.8.mlp.dense_h_to_4h.bias', 'layers.8.mlp.dense_h_to_4h.weight', 'layers.8.post_attention_layernorm.bias', 'layers.8.post_attention_layernorm.weight', 'layers.9.attention.dense.bias', 'layers.9.attention.dense.weight', 'layers.9.attention.query_key_value.bias', 'layers.9.attention.query_key_value.weight', 'layers.9.input_layernorm.bias', 'layers.9.input_layernorm.weight', 'layers.9.mlp.dense_4h_to_h.bias', 'layers.9.mlp.dense_4h_to_h.weight', 'layers.9.mlp.dense_h_to_4h.bias', 'layers.9.mlp.dense_h_to_4h.weight', 'layers.9.post_attention_layernorm.bias', 'layers.9.post_attention_layernorm.weight']\n",
- "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
- "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
- ]
- },
{
"output_type": "stream",
"name": "stdout",
"text": [
- "### TITLE: title 1\n",
- " ### ABSTRACT: abstract 1\n",
- " ###EXPLANATION: rugu stretmediate complains GermanServ\n"
+ "[2024-08-07 18:34:35,105] [WARNING] [real_accelerator.py:162:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.\n",
+ "[2024-08-07 18:34:35,110] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cpu (auto detect)\n"
]
}
]
@@ -3373,68 +6104,80 @@
{
"cell_type": "markdown",
"source": [
- "### Using ZeRO\n",
- "ZeRO is a strong tool when a model cannot fit on GPU memory, so it is sharded across them (parameters, gradients and activations). Further memory reduction could be by enabling fp16/bf16, and gradient_checkpointing."
+ "### ๐ Checking the Status of Your Training Jobs\n",
+ "\n",
+ "After submitting your training job to Simplifineโs cloud servers, itโs important to monitor its status to ensure everything is running smoothly. In this section, weโll check the status of your most recent job.\n",
+ "\n",
+ "1. **Retrieving Job Status:**\n",
+ " - We call the `get_all_jobs` method on our `client` object. This method returns a list of all jobs associated with your API key, including their current statuses.\n",
+ "\n",
+ "2. **Displaying the Latest Job:**\n",
+ " - We loop through the latest job in the list and print its status. This gives you a quick overview of how your most recent training job is progressing.\n",
+ "\n",
+ "3. **Understanding Job Statuses:**\n",
+ " - Your job can have one of the following statuses:\n",
+ " - `pending`: The job has been submitted and is waiting to start.\n",
+ " - `in progress`: The job is currently running.\n",
+ " - `stopped`: The job was stopped before completion, either manually or due to an error.\n",
+ " - `completed`: The job has successfully finished.\n",
+ "\n",
+ "Running this cell will display the status of your most recent job, helping you keep track of your training tasks on Simplifineโs cloud servers.\n"
],
"metadata": {
- "id": "os5pt22OgZc3"
+ "id": "W88J_Ef7yaYG"
}
},
{
"cell_type": "code",
"source": [
- "# This time, we just change the use_zero arg to True, and opposite to use_ddp.\n",
- "client.sft_train_cloud(model_name = model_name, from_hf=from_hf, dataset_name=dataset_name,\n",
- " keys = keys, data = data,\n",
- " template = template, job_name='zero_example_cloud',\n",
- " response_template=response_template, use_zero=True, use_ddp=False)"
- ],
- "metadata": {
- "id": "m3LGu5ZYga2y"
- },
- "execution_count": 10,
- "outputs": []
- },
- {
- "cell_type": "code",
- "source": [
- "# repeat the same step of extracting jobs and ids\n",
"status = client.get_all_jobs()\n",
- "\n",
- "for num,i in enumerate(status[-5:]):\n",
- " print(f'Number {num} status: {i}\\n')"
+ "for num,i in enumerate(status[-1:]):\n",
+ " print(f'Job {num}: {i}')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
- "id": "poAxZn-bgcnC",
- "outputId": "a78dede0-ddbd-4d1a-f80d-4f9b06b16a90"
+ "id": "l70vZyPV6_AC",
+ "outputId": "b32db3fe-e353-4105-e8b7-63a772d7ccde"
},
- "execution_count": 11,
+ "execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
- "Number 0 status: {'job_id': 'bde91132-9776-41ae-89f9-855dfb116a91', 'job_name': 'ddp_job', 'status': 'completed'}\n",
- "\n",
- "Number 1 status: {'job_id': 'a1ff54dd-5ee2-4e35-9e78-6868f63dad37', 'job_name': 'zero_example_cloud', 'status': 'completed'}\n",
- "\n",
- "Number 2 status: {'job_id': '543d3bc3-3ce4-4af6-9f9a-6c0823dcc9b0', 'job_name': 'ddp_job', 'status': 'completed'}\n",
- "\n",
- "Number 3 status: {'job_id': '5d55d46a-7793-4c06-9cef-279f03a0f953', 'job_name': 'job_1', 'status': 'completed'}\n",
- "\n",
- "Number 4 status: {'job_id': '42d965c0-773f-4b45-8dfb-a4f310e6606e', 'job_name': 'zero_example_cloud', 'status': 'in progress'}\n",
- "\n"
+ "Job 0: {'job_id': '183c65ad-2b4e-4d11-b2a5-d66232d5b15b', 'job_name': 'fake_news_english_phi3', 'status': 'completed'}\n"
]
}
]
},
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### ๐ Retrieving and Viewing Training Logs\n",
+ "\n",
+ "After checking the status of your training job, you might want to dive deeper into the details by viewing the training logs. These logs provide insights into the training process, including any issues or updates on the progress.\n",
+ "\n",
+ "1. **Getting the `job_id`:**\n",
+ " - We start by extracting the `job_id` of the last job from the status list. The `job_id` is a unique identifier for each training job, which weโll use to retrieve its logs.\n",
+ "\n",
+ "2. **Retrieving Logs:**\n",
+ " - We call the `get_train_logs` method on our `client` object, passing in the `job_id`. This method fetches the detailed logs for the specified job, giving you access to the complete training history.\n",
+ "\n",
+ "3. **Viewing the Logs:**\n",
+ " - Finally, we print the `response` from the logs, which contains detailed information about the training process. This includes updates, errors, and any other relevant messages from the training run.\n",
+ "\n",
+ "Running this cell will display the logs for your most recent job, allowing you to monitor and troubleshoot the training process effectively.\n"
+ ],
+ "metadata": {
+ "id": "BDe93gbayl_n"
+ }
+ },
{
"cell_type": "code",
"source": [
- "# extracting logs again\n",
+ "# getting the job_id of the last job\n",
"job_id = status[-1]['job_id']\n",
"\n",
"logs = client.get_train_logs(job_id)\n",
@@ -3444,64 +6187,76 @@
"colab": {
"base_uri": "https://localhost:8080/"
},
- "id": "8zHTeTBmgzm7",
- "outputId": "d5ada91a-76c1-48ac-df4b-ed35bf38661d"
+ "id": "jt35FPNn8ADK",
+ "outputId": "1de668ed-718e-452d-eb85-0632d7652008"
},
- "execution_count": 12,
+ "execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
- "W0728 18:16:44.514000 133239404900480 torch/distributed/run.py:779] \n",
- "W0728 18:16:44.514000 133239404900480 torch/distributed/run.py:779] *****************************************\n",
- "W0728 18:16:44.514000 133239404900480 torch/distributed/run.py:779] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. \n",
- "W0728 18:16:44.514000 133239404900480 torch/distributed/run.py:779] *****************************************\n",
- "[2024-07-28 18:16:49,912] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
- "[2024-07-28 18:16:49,967] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
+ "W0806 18:14:41.510000 129132731527296 torch/distributed/run.py:779] \n",
+ "W0806 18:14:41.510000 129132731527296 torch/distributed/run.py:779] *****************************************\n",
+ "W0806 18:14:41.510000 129132731527296 torch/distributed/run.py:779] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. \n",
+ "W0806 18:14:41.510000 129132731527296 torch/distributed/run.py:779] *****************************************\n",
+ "[2024-08-06 18:14:46,878] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
+ "[2024-08-06 18:14:46,910] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
"\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
+ "[2024-08-06 18:14:46,961] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
"\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
"\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
"\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
"\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
- "[2024-07-28 18:16:50,049] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
"\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
"\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
"\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
- "[2024-07-28 18:16:50,075] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
- "[2024-07-28 18:16:50,082] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
"\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
+ "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
+ "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
+ "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
+ "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n",
+ "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n",
+ "[2024-08-06 18:14:47,065] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
+ "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
+ " @autocast_custom_fwd\n",
+ "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
+ " @autocast_custom_bwd\n",
"\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n",
"\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n",
+ "[2024-08-06 18:14:47,135] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
+ "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
+ "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
+ " @autocast_custom_fwd\n",
+ "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
+ " @autocast_custom_bwd\n",
"\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
"\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
"\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
- "[2024-07-28 18:16:50,149] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
- "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
- "[2024-07-28 18:16:50,153] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
+ "[2024-08-06 18:14:47,158] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
+ "[2024-08-06 18:14:47,172] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
+ "[2024-08-06 18:14:47,194] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
"\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
+ "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n",
+ "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n",
"\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
"\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
"\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
+ "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
"\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
"\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
"\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
- "[2024-07-28 18:16:50,168] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
+ "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
" @autocast_custom_fwd\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
" @autocast_custom_bwd\n",
- "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n",
- "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n",
- "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
- "\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
- "\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
- "\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
- "\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
"\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
"\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
"\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
"\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
+ "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n",
+ "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n",
"\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
"\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
"\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
@@ -3523,14 +6278,6 @@
" @autocast_custom_fwd\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
" @autocast_custom_bwd\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_fwd\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_bwd\n",
- "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n",
- "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n",
- "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n",
- "\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n",
"\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n",
"\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
@@ -3541,85 +6288,232 @@
" @autocast_custom_fwd\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
" @autocast_custom_bwd\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_fwd\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
- " @autocast_custom_bwd\n",
+ "[2024-08-06 18:14:48,688] [INFO] [comm.py:637:init_distributed] cdb=None\n",
+ "[2024-08-06 18:14:48,695] [INFO] [comm.py:637:init_distributed] cdb=None\n",
+ "[2024-08-06 18:14:48,785] [INFO] [comm.py:637:init_distributed] cdb=None\n",
+ "[2024-08-06 18:14:48,850] [INFO] [comm.py:637:init_distributed] cdb=None\n",
+ "[2024-08-06 18:14:48,890] [INFO] [comm.py:637:init_distributed] cdb=None\n",
+ "Destroying existing process group\n",
+ "Destroying existing process group\n",
+ "[2024-08-06 18:14:48,922] [INFO] [comm.py:637:init_distributed] cdb=None\n",
+ "[2024-08-06 18:14:48,946] [INFO] [comm.py:637:init_distributed] cdb=None\n",
+ "[2024-08-06 18:14:48,947] [INFO] [comm.py:637:init_distributed] cdb=None\n",
+ "Destroying existing process group\n",
+ "Destroying existing process group\n",
+ "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
+ "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
+ "Destroying existing process group\n",
+ "Destroying existing process group\n",
+ "Destroying existing process group\n",
+ "Destroying existing process group\n",
+ "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
+ "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
+ "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
"Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
+ "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
+ "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
+ "\n",
+ "Map: 0%| | 0/393 [00:00, ? examples/s]\n",
+ "Map: 0%| | 0/393 [00:00, ? examples/s]\n",
+ "Map: 0%| | 0/393 [00:00, ? examples/s]\n",
+ "Map: 0%| | 0/393 [00:00, ? examples/s]\n",
+ "Map: 0%| | 0/393 [00:00, ? examples/s]\n",
+ "Map: 0%| | 0/393 [00:00, ? examples/s]\n",
+ "Map: 0%| | 0/393 [00:00, ? examples/s]\n",
+ "Map: 0%| | 0/393 [00:00, ? examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 393/393 [00:00<00:00, 662.32 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 393/393 [00:00<00:00, 670.82 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 393/393 [00:00<00:00, 655.96 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 393/393 [00:00<00:00, 654.94 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 393/393 [00:00<00:00, 670.00 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 393/393 [00:00<00:00, 674.59 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 393/393 [00:00<00:00, 697.91 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 393/393 [00:00<00:00, 635.21 examples/s]\n",
+ "\n",
+ "Map: 100%|โโโโโโโโโโ| 393/393 [00:00<00:00, 637.09 examples/s]\n",
+ "\n",
+ "Map: 100%|โโโโโโโโโโ| 393/393 [00:00<00:00, 624.12 examples/s]\n",
+ "\n",
+ "Map: 100%|โโโโโโโโโโ| 393/393 [00:00<00:00, 622.66 examples/s]\n",
+ "\n",
+ "Map: 0%| | 0/99 [00:00, ? examples/s]\n",
+ "Map: 0%| | 0/99 [00:00, ? examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 393/393 [00:00<00:00, 626.62 examples/s]\n",
+ "\n",
+ "Map: 0%| | 0/99 [00:00, ? examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 393/393 [00:00<00:00, 645.62 examples/s]\n",
"\n",
- "Map: 0%| | 0/480 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 480/480 [00:00<00:00, 28000.14 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 393/393 [00:00<00:00, 616.65 examples/s]\n",
"\n",
- "Map: 0%| | 0/120 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 120/120 [00:00<00:00, 15290.94 examples/s]\n",
- "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
+ "Map: 0%| | 0/99 [00:00, ? examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 393/393 [00:00<00:00, 697.04 examples/s]\n",
+ "Map: 0%| | 0/99 [00:00, ? examples/s]\n",
+ "Map: 0%| | 0/99 [00:00, ? examples/s]\n",
+ "Map: 0%| | 0/99 [00:00, ? examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 393/393 [00:00<00:00, 663.68 examples/s]\n",
"\n",
- "Map: 0%| | 0/480 [00:00, ? examples/s]Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
+ "Map: 0%| | 0/99 [00:00, ? examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 99/99 [00:00<00:00, 835.60 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 99/99 [00:00<00:00, 794.46 examples/s]\n",
"\n",
- "Map: 100%|โโโโโโโโโโ| 480/480 [00:00<00:00, 27097.90 examples/s]\n",
- "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
+ "Map: 100%|โโโโโโโโโโ| 99/99 [00:00<00:00, 837.59 examples/s]Using CUDA\n",
"\n",
- "Map: 0%| | 0/120 [00:00, ? examples/s]Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
+ "Map: 100%|โโโโโโโโโโ| 99/99 [00:00<00:00, 845.48 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 99/99 [00:00<00:00, 796.62 examples/s]\n",
"\n",
- "Map: 100%|โโโโโโโโโโ| 120/120 [00:00<00:00, 14466.03 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 99/99 [00:00<00:00, 804.56 examples/s]\n",
+ "Using CUDA\n",
+ "Using CUDA\n",
"\n",
- "Map: 0%| | 0/480 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 480/480 [00:00<00:00, 24955.57 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 99/99 [00:00<00:00, 853.92 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 99/99 [00:00<00:00, 811.99 examples/s]\n",
+ "Using CUDA\n",
"\n",
- "Map: 0%| | 0/480 [00:00, ? examples/s]\n",
- "Map: 0%| | 0/480 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 480/480 [00:00<00:00, 26237.63 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 99/99 [00:00<00:00, 852.33 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 99/99 [00:00<00:00, 810.34 examples/s]\n",
+ "Using CUDA\n",
"\n",
- "Map: 0%| | 0/120 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 480/480 [00:00<00:00, 23259.42 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 99/99 [00:00<00:00, 862.42 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 99/99 [00:00<00:00, 820.15 examples/s]\n",
"\n",
- "Map: 100%|โโโโโโโโโโ| 120/120 [00:00<00:00, 13279.77 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 99/99 [00:00<00:00, 847.22 examples/s]Using CUDA\n",
"\n",
- "Map: 0%| | 0/120 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 120/120 [00:00<00:00, 13359.07 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 99/99 [00:00<00:00, 802.41 examples/s]\n",
+ "Using CUDA\n",
"\n",
- "Map: 0%| | 0/120 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 120/120 [00:00<00:00, 14175.93 examples/s]\n",
- "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
- "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
- "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
+ "Map: 100%|โโโโโโโโโโ| 99/99 [00:00<00:00, 871.67 examples/s]\n",
+ "Map: 100%|โโโโโโโโโโ| 99/99 [00:00<00:00, 828.15 examples/s]\n",
+ "Using CUDA\n",
"\n",
- "Map: 0%| | 0/480 [00:00, ? examples/s]\n",
- "Map: 0%| | 0/480 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 480/480 [00:00<00:00, 27207.77 examples/s]\n",
+ "Loading checkpoint shards: 0%| | 0/2 [00:00, ?it/s]\n",
+ "Loading checkpoint shards: 0%| | 0/2 [00:00, ?it/s]\n",
+ "Loading checkpoint shards: 0%| | 0/2 [00:00, ?it/s]\n",
+ "Loading checkpoint shards: 0%| | 0/2 [00:00, ?it/s]\n",
+ "Loading checkpoint shards: 0%| | 0/2 [00:00, ?it/s]\n",
+ "Loading checkpoint shards: 0%| | 0/2 [00:00, ?it/s]\n",
+ "Loading checkpoint shards: 0%| | 0/2 [00:00, ?it/s]\n",
+ "Loading checkpoint shards: 0%| | 0/2 [00:00, ?it/s]\n",
+ "Loading checkpoint shards: 50%|โโโโโ | 1/2 [00:06<00:06, 6.07s/it]\n",
+ "Loading checkpoint shards: 50%|โโโโโ | 1/2 [00:06<00:06, 6.10s/it]\n",
+ "Loading checkpoint shards: 50%|โโโโโ | 1/2 [00:06<00:06, 6.11s/it]\n",
+ "Loading checkpoint shards: 50%|โโโโโ | 1/2 [00:06<00:06, 6.09s/it]\n",
+ "Loading checkpoint shards: 50%|โโโโโ | 1/2 [00:06<00:06, 6.07s/it]\n",
+ "Loading checkpoint shards: 50%|โโโโโ | 1/2 [00:06<00:06, 6.11s/it]\n",
+ "Loading checkpoint shards: 50%|โโโโโ | 1/2 [00:06<00:06, 6.09s/it]\n",
+ "Loading checkpoint shards: 50%|โโโโโ | 1/2 [00:06<00:06, 6.15s/it]\n",
+ "Loading checkpoint shards: 100%|โโโโโโโโโโ| 2/2 [00:09<00:00, 4.44s/it]\n",
+ "Loading checkpoint shards: 100%|โโโโโโโโโโ| 2/2 [00:09<00:00, 4.68s/it]\n",
"\n",
- "Map: 0%| | 0/480 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 480/480 [00:00<00:00, 23907.40 examples/s]\n",
+ "Loading checkpoint shards: 100%|โโโโโโโโโโ| 2/2 [00:09<00:00, 4.45s/it]\n",
+ "Loading checkpoint shards: 100%|โโโโโโโโโโ| 2/2 [00:09<00:00, 4.70s/it]\n",
"\n",
- "Map: 100%|โโโโโโโโโโ| 480/480 [00:00<00:00, 26135.45 examples/s]\n",
+ "Loading checkpoint shards: 100%|โโโโโโโโโโ| 2/2 [00:09<00:00, 4.44s/it]\n",
+ "Loading checkpoint shards: 100%|โโโโโโโโโโ| 2/2 [00:09<00:00, 4.69s/it]\n",
"\n",
- "Map: 0%| | 0/120 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 120/120 [00:00<00:00, 14730.64 examples/s]\n",
+ "Loading checkpoint shards: 100%|โโโโโโโโโโ| 2/2 [00:09<00:00, 4.45s/it]\n",
+ "Loading checkpoint shards: 100%|โโโโโโโโโโ| 2/2 [00:09<00:00, 4.70s/it]\n",
"\n",
- "Map: 0%| | 0/120 [00:00, ? examples/s]\n",
- "Map: 0%| | 0/120 [00:00, ? examples/s]\n",
- "Map: 100%|โโโโโโโโโโ| 120/120 [00:00<00:00, 14727.62 examples/s]\n",
+ "Loading checkpoint shards: 100%|โโโโโโโโโโ| 2/2 [00:09<00:00, 4.46s/it]\n",
+ "Loading checkpoint shards: 100%|โโโโโโโโโโ| 2/2 [00:09<00:00, 4.70s/it]\n",
"\n",
- "Map: 100%|โโโโโโโโโโ| 120/120 [00:00<00:00, 14245.34 examples/s]\n",
- "Using CUDA\n",
- "[2024-07-28 18:16:52,649] [INFO] [comm.py:637:init_distributed] cdb=None\n",
- "[2024-07-28 18:16:52,649] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl\n",
- "Using CUDA\n",
- "[2024-07-28 18:16:52,752] [INFO] [comm.py:637:init_distributed] cdb=None\n",
+ "Loading checkpoint shards: 100%|โโโโโโโโโโ| 2/2 [00:09<00:00, 4.50s/it]\n",
+ "Loading checkpoint shards: 100%|โโโโโโโโโโ| 2/2 [00:09<00:00, 4.74s/it]\n",
+ "\n",
+ "Loading checkpoint shards: 100%|โโโโโโโโโโ| 2/2 [00:09<00:00, 4.50s/it]\n",
+ "Loading checkpoint shards: 100%|โโโโโโโโโโ| 2/2 [00:09<00:00, 4.74s/it]\n",
+ "\n",
+ "Loading checkpoint shards: 100%|โโโโโโโโโโ| 2/2 [00:09<00:00, 4.53s/it]\n",
+ "Loading checkpoint shards: 100%|โโโโโโโโโโ| 2/2 [00:09<00:00, 4.78s/it]\n",
+ "Initializing process group for DDP\n",
+ "using ZeRO optimization\n",
+ "train data set is: Dataset({\n",
+ " features: ['input_ids', 'attention_mask'],\n",
+ " num_rows: 393\n",
+ "}), eval dataset is Dataset({\n",
+ " features: ['input_ids', 'attention_mask'],\n",
+ " num_rows: 99\n",
+ "})\n",
+ "/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:278: UserWarning: You didn't pass a `max_seq_length` argument to the SFTTrainer, this will default to 1024\n",
+ " warnings.warn(\n",
+ "Initializing process group for DDP\n",
+ "using ZeRO optimization\n",
+ "train data set is: Dataset({\n",
+ " features: ['input_ids', 'attention_mask'],\n",
+ " num_rows: 393\n",
+ "}), eval dataset is Dataset({\n",
+ " features: ['input_ids', 'attention_mask'],\n",
+ " num_rows: 99\n",
+ "})\n",
+ "/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:278: UserWarning: You didn't pass a `max_seq_length` argument to the SFTTrainer, this will default to 1024\n",
+ " warnings.warn(\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:494: UserWarning: You passed a dataset that is already processed (contains an `input_ids` field) together with a valid formatting function. Therefore `formatting_func` will be ignored.\n",
" warnings.warn(\n",
- "Using CUDA\n",
- "[2024-07-28 18:16:52,836] [INFO] [comm.py:637:init_distributed] cdb=None\n",
- "Using CUDA\n",
- "Using CUDA\n",
- "[2024-07-28 18:16:52,861] [INFO] [comm.py:637:init_distributed] cdb=None\n",
- "[2024-07-28 18:16:52,863] [INFO] [comm.py:637:init_distributed] cdb=None\n",
- "Using CUDA\n",
- "[2024-07-28 18:16:52,942] [INFO] [comm.py:637:init_distributed] cdb=None\n",
- "Using CUDA\n",
- "Using CUDA\n",
- "[2024-07-28 18:16:52,963] [INFO] [comm.py:637:init_distributed] cdb=None\n",
- "[2024-07-28 18:16:52,965] [INFO] [comm.py:637:init_distributed] cdb=None\n",
+ "Initializing process group for DDP\n",
+ "using ZeRO optimization\n",
+ "train data set is: Dataset({\n",
+ " features: ['input_ids', 'attention_mask'],\n",
+ " num_rows: 393\n",
+ "}), eval dataset is Dataset({\n",
+ " features: ['input_ids', 'attention_mask'],\n",
+ " num_rows: 99\n",
+ "})\n",
+ "/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:278: UserWarning: You didn't pass a `max_seq_length` argument to the SFTTrainer, this will default to 1024\n",
+ " warnings.warn(\n",
+ "Initializing process group for DDP\n",
+ "using ZeRO optimization\n",
+ "train data set is: Dataset({\n",
+ " features: ['input_ids', 'attention_mask'],\n",
+ " num_rows: 393\n",
+ "}), eval dataset is Dataset({\n",
+ " features: ['input_ids', 'attention_mask'],\n",
+ " num_rows: 99\n",
+ "})\n",
+ "/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:278: UserWarning: You didn't pass a `max_seq_length` argument to the SFTTrainer, this will default to 1024\n",
+ " warnings.warn(\n",
+ "Initializing process group for DDP\n",
+ "using ZeRO optimization\n",
+ "train data set is: Dataset({\n",
+ " features: ['input_ids', 'attention_mask'],\n",
+ " num_rows: 393\n",
+ "}), eval dataset is Dataset({\n",
+ " features: ['input_ids', 'attention_mask'],\n",
+ " num_rows: 99\n",
+ "})\n",
+ "/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:278: UserWarning: You didn't pass a `max_seq_length` argument to the SFTTrainer, this will default to 1024\n",
+ " warnings.warn(\n",
+ "Initializing process group for DDP\n",
+ "using ZeRO optimization\n",
+ "train data set is: Dataset({\n",
+ " features: ['input_ids', 'attention_mask'],\n",
+ " num_rows: 393\n",
+ "}), eval dataset is Dataset({\n",
+ " features: ['input_ids', 'attention_mask'],\n",
+ " num_rows: 99\n",
+ "})\n",
+ "/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:278: UserWarning: You didn't pass a `max_seq_length` argument to the SFTTrainer, this will default to 1024\n",
+ " warnings.warn(\n",
+ "Initializing process group for DDP\n",
+ "using ZeRO optimization\n",
+ "train data set is: Dataset({\n",
+ " features: ['input_ids', 'attention_mask'],\n",
+ " num_rows: 393\n",
+ "}), eval dataset is Dataset({\n",
+ " features: ['input_ids', 'attention_mask'],\n",
+ " num_rows: 99\n",
+ "})\n",
+ "/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:278: UserWarning: You didn't pass a `max_seq_length` argument to the SFTTrainer, this will default to 1024\n",
+ " warnings.warn(\n",
+ "Initializing process group for DDP\n",
+ "using ZeRO optimization\n",
+ "train data set is: Dataset({\n",
+ " features: ['input_ids', 'attention_mask'],\n",
+ " num_rows: 393\n",
+ "}), eval dataset is Dataset({\n",
+ " features: ['input_ids', 'attention_mask'],\n",
+ " num_rows: 99\n",
+ "})\n",
+ "/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:278: UserWarning: You didn't pass a `max_seq_length` argument to the SFTTrainer, this will default to 1024\n",
+ " warnings.warn(\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:494: UserWarning: You passed a dataset that is already processed (contains an `input_ids` field) together with a valid formatting function. Therefore `formatting_func` will be ignored.\n",
" warnings.warn(\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:494: UserWarning: You passed a dataset that is already processed (contains an `input_ids` field) together with a valid formatting function. Therefore `formatting_func` will be ignored.\n",
@@ -3635,8 +6529,8 @@
"/home/ubuntu/mlenv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:494: UserWarning: You passed a dataset that is already processed (contains an `input_ids` field) together with a valid formatting function. Therefore `formatting_func` will be ignored.\n",
" warnings.warn(\n",
"Installed CUDA version 12.0 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination\n",
- "Installed CUDA version 12.0 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination\n",
"Using /home/ubuntu/.cache/torch_extensions/py312_cu121 as PyTorch extensions root...\n",
+ "Installed CUDA version 12.0 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination\n",
"Using /home/ubuntu/.cache/torch_extensions/py312_cu121 as PyTorch extensions root...\n",
"Installed CUDA version 12.0 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination\n",
"Using /home/ubuntu/.cache/torch_extensions/py312_cu121 as PyTorch extensions root...\n",
@@ -3655,44 +6549,55 @@
"Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)\n",
"ninja: no work to do.\n",
"Loading extension module cpu_adam...\n",
- "Time to load cpu_adam op: 2.366258382797241 seconds\n",
+ "Time to load cpu_adam op: 2.368800640106201 seconds\n",
"Loading extension module cpu_adam...\n",
- "Time to load cpu_adam op: 2.4117283821105957 seconds\n",
+ "Time to load cpu_adam op: 2.408421277999878 seconds\n",
"Loading extension module cpu_adam...\n",
"Loading extension module cpu_adam...\n",
"Loading extension module cpu_adam...\n",
+ "Time to load cpu_adam op: 2.411547899246216 seconds\n",
+ "Time to load cpu_adam op: 2.41204571723938 seconds\n",
+ "Time to load cpu_adam op: 2.405897855758667 seconds\n",
"Loading extension module cpu_adam...\n",
"Loading extension module cpu_adam...\n",
- "Time to load cpu_adam op: 2.411327600479126 seconds\n",
- "Time to load cpu_adam op: 2.415825843811035 seconds\n",
- "Time to load cpu_adam op: 2.4111907482147217 seconds\n",
- "Time to load cpu_adam op: 2.4148175716400146 seconds\n",
- "Time to load cpu_adam op: 2.411813497543335 seconds\n",
+ "Time to load cpu_adam op: 2.413489818572998 seconds\n",
+ "Time to load cpu_adam op: 2.417717218399048 seconds\n",
"Loading extension module cpu_adam...\n",
- "Time to load cpu_adam op: 2.4180257320404053 seconds\n",
+ "Time to load cpu_adam op: 2.420786142349243 seconds\n",
+ "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
+ "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
+ "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
"`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py:600: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.\n",
" return fn(*args, **kwargs)\n",
"`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
+ "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
+ "You are not running the flash-attention implementation, expect numerical differences.\n",
+ "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
+ "\n",
+ " 0%| | 0/24 [00:00, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py:600: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.\n",
" return fn(*args, **kwargs)\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py:600: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.\n",
" return fn(*args, **kwargs)\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
+ "You are not running the flash-attention implementation, expect numerical differences.\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py:600: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.\n",
" return fn(*args, **kwargs)\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py:600: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.\n",
" return fn(*args, **kwargs)\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py:600: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.\n",
" return fn(*args, **kwargs)\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
+ "You are not running the flash-attention implementation, expect numerical differences.\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py:600: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.\n",
" return fn(*args, **kwargs)\n",
- "\n",
- " 0%| | 0/15 [00:00, ?it/s]/home/ubuntu/mlenv/lib/python3.12/site-packages/torch/utils/checkpoint.py:295: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.\n",
+ "/home/ubuntu/mlenv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py:600: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.\n",
+ " return fn(*args, **kwargs)\n",
+ "You are not running the flash-attention implementation, expect numerical differences.\n",
+ "You are not running the flash-attention implementation, expect numerical differences.\n",
+ "You are not running the flash-attention implementation, expect numerical differences.\n",
+ "You are not running the flash-attention implementation, expect numerical differences.\n",
+ "You are not running the flash-attention implementation, expect numerical differences.\n",
+ "/home/ubuntu/mlenv/lib/python3.12/site-packages/torch/utils/checkpoint.py:295: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.\n",
" with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs): # type: ignore[attr-defined]\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/torch/utils/checkpoint.py:295: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.\n",
" with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs): # type: ignore[attr-defined]\n",
@@ -3702,81 +6607,46 @@
" with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs): # type: ignore[attr-defined]\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/torch/utils/checkpoint.py:295: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.\n",
" with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs): # type: ignore[attr-defined]\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "/home/ubuntu/mlenv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py:600: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.\n",
- " return fn(*args, **kwargs)\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/torch/utils/checkpoint.py:295: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.\n",
" with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs): # type: ignore[attr-defined]\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/torch/utils/checkpoint.py:295: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.\n",
" with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs): # type: ignore[attr-defined]\n",
"/home/ubuntu/mlenv/lib/python3.12/site-packages/torch/utils/checkpoint.py:295: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.\n",
" with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs): # type: ignore[attr-defined]\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
"\n",
- " 7%|โ | 1/15 [00:02<00:28, 2.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
+ " 4%|โ | 1/24 [00:29<11:14, 29.32s/it]\n",
+ " 8%|โ | 2/24 [00:53<09:38, 26.31s/it]\n",
+ " 12%|โโ | 3/24 [01:17<08:54, 25.45s/it]\n",
+ " 17%|โโ | 4/24 [01:42<08:25, 25.27s/it]\n",
+ " 21%|โโ | 5/24 [02:10<08:13, 26.00s/it]\n",
+ " 25%|โโโ | 6/24 [02:34<07:35, 25.30s/it]\n",
+ " 29%|โโโ | 7/24 [02:58<07:02, 24.88s/it]\n",
+ " 33%|โโโโ | 8/24 [03:23<06:40, 25.02s/it]\n",
+ " 38%|โโโโ | 9/24 [03:47<06:10, 24.70s/it]\n",
+ " 42%|โโโโโ | 10/24 [04:11<05:42, 24.43s/it]\n",
+ " 46%|โโโโโ | 11/24 [04:34<05:13, 24.14s/it]\n",
+ " 50%|โโโโโ | 12/24 [04:59<04:50, 24.22s/it]\n",
+ " 54%|โโโโโโ | 13/24 [05:23<04:26, 24.23s/it]\n",
+ " 58%|โโโโโโ | 14/24 [05:47<04:00, 24.04s/it]\n",
+ " 62%|โโโโโโโ | 15/24 [06:09<03:33, 23.67s/it]\n",
+ " 67%|โโโโโโโ | 16/24 [06:32<03:07, 23.44s/it]\n",
+ " 71%|โโโโโโโ | 17/24 [06:56<02:44, 23.56s/it]\n",
+ " 75%|โโโโโโโโ | 18/24 [07:19<02:20, 23.47s/it]\n",
+ " 79%|โโโโโโโโ | 19/24 [07:43<01:58, 23.62s/it]\n",
+ " 83%|โโโโโโโโโ | 20/24 [08:07<01:34, 23.60s/it]\n",
+ " \n",
+ "{'loss': 1.3721, 'grad_norm': 0.6817618012428284, 'learning_rate': 4.0909090909090915e-06, 'epoch': 1.6}\n",
"\n",
- " 13%|โโ | 2/15 [00:02<00:17, 1.35s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
- "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
+ " 83%|โโโโโโโโโ | 20/24 [08:07<01:34, 23.60s/it]\n",
+ " 88%|โโโโโโโโโ | 21/24 [08:30<01:10, 23.57s/it]\n",
+ " 92%|โโโโโโโโโโ| 22/24 [08:53<00:46, 23.38s/it]\n",
+ " 96%|โโโโโโโโโโ| 23/24 [09:17<00:23, 23.53s/it]\n",
+ "100%|โโโโโโโโโโ| 24/24 [09:41<00:00, 23.53s/it]\n",
+ " \n",
+ "{'train_runtime': 613.9336, 'train_samples_per_second': 1.28, 'train_steps_per_second': 0.039, 'train_loss': 1.1531557595978181, 'epoch': 1.92}\n",
+ "\n",
+ "100%|โโโโโโโโโโ| 24/24 [10:13<00:00, 23.53s/it]\n",
+ "100%|โโโโโโโโโโ| 24/24 [10:13<00:00, 25.58s/it]\n",
"\n"
]
}
@@ -3785,36 +6655,50 @@
{
"cell_type": "markdown",
"source": [
- "As above, now we download the model."
+ "### ๐ Downloading and Saving the Trained Model\n",
+ "\n",
+ "Once your training job is completed, the next step is to download the trained model so you can use it locally or for further fine-tuning.\n",
+ "\n",
+ "1. **Creating a Directory for the Model:**\n",
+ " - We begin by creating a new folder called `sf_trained_model_zero_phi`. This folder will serve as the destination for the downloaded model files.\n",
+ "\n",
+ "2. **Downloading the Model:**\n",
+ " - We use the `download_model` method on our `client` object to download the trained model from the cloud. The `job_id` is passed to specify which model to download, and we extract the files to the newly created directory.\n",
+ " \n",
+ " - **Tip:** This process might take some time depending on the size of the model, so feel free to take a break or grab a coffee while you wait! โ\n",
+ "\n",
+ "Running this cell will download your trained model and save it in the specified directory, making it ready for use in your next project or analysis.\n"
],
"metadata": {
- "id": "hUDHmgiWhIMz"
+ "id": "koKpp2XNU-y1"
}
},
{
"cell_type": "code",
"source": [
+ "import os\n",
+ "\n",
"# creating a folder to store the model\n",
- "os.mkdir('sf_trained_model_ZeRO')\n",
+ "os.mkdir('sf_trained_model_zero_phi')\n",
"\n",
"# download and save the model to it.\n",
"# This might take some time, have a sip of that coffee! :)\n",
- "client.download_model(job_id=job_id, extract_to='/content/sf_trained_model_ZeRO')"
+ "client.download_model(job_id=job_id, extract_to='/content/sf_trained_model_zero_phi')"
],
"metadata": {
+ "id": "DNv9XkFv80d7",
"colab": {
"base_uri": "https://localhost:8080/"
},
- "id": "u4Y5GAM7hImM",
- "outputId": "b7d64f54-f9ab-421b-a7ed-eb16f084df85"
+ "outputId": "b88812a9-8f64-464e-d4c5-d2ace8814f08"
},
- "execution_count": 13,
+ "execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
- "Downloading: 100%|โโโโโโโโโโ| 295M/295M [00:20<00:00, 14.1MiB/s]\n"
+ "Downloading: 100%|โโโโโโโโโโ| 6.99G/6.99G [00:42<00:00, 166MiB/s]\n"
]
},
{
@@ -3822,9 +6706,9 @@
"name": "stdout",
"text": [
"\n",
- "Directory downloaded successfully and saved to /content/sf_trained_model_ZeRO/42d965c0-773f-4b45-8dfb-a4f310e6606e.zip\n",
- "Model unzipped successfully to /content/sf_trained_model_ZeRO\n",
- "Deleted the zip file at /content/sf_trained_model_ZeRO/42d965c0-773f-4b45-8dfb-a4f310e6606e.zip\n",
+ "Directory downloaded successfully and saved to /content/sf_trained_model_zero_phi/183c65ad-2b4e-4d11-b2a5-d66232d5b15b.zip\n",
+ "Model unzipped successfully to /content/sf_trained_model_zero_phi\n",
+ "Deleted the zip file at /content/sf_trained_model_zero_phi/183c65ad-2b4e-4d11-b2a5-d66232d5b15b.zip\n",
"Model downloaded, unzipped, and zip file deleted successfully!\n"
]
}
@@ -3833,61 +6717,288 @@
{
"cell_type": "markdown",
"source": [
- "Next we test this model trained with ZeRO for generation."
+ "### ๐ Loading the Trained Model and Tokenizer\n",
+ "\n",
+ "Now that we've successfully downloaded the trained model, the next step is to load it into our environment so we can use it for inference or further fine-tuning.\n",
+ "\n",
+ "1. **Importing Required Libraries:**\n",
+ " - We import `AutoModelForCausalLM` and `AutoTokenizer` from the `transformers` library. These classes are used to load the model and tokenizer from the saved files.\n",
+ "\n",
+ "2. **Setting the Path:**\n",
+ " - We set the `path` variable to point to the directory where we saved the trained model (`'/content/sf_trained_model_zero_phi'`).\n",
+ "\n",
+ "3. **Loading the Model:**\n",
+ " - We use `AutoModelForCausalLM.from_pretrained(path)` to load the trained model from the specified path. This initializes the model so itโs ready for use.\n",
+ "\n",
+ "4. **Loading the Tokenizer:**\n",
+ " - Similarly, we load the tokenizer using `AutoTokenizer.from_pretrained(path)`. The tokenizer is essential for processing text input into a format that the model can understand.\n",
+ "\n",
+ "Running this cell will load both the trained model and tokenizer into your environment, allowing you to start generating text or continue fine-tuning with your freshly trained model."
],
"metadata": {
- "id": "y-CnHmX9hN1p"
+ "id": "mQ1fk9tJVJKy"
}
},
{
"cell_type": "code",
"source": [
"from transformers import AutoModelForCausalLM, AutoTokenizer\n",
+ "import torch\n",
"\n",
- "path = '/content/sf_trained_model_ZeRO'\n",
+ "path = '/content/sf_trained_model_zero_phi'\n",
"sf_model = AutoModelForCausalLM.from_pretrained(path)\n",
- "sf_tokenizer = AutoTokenizer.from_pretrained(path)\n",
+ "sf_tokenizer = AutoTokenizer.from_pretrained(path)"
+ ],
+ "metadata": {
+ "id": "WOte32R79n9j",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 67,
+ "referenced_widgets": [
+ "172440174eb14be1b3333a21ef8692b2",
+ "1629c9cba045401190598dc2243724f0",
+ "7578cec5b89f48038cef8ced523158a0",
+ "98e224ecc1a341288c0c8ba7a1a692a4",
+ "251fc1ddf7ea4a869bbe39baceb5b57a",
+ "c02e899b67bc45239ec779cb1cdd7027",
+ "e41470729e534f2399d37a2ddfee5e03",
+ "033e947a4e3e4e07b86f54fd40963676",
+ "2f341eb6d5b94ddba4a9ce8667b8326c",
+ "fe72e6426a844a36a87dda60bc761aca",
+ "0d80f492ce4d4198997e89267c724d9c"
+ ]
+ },
+ "outputId": "9b14f28f-3376-45bc-82cb-c6b09a31aa6c"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "Loading checkpoint shards: 0%| | 0/2 [00:00, ?it/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "172440174eb14be1b3333a21ef8692b2"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### ๐ Loading the Dataset\n",
+ "\n",
+ "Before we can use our trained model for inference or further fine-tuning, we need to load the dataset that weโve been working with.\n",
+ "\n",
+ "1. **Importing the Datasets Library:**\n",
+ " - We start by importing the `datasets` library, which provides easy access to a wide range of datasets, including the one we've been using for training.\n",
+ "\n",
+ "2. **Loading the Dataset:**\n",
+ " - We load the dataset using the `load_dataset` function from the `datasets` library. The `dataset_name` variable contains the name of the dataset we specified earlier in our code.\n",
+ "\n",
+ "Running this cell will load the dataset into your environment, making it ready for evaluation, inference,"
+ ],
+ "metadata": {
+ "id": "UZ-1si0bVOMC"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import datasets\n",
+ "dataset = datasets.load_dataset(dataset_name)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 240,
+ "referenced_widgets": [
+ "9ba289a862cd4330a4f5e3f8c8681311",
+ "6c27d4944f494524a46a7f6feb636f36",
+ "a5fc5c047bdd465baa1c7df93d5edb46",
+ "89b905cc3ff440579761665991715fc4",
+ "cb7433385706439ab15819dc12be97f7",
+ "e6b2ce696dcb4b71bf12ac6f909167a3",
+ "05806584dc2445a59cfe07ce47d9228e",
+ "0abe106dfc11463aaacec340ee291c4f",
+ "bfb88ed6e8174675bb50041d22742e3b",
+ "24b14faf20e84ab8a4113a2b134752cd",
+ "db83f9b3e01c4b87a1250dbf32ca305f",
+ "abad1992c21a4a8493c8609835dbc376",
+ "65ea4601453943f3acc6721158122c20",
+ "c494b3018813477d8a3a4881a4761505",
+ "d5c6f34792ba4611aa58436618fcbf72",
+ "416e016e019d48e2b6aede9e236507d8",
+ "dba04c28a4a849bb97c6edf252143e80",
+ "be4a71aec5184d00ad31015709109608",
+ "662ec0e63fb2414282305c41620aa652",
+ "be882239f69a48f5a0a7fbb8e6368d54",
+ "cf3e1b5dae9245efae387219b4c66625",
+ "a857225fdbe94e5f97631dadc381d9fa",
+ "455c0cdca6f244e8865093ebe04e83ee",
+ "c0ad1e180f964f4abe5589296b008700",
+ "a832769eeed842468471536a67eeb0d6",
+ "346a547009bb4d0d888964c790db24ad",
+ "cf88fb83f22946ed9a2174e0ce1c2bdc",
+ "9accff4ab2f0481b87573188add0e44d",
+ "52b0f027bb414f1c8b3fb8d644ce54e6",
+ "87c5ec19a652467998b63203855c22d9",
+ "dddd99644ee94fddb4f8b3b436a5f3c4",
+ "028492fb8b4d483cb13d45491353fd66",
+ "0fed45330b9b471185fe6d427ecf51f6"
+ ]
+ },
+ "id": "Orm2RTPh1s-s",
+ "outputId": "34794037-e2bb-4e64-cf52-445e61a7aaf6"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: \n",
+ "The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
+ "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
+ "You will be able to reuse this secret in all of your notebooks.\n",
+ "Please note that authentication is recommended but still optional to access public models or datasets.\n",
+ " warnings.warn(\n"
+ ]
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "Downloading readme: 0%| | 0.00/5.01k [00:00, ?B/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "9ba289a862cd4330a4f5e3f8c8681311"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "Downloading data: 0%| | 0.00/43.1k [00:00, ?B/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "abad1992c21a4a8493c8609835dbc376"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "Generating train split: 0%| | 0/492 [00:00, ? examples/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "455c0cdca6f244e8865093ebe04e83ee"
+ }
+ },
+ "metadata": {}
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### ๐ง Generating Text with the Trained Model\n",
+ "\n",
+ "Now that we've loaded both the model and the dataset, itโs time to generate some text using our trained model. In this section, weโll configure the generation settings and produce some sample outputs.\n",
"\n",
- "input_example = '''### TITLE: title 1\\n ### ABSTRACT: abstract 1\\n ###EXPLANATION: '''\n",
+ "1. **Importing Inference Tools:**\n",
+ " - We import `inference_tools` from the `simplifine_alpha` library. This module provides the necessary tools to generate text using the model weโve fine-tuned.\n",
"\n",
- "input_example = sf_tokenizer(input_example, return_tensors='pt')\n",
+ "2. **Configuring Text Generation:**\n",
+ " - We create a `GenerationConfig` object to define how the model should generate text. This configuration includes:\n",
+ " - `prompt_template` and `response_template`: Templates for how the inputs and outputs are formatted.\n",
+ " - `keys`: Specifies the data keys used in the templates.\n",
+ " - `train_type`: Indicates that we're using supervised fine-tuning (`sft`).\n",
+ " - `max_length`: The maximum length of the generated sequences.\n",
+ " - `num_return_sequences`: How many sequences to generate.\n",
+ " - `do_sample`, `top_k`, `top_p`, `temperature`: Parameters that control the randomness and diversity of the generated text.\n",
+ "\n",
+ "3. **Generating Text:**\n",
+ " - We call `generate_from_pretrained` using our fine-tuned model, tokenizer, and the generation configuration. We also pass in a small sample of the dataset to generate text based on the training data.\n",
+ " \n",
+ " - **Note:** Weโre using only the first three examples from the training dataset (`dataset['train'][:3]`) for quick testing.\n",
+ "\n",
+ "4. **Displaying the Generated Text:**\n",
+ " - Finally, we print the generated text, which provides a glimpse into how well the model has learned to detect fake news.\n",
+ "\n",
+ "Running this cell will generate text using your trained model, showcasing its ability to produce outputs based on the fine-tuned dataset. This is where you can see the real impact of your training efforts!"
+ ],
+ "metadata": {
+ "id": "tHGpRwU6VVav"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from simplifine_alpha import inference_tools\n",
"\n",
- "output = sf_model.generate(input_example['input_ids'],\n",
- " attention_mask=input_example['attention_mask'],\n",
- " max_length=30,eos_token_id=sf_tokenizer.eos_token_id,\n",
- " early_stopping=True,\n",
- " pad_token_id=sf_tokenizer.eos_token_id\n",
+ "config = inference_tools.GenerationConfig(\n",
+ " prompt_template=sft_prompt_config.template,\n",
+ " response_template=sft_prompt_config.response_template,\n",
+ " keys=sft_prompt_config.keys,\n",
+ " train_type='sft',\n",
+ " max_length=110,\n",
+ " num_return_sequences=1,\n",
+ " do_sample=True,\n",
+ " top_k=50,\n",
+ " top_p=0.9,\n",
+ " temperature=0.99\n",
")\n",
"\n",
- "print(sf_tokenizer.decode(output[0]))"
+ "generated_text = inference_tools.generate_from_pretrained(sf_model, sf_tokenizer, config, data=dataset['train'][:3])\n",
+ "print(generated_text)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
- "id": "ZlmBeXHhhNfv",
- "outputId": "665d598b-21f5-49f8-c90d-41a6fa927ba4"
+ "id": "8KWnTV9w1OMQ",
+ "outputId": "e78d14ca-9b91-4412-8d16-ece24b3ffe7d"
},
- "execution_count": 14,
+ "execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
- "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
+ "You are not running the flash-attention implementation, expect numerical differences.\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
- "### TITLE: title 1\n",
- " ### ABSTRACT: abstract 1\n",
- " ###EXPLANATION: explanation 1\n",
- " ### QUE\n"
+ "[['###URL: http://www.redflagnews.com/headlines-2016/cdc-proposes-rule-to-apprehend-and-detain-anyone-anywhere-at-any-time-for-any-duration-without-due-process-or-right-of-appeal-and-administer-forced-vaccinations-or-medical-treatment-without-consent-or-parens. \\n###'], ['###URL: http://www.redflagnews.com/headlines-2016/-outrage-what-obama-just-did-to-the-white-house-logo-will-make-you-sick-128097.html \\n###CLS: 0'], ['###URL: http://www.redflagnews.com/headlines-2016/white-house-cancels-all-obama-appearances-at-hillary-campaign-events-as-he-navigates-mandatory-divorce-june-28-2016-1651142.html \\n###CLS: 1']]\n"
]
}
]
}
]
-}
\ No newline at end of file
+}
diff --git a/examples/cont_emb_ft.py b/examples/cont_emb_ft.py
deleted file mode 100644
index bba863f..0000000
--- a/examples/cont_emb_ft.py
+++ /dev/null
@@ -1,15 +0,0 @@
-from simplifine_alpha import train_engine
-
-# model name
-# this should be a sentence transformer model
-model_name = 'sentence-transformers/paraphrase-MiniLM-L6-v2'
-
-# your huggingface token
-hf_token = ''
-
-# data
-queries = ['the weather is good', 'the weather is bad', 'the weather is okay']
-positive = ['nice day', 'bad day', 'okay day']
-negative = ['bad day', 'nice day', 'bad day']
-
-train_engine.hf_finetune_embedder_contrastive(model_name=model_name, queries=queries, positive=positive, negative=negative, hf_token=hf_token)
diff --git a/examples/fake_url_detection.ipynb b/examples/fake_url_detection.ipynb
index 23cbc1a..1d8f0d0 100644
--- a/examples/fake_url_detection.ipynb
+++ b/examples/fake_url_detection.ipynb
@@ -4,9 +4,7 @@
"metadata": {
"colab": {
"provenance": [],
- "machine_shape": "hm",
- "authorship_tag": "ABX9TyMbgzp4WjwBkEWl8qaCpYHi",
- "include_colab_link": true
+ "machine_shape": "hm"
},
"kernelspec": {
"name": "python3",
@@ -5495,17 +5493,25 @@
"cells": [
{
"cell_type": "markdown",
- "metadata": {
- "id": "view-in-github",
- "colab_type": "text"
- },
"source": [
- ""
- ]
+ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/simplifine-llm/Simplifine/blob/main/examples/fake_url_detection.ipynb)",
+ "\n",
+ "### ๐ฆ Installing Required Libraries\n",
+ "\n",
+ "Before we begin fine-tuning our fake news detector, we need to install the necessary libraries. In this step, weโre installing the `Simplifine` library, which provides tools to streamline the fine-tuning process for large language models. Weโre also installing the `datasets` library, which allows us to easily access and manage datasets from Hugging Face.\n",
+ "\n",
+ "- The `Simplifine` library helps in making the fine-tuning process more efficient, whether you're working locally or in the cloud.\n",
+ "- The `datasets` library is essential for loading and processing the dataset we'll be using for this project.\n",
+ "\n",
+ "Running this cell will install both libraries quietly in the background.\n"
+ ],
+ "metadata": {
+ "id": "0SClYIzAQrpD"
+ }
},
{
"cell_type": "code",
- "execution_count": 1,
+ "execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
@@ -5560,6 +5566,41 @@
"!pip install datasets -q"
]
},
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### ๐ ๏ธ Setting Up for Local Training\n",
+ "\n",
+ "In this section, weโre preparing to fine-tune our fake news detector model using Google Colabโs resources. The steps below outline how to configure and initiate the training process.\n",
+ "\n",
+ "1. **Importing Libraries:**\n",
+ " - We import `train_engine` from the `Simplifine` library, which provides the necessary functions to handle the fine-tuning process.\n",
+ " - We also import `SFTConfig` from the `trl` library, which allows us to configure the supervised fine-tuning parameters.\n",
+ "\n",
+ "2. **Dataset Selection:**\n",
+ " - We define the dataset name as `'community-datasets/fake_news_english'`. This dataset contains examples of fake news articles that we will use to fine-tune our model.\n",
+ "\n",
+ "3. **Prompt Configuration:**\n",
+ " - We create a `sftPromptConfig` object to specify how the training data is formatted.\n",
+ " - The `template` parameter defines the input format, and the `response_template` specifies how the model should generate outputs.\n",
+ " - The `use_chat_template` flag is set to `True` to format the inputs in a conversational style, which can be effective for chat-based models.\n",
+ "\n",
+ "4. **Training Configuration:**\n",
+ " - We define the training settings using `SFTConfig`. This includes parameters like batch size, learning rate, and the number of epochs.\n",
+ " - We also enable `fp16` (16-bit floating-point) training for faster computation and set `gradient_checkpointing` to save memory during training.\n",
+ "\n",
+ "5. **Model Selection:**\n",
+ " - The model weโre fine-tuning is `'TinyLlama/TinyLlama-1.1B-Chat-v1.0'`. This is a smaller, efficient model suitable for demonstration purposes on Colab.\n",
+ "\n",
+ "6. **Training the Model:**\n",
+ " - Finally, we call `sft_train` to start the fine-tuning process. This step will take a while to complete, as weโre training the model from scratch without any optimizations like quantization or LoRA.\n",
+ "\n",
+ "Running this cell will fine-tune the model locally on Colab, using the configurations weโve set up. This is ideal for quick experiments or when cloud resources are not available."
+ ],
+ "metadata": {
+ "id": "C0dDwmg4Rb3N"
+ }
+ },
{
"cell_type": "code",
"source": [
@@ -5738,7 +5779,7 @@
"id": "uKH1cxpkxFAr",
"outputId": "bae79adb-9ed2-49f6-c618-efdb66923cc3"
},
- "execution_count": 2,
+ "execution_count": null,
"outputs": [
{
"output_type": "stream",
@@ -5987,6 +6028,41 @@
}
]
},
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### โ๏ธ Training the Model on Cloud Servers\n",
+ "\n",
+ "In this section, weโre moving from local training to cloud-based training using Simplifineโs cloud infrastructure. This allows you to leverage powerful GPUs like the A100 for more intensive tasks, making it easier to handle larger models and datasets.\n",
+ "\n",
+ "1. **Importing the `train_utils` Module:**\n",
+ " - We start by importing the `train_utils` module from the `Simplifine` library. This module provides utilities to interact with Simplifine's cloud servers.\n",
+ "\n",
+ "2. **Model and API Configuration:**\n",
+ " - We select a different model for this cloud training: `'microsoft/Phi-3-mini-4k-instruct'`. This model is more powerful and well-suited for deployment on cloud GPUs.\n",
+ " - The `simplifine_api_key` is your unique key to access Simplifineโs cloud services. Ensure you have it ready.\n",
+ " - The `gpu_type` is set to `'a100'`, which specifies the type of GPU to be used in the cloud. The A100 is a high-performance GPU ideal for deep learning tasks.\n",
+ "\n",
+ " ### ๐ Need an API Key?\n",
+ " If you don't have an API key yet, you can [**request one here for free**](https://www.simplifine.com/api-key-interest). The turnaround time is just 24 hours, so you'll be up and running in no time!\n",
+ "\n",
+ "3. **Client Initialization:**\n",
+ " - We create a `Client` object using the API key and GPU type. This client will handle the communication with Simplifineโs cloud infrastructure, managing the training job on your behalf.\n",
+ "\n",
+ "4. **Defining the Training Job:**\n",
+ " - The `job_name` is set to `'fake_news_english_phi3'`, which uniquely identifies this training task.\n",
+ " - We then call the `sft_train_cloud` method on our `client` object. This method sends the training job to the cloud, using the model and configurations weโve defined earlier.\n",
+ "\n",
+ "5. **Cloud Training Setup:**\n",
+ " - We enable `use_zero=True` to utilize DeepSpeed's ZeRO optimization, allowing the model to scale effectively across multiple GPUs.\n",
+ " - We disable Distributed Data Parallel (DDP) for this job, which is appropriate when ZeRO is handling the distribution of data.\n",
+ "\n",
+ "Running this cell will initiate the training process on Simplifineโs cloud servers, allowing you to offload the heavy lifting to a powerful cloud infrastructure. This is ideal when working with larger models or when your local resources are insufficient.\n"
+ ],
+ "metadata": {
+ "id": "oehMA7hwRky5"
+ }
+ },
{
"cell_type": "code",
"source": [
@@ -5994,7 +6070,7 @@
"\n",
"# change name to phi 3\n",
"model_name = 'microsoft/Phi-3-mini-4k-instruct'\n",
- "simplifine_api_key = ''\n",
+ "simplifine_api_key = 'PUT YOUR OWN API KEY PROVIDED BY SIMPLIFINE'\n",
"gpu_type = 'a100'\n",
"client = train_utils.Client(simplifine_api_key, gpu_type)\n",
"\n",
@@ -6013,7 +6089,7 @@
},
"outputId": "d2510f4d-5246-4631-df37-8a741cf92240"
},
- "execution_count": 2,
+ "execution_count": null,
"outputs": [
{
"output_type": "stream",
@@ -6028,13 +6104,24 @@
{
"cell_type": "markdown",
"source": [
- "You can check the status of your job. The status can be any of the following:\n",
+ "### ๐ Checking the Status of Your Training Jobs\n",
+ "\n",
+ "After submitting your training job to Simplifineโs cloud servers, itโs important to monitor its status to ensure everything is running smoothly. In this section, weโll check the status of your most recent job.\n",
+ "\n",
+ "1. **Retrieving Job Status:**\n",
+ " - We call the `get_all_jobs` method on our `client` object. This method returns a list of all jobs associated with your API key, including their current statuses.\n",
"\n",
+ "2. **Displaying the Latest Job:**\n",
+ " - We loop through the latest job in the list and print its status. This gives you a quick overview of how your most recent training job is progressing.\n",
"\n",
- "```\n",
- "pending | in progress | stopped | completed\n",
- "```\n",
- "\n"
+ "3. **Understanding Job Statuses:**\n",
+ " - Your job can have one of the following statuses:\n",
+ " - `pending`: The job has been submitted and is waiting to start.\n",
+ " - `in progress`: The job is currently running.\n",
+ " - `stopped`: The job was stopped before completion, either manually or due to an error.\n",
+ " - `completed`: The job has successfully finished.\n",
+ "\n",
+ "Running this cell will display the status of your most recent job, helping you keep track of your training tasks on Simplifineโs cloud servers.\n"
],
"metadata": {
"id": "W88J_Ef7yaYG"
@@ -6054,7 +6141,7 @@
"id": "l70vZyPV6_AC",
"outputId": "b32db3fe-e353-4105-e8b7-63a772d7ccde"
},
- "execution_count": 3,
+ "execution_count": null,
"outputs": [
{
"output_type": "stream",
@@ -6068,7 +6155,20 @@
{
"cell_type": "markdown",
"source": [
- "To see how things are going, you can take a look at the logs, to see if there were any errors or what not."
+ "### ๐ Retrieving and Viewing Training Logs\n",
+ "\n",
+ "After checking the status of your training job, you might want to dive deeper into the details by viewing the training logs. These logs provide insights into the training process, including any issues or updates on the progress.\n",
+ "\n",
+ "1. **Getting the `job_id`:**\n",
+ " - We start by extracting the `job_id` of the last job from the status list. The `job_id` is a unique identifier for each training job, which weโll use to retrieve its logs.\n",
+ "\n",
+ "2. **Retrieving Logs:**\n",
+ " - We call the `get_train_logs` method on our `client` object, passing in the `job_id`. This method fetches the detailed logs for the specified job, giving you access to the complete training history.\n",
+ "\n",
+ "3. **Viewing the Logs:**\n",
+ " - Finally, we print the `response` from the logs, which contains detailed information about the training process. This includes updates, errors, and any other relevant messages from the training run.\n",
+ "\n",
+ "Running this cell will display the logs for your most recent job, allowing you to monitor and troubleshoot the training process effectively.\n"
],
"metadata": {
"id": "BDe93gbayl_n"
@@ -6090,7 +6190,7 @@
"id": "jt35FPNn8ADK",
"outputId": "1de668ed-718e-452d-eb85-0632d7652008"
},
- "execution_count": 4,
+ "execution_count": null,
"outputs": [
{
"output_type": "stream",
@@ -6552,6 +6652,27 @@
}
]
},
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### ๐ Downloading and Saving the Trained Model\n",
+ "\n",
+ "Once your training job is completed, the next step is to download the trained model so you can use it locally or for further fine-tuning.\n",
+ "\n",
+ "1. **Creating a Directory for the Model:**\n",
+ " - We begin by creating a new folder called `sf_trained_model_zero_phi`. This folder will serve as the destination for the downloaded model files.\n",
+ "\n",
+ "2. **Downloading the Model:**\n",
+ " - We use the `download_model` method on our `client` object to download the trained model from the cloud. The `job_id` is passed to specify which model to download, and we extract the files to the newly created directory.\n",
+ " \n",
+ " - **Tip:** This process might take some time depending on the size of the model, so feel free to take a break or grab a coffee while you wait! โ\n",
+ "\n",
+ "Running this cell will download your trained model and save it in the specified directory, making it ready for use in your next project or analysis.\n"
+ ],
+ "metadata": {
+ "id": "koKpp2XNU-y1"
+ }
+ },
{
"cell_type": "code",
"source": [
@@ -6571,7 +6692,7 @@
},
"outputId": "b88812a9-8f64-464e-d4c5-d2ace8814f08"
},
- "execution_count": 5,
+ "execution_count": null,
"outputs": [
{
"output_type": "stream",
@@ -6593,6 +6714,31 @@
}
]
},
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### ๐ Loading the Trained Model and Tokenizer\n",
+ "\n",
+ "Now that we've successfully downloaded the trained model, the next step is to load it into our environment so we can use it for inference or further fine-tuning.\n",
+ "\n",
+ "1. **Importing Required Libraries:**\n",
+ " - We import `AutoModelForCausalLM` and `AutoTokenizer` from the `transformers` library. These classes are used to load the model and tokenizer from the saved files.\n",
+ "\n",
+ "2. **Setting the Path:**\n",
+ " - We set the `path` variable to point to the directory where we saved the trained model (`'/content/sf_trained_model_zero_phi'`).\n",
+ "\n",
+ "3. **Loading the Model:**\n",
+ " - We use `AutoModelForCausalLM.from_pretrained(path)` to load the trained model from the specified path. This initializes the model so itโs ready for use.\n",
+ "\n",
+ "4. **Loading the Tokenizer:**\n",
+ " - Similarly, we load the tokenizer using `AutoTokenizer.from_pretrained(path)`. The tokenizer is essential for processing text input into a format that the model can understand.\n",
+ "\n",
+ "Running this cell will load both the trained model and tokenizer into your environment, allowing you to start generating text or continue fine-tuning with your freshly trained model."
+ ],
+ "metadata": {
+ "id": "mQ1fk9tJVJKy"
+ }
+ },
{
"cell_type": "code",
"source": [
@@ -6624,7 +6770,7 @@
},
"outputId": "9b14f28f-3376-45bc-82cb-c6b09a31aa6c"
},
- "execution_count": 6,
+ "execution_count": null,
"outputs": [
{
"output_type": "display_data",
@@ -6649,6 +6795,25 @@
}
]
},
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### ๐ Loading the Dataset\n",
+ "\n",
+ "Before we can use our trained model for inference or further fine-tuning, we need to load the dataset that weโve been working with.\n",
+ "\n",
+ "1. **Importing the Datasets Library:**\n",
+ " - We start by importing the `datasets` library, which provides easy access to a wide range of datasets, including the one we've been using for training.\n",
+ "\n",
+ "2. **Loading the Dataset:**\n",
+ " - We load the dataset using the `load_dataset` function from the `datasets` library. The `dataset_name` variable contains the name of the dataset we specified earlier in our code.\n",
+ "\n",
+ "Running this cell will load the dataset into your environment, making it ready for evaluation, inference,"
+ ],
+ "metadata": {
+ "id": "UZ-1si0bVOMC"
+ }
+ },
{
"cell_type": "code",
"source": [
@@ -6698,7 +6863,7 @@
"id": "Orm2RTPh1s-s",
"outputId": "34794037-e2bb-4e64-cf52-445e61a7aaf6"
},
- "execution_count": 8,
+ "execution_count": null,
"outputs": [
{
"output_type": "stream",
@@ -6756,6 +6921,39 @@
}
]
},
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### ๐ง Generating Text with the Trained Model\n",
+ "\n",
+ "Now that we've loaded both the model and the dataset, itโs time to generate some text using our trained model. In this section, weโll configure the generation settings and produce some sample outputs.\n",
+ "\n",
+ "1. **Importing Inference Tools:**\n",
+ " - We import `inference_tools` from the `simplifine_alpha` library. This module provides the necessary tools to generate text using the model weโve fine-tuned.\n",
+ "\n",
+ "2. **Configuring Text Generation:**\n",
+ " - We create a `GenerationConfig` object to define how the model should generate text. This configuration includes:\n",
+ " - `prompt_template` and `response_template`: Templates for how the inputs and outputs are formatted.\n",
+ " - `keys`: Specifies the data keys used in the templates.\n",
+ " - `train_type`: Indicates that we're using supervised fine-tuning (`sft`).\n",
+ " - `max_length`: The maximum length of the generated sequences.\n",
+ " - `num_return_sequences`: How many sequences to generate.\n",
+ " - `do_sample`, `top_k`, `top_p`, `temperature`: Parameters that control the randomness and diversity of the generated text.\n",
+ "\n",
+ "3. **Generating Text:**\n",
+ " - We call `generate_from_pretrained` using our fine-tuned model, tokenizer, and the generation configuration. We also pass in a small sample of the dataset to generate text based on the training data.\n",
+ " \n",
+ " - **Note:** Weโre using only the first three examples from the training dataset (`dataset['train'][:3]`) for quick testing.\n",
+ "\n",
+ "4. **Displaying the Generated Text:**\n",
+ " - Finally, we print the generated text, which provides a glimpse into how well the model has learned to detect fake news.\n",
+ "\n",
+ "Running this cell will generate text using your trained model, showcasing its ability to produce outputs based on the fine-tuned dataset. This is where you can see the real impact of your training efforts!"
+ ],
+ "metadata": {
+ "id": "tHGpRwU6VVav"
+ }
+ },
{
"cell_type": "code",
"source": [
@@ -6784,7 +6982,7 @@
"id": "8KWnTV9w1OMQ",
"outputId": "e78d14ca-9b91-4412-8d16-ece24b3ffe7d"
},
- "execution_count": 11,
+ "execution_count": null,
"outputs": [
{
"output_type": "stream",
@@ -6803,4 +7001,4 @@
]
}
]
-}
\ No newline at end of file
+}
|