add/lamini test

30DaysOf · Apr 15, 2024 · 2c741d7 · 2c741d7
1 parent 2a8340b
commit 2c741d7
Show file tree

Hide file tree

Showing 6 changed files with 294 additions and 111 deletions.
diff --git a/.env.sample b/.env.sample
@@ -11,5 +11,8 @@ AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT='<add your embeddings model name here>'
 ## Hugging Face
 HUGGING_FACE_API_KEY='<add your HuggingFace API or token here>'
 
+## Lamini
+LAMINI_API_KEY='<add your Lamini API key here>'
+
 ## GitHub Personal Access Token
 30DAYSOF_PAT='<add your GitHub Personal Access Token here>'
diff --git a/notebooks/400/400-00-llm-setup.ipynb b/notebooks/400/400-00-llm-setup.ipynb
@@ -467,6 +467,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "---\n",
+    "\n",
     "## 3. Hugging Face\n",
     "\n",
     "We should use the [latest documentation](https://huggingface.co/docs) and focus initially on the [Text Generation Inference](https://huggingface.co/docs/text-generation-inference/index) capability. We can use this in two ways:\n",
@@ -831,6 +833,164 @@
     ")\n",
     "print(\"Response:\\n \", data)"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "\n",
+    "## 4. Lamini\n",
+    "\n",
+    "Lamini is an LLM platform [optimized for enterprise fine tuning](https://lamini-ai.github.io/about/).\n",
+    " - Explore the [Lamini SDK](https://github.com/lamini-ai/lamini-sdk/)\n",
+    " - Explore tools for [better inference](https://lamini-ai.github.io/inference/quick_tour/) \n",
+    " - Exolore tools for [better training](https://lamini-ai.github.io/training/quick_tour/)\n",
+    "\n",
+    "To get started:\n",
+    " - Create an account and get an API key\n",
+    " - Validate the key works with sample questions as shown\n",
+    "\n",
+    "Note: The free account only gives you 200 calls _total_ (no refresh) so use it wisely.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Install the lamini package\n",
+    "# !pip install --upgrade lamini\n",
+    "\n",
+    "## Configure the API key\n",
+    "import lamini\n",
+    "import os\n",
+    "lamini.api_key = os.getenv(\"LAMINI_API_KEY\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "2024-04-15:03:33:11,976 INFO     [lamini.py:33] Using 3.10 InferenceQueue Interface\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "status code: 513 https://api.lamini.ai/v1/completions\n"
+     ]
+    },
+    {
+     "ename": "APIError",
+     "evalue": "API error {'detail': \"error_id: 243549526076307102879929981439376352577: Downloading the 'Intel/neural-chat-7b-v3-1' model. Please try again in a few minutes.\"}",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mHTTPError\u001b[0m                                 Traceback (most recent call last)",
+      "File \u001b[0;32m~/.python/current/lib/python3.10/site-packages/lamini/api/rest_requests.py:132\u001b[0m, in \u001b[0;36mmake_web_request\u001b[0;34m(key, url, http_method, json)\u001b[0m\n\u001b[1;32m    131\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 132\u001b[0m     \u001b[43mresp\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mraise_for_status\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    133\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m requests\u001b[38;5;241m.\u001b[39mexceptions\u001b[38;5;241m.\u001b[39mHTTPError \u001b[38;5;28;01mas\u001b[39;00m e:\n",
+      "File \u001b[0;32m~/.local/lib/python3.10/site-packages/requests/models.py:1021\u001b[0m, in \u001b[0;36mResponse.raise_for_status\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m   1020\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m http_error_msg:\n\u001b[0;32m-> 1021\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m HTTPError(http_error_msg, response\u001b[38;5;241m=\u001b[39m\u001b[38;5;28mself\u001b[39m)\n",
+      "\u001b[0;31mHTTPError\u001b[0m: 513 Server Error:  for url: https://api.lamini.ai/v1/completions",
+      "\nDuring handling of the above exception, another exception occurred:\n",
+      "\u001b[0;31mAPIError\u001b[0m                                  Traceback (most recent call last)",
+      "Cell \u001b[0;32mIn[23], line 22\u001b[0m\n\u001b[1;32m     17\u001b[0m \u001b[38;5;66;03m## Option 1: Use a named model to get an endpoint for requests\u001b[39;00m\n\u001b[1;32m     18\u001b[0m \u001b[38;5;66;03m## Models may not be pre-loaded in HF inference service - you will then see this error, so retry:\u001b[39;00m\n\u001b[1;32m     19\u001b[0m \u001b[38;5;66;03m##     Downloading the 'Intel/neural-chat-7b-v3-1' model. \u001b[39;00m\n\u001b[1;32m     20\u001b[0m \u001b[38;5;66;03m##     Please try again in a few minutes.\u001b[39;00m\n\u001b[1;32m     21\u001b[0m llm \u001b[38;5;241m=\u001b[39m lamini\u001b[38;5;241m.\u001b[39mLamini(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mIntel/neural-chat-7b-v3-1\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m---> 22\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[43mllm\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mgenerate\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mHow to convert inches to centimeters? Answer in 2 sentences\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m)\n",
+      "File \u001b[0;32m~/.python/current/lib/python3.10/site-packages/lamini/api/lamini.py:71\u001b[0m, in \u001b[0;36mLamini.generate\u001b[0;34m(self, prompt, model_name, output_type, max_tokens, max_new_tokens, callback, metadata)\u001b[0m\n\u001b[1;32m     63\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(prompt, \u001b[38;5;28mstr\u001b[39m) \u001b[38;5;129;01mor\u001b[39;00m (\u001b[38;5;28misinstance\u001b[39m(prompt, \u001b[38;5;28mlist\u001b[39m) \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(prompt) \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m1\u001b[39m):\n\u001b[1;32m     64\u001b[0m     req_data \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmake_llm_req_map(\n\u001b[1;32m     65\u001b[0m         prompt\u001b[38;5;241m=\u001b[39mprompt,\n\u001b[1;32m     66\u001b[0m         model_name\u001b[38;5;241m=\u001b[39mmodel_name \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmodel_name,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     69\u001b[0m         max_new_tokens\u001b[38;5;241m=\u001b[39mmax_new_tokens,\n\u001b[1;32m     70\u001b[0m     )\n\u001b[0;32m---> 71\u001b[0m     result \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcompletion\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mgenerate\u001b[49m\u001b[43m(\u001b[49m\u001b[43mreq_data\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m     72\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m output_type \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m     73\u001b[0m         \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(prompt, \u001b[38;5;28mlist\u001b[39m) \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(prompt) \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m1\u001b[39m:\n",
+      "File \u001b[0;32m~/.python/current/lib/python3.10/site-packages/lamini/api/utils/completion.py:15\u001b[0m, in \u001b[0;36mCompletion.generate\u001b[0;34m(self, params)\u001b[0m\n\u001b[1;32m     14\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mgenerate\u001b[39m(\u001b[38;5;28mself\u001b[39m, params):\n\u001b[0;32m---> 15\u001b[0m     resp \u001b[38;5;241m=\u001b[39m \u001b[43mmake_web_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m     16\u001b[0m \u001b[43m        \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mapi_key\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mapi_prefix\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mcompletions\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mpost\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mparams\u001b[49m\n\u001b[1;32m     17\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m     18\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m resp\n",
+      "File \u001b[0;32m~/.python/current/lib/python3.10/site-packages/lamini/api/rest_requests.py:183\u001b[0m, in \u001b[0;36mmake_web_request\u001b[0;34m(key, url, http_method, json)\u001b[0m\n\u001b[1;32m    181\u001b[0m             \u001b[38;5;28;01mif\u001b[39;00m description \u001b[38;5;241m==\u001b[39m {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mdetail\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m\"\u001b[39m}:\n\u001b[1;32m    182\u001b[0m                 \u001b[38;5;28;01mraise\u001b[39;00m APIError(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m500 Internal Server Error\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m--> 183\u001b[0m             \u001b[38;5;28;01mraise\u001b[39;00m APIError(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mAPI error \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mdescription\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m    185\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m resp\u001b[38;5;241m.\u001b[39mjson()\n",
+      "\u001b[0;31mAPIError\u001b[0m: API error {'detail': \"error_id: 243549526076307102879929981439376352577: Downloading the 'Intel/neural-chat-7b-v3-1' model. Please try again in a few minutes.\"}"
+     ]
+    }
+   ],
+   "source": [
+    "## Validate setup with a named model from Hugging Face \n",
+    "## By default the free-tier user has support for these base models (identified in error message)\n",
+    "'''\n",
+    "'hf-internal-testing/tiny-random-gpt2', \n",
+    "'EleutherAI/pythia-70m', 'EleutherAI/pythia-70m-deduped', 'EleutherAI/pythia-70m-v0', \n",
+    "'EleutherAI/pythia-70m-deduped-v0', 'EleutherAI/neox-ckpt-pythia-70m-deduped-v0', 'EleutherAI/neox-ckpt-pythia-70m-v1', \n",
+    "'EleutherAI/neox-ckpt-pythia-70m-deduped-v1', 'EleutherAI/gpt-neo-125m', 'EleutherAI/pythia-160m', \n",
+    "'EleutherAI/pythia-160m-deduped', 'EleutherAI/pythia-160m-deduped-v0', 'EleutherAI/neox-ckpt-pythia-70m', \n",
+    "'EleutherAI/neox-ckpt-pythia-160m', 'EleutherAI/neox-ckpt-pythia-160m-deduped-v1', 'EleutherAI/pythia-2.8b', \n",
+    "'EleutherAI/pythia-410m', 'EleutherAI/pythia-410m-v0', 'EleutherAI/pythia-410m-deduped', \n",
+    "'EleutherAI/pythia-410m-deduped-v0', 'EleutherAI/neox-ckpt-pythia-410m', 'EleutherAI/neox-ckpt-pythia-410m-deduped-v1', \n",
+    "'cerebras/Cerebras-GPT-111M', 'cerebras/Cerebras-GPT-256M', 'meta-llama/Llama-2-7b-hf', \n",
+    "'meta-llama/Llama-2-7b-chat-hf', 'meta-llama/Llama-2-13b-chat-hf', 'meta-llama/Llama-2-70b-chat-hf', \n",
+    "'Intel/neural-chat-7b-v3-1', 'mistralai/Mistral-7B-Instruct-v0.1', 'microsoft/phi-2'\n",
+    "'''\n",
+    "\n",
+    "## Option 1: Use a named model to get an endpoint for requests\n",
+    "## Models may not be pre-loaded in HF inference service - you will then see this error, so retry:\n",
+    "##     Downloading the 'cerebras/Cerebras-GPT-111M' model. \n",
+    "##     Please try again in a few minutes.\n",
+    "llm = lamini.Lamini(\"cerebras/Cerebras-GPT-111M\")\n",
+    "print(llm.generate(\"How to convert inches to centimeters? Answer in 2 sentences\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "2024-04-15:03:29:20,700 INFO     [lamini.py:33] Using 3.10 InferenceQueue Interface\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "' To convert inches to centimeters, you can multiply the number of inches by 2.54. For example, 1 inch is equal to 2.54 centimeters.'"
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "## Option 2: Use pre-defined Mistral runner\n",
+    "llm = lamini.MistralRunner()\n",
+    "llm(\"How to convert inches to centimeters? Answer in 2 sentences\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "2024-04-15:03:29:35,899 INFO     [lamini.py:33] Using 3.10 InferenceQueue Interface\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "'  Of course! To convert inches to centimeters, you can use the following conversion factor: 1 inch = 2.54 centimeters. Therefore, if you want to convert a measurement in inches to centimeters, you can simply multiply it by 2.54.'"
+      ]
+     },
+     "execution_count": 18,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "## Option 3: Use pre-defined LLama-2 runner\n",
+    "llama = lamini.LlamaV2Runner()\n",
+    "llama(\"How to convert inches to centimeters? Answer in 2 sentences\")"
+   ]
   }
  ],
  "metadata": {

diff --git a/notebooks/400/400-02-dl-fine-tuning.ipynb b/notebooks/400/400-02-dl-fine-tuning.ipynb
@@ -0,0 +1,127 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 400.9 | Fine Tuning Large Language Models\n",
+    "\n",
+    " **This notebook is for my personal use only** - all sources are cited. If you are following the same learning journey please reference original sources instead. Key Resources used include:\n",
+    "1.  [Fine Tuning Large Language Models](https://www.deeplearning.ai/short-courses/finetuning-large-language-models/), _DeepLearning.AI_ (2024)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "\n",
+    "## Learn: Concepts\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### What is Fine Tuning?\n",
+    "\n",
+    "Process of turning a general-purpose pre-trained model into a speciliazed version suited for a particular task. Analogy: a general practitioner (GP) vs. a specialist (cardiologist).\n",
+    "\n",
+    "Both fine-tuning and prompt-engineering are techniques to improve the quality of a model's response to a user request but differ in cost and context:\n",
+    " - Prompt engineering is easier to implement, has less upfront cost\n",
+    " - Prompt engineering has data limitations (fewer examples), more hallucinations\n",
+    " - Fine-tuning is more effective but has upfront compute & data processing costs\n",
+    " - Fine-tuning requires high-quality data and more expertise in model training\n",
+    "\n",
+    "### Why Fine Tune?\n",
+    "1. More cost-effective - (per-request) frees up space used by examples, context\n",
+    "1. More consistent outputs - understands app requirements, response formats\n",
+    "1. Reduce hallucinations - grounded in relevant data, critical for enterprise\n",
+    "1. Improve data privacy - reduce breaches, data leakage in training\n",
+    "1. Better performance - reliability, lower latency, better moderation options"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "\n",
+    "## Apply: Tasks\n",
+    "\n",
+    "> The following exercises should help walk through the entire process of fine-tuning a large language model using a specific provider and model endpoint."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "\n",
+    "### T1: Setup Dev Environment\n",
+    "\n",
+    "To explore ideas in practice, we need access to a relevant Large Language Model (LLM) and provider-hosted endpoint (API). Use the [LLM Setup](./400-00-aoia-intro.ipynb) notebook to configure environment variables and validate setup for supported providers including:\n",
+    " - Open AI\n",
+    " - Azure Open AI\n",
+    " - Hugging Face\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "\n",
+    "### T2: Lamini Example\n",
+    "\n",
+    "The example from the DeepLearning.AI course uses the following libraries:\n",
+    "- PyTorch (Meta) - lowest level\n",
+    "- Transformers (Hugging Face) - abstracts PyTorch for easier use\n",
+    "- Llama (Lamini) - abstracts working with LLama models\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Install the lamini package\n",
+    "import lamini\n",
+    "import os\n",
+    "lamini.api_key = os.getenv(\"LAMINI_API_KEY\")\n",
+    "\n",
+    "# Test the installation\n",
+    "from llama import BasicModelRunner\n",
+    "non_ft_model = BasicModelRunner(\"meta-llama/LLama-3-7b-hf\")\n",
+    "print(non_ft_model(\"Oh say can you see\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/...oks/400/400-25-dl-opensource-models.ipynb → ...oks/400/400-03-dl-opensource-models.ipynb b/...oks/400/400-25-dl-opensource-models.ipynb → ...oks/400/400-03-dl-opensource-models.ipynb