Update docs for LLM app deployment.

oracle · Oct 25, 2024 · 0668f8a · 0668f8a
1 parent ad34d42
commit 0668f8a
Show file tree

Hide file tree

Showing 4 changed files with 180 additions and 127 deletions.
diff --git a/docs/source/user_guide/large_language_model/deploy_langchain_application.rst b/docs/source/user_guide/large_language_model/deploy_langchain_application.rst
@@ -1,171 +1,224 @@
-############################
-Deploy LangChain Application
-############################
+##################################
+Deploy LLM Applications and Agents
+##################################
 
-Oracle ADS supports the deployment of LangChain application to OCI data science model deployment and you can easily do so just by writing a few lines of code.
+Oracle ADS supports the deployment of LLM applications and agents, including LangChain application to OCI data science model deployment.
 
-.. versionadded:: 2.9.1
-
-.. admonition:: Installation
+.. admonition:: IAM Policies
   :class: note
 
-  It is important to note that for ADS to serialize and deploy the LangChain application, all components used to build the application must be serializable. For more information regarding LLMs model serialization, see `here <https://python.langchain.com/docs/modules/model_io/llms/llm_serialization>`_.
+  Ensure that you have configured the necessary `policies for model deployments <https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-policies-auth.htm#model_dep_policies_auth>`_. 
+  For example, the following policy allows the dynamic group to use ``resource_principal`` to create model deployment.
 
-Configuration
-*************
+  .. code-block:: shell
 
-Ensure that you have created the necessary `policies, authentication, and authorization for model deployments <https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-policies-auth.htm#model_dep_policies_auth>`_. 
-For example, the following policy allows the dynamic group to use ``resource_principal`` to create model deployment.
+      allow dynamic-group <dynamic-group-name> to manage data-science-model-deployments in compartment <compartment-name>
 
-.. code-block:: shell
+The process of deploying LLM apps and agents involves:
 
-    allow dynamic-group <dynamic-group-name> to manage data-science-model-deployments in compartment <compartment-name>
+* Prepare your applications as model artifact
+* Register the model artifact with OCI Data Science Model Catalog
+* Build container image with dependencies, and push the image to OCI Container Registry
+* Deploy the model artifact using the container image with OCI Data Science Model Deployment
 
-LangChain Application
-*********************
+To get you started, we provide templates for `model artifacts <https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/LLM/deployment/model_artifacts>`_ and `container image <https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/LLM/deployment/container>`_, so that you can focus on building you applications and agents.
 
-Following is a simple LangChain application that build with a prompt template and large language model API. Here the ``Cohere`` model is used as an example. You may replace it with any other LangChain compatible LLM, including OCI Generative AI service.
+.. figure:: figures/workflow.png
+  :width: 800
 
-.. code-block:: python3
+Prepare Model Artifacts
+***********************
 
-    import os
-    from langchain.llms import Cohere
-    from langchain.chains import LLMChain
-    from langchain.prompts import PromptTemplate
-    # Remember to replace the ``<cohere_api_key>`` with the actual cohere api key.
-    os.environ["COHERE_API_KEY"] = "<cohere_api_key>"
-    
-    cohere = Cohere()
-    prompt = PromptTemplate.from_template("Tell me a joke about {subject}")
-    llm_chain = LLMChain(prompt=prompt, llm=cohere, verbose=True)
+You can prepare your model artifact based on the `model artifact template <https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/LLM/deployment/model_artifacts>`_
 
-Now you have a LangChain object ``llm_chain``. Try running it with the input ``{"subject": "animals"}`` and it should return a joke about animals.
+First, create a template folder locally with the `score.py <https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/LLM/deployment/model_artifacts/score.py>`_ file. For example, we can call it ``llm_apps_template``.
 
-.. code-block:: python3
+.. code-block::
 
-    llm_chain.invoke({"subject": "animals"})
+  llm_apps_template
+  ├── score.py
 
-Initialize the ChainDeployment
-******************************
+The ``score.py`` serves as an agent for invoking your application with JSON payload.
 
-ADS provides the ``ChainDeployment`` to handle the deployment of LangChain applications.
-You can initialize ``ChainDeployment`` with the LangChain object ``llm_chain`` from previous section as parameter.
-The ``artifact_dir`` is an optional parameter which points to the folder where the model artifacts will be put locally.
-In this example, we're using a temporary folder generated by ``tempfile.mkdtemp()``.
+Next, you can use ADS to create a generic model and save a copy of the template to a anther folder (e.g. ``my_apps``), which will be uploaded as model artifact.
 
-.. code-block:: python3
+.. code-block:: python
 
-    import tempfile
-    from ads.llm.deploy import ChainDeployment
-    
-    artifact_dir = tempfile.mkdtemp()
-    
-    chain_deployment = ChainDeployment(
-        chain=llm_chain,
-        artifact_dir=artifact_dir
-    )
-
-Prepare the Model Artifacts
-***************************
+  from ads.model.generic_model import GenericModel
 
-Call ``prepare`` from ``ChainDeployment`` to generate the ``score.py`` and serialize the LangChain application to ``chain.yaml`` file under ``artifact_dir`` folder. 
-Parameters ``inference_conda_env`` and ``inference_python_version`` are passed to define the conda environment where your LangChain application will be running on OCI cloud. 
-Here, replace ``custom_conda_environment_uri`` with your conda environment uri that has the latest ADS 2.9.1 and replace ``python_version`` with your conda environment python version.
+  llm_app = GenericModel.from_model_artifact(
+      uri="llm_apps_template", # Contains the model artifact templates
+      artifact_dir="my_apps",  # Location for the new model artifacts
+      model_input_serializer="cloudpickle"
+  )
+  llm_app.reload_runtime_info()
 
-.. note::
-    For how to customize and publish conda environment, take reference to `Publishing a Conda Environment to an Object Storage Bucket <https://docs.oracle.com/en-us/iaas/data-science/using/conda_publishs_object.htm>`_
+Then, you can add your own applications to the my_apps folder. Here are some requirements:
+* Each application should be a Python module.
+* Each module should have an ``invoke()`` function as the entrypoint.
+* The ``invoke()`` function should take a dictionary and return another dictionary.
 
-.. code-block:: python3
+For example, following is an example LangChain application to translate English into French using a prompt template and output parser:
 
-    chain_deployment.prepare(
-        inference_conda_env="<custom_conda_environment_uri>",
-        inference_python_version="<python_version>",
-    )
-
-Below is the ``chain.yaml`` file that was saved from ``llm_chain`` object.
-
-.. code-block:: YAML
-    
-    _type: llm_chain
-    llm:
-      _type: cohere
-      frequency_penalty: 0.0
-      k: 0
-      max_tokens: 256
-      model: null
-      p: 1
-      presence_penalty: 0.0
-      temperature: 0.75
-      truncate: null
-    llm_kwargs: {}
-    memory: null
-    metadata: null
-    output_key: text
-    output_parser:
-      _type: default
-    prompt:
-      _type: prompt
-      input_types: {}
-      input_variables:
-      - subject
-      output_parser: null
-      partial_variables: {}
-      template: Tell me a joke about {subject}
-      template_format: f-string
-      validate_template: false
-    return_final_only: true
-    tags: null
-    verbose: true
-
-Verify the Serialized Application
-*********************************
-
-Verify the serialized application by calling ``verify()`` to make sure it is working as expected.
-There will be error if your application is not fully serializable.
+.. code-block:: python
+  
+  import os
+  import ads
+  from langchain_core.prompts import ChatPromptTemplate
+  from langchain_core.output_parsers import StrOutputParser
+  from ads.llm import ChatOCIModelDeploymentVLLM
 
-.. code-block:: python3
 
-    chain_deployment.verify({"subject": "animals"})
+  ads.set_auth(auth="resource_principal")
 
-Save Artifacts to OCI Model Catalog
-***********************************
 
-Call ``save`` to pack and upload the artifacts under ``artifact_dir`` to OCI data science model catalog. Once the artifacts are successfully uploaded, you should be able to see the id of the model.
+  llm = ChatOCIModelDeploymentVLLM(
+      model="odsc-llm",
+      # LLM_ENDPOINT environment variable should be set to a model deployment endpoint.
+      endpoint=os.environ["LLM_ENDPOINT"],
+      # Optionally you can specify additional keyword arguments for the model, e.g. temperature.
+      temperature=0.1,
+  )
 
-.. code-block:: python3
+  prompt = ChatPromptTemplate.from_messages(
+      [
+          (
+              "human",
+              "You are a helpful assistant to translate English into French. Response only the translation.\n"
+              "{input}",
+          ),
+      ]
+  )
+
+  chain = prompt | llm | StrOutputParser()
+
+  def invoke(message):
+      return chain.invoke({"input": message})
+
+The ``llm`` model in this example uses a chat model deployed with `AI Quick Actions <https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/ai-quick-actions/model-deployment-tips.md>`_.
+
+You can find a few example applications in the `model artifact template <https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/LLM/deployment/model_artifacts>`_, including `tool calling with OCI generative AI <https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/LLM/deployment/model_artifacts/exchange_rate.py>`_ and `LangGraph multi-agent example <https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/LLM/deployment/model_artifacts/graph.py>`_.
+
+Once you added your application, you can call the ``verify()`` function to test/debug it locally:
+
+.. code-block:: python
+
+  llm_app.verify({
+      "inputs": "Hello!",
+      "module": "translate.py"
+  })
+
+Note that with the default ``score.py`` template, you will invoke your application with two keys:
+
+* ``module``: The module in the model artifact (``my_apps`` folder) containing the application to be invoked. Here we are using the ``translate.py`` example. You can specify a default module using the ``DEFAULT_MODULE`` environment variables.
+* ``inputs``: the value should be the payload for your application module. This example uses a string. However, you can use list or other JSON payload for your application.
+
+The response will have the following format:
 
-    chain_deployment.save(display_name="LangChain Model")
+.. code-block:: python
 
-Deploy the Model
-****************
+  {
+      "outputs": "The outputs returned by invoking your app/agent",
+      "error": "Error message, if any.",
+      "traceback": "Traceback, if any.",
+      "id": "The ID for identifying the request.",
+  }
 
-Deploy the LangChain model from previous step by calling ``deploy``. Remember to replace the ``<cohere_api_key>`` with the actual cohere api key in the ``environment_variables``. 
-It usually takes a couple of minutes to deploy the model and you should see the model deployment in the output once the process completes.
+If there is an error when invoking your app/agent, the ``error`` message along with the ``traceback`` will be returned in the response.
+
+Register the Model Artifact
+***************************
+
+Once your apps and agents are ready, you need save it to OCI Data Science Model Catalog before deployment:
 
 .. code-block:: python3
 
-    chain_deployment.deploy(
-        display_name="LangChain Model Deployment",
-        environment_variables={"COHERE_API_KEY":"<cohere_api_key>"},
-    )
+  llm_app.save(display_name="LLM Apps", ignore_introspection=True)
+
+
+Build Container Image
+*********************
+
+Before deploying the model, you will need to build a container image with the dependencies for your apps and agents.
+
+To configure your environment for pushing image to OCI container registry (OCIR). Please refer to the OCIR documentation for `Pushing Images Using the Docker CLI <https://docs.oracle.com/en-us/iaas/Content/Registry/Tasks/registrypushingimagesusingthedockercli.htm>`.
+
+The `container image template <https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/LLM/deployment/container>`_ contains files for building a container image for OCI Data Model Deployment service. You can add your dependencies into the ``requirement.txt`` file. You may also modify the ``Dockerfile`` if you need to add system libraries.
+
+```bash
+docker build -t <image-name:tag> .
+```
 
-Invoke the Deployed Model
-*************************
+Once the image is built, you can push it to OCI container registry.
+```bash
+docker push <image-name:tag>
+```
 
-Now the OCI data science model deployment endpoint is ready and you can invoke it to ``tell a joke about animals``.
+Deploy as Model Deployment
+**************************
+
+To deploy the model, simply call the ``deploy()`` function with your settings:
+* For most application, a CPU shape would be sufficient.
+* Specify log group and log OCID to enable logging for the deployment.
+* `Custom networking <https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-create-cus-net.htm>`_ with internet access is required for accessing external APIs or OCI Generative AI APIs in a different region.
+* Add environments variables as needed by your application, including any API keys or endpoints.
+* You may set the ``DEFAULT_MODULE`` for invoking the default app
 
 .. code-block:: python3
 
-    chain_deployment.predict(data={"subject": "animals"})["output"]
+  import os
+
+  generic_model.deploy(
+      display_name="LLM Apps",
+      deployment_instance_shape="VM.Standard.E4.Flex",
+      deployment_log_group_id="<log_group_ocid>",
+      deployment_predict_log_id="<log_ocid>",
+      deployment_access_log_id="<log_ocid>",
+      deployment_image="<image-name:tag>",
+      # Custom networking with internet access is needed for external API calls.
+      deployment_instance_subnet_id="<subnet_ocid>",
+      # Add environments variables as needed by your application.
+      # Following are just examples
+      environment_variables={
+          "TAVILY_API_KEY": os.environ["TAVILY_API_KEY"],
+          "PROJECT_COMPARTMENT_OCID": os.environ["PROJECT_COMPARTMENT_OCID"],
+          "LLM_ENDPOINT": os.environ["LLM_ENDPOINT"],
+          "DEFAULT_MODULE": "translate.py",
+      }
+  )
+
+Invoking the Deployment
+***********************
+
+Once the deployment is active, you can invoke the application with HTTP requests. For example:
 
-.. figure:: figures/prediction.png
-  :width: 800
+.. code-block:: python3
 
-Alternatively, you can use OCI CLI to invoke the model deployment. Remember to replace the ``langchain_application_model_deployment_url`` with the actual model deployment url which you can find in the output from deploy step.
+  import oci
+  import requests
 
-.. code-block:: shell
+  response = requests.post(
+      endpoint,
+      json={
+          "inputs": "Hello!",
+      },
+      auth=oci.auth.signers.get_resource_principals_signer()
+  )
+  response.json()
+
+The response will be similar to the following:
 
-    oci raw-request --http-method POST --target-uri <langchain_application_model_deployment_url>/predict --request-body '{"subject": "animals"}' --auth resource_principal
+.. code-block:: python3
+
+  {
+      'error': None,
+      'id': 'fa3d7111-326f-4736-a8f4-ed5b21654534',
+      'outputs': 'Bonjour!',
+      'traceback': None
+  }
+
+Alternatively, you can use OCI CLI to invoke the model deployment. Remember to replace the ``model_deployment_url`` with the actual model deployment url, which you can find in the output from deploy step.
+
+.. code-block:: shell
 
-.. figure:: figures/cli_prediction.png
-  :width: 800
+    oci raw-request --http-method POST --target-uri <model_deployment_url>/predict --request-body '{"input": "Hello!"}' --auth resource_principal
diff --git a/docs/source/user_guide/large_language_model/figures/cli_prediction.png b/docs/source/user_guide/large_language_model/figures/cli_prediction.png
diff --git a/docs/source/user_guide/large_language_model/figures/prediction.png b/docs/source/user_guide/large_language_model/figures/prediction.png
diff --git a/docs/source/user_guide/large_language_model/figures/workflow.png b/docs/source/user_guide/large_language_model/figures/workflow.png