diff --git a/articles/ai-services/cognitive-services-container-support.md b/articles/ai-services/cognitive-services-container-support.md index 9f3376cdde..c74846de3e 100644 --- a/articles/ai-services/cognitive-services-container-support.md +++ b/articles/ai-services/cognitive-services-container-support.md @@ -15,7 +15,7 @@ keywords: on-premises, Docker, container, Kubernetes # What are Azure AI containers? -Azure AI services provides several [Docker containers](https://www.docker.com/what-container) that let you use the same APIs that are available in Azure, on-premises. Using these containers gives you the flexibility to bring Azure AI services closer to your data for compliance, security or other operational reasons. Container support is currently available for a subset of Azure AI services. +Azure AI services provide several [Docker containers](https://www.docker.com/what-container) that let you use the same APIs that are available in Azure, on-premises. Using these containers gives you the flexibility to bring Azure AI services closer to your data for compliance, security or other operational reasons. Container support is currently available for a subset of Azure AI services. > [!VIDEO https://www.youtube.com/embed/hdfbn4Q8jbo] @@ -48,7 +48,7 @@ Azure AI containers provide the following set of Docker containers, each of whic | Service | Container | Description | Availability | |--|--|--|--| | [LUIS][lu-containers] | **LUIS** ([image](https://mcr.microsoft.com/product/azure-cognitive-services/language/luis/about)) | Loads a trained or published Language Understanding model, also known as a LUIS app, into a docker container and provides access to the query predictions from the container's API endpoints. You can collect query logs from the container and upload these back to the [LUIS portal](https://www.luis.ai) to improve the app's prediction accuracy. | Generally available.
This container can also [run in disconnected environments](containers/disconnected-containers.md). | -| [Language service][ta-containers-keyphrase] | **Key Phrase Extraction** ([image](https://mcr.microsoft.com/product/azure-cognitive-services/textanalytics/keyphrase/about)) | Extracts key phrases to identify the main points. For example, for the input text "The food was delicious and there were wonderful staff", the API returns the main talking points: "food" and "wonderful staff". | Generally available.
This container can also [run in disconnected environments](containers/disconnected-containers.md). | +| [Language service][ta-containers-keyphrase] | **Key Phrase Extraction** ([image](https://mcr.microsoft.com/product/azure-cognitive-services/textanalytics/keyphrase/about)) | Extracts key phrases to identify the main points. For example, for the input text "The food was delicious and there were wonderful staff," the API returns the main talking points: "food" and "wonderful staff". | Generally available.
This container can also [run in disconnected environments](containers/disconnected-containers.md). | | [Language service][ta-containers-language] | **Text Language Detection** ([image](https://mcr.microsoft.com/product/azure-cognitive-services/textanalytics/language/about)) | For up to 120 languages, detects which language the input text is written in and report a single language code for every document submitted on the request. The language code is paired with a score indicating the strength of the score. | Generally available.
This container can also [run in disconnected environments](containers/disconnected-containers.md). | | [Language service][ta-containers-sentiment] | **Sentiment Analysis** ([image](https://mcr.microsoft.com/product/azure-cognitive-services/textanalytics/sentiment/about)) | Analyzes raw text for clues about positive or negative sentiment. This version of sentiment analysis returns sentiment labels (for example *positive* or *negative*) for each document and sentence within it. | Generally available.
This container can also [run in disconnected environments](containers/disconnected-containers.md). | | [Language service][ta-containers-health] | **Text Analytics for health** ([image](https://mcr.microsoft.com/product/azure-cognitive-services/textanalytics/healthcare/about))| Extract and label medical information from unstructured clinical text. | Generally available | diff --git a/articles/ai-services/openai/assistants-reference-messages.md b/articles/ai-services/openai/assistants-reference-messages.md index 2d95c2088f..dd15ce091e 100644 --- a/articles/ai-services/openai/assistants-reference-messages.md +++ b/articles/ai-services/openai/assistants-reference-messages.md @@ -21,7 +21,7 @@ This article provides reference documentation for Python and REST for the new As ## Create message ```http -POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages?api-version=2024-05-01-preview +POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages?api-version=2024-08-01-preview ``` Create a message. @@ -38,7 +38,7 @@ Create a message. |--- |--- |--- |--- | | `role` | string | Required | The role of the entity that is creating the message. Can be `user` or `assistant`. `user` indicates the message is sent by an actual user and should be used in most cases to represent user-generated messages. `assistant` indicates the message is generated by the assistant. Use this value to insert messages from the assistant into the conversation. | | `content` | string | Required | The content of the message. | -| `file_ids` | array | Optional | A list of File IDs that the message should use. There can be a maximum of 10 files attached to a message. Useful for tools like retrieval and code_interpreter that can access and use files. | +| `attachments` | array | Optional | A list of files attached to the message, and the tools they should be added to. | | `metadata` | map | Optional | Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long. | ### Returns @@ -54,7 +54,7 @@ from openai import AzureOpenAI client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -69,7 +69,7 @@ print(thread_message) # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' \ -d '{ @@ -83,7 +83,7 @@ curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/mess ## List messages ```http -GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages?api-version=2024-05-01-preview +GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages?api-version=2024-08-01-preview ``` Returns a list of messages for a given thread. @@ -102,6 +102,7 @@ Returns a list of messages for a given thread. | `limit` | integer | Optional - Defaults to 20 |A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.| | `order` | string | Optional - Defaults to desc |Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.| | `after` | string | Optional | A cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.| +| `run_id` | string | Optionanl | Filter messages by the run ID that generated them. | | `before` | string | Optional | A cursor for use in pagination. before is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include before=obj_foo in order to fetch the previous page of the list.| ### Returns @@ -117,7 +118,7 @@ from openai import AzureOpenAI client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -129,7 +130,7 @@ print(thread_messages.data) # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' ``` @@ -139,7 +140,7 @@ curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/mess ## Retrieve message ```http -GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages/{message_id}?api-version=2024-05-01-preview +GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages/{message_id}?api-version=2024-08-01-preview ``` Retrieves a message file. @@ -180,7 +181,7 @@ print(message) # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages/{message_id}?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages/{message_id}?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' ``` @@ -190,7 +191,7 @@ curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/mess ## Modify message ```http -POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages/{message_id}?api-version=2024-05-01-preview +POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages/{message_id}?api-version=2024-08-01-preview ``` Modifies a message. @@ -206,7 +207,7 @@ Modifies a message. |Parameter| Type | Required | Description | |---|---|---|---| -| metadata | map| Optional | Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.| +| `metadata` | map| Optional | Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.| ### Returns @@ -219,7 +220,7 @@ from openai import AzureOpenAI client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -237,7 +238,7 @@ print(message) # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages/{message_id}?api-version=2024-05-01-preview +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages/{message_id}?api-version=2024-08-01-preview ``` \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' \ @@ -256,7 +257,7 @@ curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/mess ```http -DELETE https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages/{message_id}?api-version=2024-05-01-preview +DELETE https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages/{message_id}?api-version=2024-08-01-preview ``` Deletes a message. @@ -278,7 +279,7 @@ The deletion status of the [message](#message-object) object. from openai import AzureOpenAI client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -292,7 +293,7 @@ print(deleted_message) # [REST](#tab/rest) ```console -curl -x DELETE https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages/{message_id}?api-version=2024-05-01-preview \ +curl -x DELETE https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/messages/{message_id}?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' ``` diff --git a/articles/ai-services/openai/assistants-reference-runs.md b/articles/ai-services/openai/assistants-reference-runs.md index 977bdad81f..2f5514b8a8 100644 --- a/articles/ai-services/openai/assistants-reference-runs.md +++ b/articles/ai-services/openai/assistants-reference-runs.md @@ -5,9 +5,9 @@ description: Learn how to use Azure OpenAI's Python & REST API runs with Assista manager: nitinme ms.service: azure-ai-openai ms.topic: conceptual -ms.date: 04/16/2024 -author: mrbullwinkle -ms.author: mbullwin +ms.date: 09/17/2024 +author: aahill +ms.author: aahi recommendations: false ms.custom: devx-track-python --- @@ -21,7 +21,7 @@ This article provides reference documentation for Python and REST for the new As ## Create run ```http -POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs?api-version=2024-05-01-preview +POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs?api-version=2024-08-01-preview ``` Create a run. @@ -39,8 +39,19 @@ Create a run. | `assistant_id` | string | Required | The ID of the assistant to use to execute this run. | | `model` | string or null | Optional | The model deployment name to be used to execute this run. If a value is provided here, it will override the model deployment name associated with the assistant. If not, the model deployment name associated with the assistant will be used. | | `instructions` | string or null | Optional | Overrides the instructions of the assistant. This is useful for modifying the behavior on a per-run basis. | +| `additional_instructions` | string | Optional | Appends additional instructions at the end of the instructions for the run. This is useful for modifying the behavior on a per-run basis without overriding other instructions. | +| `additional_messages` | array | Optional | Adds additional messages to the thread before creating the run. | | `tools` | array or null | Optional | Override the tools the assistant can use for this run. This is useful for modifying the behavior on a per-run basis. | | `metadata` | map | Optional | Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long. | +| `temperature` | number | Optional | What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Default is 1. | +| `top_p` | number | Optional | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. Default is 1. | +| `stream` | boolean | optional | If `true`, returns a stream of events that happen during the Run as server-sent events, terminating when the Run enters a terminal state with a `data: [DONE]` message. | +| `max_prompt_tokens` | integer | optional | The maximum number of completion tokens that might be used over the course of the run. The run will make a best effort to use only the number of completion tokens specified, across multiple turns of the run. If the run exceeds the number of completion tokens specified, the run will end with status `incomplete`. | +| `max_completion_tokens` | integer | optional | The maximum number of completion tokens that might be used over the course of the run. The run will make a best effort to use only the number of completion tokens specified, across multiple turns of the run. If the run exceeds the number of completion tokens specified, the run will end with status `incomplete`. | +| `truncation_strategy` | [truncationObject](#truncation-object) | optional | Controls for how a thread will be truncated prior to the run. Use this to control the initial context window of the run. | +| `tool_choice` | string or object | optional | Controls which (if any) tool is called by the model. A `none` value means the model won't call any tools and instead generates a message. `auto` is the default value and means the model can pick between generating a message or calling a tool. Specifying a particular tool like `{"type": "file_search"}` or `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that tool. | +| `response_format` | string or object | optional | Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since `gpt-3.5-turbo-1106`.
Setting to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON.
**Important**: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model might generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Also note that the message content might be partially cut off if `finish_reason="length"`, which indicates the generation exceeded `max_tokens` or the conversation exceeded the max context length. | + ### Returns @@ -55,7 +66,7 @@ from openai import AzureOpenAI client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -69,7 +80,7 @@ print(run) # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' \ -d '{ @@ -82,7 +93,7 @@ curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs ## Create thread and run ```http -POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/runs?api-version=2024-05-01-preview +POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/runs?api-version=2024-08-01-preview ``` Create a thread and run it in a single request. @@ -97,6 +108,14 @@ Create a thread and run it in a single request. | `instructions` | string or null | Optional | Override the default system message of the assistant. This is useful for modifying the behavior on a per-run basis.| | `tools` | array or null | Optional | Override the tools the assistant can use for this run. This is useful for modifying the behavior on a per-run basis.| | `metadata` | map | Optional | Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.| +| `temperature` | number | Optional | What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Default is 1. | +| `top_p` | number | Optional | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. Default is 1. | +| `stream` | boolean | optional | If `true`, returns a stream of events that happen during the Run as server-sent events, terminating when the Run enters a terminal state with a `data: [DONE]` message. | +| `max_prompt_tokens` | integer | optional | The maximum number of completion tokens that might be used over the course of the run. The run will make a best effort to use only the number of completion tokens specified, across multiple turns of the run. If the run exceeds the number of completion tokens specified, the run will end with status `incomplete`. | +| `max_completion_tokens` | integer | optional | The maximum number of completion tokens that might be used over the course of the run. The run will make a best effort to use only the number of completion tokens specified, across multiple turns of the run. If the run exceeds the number of completion tokens specified, the run will end with status `incomplete`. | +| `truncation_strategy` | [truncationObject](#truncation-object) | optional | Controls for how a thread will be truncated prior to the run. Use this to control the initial context window of the run. | +| `tool_choice` | string or object | optional | Controls which (if any) tool is called by the model. A `none` value means the model won't call any tools and instead generates a message. `auto` is the default value and means the model can pick between generating a message or calling a tool. Specifying a particular tool like `{"type": "file_search"}` or `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that tool. | +| `response_format` | string or object | optional | Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since `gpt-3.5-turbo-1106`.
Setting to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON.
**Important**: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model might generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Also note that the message content might be partially cut off if `finish_reason="length"`, which indicates the generation exceeded `max_tokens` or the conversation exceeded the max context length. | ### Returns @@ -111,7 +130,7 @@ from openai import AzureOpenAI client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -128,7 +147,7 @@ run = client.beta.threads.create_and_run( # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/runs?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/runs?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' \ -d '{ @@ -146,7 +165,7 @@ curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/runs?api-version ## List runs ```http -GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs?api-version=2024-05-01-preview +GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs?api-version=2024-08-01-preview ``` Returns a list of runs belonging to a thread. @@ -179,7 +198,7 @@ from openai import AzureOpenAI client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -202,7 +221,7 @@ curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs ## List run steps ```http -GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}/steps?api-version=2024-05-01-preview +GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}/steps?api-version=2024-08-01-preview ``` Returns a list of steps belonging to a run. @@ -236,7 +255,7 @@ from openai import AzureOpenAI client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -250,7 +269,7 @@ print(run_steps) # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}/steps?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}/steps?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' ``` @@ -276,7 +295,7 @@ print(run) # [REST](#tab/rest) ```http -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}?api-version=2024-05-01-preview +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}?api-version=2024-08-01-preview -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' ``` @@ -305,7 +324,7 @@ from openai import AzureOpenAI client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -319,7 +338,7 @@ print(run) # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' ``` @@ -329,7 +348,7 @@ curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs ## Retrieve run step ```http -GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}/steps/{step_id}?api-version=2024-05-01-preview +GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}/steps/{step_id}?api-version=2024-08-01-preview ``` Retrieves a run step. @@ -355,7 +374,7 @@ from openai import AzureOpenAI client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -370,7 +389,7 @@ print(run_step) # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}/steps/{step_id}?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}/steps/{step_id}?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' ``` @@ -380,7 +399,7 @@ curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs ## Modify run ```http -POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}?api-version=2024-05-01-preview +POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}?api-version=2024-08-01-preview ``` Modifies a run. @@ -411,7 +430,7 @@ from openai import AzureOpenAI client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -426,7 +445,7 @@ print(run) # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' -d '{ @@ -441,7 +460,7 @@ curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs ## Submit tool outputs to run ```http -POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}/submit_tool_outputs?api-version=2024-05-01-preview +POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}/submit_tool_outputs?api-version=2024-08-01-preview ``` When a run has the status: "requires_action" and required_action.type is submit_tool_outputs, this endpoint can be used to submit the outputs from the tool calls once they're all completed. All outputs must be submitted in a single request. @@ -458,6 +477,7 @@ When a run has the status: "requires_action" and required_action.type is submit_ |Name | Type | Required | Description | |--- |--- |--- |--- | | `tool_outputs` | array | Required | A list of tools for which the outputs are being submitted. | +| `stream` | boolean | Optional | If `true`, returns a stream of events that happen during the Run as server-sent events, terminating when the Run enters a terminal state with a `data: [DONE]` message. | ### Returns @@ -472,7 +492,7 @@ from openai import AzureOpenAI client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -492,7 +512,7 @@ print(run) # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}/submit_tool_outputs?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}/submit_tool_outputs?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' \ -d '{ @@ -511,7 +531,7 @@ curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs ## Cancel a run ```http -POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}/cancel?api-version=2024-05-01-preview +POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}/cancel?api-version=2024-08-01-preview ``` Cancels a run that is in_progress. @@ -536,7 +556,7 @@ from openai import AzureOpenAI client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -550,7 +570,7 @@ print(run) # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}/cancel?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}/runs/{run_id}/cancel?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' \ -X POST @@ -586,7 +606,9 @@ Represents an execution run on a thread. | `max_prompt_tokens` | integer or null | The maximum number of prompt tokens specified to have been used over the course of the run. | | `max_completion_tokens` | integer or null | The maximum number of completion tokens specified to have been used over the course of the run. | | `usage` | object or null | Usage statistics related to the run. This value will be null if the run is not in a terminal state (for example `in_progress`, `queued`). | - +| `truncation_strategy` | object | Controls for how a thread will be truncated prior to the run. | +| `response_format` | string | The format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since `gpt-3.5-turbo-1106`. | +| `tool_choice` | string | Controls which (if any) tool is called by the model. `none` means the model won't call any tools and instead generates a message. `auto` is the default value and means the model can pick between generating a message or calling a tool. | ## Run step object @@ -663,6 +685,14 @@ with client.beta.threads.runs.stream( stream.until_done() ``` +## Truncation object + +Controls for how a thread will be truncated prior to the run. Use this to control the initial context window of the run. + +| Name | Type | Description | Required | +|--- |--- |--- | +| `type` | string | The truncation strategy to use for the thread. The default is `auto`. If set to `last_messages`, the thread will be truncated to the n most recent messages in the thread. When set to `auto`, messages in the middle of the thread will be dropped to fit the context length of the model, `max_prompt_tokens`. | Yes | +| `last_messages` | integer | The number of most recent messages from the thread when constructing the context for the run. | No | ## Message delta object @@ -720,4 +750,4 @@ Events are emitted whenever a new object is created, transitions to a new state, | `thread.message.completed` | `data` is a message. | Occurs when a message is completed. | | `thread.message.incomplete` | `data` is a message. | Occurs when a message ends before it is completed. | | `error` | `data` is an error. | Occurs when an error occurs. This can happen due to an internal server error or a timeout. | -| `done` | `data` is `[DONE]` | Occurs when a stream ends. | \ No newline at end of file +| `done` | `data` is `[DONE]` | Occurs when a stream ends. | diff --git a/articles/ai-services/openai/assistants-reference-threads.md b/articles/ai-services/openai/assistants-reference-threads.md index cd6416f604..06eb0a258f 100644 --- a/articles/ai-services/openai/assistants-reference-threads.md +++ b/articles/ai-services/openai/assistants-reference-threads.md @@ -5,9 +5,9 @@ description: Learn how to use Azure OpenAI's Python & REST API threads with Assi manager: nitinme ms.service: azure-ai-openai ms.topic: conceptual -ms.date: 05/20/2024 -author: mrbullwinkle -ms.author: mbullwin +ms.date: 09/17/2024 +author: aahill +ms.author: aahi recommendations: false ms.custom: devx-track-python --- @@ -21,7 +21,7 @@ This article provides reference documentation for Python and REST for the new As ## Create a thread ```http -POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads?api-version=2024-05-01-preview +POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads?api-version=2024-08-01-preview ``` Create a thread. @@ -32,6 +32,22 @@ Create a thread. |--- |--- |--- |--- | |`messages`|array| Optional | A list of messages to start the thread with. | |`metadata`| map | Optional | Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long. | +| `tool_resources` | [object](#tool_resources-properties) | Optional | A set of resources that are made available to the assistant's tools in this thread. The resources are specific to the type of tool. For example, the `code_interpreter` tool requires a list of file IDs, while the `file_search` tool requires a list of vector store IDs. | + +### tool_resources properties + +**code_interpreter** + +| Name | Type | Description | Default | +|--- |--- |--- |--- | +| `file_ids` | array | A list of file IDs made available to the code_interpreter tool. There can be a maximum of 20 files associated with the tool. | `[]` | + +**file_search** + +| Name | Type | Description | Default | +|--- |--- |--- |--- | +| `vector_store_ids` | array | The vector store attached to this thread. There can be a maximum of 1 vector store attached to the thread. | `[]` | +| `vector_stores` | array | A helper to create a vector store with file_ids and attach it to this thread. There can be a maximum of 1 vector store attached to the thread. | `[]` | ### Returns @@ -46,7 +62,7 @@ from openai import AzureOpenAI client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -57,7 +73,7 @@ print(empty_thread) # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' \ -d '' @@ -68,7 +84,7 @@ curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads?api-version=2024 ## Retrieve thread ```http -GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}?api-version=2024-05-01-preview +GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}?api-version=2024-08-01-preview ``` Retrieves a thread. @@ -93,7 +109,7 @@ from openai import AzureOpenAI client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -104,7 +120,7 @@ print(my_thread) # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' ``` @@ -114,7 +130,7 @@ curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}?api- ## Modify thread ```http -POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}?api-version=2024-05-01-preview +POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}?api-version=2024-08-01-preview ``` Modifies a thread. @@ -129,7 +145,8 @@ Modifies a thread. |Name | Type | Required | Description | |--- |--- |--- |--- | -| metadata| map | Optional | Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.| +| `metadata` | map | Optional | Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.| +| `tool_resources` | [object](#tool_resources-properties) | Optional | A set of resources that are made available to the assistant's tools in this thread. The resources are specific to the type of tool. For example, the `code_interpreter` tool requires a list of file IDs, while the `file_search` tool requires a list of vector store IDs. | ### Returns @@ -144,7 +161,7 @@ from openai import AzureOpenAI client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -161,7 +178,7 @@ print(my_updated_thread) # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/threads/{thread_id}?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' \ -d '{ diff --git a/articles/ai-services/openai/assistants-reference.md b/articles/ai-services/openai/assistants-reference.md index e731c8bfdf..077596d72a 100644 --- a/articles/ai-services/openai/assistants-reference.md +++ b/articles/ai-services/openai/assistants-reference.md @@ -5,9 +5,9 @@ description: Learn how to use Azure OpenAI's Python & REST API with Assistants. manager: nitinme ms.service: azure-ai-openai ms.topic: conceptual -ms.date: 07/25/2024 -author: mrbullwinkle -ms.author: mbullwin +ms.date: 09/17/2024 +author: aahill +ms.author: aahi recommendations: false ms.custom: devx-track-python --- @@ -36,11 +36,11 @@ Create an assistant with a model and instructions. | description| string or null | Optional | The description of the assistant. The maximum length is 512 characters.| | instructions | string or null | Optional | The system instructions that the assistant uses. The maximum length is 256,000 characters.| | tools | array | Optional | Defaults to []. A list of tools enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can currently be of types `code_interpreter`, or `function`. A `function` description can be a maximum of 1,024 characters. | -| file_ids | array | Optional | Defaults to []. A list of file IDs attached to this assistant. There can be a maximum of 20 files attached to the assistant. Files are ordered by their creation date in ascending order.| | metadata | map | Optional | Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.| | temperature | number or null | Optional | Defaults to 1. Determines what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. | | top_p | number or null | Optional | Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. | | response_format | string or object | Optional | Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting this parameter to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON. Importantly, when using JSON mode, you must also instruct the model to produce JSON yourself using a system or user message. Without this instruction, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Additionally, the message content may be partially cut off if you use `finish_reason="length"`, which indicates the generation exceeded `max_tokens` or the conversation exceeded the max context length. | +| tool_resources | object | Optional | A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the `code_interpreter` tool requires a list of file IDs, while the `file_search` tool requires a list of vector store IDs. | ### Returns @@ -55,7 +55,7 @@ from openai import AzureOpenAI client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -69,7 +69,7 @@ assistant = client.beta.assistants.create( # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' \ -d '{ @@ -113,7 +113,7 @@ from openai import AzureOpenAI client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -128,7 +128,7 @@ print(my_assistants.data) # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' ``` @@ -139,7 +139,7 @@ curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants?api-version=2 ## Retrieve assistant ```http -GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant_id}?api-version=2024-05-01-preview +GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant_id}?api-version=2024-08-01-preview ``` Retrieves an assistant. @@ -161,7 +161,7 @@ The [assistant](#assistant-object) object matching the specified ID. ```python client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -172,7 +172,7 @@ print(my_assistant) # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant-id}?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant-id}?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' ``` @@ -182,7 +182,7 @@ curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant-id ## Modify assistant ```http -POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant_id}?api-version=2024-05-01-preview +POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant_id}?api-version=2024-08-01-preview ``` Modifies an assistant. @@ -202,8 +202,11 @@ Modifies an assistant. | `description` | string or null | Optional | The description of the assistant. The maximum length is 512 characters. | | `instructions` | string or null | Optional | The system instructions that the assistant uses. The maximum length is 32768 characters. | | `tools` | array | Optional | Defaults to []. A list of tools enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can be of types code_interpreter, or function. A `function` description can be a maximum of 1,024 characters. | -| `file_ids` | array | Optional | Defaults to []. A list of File IDs attached to this assistant. There can be a maximum of 20 files attached to the assistant. Files are ordered by their creation date in ascending order. If a file was previously attached to the list but does not show up in the list, it will be deleted from the assistant. | | `metadata` | map | Optional | Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long. | +| `temperature` | number or null | Optional | Defaults to 1. Determines what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. | +| `top_p` | number or null | Optional | Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. | +| `response_format` | string or object | Optional | Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting this parameter to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON. Importantly, when using JSON mode, you must also instruct the model to produce JSON yourself using a system or user message. Without this instruction, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Additionally, the message content may be partially cut off if you use `finish_reason="length"`, which indicates the generation exceeded `max_tokens` or the conversation exceeded the max context length. | +| `tool_resources` | object | Optional | A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the `code_interpreter` tool requires a list of file IDs, while the `file_search` tool requires a list of vector store IDs. | **Returns** @@ -216,7 +219,7 @@ The modified [assistant object](#assistant-object). ```python client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -226,7 +229,6 @@ my_updated_assistant = client.beta.assistants.update( name="HR Helper", tools=[{"type": "code-interpreter"}], model="gpt-4", #model = model deployment name - file_ids=["assistant-abc123", "assistant-abc456"], ) print(my_updated_assistant) @@ -235,14 +237,13 @@ print(my_updated_assistant) # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant-id}?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant-id}?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' \ -d '{ "instructions": "You are an HR bot, and you have access to files to answer employee questions about company policies. Always response with info from either of the files.", "tools": [{"type": "code-interpreter"}], - "model": "gpt-4", - "file_ids": ["assistant-abc123", "assistant-abc456"] + "model": "gpt-4" }' ``` @@ -251,7 +252,7 @@ curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant-id ## Delete assistant ```http -DELETE https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant_id}?api-version=2024-05-01-preview +DELETE https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant_id}?api-version=2024-08-01-preview ``` Delete an assistant. @@ -273,7 +274,7 @@ Deletion status. ```python client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), - api_version="2024-05-01-preview", + api_version="2024-08-01-preview", azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") ) @@ -284,7 +285,7 @@ print(response) # [REST](#tab/rest) ```console -curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant-id}?api-version=2024-05-01-preview \ +curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant-id}?api-version=2024-08-01-preview \ -H "api-key: $AZURE_OPENAI_API_KEY" \ -H 'Content-Type: application/json' \ -X DELETE @@ -309,5 +310,8 @@ Assistants use the [same API for file upload as fine-tuning](/rest/api/azureopen | `model` | string | Name of the model deployment name to use.| | `instructions` | string or null | The system instructions that the assistant uses. The maximum length is 32768 characters.| | `tools` | array | A list of tool enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can be of types code_interpreter, or function. A `function` description can be a maximum of 1,024 characters.| -| `file_ids` | array | A list of file IDs attached to this assistant. There can be a maximum of 20 files attached to the assistant. Files are ordered by their creation date in ascending order.| | `metadata` | map | Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.| +| `temperature` | number or null | Defaults to 1. Determines what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. | +| `top_p` | number or null | Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. | +| `response_format` | string or object | Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting this parameter to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON. Importantly, when using JSON mode, you must also instruct the model to produce JSON yourself using a system or user message. Without this instruction, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Additionally, the message content may be partially cut off if you use `finish_reason="length"`, which indicates the generation exceeded `max_tokens` or the conversation exceeded the max context length. | +| `tool_resources` | object | A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the `code_interpreter` tool requires a list of file IDs, while the `file_search` tool requires a list of vector store IDs. | diff --git a/articles/ai-services/openai/concepts/use-your-data.md b/articles/ai-services/openai/concepts/use-your-data.md index fb14c6a740..61161d5387 100644 --- a/articles/ai-services/openai/concepts/use-your-data.md +++ b/articles/ai-services/openai/concepts/use-your-data.md @@ -21,12 +21,25 @@ Use this article to learn about Azure OpenAI On Your Data, which makes it easier Azure OpenAI On Your Data enables you to run advanced AI models such as GPT-35-Turbo and GPT-4 on your own enterprise data without needing to train or fine-tune models. You can chat on top of and analyze your data with greater accuracy. You can specify sources to support the responses based on the latest information available in your designated data sources. You can access Azure OpenAI On Your Data using a REST API, via the SDK or the web-based interface in the [Azure OpenAI Studio](https://oai.azure.com/). You can also create a web app that connects to your data to enable an enhanced chat solution or deploy it directly as a copilot in the Copilot Studio (preview). -## Get started +## Developing with Azure OpenAI On Your Data -To get started, [connect your data source](../use-your-data-quickstart.md) using Azure OpenAI Studio and start asking questions and chatting on your data. +:::image type="content" source="../media/use-your-data/workflow-diagram.png" alt-text="A diagram showing an example workflow."::: -> [!NOTE] -> To get started, you need to already have been approved for [Azure OpenAI access](../overview.md#how-do-i-get-access-to-azure-openai) and have an [Azure OpenAI Service resource](../how-to/create-resource.md) deployed in a [supported region](#regional-availability-and-model-support) with either the gpt-35-turbo or the gpt-4 models. +Typically, the development process you'd use with Azure OpenAI On Your Data is: +1. **Ingest**: Upload files using either Azure OpenAI Studio or the ingestion API. This enables your data to be cracked, chunked and embedded into an Azure AI Search instance that can be used by Azure Open AI models. If you have an existing [supported data source](#supported-data-sources), you can also connect it directly. + +1. **Develop**: After trying Azure OpenAI On Your Data, begin developing your application using the available REST API and SDKs, which are available in several languages. It will create prompts and search intents to pass to the Azure OpenAI service. + +1. **Inference**: After your application is deployed in your preferred environment, it will send prompts to Azure OpenAI, which will perform several steps before returning a response: + 1. **Intent generation**: The service will determine the intent of the user's prompt to determine a proper response. + + 1. **Retrieval**: The service retrieves relevant chunks of available data from the connected data source by querying it. For example by using a semantic or vector search. [Parameters](#runtime-parameters) such as strictness and number of documents to retreive are utilized to influence the retrieval. + + 1. **Filtration and reranking**: Search results from the retrieval step are improved by ranking and filtering data to refine relevance. + + 1. **Response generation**: The resulting data is submitted along with other information like the system message to the Large Language Model (LLM) and the response is sent back to the application. + +To get started, [connect your data source](../use-your-data-quickstart.md) using Azure OpenAI Studio and start asking questions and chatting on your data. ## Azure Role-based access controls (Azure RBAC) for adding data sources @@ -184,7 +197,6 @@ To keep your Azure AI Search index up-to-date with your latest data, you can sch After the data ingestion is set to a cadence other than once, Azure AI Search indexers will be created with a schedule equivalent to `0.5 * the cadence specified`. This means that at the specified cadence, the indexers will pull, reprocess, and index the documents that were added or modified from the storage container. This process ensures that the updated data gets preprocessed and indexed in the final index at the desired cadence automatically. To update your data, you only need to upload the additional documents from the Azure portal. From the portal, select **Storage Account** > **Containers**. Select the name of the original container, then **Upload**. The index will pick up the files automatically after the scheduled refresh period. The intermediate assets created in the Azure AI Search resource won't be cleaned up after ingestion to allow for future runs. These assets are: - `{Index Name}-index` - `{Index Name}-indexer` - - `{Index Name}-indexer-chunk` - `{Index Name}-datasource` - `{Index Name}-skillset` diff --git a/articles/ai-services/openai/media/use-your-data/workflow-diagram.png b/articles/ai-services/openai/media/use-your-data/workflow-diagram.png new file mode 100644 index 0000000000..c1731928b9 Binary files /dev/null and b/articles/ai-services/openai/media/use-your-data/workflow-diagram.png differ diff --git a/articles/ai-studio/how-to/deploy-models-cohere-command.md b/articles/ai-studio/how-to/deploy-models-cohere-command.md index 6a7ec5cea5..0989b9fff7 100644 --- a/articles/ai-studio/how-to/deploy-models-cohere-command.md +++ b/articles/ai-studio/how-to/deploy-models-cohere-command.md @@ -5,7 +5,7 @@ description: Learn how to use Cohere Command chat models with Azure AI Studio. ms.service: azure-ai-studio manager: scottpolly ms.topic: how-to -ms.date: 08/08/2024 +ms.date: 09/23/2024 ms.reviewer: shubhiraj reviewer: shubhirajMsft ms.author: mopeakande @@ -29,14 +29,52 @@ The Cohere family of models includes various models optimized for different use The Cohere Command chat models include the following models: +# [Cohere Command R+ 08-2024](#tab/cohere-command-r-plus-08-2024) + +Command R+ 08-2024 is a generative large language model optimized for various use cases, including reasoning, summarization, and question answering. + +* **Model Architecture**: Command R+ 08-2024 is an autoregressive language model that uses an optimized transformer architecture. After pre-training, the model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* **Languages covered**: The mode is optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* **Pre-training data also included the following 13 languages:** Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. +* **Context length:** Command R+ 08-2024 supports a context length of 128 K. +* **Input:** Text only. +* **Output:** Text only. + +We recommend using Command R+ 08-2024 for those workflows that lean on complex retrieval augmented generation (RAG) functionality, multi-step tool use (agents), and structured outputs. + + +The following models are available: + +* [Cohere-command-r-plus-08-2024](https://aka.ms/azureai/landing/Cohere-command-r-plus-08-2024) + + +# [Cohere Command R 08-2024](#tab/cohere-command-r-08-2024) + +Command R 08-2024 is a large language model optimized for various use cases, including reasoning, summarization, and question answering. + +* **Model Architecture**: Command R 08-2024 is an autoregressive language model that uses an optimized transformer architecture. After pre-training, the model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* **Languages covered**: The model is optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* **Pre-training data also included the following 13 languages:** Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. +* **Context length:** Command R 08-2024 supports a context length of 128 K. +* **Input:** Text only. +* **Output:** Text only. + + +The following models are available: + +* [Cohere-command-r-08-2024](https://aka.ms/azureai/landing/Cohere-command-r-08-2024) + + # [Cohere Command R+](#tab/cohere-command-r-plus) Command R+ is a generative large language model optimized for various use cases, including reasoning, summarization, and question answering. -* **Model Architecture**: Both Command R and Command R+ are autoregressive language models that use an optimized transformer architecture. After pre-training, the models use supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. -* **Languages covered**: The models are optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* **Model Architecture**: Command R+ is an autoregressive language model that uses an optimized transformer architecture. After pre-training, the model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* **Languages covered**: The model is optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. * **Pre-training data also included the following 13 languages:** Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. -* **Context length:** Command R and Command R+ support a context length of 128 K. +* **Context length:** Command R+ supports a context length of 128 K. +* **Input:** Text only. +* **Output:** Text only. We recommend using Command R+ for those workflows that lean on complex retrieval augmented generation (RAG) functionality and multi-step tool use (agents). @@ -50,10 +88,10 @@ The following models are available: Command R is a large language model optimized for various use cases, including reasoning, summarization, and question answering. -* **Model Architecture**: Both Command R and Command R+ are autoregressive language models that use an optimized transformer architecture. After pre-training, the models use supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. -* **Languages covered**: The models are optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* **Model Architecture**: Command R is an autoregressive language model that uses an optimized transformer architecture. After pre-training, the model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* **Languages covered**: The model is optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. * **Pre-training data also included the following 13 languages:** Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. -* **Context length:** Command R and Command R+ support a context length of 128 K. +* **Context length:** Command R supports a context length of 128 K. Command R is great for simpler retrieval augmented generation (RAG) and single-step tool use tasks. It's also great for use in applications where price is a major consideration. @@ -141,7 +179,7 @@ print("Model provider name:", model_info.model_provider_name) ``` ```console -Model name: Cohere-command-r-plus +Model name: Cohere-command-r-plus-08-2024 Model type: chat-completions Model provider name: Cohere ``` @@ -175,7 +213,7 @@ print("\tCompletion tokens:", response.usage.completion_tokens) ```console Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. -Model: Cohere-command-r-plus +Model: Cohere-command-r-plus-08-2024 Usage: Prompt tokens: 19 Total tokens: 91 @@ -244,7 +282,7 @@ response = client.complete( stop=["<|endoftext|>"], temperature=0, top_p=1, - response_format=ChatCompletionsResponseFormatText(), + response_format={ "type": ChatCompletionsResponseFormatText() }, ) ``` @@ -264,7 +302,7 @@ response = client.complete( " the following format: { ""answer"": ""response"" }."), UserMessage(content="How many languages are in the world?"), ], - response_format=ChatCompletionsResponseFormatJSON() + response_format={ "type": ChatCompletionsResponseFormatJSON() } ) ``` @@ -332,7 +370,7 @@ def get_flight_info(loc_origin: str, loc_destination: str): ``` > [!NOTE] -> Cohere-command-r-plus and Cohere-command-r require a tool's responses to be a valid JSON content formatted as a string. When constructing messages of type *Tool*, ensure the response is a valid JSON string. +> Cohere-command-r-plus-08-2024, Cohere-command-r-08-2024, Cohere-command-r-plus, and Cohere-command-r require a tool's responses to be a valid JSON content formatted as a string. When constructing messages of type *Tool*, ensure the response is a valid JSON string. Prompt the model to book flights with the help of this function: @@ -464,14 +502,52 @@ except HttpResponseError as ex: The Cohere Command chat models include the following models: +# [Cohere Command R+ 08-2024](#tab/cohere-command-r-plus-08-2024) + +Command R+ 08-2024 is a generative large language model optimized for various use cases, including reasoning, summarization, and question answering. + +* **Model Architecture**: Command R+ 08-2024 is an autoregressive language model that uses an optimized transformer architecture. After pre-training, the model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* **Languages covered**: The mode is optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* **Pre-training data also included the following 13 languages:** Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. +* **Context length:** Command R+ 08-2024 supports a context length of 128 K. +* **Input:** Text only. +* **Output:** Text only. + +We recommend using Command R+ 08-2024 for those workflows that lean on complex retrieval augmented generation (RAG) functionality, multi-step tool use (agents), and structured outputs. + + +The following models are available: + +* [Cohere-command-r-plus-08-2024](https://aka.ms/azureai/landing/Cohere-command-r-plus-08-2024) + + +# [Cohere Command R 08-2024](#tab/cohere-command-r-08-2024) + +Command R 08-2024 is a large language model optimized for various use cases, including reasoning, summarization, and question answering. + +* **Model Architecture**: Command R 08-2024 is an autoregressive language model that uses an optimized transformer architecture. After pre-training, the model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* **Languages covered**: The model is optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* **Pre-training data also included the following 13 languages:** Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. +* **Context length:** Command R 08-2024 supports a context length of 128 K. +* **Input:** Text only. +* **Output:** Text only. + + +The following models are available: + +* [Cohere-command-r-08-2024](https://aka.ms/azureai/landing/Cohere-command-r-08-2024) + + # [Cohere Command R+](#tab/cohere-command-r-plus) Command R+ is a generative large language model optimized for various use cases, including reasoning, summarization, and question answering. -* **Model Architecture**: Both Command R and Command R+ are autoregressive language models that use an optimized transformer architecture. After pre-training, the models use supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. -* **Languages covered**: The models are optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* **Model Architecture**: Command R+ is an autoregressive language model that uses an optimized transformer architecture. After pre-training, the model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* **Languages covered**: The model is optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. * **Pre-training data also included the following 13 languages:** Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. -* **Context length:** Command R and Command R+ support a context length of 128 K. +* **Context length:** Command R+ supports a context length of 128 K. +* **Input:** Text only. +* **Output:** Text only. We recommend using Command R+ for those workflows that lean on complex retrieval augmented generation (RAG) functionality and multi-step tool use (agents). @@ -485,10 +561,10 @@ The following models are available: Command R is a large language model optimized for various use cases, including reasoning, summarization, and question answering. -* **Model Architecture**: Both Command R and Command R+ are autoregressive language models that use an optimized transformer architecture. After pre-training, the models use supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. -* **Languages covered**: The models are optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* **Model Architecture**: Command R is an autoregressive language model that uses an optimized transformer architecture. After pre-training, the model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* **Languages covered**: The model is optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. * **Pre-training data also included the following 13 languages:** Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. -* **Context length:** Command R and Command R+ support a context length of 128 K. +* **Context length:** Command R supports a context length of 128 K. Command R is great for simpler retrieval augmented generation (RAG) and single-step tool use tasks. It's also great for use in applications where price is a major consideration. @@ -574,7 +650,7 @@ console.log("Model provider name: ", model_info.body.model_provider_name) ``` ```console -Model name: Cohere-command-r-plus +Model name: Cohere-command-r-plus-08-2024 Model type: chat-completions Model provider name: Cohere ``` @@ -614,7 +690,7 @@ console.log("\tCompletion tokens:", response.body.usage.completion_tokens); ```console Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. -Model: Cohere-command-r-plus +Model: Cohere-command-r-plus-08-2024 Usage: Prompt tokens: 19 Total tokens: 91 @@ -788,7 +864,7 @@ function get_flight_info(loc_origin, loc_destination) { ``` > [!NOTE] -> Cohere-command-r-plus and Cohere-command-r require a tool's responses to be a valid JSON content formatted as a string. When constructing messages of type *Tool*, ensure the response is a valid JSON string. +> Cohere-command-r-plus-08-2024, Cohere-command-r-08-2024, Cohere-command-r-plus, and Cohere-command-r require a tool's responses to be a valid JSON content formatted as a string. When constructing messages of type *Tool*, ensure the response is a valid JSON string. Prompt the model to book flights with the help of this function: @@ -913,14 +989,52 @@ catch (error) { The Cohere Command chat models include the following models: +# [Cohere Command R+ 08-2024](#tab/cohere-command-r-plus-08-2024) + +Command R+ 08-2024 is a generative large language model optimized for various use cases, including reasoning, summarization, and question answering. + +* **Model Architecture**: Command R+ 08-2024 is an autoregressive language model that uses an optimized transformer architecture. After pre-training, the model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* **Languages covered**: The mode is optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* **Pre-training data also included the following 13 languages:** Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. +* **Context length:** Command R+ 08-2024 supports a context length of 128 K. +* **Input:** Text only. +* **Output:** Text only. + +We recommend using Command R+ 08-2024 for those workflows that lean on complex retrieval augmented generation (RAG) functionality, multi-step tool use (agents), and structured outputs. + + +The following models are available: + +* [Cohere-command-r-plus-08-2024](https://aka.ms/azureai/landing/Cohere-command-r-plus-08-2024) + + +# [Cohere Command R 08-2024](#tab/cohere-command-r-08-2024) + +Command R 08-2024 is a large language model optimized for various use cases, including reasoning, summarization, and question answering. + +* **Model Architecture**: Command R 08-2024 is an autoregressive language model that uses an optimized transformer architecture. After pre-training, the model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* **Languages covered**: The model is optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* **Pre-training data also included the following 13 languages:** Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. +* **Context length:** Command R 08-2024 supports a context length of 128 K. +* **Input:** Text only. +* **Output:** Text only. + + +The following models are available: + +* [Cohere-command-r-08-2024](https://aka.ms/azureai/landing/Cohere-command-r-08-2024) + + # [Cohere Command R+](#tab/cohere-command-r-plus) Command R+ is a generative large language model optimized for various use cases, including reasoning, summarization, and question answering. -* **Model Architecture**: Both Command R and Command R+ are autoregressive language models that use an optimized transformer architecture. After pre-training, the models use supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. -* **Languages covered**: The models are optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* **Model Architecture**: Command R+ is an autoregressive language model that uses an optimized transformer architecture. After pre-training, the model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* **Languages covered**: The model is optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. * **Pre-training data also included the following 13 languages:** Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. -* **Context length:** Command R and Command R+ support a context length of 128 K. +* **Context length:** Command R+ supports a context length of 128 K. +* **Input:** Text only. +* **Output:** Text only. We recommend using Command R+ for those workflows that lean on complex retrieval augmented generation (RAG) functionality and multi-step tool use (agents). @@ -934,10 +1048,10 @@ The following models are available: Command R is a large language model optimized for various use cases, including reasoning, summarization, and question answering. -* **Model Architecture**: Both Command R and Command R+ are autoregressive language models that use an optimized transformer architecture. After pre-training, the models use supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. -* **Languages covered**: The models are optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* **Model Architecture**: Command R is an autoregressive language model that uses an optimized transformer architecture. After pre-training, the model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* **Languages covered**: The model is optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. * **Pre-training data also included the following 13 languages:** Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. -* **Context length:** Command R and Command R+ support a context length of 128 K. +* **Context length:** Command R supports a context length of 128 K. Command R is great for simpler retrieval augmented generation (RAG) and single-step tool use tasks. It's also great for use in applications where price is a major consideration. @@ -969,7 +1083,7 @@ Deployment to a serverless API endpoint doesn't require quota from your subscrip ### The inference package installed -You can consume predictions from this model by using the `Azure.AI.Inference` package from [Nuget](https://www.nuget.org/). To install this package, you need the following prerequisites: +You can consume predictions from this model by using the `Azure.AI.Inference` package from [NuGet](https://www.nuget.org/). To install this package, you need the following prerequisites: * The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). * Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. @@ -995,7 +1109,7 @@ using Azure.Identity; using Azure.AI.Inference; ``` -This example also use the following namespaces but you may not always need them: +This example also uses the following namespaces but you may not always need them: ```csharp @@ -1042,7 +1156,7 @@ Console.WriteLine($"Model provider name: {modelInfo.Value.ModelProviderName}"); ``` ```console -Model name: Cohere-command-r-plus +Model name: Cohere-command-r-plus-08-2024 Model type: chat-completions Model provider name: Cohere ``` @@ -1077,7 +1191,7 @@ Console.WriteLine($"\tCompletion tokens: {response.Value.Usage.CompletionTokens} ```console Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. -Model: Cohere-command-r-plus +Model: Cohere-command-r-plus-08-2024 Usage: Prompt tokens: 19 Total tokens: 91 @@ -1260,7 +1374,7 @@ static string getFlightInfo(string loc_origin, string loc_destination) ``` > [!NOTE] -> Cohere-command-r-plus and Cohere-command-r require a tool's responses to be a valid JSON content formatted as a string. When constructing messages of type *Tool*, ensure the response is a valid JSON string. +> Cohere-command-r-plus-08-2024, Cohere-command-r-08-2024, Cohere-command-r-plus, and Cohere-command-r require a tool's responses to be a valid JSON content formatted as a string. When constructing messages of type *Tool*, ensure the response is a valid JSON string. Prompt the model to book flights with the help of this function: @@ -1387,14 +1501,52 @@ catch (RequestFailedException ex) The Cohere Command chat models include the following models: +# [Cohere Command R+ 08-2024](#tab/cohere-command-r-plus-08-2024) + +Command R+ 08-2024 is a generative large language model optimized for various use cases, including reasoning, summarization, and question answering. + +* **Model Architecture**: Command R+ 08-2024 is an autoregressive language model that uses an optimized transformer architecture. After pre-training, the model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* **Languages covered**: The mode is optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* **Pre-training data also included the following 13 languages:** Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. +* **Context length:** Command R+ 08-2024 supports a context length of 128 K. +* **Input:** Text only. +* **Output:** Text only. + +We recommend using Command R+ 08-2024 for those workflows that lean on complex retrieval augmented generation (RAG) functionality, multi-step tool use (agents), and structured outputs. + + +The following models are available: + +* [Cohere-command-r-plus-08-2024](https://aka.ms/azureai/landing/Cohere-command-r-plus-08-2024) + + +# [Cohere Command R 08-2024](#tab/cohere-command-r-08-2024) + +Command R 08-2024 is a large language model optimized for various use cases, including reasoning, summarization, and question answering. + +* **Model Architecture**: Command R 08-2024 is an autoregressive language model that uses an optimized transformer architecture. After pre-training, the model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* **Languages covered**: The model is optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* **Pre-training data also included the following 13 languages:** Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. +* **Context length:** Command R 08-2024 supports a context length of 128 K. +* **Input:** Text only. +* **Output:** Text only. + + +The following models are available: + +* [Cohere-command-r-08-2024](https://aka.ms/azureai/landing/Cohere-command-r-08-2024) + + # [Cohere Command R+](#tab/cohere-command-r-plus) Command R+ is a generative large language model optimized for various use cases, including reasoning, summarization, and question answering. -* **Model Architecture**: Both Command R and Command R+ are autoregressive language models that use an optimized transformer architecture. After pre-training, the models use supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. -* **Languages covered**: The models are optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* **Model Architecture**: Command R+ is an autoregressive language model that uses an optimized transformer architecture. After pre-training, the model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* **Languages covered**: The model is optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. * **Pre-training data also included the following 13 languages:** Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. -* **Context length:** Command R and Command R+ support a context length of 128 K. +* **Context length:** Command R+ supports a context length of 128 K. +* **Input:** Text only. +* **Output:** Text only. We recommend using Command R+ for those workflows that lean on complex retrieval augmented generation (RAG) functionality and multi-step tool use (agents). @@ -1408,10 +1560,10 @@ The following models are available: Command R is a large language model optimized for various use cases, including reasoning, summarization, and question answering. -* **Model Architecture**: Both Command R and Command R+ are autoregressive language models that use an optimized transformer architecture. After pre-training, the models use supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. -* **Languages covered**: The models are optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* **Model Architecture**: Command R is an autoregressive language model that uses an optimized transformer architecture. After pre-training, the model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* **Languages covered**: The model is optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. * **Pre-training data also included the following 13 languages:** Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. -* **Context length:** Command R and Command R+ support a context length of 128 K. +* **Context length:** Command R supports a context length of 128 K. Command R is great for simpler retrieval augmented generation (RAG) and single-step tool use tasks. It's also great for use in applications where price is a major consideration. @@ -1475,7 +1627,7 @@ The response is as follows: ```json { - "model_name": "Cohere-command-r-plus", + "model_name": "Cohere-command-r-plus-08-2024", "model_type": "chat-completions", "model_provider_name": "Cohere" } @@ -1508,7 +1660,7 @@ The response is as follows, where you can see the model's usage statistics: "id": "0a1234b5de6789f01gh2i345j6789klm", "object": "chat.completion", "created": 1718726686, - "model": "Cohere-command-r-plus", + "model": "Cohere-command-r-plus-08-2024", "choices": [ { "index": 0, @@ -1565,7 +1717,7 @@ You can visualize how streaming generates content: "id": "23b54589eba14564ad8a2e6978775a39", "object": "chat.completion.chunk", "created": 1718726371, - "model": "Cohere-command-r-plus", + "model": "Cohere-command-r-plus-08-2024", "choices": [ { "index": 0, @@ -1588,7 +1740,7 @@ The last message in the stream has `finish_reason` set, indicating the reason fo "id": "23b54589eba14564ad8a2e6978775a39", "object": "chat.completion.chunk", "created": 1718726371, - "model": "Cohere-command-r-plus", + "model": "Cohere-command-r-plus-08-2024", "choices": [ { "index": 0, @@ -1639,7 +1791,7 @@ Explore other parameters that you can specify in the inference client. For a ful "id": "0a1234b5de6789f01gh2i345j6789klm", "object": "chat.completion", "created": 1718726686, - "model": "Cohere-command-r-plus", + "model": "Cohere-command-r-plus-08-2024", "choices": [ { "index": 0, @@ -1689,7 +1841,7 @@ Cohere Command chat models can create JSON outputs. Set `response_format` to `js "id": "0a1234b5de6789f01gh2i345j6789klm", "object": "chat.completion", "created": 1718727522, - "model": "Cohere-command-r-plus", + "model": "Cohere-command-r-plus-08-2024", "choices": [ { "index": 0, @@ -1778,7 +1930,7 @@ The following code example creates a tool definition that is able to look from f In this example, the function's output is that there are no flights available for the selected route, but the user should consider taking a train. > [!NOTE] -> Cohere-command-r-plus and Cohere-command-r require a tool's responses to be a valid JSON content formatted as a string. When constructing messages of type *Tool*, ensure the response is a valid JSON string. +> Cohere-command-r-plus-08-2024, Cohere-command-r-08-2024, Cohere-command-r-plus, and Cohere-command-r require a tool's responses to be a valid JSON content formatted as a string. When constructing messages of type *Tool*, ensure the response is a valid JSON string. Prompt the model to book flights with the help of this function: @@ -1833,7 +1985,7 @@ You can inspect the response to find out if a tool needs to be called. Inspect t "id": "0a1234b5de6789f01gh2i345j6789klm", "object": "chat.completion", "created": 1718726007, - "model": "Cohere-command-r-plus", + "model": "Cohere-command-r-plus-08-2024", "choices": [ { "index": 0, @@ -1972,7 +2124,7 @@ The following example shows how to handle events when the model detects harmful ## More inference examples -For more examples of how to use Cohere, see the following examples and tutorials: +For more examples of how to use Cohere models, see the following examples and tutorials: | Description | Language | Sample | |-------------------------------------------|-------------------|-----------------------------------------------------------------| @@ -1995,7 +2147,7 @@ For more examples of how to use Cohere, see the following examples and tutorials | Command R+ tool/function calling, using LangChain | `cohere`, `langchain`, `langchain_cohere` | [command_tools-langchain.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/command_tools-langchain.ipynb) | -## Cost and quota considerations for Cohere family of models deployed as serverless API endpoints +## Cost and quota considerations for Cohere models deployed as serverless API endpoints Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios. @@ -2012,4 +2164,4 @@ For more information on how to track costs, see [Monitor costs for models offere * [Deploy models as serverless APIs](deploy-models-serverless.md) * [Consume serverless API endpoints from a different Azure AI Studio project or hub](deploy-models-serverless-connect.md) * [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md) -* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace) \ No newline at end of file +* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace) diff --git a/articles/ai-studio/how-to/model-catalog-overview.md b/articles/ai-studio/how-to/model-catalog-overview.md index 8e362064a6..7e5e1fe8fd 100644 --- a/articles/ai-studio/how-to/model-catalog-overview.md +++ b/articles/ai-studio/how-to/model-catalog-overview.md @@ -66,7 +66,7 @@ Model | Managed compute | Serverless API (pay-as-you-go) --|--|-- Llama family models | Llama-2-7b
Llama-2-7b-chat
Llama-2-13b
Llama-2-13b-chat
Llama-2-70b
Llama-2-70b-chat
Llama-3-8B-Instruct
Llama-3-70B-Instruct
Llama-3-8B
Llama-3-70B | Llama-3-70B-Instruct
Llama-3-8B-Instruct
Llama-2-7b
Llama-2-7b-chat
Llama-2-13b
Llama-2-13b-chat
Llama-2-70b
Llama-2-70b-chat Mistral family models | mistralai-Mixtral-8x22B-v0-1
mistralai-Mixtral-8x22B-Instruct-v0-1
mistral-community-Mixtral-8x22B-v0-1
mistralai-Mixtral-8x7B-v01
mistralai-Mistral-7B-Instruct-v0-2
mistralai-Mistral-7B-v01
mistralai-Mixtral-8x7B-Instruct-v01
mistralai-Mistral-7B-Instruct-v01 | Mistral-large (2402)
Mistral-large (2407)
Mistral-small
Mistral-NeMo -Cohere family models | Not available | Cohere-command-r-plus
Cohere-command-r
Cohere-embed-v3-english
Cohere-embed-v3-multilingual
Cohere-rerank-v3-english
Cohere-rerank-v3-multilingual +Cohere family models | Not available | Cohere-command-r-plus-08-2024
Cohere-command-r-08-2024
Cohere-command-r-plus
Cohere-command-r
Cohere-embed-v3-english
Cohere-embed-v3-multilingual
Cohere-rerank-v3-english
Cohere-rerank-v3-multilingual JAIS | Not available | jais-30b-chat Phi-3 family models | Phi-3-mini-4k-Instruct
Phi-3-mini-128k-Instruct
Phi-3-small-8k-Instruct
Phi-3-small-128k-Instruct
Phi-3-medium-4k-instruct
Phi-3-medium-128k-instruct
Phi-3-vision-128k-Instruct
Phi-3.5-mini-Instruct
Phi-3.5-vision-Instruct
Phi-3.5-MoE-Instruct | Phi-3-mini-4k-Instruct
Phi-3-mini-128k-Instruct
Phi-3-small-8k-Instruct
Phi-3-small-128k-Instruct
Phi-3-medium-4k-instruct
Phi-3-medium-128k-instruct

Phi-3.5-mini-Instruct
Phi-3.5-vision-Instruct
Phi-3.5-MoE-Instruct Nixtla | Not available | TimeGEN-1 diff --git a/articles/ai-studio/includes/region-availability-maas.md b/articles/ai-studio/includes/region-availability-maas.md index 366dc01e0a..94df325a2e 100644 --- a/articles/ai-studio/includes/region-availability-maas.md +++ b/articles/ai-studio/includes/region-availability-maas.md @@ -15,8 +15,10 @@ ms.custom: include, references_regions |Model |Offer Availability Region | Hub/Project Region for Deployment | Hub/Project Region for Fine tuning | |---------|---------|---------|---------| -Cohere Command R | [Microsoft Managed Countries](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions)
Japan
Qatar | East US
East US 2
North Central US
South Central US
Sweden Central
West US
West US 3 | Not available | +Cohere Command R+ 08-2024 | [Microsoft Managed Countries](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions) |East US
East US 2
North Central US
South Central US
Sweden Central
West US
West US 3 | Not available | +Cohere Command R 08-2024 | [Microsoft Managed Countries](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions) |East US
East US 2
North Central US
South Central US
Sweden Central
West US
West US 3 | Not available | Cohere Command R+ | [Microsoft Managed Countries](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions)
Japan
Qatar |East US
East US 2
North Central US
South Central US
Sweden Central
West US
West US 3 | Not available | +Cohere Command R | [Microsoft Managed Countries](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions)
Japan
Qatar | East US
East US 2
North Central US
South Central US
Sweden Central
West US
West US 3 | Not available | Cohere Rerank 3 - English | [Microsoft Managed Countries](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions)
Japan
Qatar | East US
East US 2
North Central US
South Central US
Sweden Central
West US
West US 3 | Not available | Cohere Rerank 3 - Multilingual | [Microsoft Managed Countries](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions)
Japan
Qatar | East US
East US 2
North Central US
South Central US
Sweden Central
West US
West US 3 | Not available | Cohere Embed 3 - English | [Microsoft Managed Countries](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions)
Japan
Qatar |East US
East US 2
North Central US
South Central US
Sweden Central
West US
West US 3 | Not available | diff --git a/articles/machine-learning/how-to-setup-vs-code.md b/articles/machine-learning/how-to-setup-vs-code.md index f436f4ebf1..3e7df5c67e 100644 --- a/articles/machine-learning/how-to-setup-vs-code.md +++ b/articles/machine-learning/how-to-setup-vs-code.md @@ -1,5 +1,5 @@ --- -title: Set up Visual Studio Code desktop with the Azure Machine Learning extension (preview) +title: Set up Visual Studio Code desktop with the Azure Machine Learning extension titleSuffix: Azure Machine Learning description: Learn how to set up the Azure Machine Learning Visual Studio Code extension. services: machine-learning @@ -14,7 +14,7 @@ ms.custom: devplatv2, build-2023 monikerRange: 'azureml-api-1 || azureml-api-2' --- -# Set up Visual Studio Code desktop with the Azure Machine Learning extension (preview) +# Set up Visual Studio Code desktop with the Azure Machine Learning extension Learn how to set up the Azure Machine Learning Visual Studio Code extension for your machine learning workflows. You only need to do this setup when using the VS Code desktop application. If you use VS Code for the Web, this is handled for you. @@ -26,8 +26,6 @@ The Azure Machine Learning extension for VS Code provides a user interface to: - Debug machine learning experiments locally - Schema-based language support, autocompletion and diagnostics for specification file authoring -[!INCLUDE [machine-learning-preview-generic-disclaimer](includes/machine-learning-preview-generic-disclaimer.md)] - ## Prerequisites - Azure subscription. If you don't have one, sign up to try the [free or paid version of Azure Machine Learning](https://azure.microsoft.com/free/).