Skip to content

Latest commit

 

History

History
245 lines (177 loc) · 10.9 KB

README.md

File metadata and controls

245 lines (177 loc) · 10.9 KB

Overview

The project implements AI DIAL API for language models from AWS Bedrock.

Supported models

Chat completion models

The following models support POST SERVER_URL/openai/deployments/DEPLOYMENT_NAME/chat/completions endpoint along with an optional support of POST /tokenize and POST /truncate_prompt endpoints:

Note that a model supports /truncate_prompt endpoint if and only if it supports max_prompt_tokens request parameter.

Vendor Model Deployment name Modality /tokenize /truncate_prompt, max_prompt_tokens tools/functions
Anthropic Claude 3.5 Sonnet [us.|eu.]anthropic.claude-3-5-sonnet-20240620-v1:0 (text/image)-to-text 🟡 🟡
Anthropic Claude 3.5 Sonnet 2.0 [us.]anthropic.claude-3-5-sonnet-20241022-v2:0 (text/image)-to-text 🟡 🟡
Anthropic Claude 3 Sonnet [us.|eu.]anthropic.claude-3-sonnet-20240229-v1:0 (text/image)-to-text 🟡 🟡
Anthropic Claude 3 Haiku [us.|eu.]anthropic.claude-3-haiku-20240307-v1:0 (text/image)-to-text 🟡 🟡
Anthropic Claude 3.5 Haiku [us.]anthropic.claude-3-5-haiku-20241022-v1:0 text-to-text 🟡 🟡
Anthropic Claude 3 Opus [us.]anthropic.claude-3-opus-20240229-v1:0 (text/image)-to-text 🟡 🟡
Anthropic Claude 2.1 anthropic.claude-v2:1 text-to-text
Anthropic Claude 2 anthropic.claude-v2 text-to-text
Anthropic Claude Instant 1.2 anthropic.claude-instant-v1 text-to-text 🟡 🟡
Meta Llama 3.2 90B Instruct us.meta.llama3-2-90b-instruct-v1:0 (text/image)-to-text 🟡 🟡
Meta Llama 3.2 11B Instruct us.meta.llama3-2-11b-instruct-v1:0 (text/image)-to-text 🟡 🟡
Meta Llama 3.2 3B Instruct us.meta.llama3-2-3b-instruct-v1:0 text-to-text 🟡 🟡
Meta Llama 3.2 1B Instruct us.meta.llama3-2-1b-instruct-v1:0 text-to-text 🟡 🟡
Meta Llama 3.1 405B Instruct meta.llama3-1-405b-instruct-v1:0 text-to-text 🟡 🟡
Meta Llama 3.1 70B Instruct meta.llama3-1-70b-instruct-v1:0 text-to-text 🟡 🟡
Meta Llama 3.1 8B Instruct meta.llama3-1-8b-instruct-v1:0 text-to-text 🟡 🟡
Meta Llama 3 Chat 70B Instruct meta.llama3-70b-instruct-v1:0 text-to-text 🟡 🟡
Meta Llama 3 Chat 8B Instruct meta.llama3-8b-instruct-v1:0 text-to-text 🟡 🟡
Stability AI SDXL 1.0 stability.stable-diffusion-xl-v1 text-to-image 🟡
Stability AI SD3 Large 1.0 stability.sd3-large-v1:0 (text/image)-to-image 🟡
Stability AI Stable Image Ultra 1.0 stability.stable-image-ultra-v1:0 text-to-image 🟡
Stability AI Stable Image Core 1.0 stability.stable-image-core-v1:0 text-to-image 🟡
Amazon Titan Text G1 - Express amazon.titan-tg1-large text-to-text 🟡 🟡
Amazon Nova Pro amazon.nova-pro-v1:0 (text/image/document)-to-text 🟡 🟡
Amazon Nova Lite amazon.nova-lite-v1:0 (text/image/document)-to-text 🟡 🟡
Amazon Nova Micro amazon.nova-micro-v1:0 text-to-text 🟡 🟡
AI21 Labs Jurassic-2 Ultra ai21.j2-jumbo-instruct text-to-text 🟡 🟡
AI21 Labs Jurassic-2 Ultra v1 ai21.j2-ultra-v1 text-to-text 🟡 🟡
AI21 Labs Jurassic-2 Mid ai21.j2-grande-instruct text-to-text 🟡 🟡
AI21 Labs Jurassic-2 Mid v1 ai21.j2-mid-v1 text-to-text 🟡 🟡
Cohere Command cohere.command-text-v14 text-to-text 🟡 🟡
Cohere Command Light cohere.command-light-text-v14 text-to-text 🟡 🟡

✅, 🟡, and ❌ denote degrees of support of the given feature:

/tokenize, /truncate_prompt, max_prompt_token tools/functions
Fully supported via an official tokenization algorithm Fully supported via native tools API or official prompts to enable tools
🟡 Partially supported, because tokenization algorithm wasn't made public by the model vendor.
An approximate tokenization algorithm is used instead.
It conservatively counts every byte in UTF-8 encoding of a string as a single token.
Partially supported, because the model doesn't support tools natively.
Prompt engineering is used instead to emulate tools, which may not be very reliable.
Not supported Not supported

Embedding models

The following models support SERVER_URL/openai/deployments/DEPLOYMENT_NAME/embeddings endpoint:

Model Deployment name Modality
Titan Multimodal Embeddings Generation 1 (G1) amazon.titan-embed-image-v1 image/text-to-embedding
Amazon Titan Text Embeddings V2 amazon.titan-embed-text-v2:0 text-to-embedding
Titan Embeddings G1 – Text v1.2 amazon.titan-embed-text-v1 text-to-embedding
Cohere Embed English cohere.embed-english-v3 text-to-embedding
Cohere Multilingual cohere.embed-multilingual-v3 text-to-embedding

Developer environment

This project uses Python>=3.11 and Poetry>=1.6.1 as a dependency manager.

Check out Poetry's documentation on how to install it on your system before proceeding.

To install requirements:

poetry install

This will install all requirements for running the package, linting, formatting and tests.

IDE configuration

The recommended IDE is VSCode. Open the project in VSCode and install the recommended extensions.

The VSCode is configured to use PEP-8 compatible formatter Black.

Alternatively you can use PyCharm.

Set-up the Black formatter for PyCharm manually or install PyCharm>=2023.2 with built-in Black support.

Run

Run the development server:

make serve

Open localhost:5001/docs to make sure the server is up and running.

Environment Variables

Copy .env.example to .env and customize it for your environment:

Variable Default Description
AWS_ACCESS_KEY_ID NA AWS credentials with access to Bedrock service
AWS_SECRET_ACCESS_KEY NA AWS credentials with access to Bedrock service
AWS_DEFAULT_REGION AWS region e.g. us-east-1
AWS_ASSUME_ROLE_ARN AWS assume role arn e.g. arn:aws:iam::123456789012:role/RoleName
LOG_LEVEL INFO Log level. Use DEBUG for dev purposes and INFO in prod
AIDIAL_LOG_LEVEL WARNING AI DIAL SDK log level
DIAL_URL URL of the core DIAL server. If defined, images generated by Stability are uploaded to the DIAL file storage and attachments are returned with URLs pointing to the images. Otherwise, the images are returned as base64 encoded strings.
WEB_CONCURRENCY 1 Number of workers for the server
COMPATIBILITY_MAPPING {} A JSON dictionary that maps Bedrock deployments that aren't supported by the Adapter to the Bedrock deployments that are supported by the Adapter (see the Supported models section). Find more details in the compatibility mode section.

Running unsupported Bedrock models in the compatibility mode

The Adapter supports a predefined list of AWS Bedrock deployments. The Supported models section lists the models. These models could be accessed via /openai/deployments/{deployment_name}/(chat_completions|embeddings) endpoints. The Adapter won't recognize any other deployment name and will result in 404 error.

Now, suppose AWS Bedrock released a new version of a model, e.g. anthropic.claude-3-5-sonnet-20250210-v3:0 which is a better version of an older anthropic.claude-3-5-sonnet-20241022-v2:0 model.

Immediately after the release, the former model is unsupported by the Adapter, but the latter is supported. Therefore, the request to openai/deployments/anthropic.claude-3-5-sonnet-20250210-v3:0/chat/completions will result in 404 error.

It will take some time for the Adapter to catch up with AWS Bedrock - support the v3 model and publish the release with the fix.

What to do in the meantime? Presumably, the v3 model is backward compatible with v2, so we may try to run v3 in the compatibility mode - that is to convince the Adapter to process v3 request as if it's v2 request with the only difference that the final upstream request to AWS Bedrock will be to v3 and not v2.

The COMPATIBILITY_MAPPING env variable enables exactly this scenario.

When it's defined like this:

COMPATIBILITY_MAPPING={"anthropic.claude-3-5-sonnet-20250210-v3:0": "anthropic.claude-3-5-sonnet-20241022-v2:0"}

the Adapter will be able to handle requests to anthropic.claude-3-5-sonnet-20250210-v3:0 deployment. The requests will be processed by the same pipeline as anthropic.claude-3-5-sonnet-20241022-v2:0, but the call to AWS Bedrock will be done to anthropic.claude-3-5-sonnet-20250210-v3:0 deployment name.

Naturally, this will only work if the APIs of v2 and v3 deployments are compatible:

  1. The requests utilizing the modalities supported by both v2 and v3 will work just fine.
  2. However, the requests with modalities that are supported by v3 (e.g. audio) and aren't supported by v2, won't be processed correctly. You will have to wait until the Adapter supports the v3 deployment natively.

When a version of the Adapter supporting the v3 model is released, you may migrate to it and safely remove the entry from the COMPATIBILITY_MAPPING dictionary.

Note that a mapping such as this one would be ineffectual:

COMPATIBILITY_MAPPING={"anthropic.claude-3-5-sonnet-20250210-v3:0": "stability.stable-image-ultra-v1:0"}

since the APIs and capabilities of these two models are drastically different.

Load balancing

If you use DIAL Core load balancing mechanism, you can provide extraData upstream setting with different AWS account credentials/regions to use different model deployments:

{
  "upstreams": [
    {
      "extraData": {
        "region": "eu-west-1",
        "aws_access_key_id": "key_id_1",
        "aws_secret_access_key": "access_key_1"
      }
    },
    {
      "extraData": {
        "region": "eu-west-1",
        "aws_access_key_id": "key_id_2",
        "aws_secret_access_key": "access_key_2"
      }
    },
    {
      "extraData": {
        "region": "eu-west-1",
        "aws_assume_role_arn": "arn:aws:iam::123456789012:role/BedrockAccessAdapterRoleName"
      }
    }
  ]
}

Supported extraData fields:

  • region
  • aws_access_key_id
  • aws_secret_access_key
  • aws_assume_role_arn

Docker

Run the server in Docker:

make docker_serve

Lint

Run the linting before committing:

make lint

To auto-fix formatting issues run:

make format

Test

Run unit tests locally:

make test

Run unit tests in Docker:

make docker_test

Run integration tests locally:

make integration_tests

Clean

To remove the virtual environment and build artifacts:

make clean