Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pricing.mdx + add container_types.mdx #94

Merged
merged 7 commits into from
Aug 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,6 @@
title: Inference Endpoints Version
- local: others/serialization
title: Serialization & Deserialization for Requests
- local: others/container_types
title: Inference Endpoints Container Types
title: Others
3 changes: 3 additions & 0 deletions docs/source/faq.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,9 @@ A: This is now possible in the UI, or via the API:

A: Yes! Please check out our [TGI documentation](https://huggingface.co/docs/text-generation-inference/index) and this [video](https://www.youtube.com/watch?v=jlMAX2Oaht0) on TGI deploys.

### Q: I'm sometimes running into a 503 error on a running endpoint in production. What can I do?

A: To help mitigate service interruptions on an Endpoint that needs to be highly available, please make sure to use at least 2 replicas, ie min replicas set to 2.

### Q: What’s the difference between Dedicated and Serverless Endpoints?

Expand Down
32 changes: 32 additions & 0 deletions docs/source/others/container_types.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Inference Endpoints Container Types

When you create an Endpoint, you have a variety of options when selecting a container type.

## Default

The default container type is the easiest way to deploy endpoints and is flexible thanks to [custom Inference Handlers](https://huggingface.co/docs/inference-endpoints/guides/custom_handler). The Hugging Face Inference Toolkit is now public at https://github.com/huggingface/huggingface-inference-toolkit.

## Custom

Select a custom container type if you'd like to customize the image and include a [custom container](https://huggingface.co/docs/inference-endpoints/guides/custom_container).

## NVIDIA NIM (no longer available in UI)

*The NIM container type will no longer be officially supported for already existing Endpoints in Inference Endpoints beginning October 1st, 2024.*
Select the NIM container type for models supported by NVIDIA. You'll see this option in the UI if supported for that model.

## Text Embeddings Inference

Select the Text Embeddings Inference container type to gain all the benefits of [TEI](https://huggingface.co/docs/text-embeddings-inference/en/index) for your Endpoint. You'll see this option in the UI if supported for that model.

## Text Generation Inference

Select the Text Generation Inference container type to gain all the benefits of [TGI](https://huggingface.co/docs/text-generation-inference/index) for your Endpoint. You'll see this option in the UI if supported for that model.

## Text Generation Inference (INF2)

Select the Text Generation Inference Inferentia2 Neuron container type for models you'd like to deploy with TGI on an AWS Inferentia2 instance. You'll see this option in the UI if supported for that model.

## Text Generation Inference (TPU)

Select the Text Generation Inference TPU container type for models you'd like to deploy with TGI on a Google Cloud TPU instance. You'll see this option in the UI if supported for that model.
36 changes: 18 additions & 18 deletions docs/source/pricing.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -23,24 +23,24 @@ You can find the hourly pricing for all available instances for 🤗 Inference E

The table below shows currently available CPU instances and their hourly pricing. If the instance type cannot be selected in the application, you need to [request a quota](mailto:api-enterprise@huggingface.co?subject=Quota%20increase%20HF%20Endpoints&body=Hello,%0D%0A%0D%0AI%20would%20like%20to%20request%20access/quota%20increase%20for%20[INSTANCE%20TYPE]%20for%20the%20following%20account%20[HF%20ACCOUNT].) to use it.

| Provider | Instance Type | Instance Size | Hourly rate | vCPUs | Memory | Architecture |
| -------- | ------------- | ------------- | ----------- | ----- | ------ | --------------------- |
| aws | intel-icl | x1 | $0.032 | 1 | 2 GB | Intel Ice Lake | (Soon to be fully deprecated)
| aws | intel-icl | x2 | $0.064 | 2 | 4 GB | Intel Ice Lake | (Soon to be fully deprecated)
| aws | intel-icl | x4 | $0.128 | 4 | 8 GB | Intel Ice Lake | (Soon to be fully deprecated)
| aws | intel-icl | x8 | $0.256 | 8 | 16 GB | Intel Ice Lake | (Soon to be fully deprecated)
| aws | intel-spr | x1 | $0.033 | 1 | 2 GB | Intel Sapphire Rapids |
| aws | intel-spr | x2 | $0.067 | 2 | 4 GB | Intel Sapphire Rapids |
| aws | intel-spr | x4 | $0.134 | 4 | 8 GB | Intel Sapphire Rapids |
| aws | intel-spr | x8 | $0.268 | 8 | 16 GB | Intel Sapphire Rapids |
| azure | intel-xeon | x1 | $0.060 | 1 | 2 GB | Intel Xeon |
| azure | intel-xeon | x2 | $0.120 | 2 | 4 GB | Intel Xeon |
| azure | intel-xeon | x4 | $0.240 | 4 | 8 GB | Intel Xeon |
| azure | intel-xeon | x8 | $0.480 | 8 | 16 GB | Intel Xeon |
| gcp | intel-spr | x1 | $0.070 | 1 | 2 GB | Intel Sapphire Rapids |
| gcp | intel-spr | x2 | $0.140 | 2 | 4 GB | Intel Sapphire Rapids |
| gcp | intel-spr | x4 | $0.280 | 4 | 8 GB | Intel Sapphire Rapids |
| gcp | intel-spr | x8 | $0.560 | 8 | 16 GB | Intel Sapphire Rapids |
| Provider | Instance Type | Instance Size | Hourly rate | vCPUs | Memory | Architecture |
| -------- | ------------- | ------------- | ----------- | ----- | ------ | ------------------------------------------- |
| aws | intel-icl | x1 | $0.032 | 1 | 2 GB | Intel Ice Lake *(soon to be fully deprecated)*|
| aws | intel-icl | x2 | $0.064 | 2 | 4 GB | Intel Ice Lake *(soon to be fully deprecated)*|
| aws | intel-icl | x4 | $0.128 | 4 | 8 GB | Intel Ice Lake *(soon to be fully deprecated)*|
| aws | intel-icl | x8 | $0.256 | 8 | 16 GB | Intel Ice Lake *(soon to be fully deprecated)*|
| aws | intel-spr | x1 | $0.033 | 1 | 2 GB | Intel Sapphire Rapids |
| aws | intel-spr | x2 | $0.067 | 2 | 4 GB | Intel Sapphire Rapids |
| aws | intel-spr | x4 | $0.134 | 4 | 8 GB | Intel Sapphire Rapids |
| aws | intel-spr | x8 | $0.268 | 8 | 16 GB | Intel Sapphire Rapids |
| azure | intel-xeon | x1 | $0.060 | 1 | 2 GB | Intel Xeon |
| azure | intel-xeon | x2 | $0.120 | 2 | 4 GB | Intel Xeon |
| azure | intel-xeon | x4 | $0.240 | 4 | 8 GB | Intel Xeon |
| azure | intel-xeon | x8 | $0.480 | 8 | 16 GB | Intel Xeon |
| gcp | intel-spr | x1 | $0.070 | 1 | 2 GB | Intel Sapphire Rapids |
| gcp | intel-spr | x2 | $0.140 | 2 | 4 GB | Intel Sapphire Rapids |
| gcp | intel-spr | x4 | $0.280 | 4 | 8 GB | Intel Sapphire Rapids |
| gcp | intel-spr | x8 | $0.560 | 8 | 16 GB | Intel Sapphire Rapids |

## GPU Instances

Expand Down
Loading