huggingface · Michellehbn · Aug 23, 2024 · Aug 23, 2024 · Aug 23, 2024 · Aug 23, 2024
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -51,4 +51,6 @@
     title: Inference Endpoints Version
   - local: others/serialization
     title: Serialization & Deserialization for Requests
+  - local: others/container_types
+    title: Inference Endpoints Container Types
   title: Others
diff --git a/docs/source/faq.mdx b/docs/source/faq.mdx
@@ -101,6 +101,9 @@ A: This is now possible in the UI, or via the API:
 
 A: Yes! Please check out our [TGI documentation](https://huggingface.co/docs/text-generation-inference/index) and this [video](https://www.youtube.com/watch?v=jlMAX2Oaht0) on TGI deploys.
 
+### Q: I'm sometimes running into a 503 error on a running endpoint in production. What can I do? 
+
+A: To help mitigate service interruptions on an Endpoint that needs to be highly available, please make sure to use at least 2 replicas, ie min replicas set to 2.
 
 ### Q: What’s the difference between Dedicated and Serverless Endpoints? 
 

diff --git a/docs/source/others/container_types.mdx b/docs/source/others/container_types.mdx
@@ -0,0 +1,32 @@
+# Inference Endpoints Container Types
+
+When you create an Endpoint, you have a variety of options when selecting a container type.
+
+## Default
+
+The default container type is the easiest way to deploy endpoints and is flexible thanks to [custom Inference Handlers](https://huggingface.co/docs/inference-endpoints/guides/custom_handler). The Hugging Face Inference Toolkit is now public at https://github.com/huggingface/huggingface-inference-toolkit.
+
+## Custom
+
+Select a custom container type if you'd like to customize the image and include a [custom container](https://huggingface.co/docs/inference-endpoints/guides/custom_container). 
+
+## NVIDIA NIM (no longer available in UI) 
+
+*The NIM container type will no longer be officially supported for already existing Endpoints in Inference Endpoints beginning October 1st, 2024.*
+Select the NIM container type for models supported by NVIDIA. You'll see this option in the UI if supported for that model.
+
+## Text Embeddings Inference 
+
+Select the Text Embeddings Inference container type to gain all the benefits of [TEI](https://huggingface.co/docs/text-embeddings-inference/en/index) for your Endpoint. You'll see this option in the UI if supported for that model. 
+
+## Text Generation Inference 
+
+Select the Text Generation Inference container type to gain all the benefits of [TGI](https://huggingface.co/docs/text-generation-inference/index) for your Endpoint. You'll see this option in the UI if supported for that model.
+
+## Text Generation Inference (INF2)
+
+Select the Text Generation Inference Inferentia2 Neuron container type for models you'd like to deploy with TGI on an AWS Inferentia2 instance. You'll see this option in the UI if supported for that model.
+
+## Text Generation Inference (TPU)
+
+Select the Text Generation Inference TPU container type for models you'd like to deploy with TGI on a Google Cloud TPU instance. You'll see this option in the UI if supported for that model.
diff --git a/docs/source/pricing.mdx b/docs/source/pricing.mdx
@@ -23,24 +23,24 @@ You can find the hourly pricing for all available instances for 🤗 Inference E
 
 The table below shows currently available CPU instances and their hourly pricing. If the instance type cannot be selected in the application, you need to [request a quota](mailto:api-enterprise@huggingface.co?subject=Quota%20increase%20HF%20Endpoints&body=Hello,%0D%0A%0D%0AI%20would%20like%20to%20request%20access/quota%20increase%20for%20[INSTANCE%20TYPE]%20for%20the%20following%20account%20[HF%20ACCOUNT].) to use it.
 
-| Provider | Instance Type | Instance Size | Hourly rate | vCPUs | Memory | Architecture          |
-| -------- | ------------- | ------------- | ----------- | ----- | ------ | --------------------- |
-| aws      | intel-icl     | x1            | $0.032      | 1     | 2 GB   | Intel Ice Lake        | (Soon to be fully deprecated)
-| aws      | intel-icl     | x2            | $0.064      | 2     | 4 GB   | Intel Ice Lake        | (Soon to be fully deprecated)
-| aws      | intel-icl     | x4            | $0.128      | 4     | 8 GB   | Intel Ice Lake        | (Soon to be fully deprecated)
-| aws      | intel-icl     | x8            | $0.256      | 8     | 16 GB  | Intel Ice Lake        | (Soon to be fully deprecated)
-| aws      | intel-spr     | x1            | $0.033      | 1     | 2 GB   | Intel Sapphire Rapids |
-| aws      | intel-spr     | x2            | $0.067      | 2     | 4 GB   | Intel Sapphire Rapids |
-| aws      | intel-spr     | x4            | $0.134      | 4     | 8 GB   | Intel Sapphire Rapids |
-| aws      | intel-spr     | x8            | $0.268      | 8     | 16 GB  | Intel Sapphire Rapids |
-| azure    | intel-xeon    | x1            | $0.060      | 1     | 2 GB   | Intel Xeon            |
-| azure    | intel-xeon    | x2            | $0.120      | 2     | 4 GB   | Intel Xeon            |
-| azure    | intel-xeon    | x4            | $0.240      | 4     | 8 GB   | Intel Xeon            |
-| azure    | intel-xeon    | x8            | $0.480      | 8     | 16 GB  | Intel Xeon            |
-| gcp      | intel-spr     | x1            | $0.070      | 1     | 2 GB   | Intel Sapphire Rapids |
-| gcp      | intel-spr     | x2            | $0.140      | 2     | 4 GB   | Intel Sapphire Rapids |
-| gcp      | intel-spr     | x4            | $0.280      | 4     | 8 GB   | Intel Sapphire Rapids |
-| gcp      | intel-spr     | x8            | $0.560      | 8     | 16 GB  | Intel Sapphire Rapids |
+| Provider | Instance Type | Instance Size | Hourly rate | vCPUs | Memory | Architecture                                |
+| -------- | ------------- | ------------- | ----------- | ----- | ------ | ------------------------------------------- |
+| aws      | intel-icl     | x1            | $0.032      | 1     | 2 GB   | Intel Ice Lake *(soon to be fully deprecated)*|
+| aws      | intel-icl     | x2            | $0.064      | 2     | 4 GB   | Intel Ice Lake *(soon to be fully deprecated)*|
+| aws      | intel-icl     | x4            | $0.128      | 4     | 8 GB   | Intel Ice Lake *(soon to be fully deprecated)*|
+| aws      | intel-icl     | x8            | $0.256      | 8     | 16 GB  | Intel Ice Lake *(soon to be fully deprecated)*| 
+| aws      | intel-spr     | x1            | $0.033      | 1     | 2 GB   | Intel Sapphire Rapids                       |
+| aws      | intel-spr     | x2            | $0.067      | 2     | 4 GB   | Intel Sapphire Rapids                       |
+| aws      | intel-spr     | x4            | $0.134      | 4     | 8 GB   | Intel Sapphire Rapids                       |
+| aws      | intel-spr     | x8            | $0.268      | 8     | 16 GB  | Intel Sapphire Rapids                       |
+| azure    | intel-xeon    | x1            | $0.060      | 1     | 2 GB   | Intel Xeon                                  |
+| azure    | intel-xeon    | x2            | $0.120      | 2     | 4 GB   | Intel Xeon                                  |
+| azure    | intel-xeon    | x4            | $0.240      | 4     | 8 GB   | Intel Xeon                                  |
+| azure    | intel-xeon    | x8            | $0.480      | 8     | 16 GB  | Intel Xeon                                  |
+| gcp      | intel-spr     | x1            | $0.070      | 1     | 2 GB   | Intel Sapphire Rapids                       |
+| gcp      | intel-spr     | x2            | $0.140      | 2     | 4 GB   | Intel Sapphire Rapids                       |
+| gcp      | intel-spr     | x4            | $0.280      | 4     | 8 GB   | Intel Sapphire Rapids                       |
+| gcp      | intel-spr     | x8            | $0.560      | 8     | 16 GB  | Intel Sapphire Rapids                       |
 
 ## GPU Instances