Skip to content

Latest commit

 

History

History
40 lines (28 loc) · 2.21 KB

File metadata and controls

40 lines (28 loc) · 2.21 KB

Gradio UI for RAG using Hugging Face Text Generation Inference server and Redis

This is a simple UI example for a RAG-based Chatbot using Gradio, Hugging Face TGI server, and Redis as a vector database.

You can refer to those different notebooks to get a better understanding of the flow:

Requirements

  • A Hugging Face Text Generation Inference server with a deployed LLM. This example is based on Llama2 but depending on your LLM you may need to adapt the prompt.
  • A Redis installation with a Database and an Index already populated with documents. See here for deployment instructions.
  • An index in the Redis DB that you have populated with documents. See here for an example.

Deployment on OpenShift

A pre-built container image of the application is available at: quay.io/rh-aiservices-bu/gradio-hftgi-rag-redis:latest

In the deployment folder, you will find the files necessary to deploy the application:

  • cm_redis_schema.yaml: A ConfigMap containing the schema for the database you have created in Redis (see the ingestion Notebook).
  • deployment.yaml: you must provide the URL of your inference server in the placeholder on L54 and Redis information on L56 and L58. Please feel free to modify other parameters as you see fit.
  • service.yaml
  • route.yaml

The different parameters you can/must pass as environment variables in the deployment are:

  • INFERENCE_SERVER_URL - mandatory
  • REDIS_URL - mandatory
  • REDIS_INDEX - mandarory
  • MAX_NEW_TOKENS - optional, default: 512
  • TOP_K - optional, default: 10
  • TOP_P - optional, default: 0.95
  • TYPICAL_P - optional, default: 0.95
  • TEMPERATURE - optional, default: 0.01
  • REPETITION_PENALTY - optional, default: 1.03

The deployment replicas is set to 0 initially to let you properly fill in those parameters. Don't forget to scale it up if you want see something 😉!