This is a simple UI example for a RAG-based Chatbot using Gradio, Hugging Face TGI server, and Redis as a vector database.
You can refer to those different notebooks to get a better understanding of the flow:
- Data Ingestion to Redis with Langchain
- Redis querying with Langchain
- Full RAG example with Redis and Langchain
- A Hugging Face Text Generation Inference server with a deployed LLM. This example is based on Llama2 but depending on your LLM you may need to adapt the prompt.
- A Redis installation with a Database and an Index already populated with documents. See here for deployment instructions.
- An index in the Redis DB that you have populated with documents. See here for an example.
A pre-built container image of the application is available at: quay.io/rh-aiservices-bu/gradio-hftgi-rag-redis:latest
In the deployment
folder, you will find the files necessary to deploy the application:
cm_redis_schema.yaml
: A ConfigMap containing the schema for the database you have created in Redis (see the ingestion Notebook).deployment.yaml
: you must provide the URL of your inference server in the placeholder on L54 and Redis information on L56 and L58. Please feel free to modify other parameters as you see fit.service.yaml
route.yaml
The different parameters you can/must pass as environment variables in the deployment are:
- INFERENCE_SERVER_URL - mandatory
- REDIS_URL - mandatory
- REDIS_INDEX - mandarory
- MAX_NEW_TOKENS - optional, default: 512
- TOP_K - optional, default: 10
- TOP_P - optional, default: 0.95
- TYPICAL_P - optional, default: 0.95
- TEMPERATURE - optional, default: 0.01
- REPETITION_PENALTY - optional, default: 1.03
The deployment replicas is set to 0 initially to let you properly fill in those parameters. Don't forget to scale it up if you want see something 😉!