Why do this?
Data is not a business asset unless it is actually used. A ChatGPT-like experience is one way to encourage adoption of data to inform and accelerate workflow - through a simple and accessible natural language query on your data.
What do we want to evidence?
We want to create a ChatGPT-like experience to query and get answers from a corpus of first-party data. Technically we want to learn how word embeddings, vector databases and natural language queries can all come together to create this application.
NB:
- The corpus is a publicly available PDF called "Delivering Pandemic Vaccines in 100 Days".
- A more advanced POC is available, showing how it can be deployed on Streamlit. DM me for access to the code.