SheetSimplify with RAG LLMs

The aim of this project is to simplify data retrieval from Excel Sheets using RAG LLMs, hence the name! Many organizations currently store their data in Excel sheets and have stored decades' worth of data in them. However, retrieving data from these sheets becomes quite difficult unless the user has some technical background. The idea of Natural Language Querying (NLQ) is to exactly solve this issue by allowing users to ask simple questions to a model and get appropriate and rational responses. This NLQ can be achieved using RAG LLMs, which is what we aim to build in this project.

The approach

Instead of fine-tuning the model on the relevant data, which consumes significant resources, we shall attempt to utilize prompt-engineering to make the LLM answer based on the context provided via the dataset. This is the basic idea behind RAG.

Once the model is done, we shall expose a simple API endpoint which responds with a summary of key information from the data. The user would also like to query the model further, for which we provide a streamlit app.

Since we would like the whole application to be distributed, we would 'dockerize' it.

The repo

The repo structure is based on a standard template for production ML projects [3].

Notebooks: Typically used for data analysis and exploration. Since we are dealing with LLMs, I decided to add my understanding of different concepts to the notebooks here.
Model: Contains the python file that preps and invokes the LLM.
API: Contains the flask file which exposes the endpoint to make a simple call to the LLM. Following are the endpoints exposed so far
- "/" - home page which just says "Welcome to Sheet Simplify!"
- "/v1/summary" - which provides a summary of the data provided as per the LLM
streamlit: Contains the code to setup the streamlit app. Not in the original template.

There are several other folders from the original template which were not relevant to this case and hence have been omitted.

Usage Tips

Before running the scripts in this repo, it is very important to perform a pip install on the entire project so that the internal packages become available to each other. To do this, run the following, pip install .
Regardless of the script you want to run, it is very important to execute it from the root directory. For example, you would run the llm.py file as follows from the root directory python -m model/llm.py
For the streamlit app, run the following command streamlit run streamlit/streamlit_app.py This initiates the streamlit web app frontend in the localhost
You can also execute the flask app to hit the API endpoints using the command python =m api/app.py This initiates the flask server in localhost. Remember, you would have to add the endpoint /v1/summary to the URL in the browser to hit the endpoint. Of course, you can also hit these endpoints from other API tools like Postman.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
api		api
model		model
notebooks		notebooks
streamlit		streamlit
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SheetSimplify with RAG LLMs

The approach

The repo

Usage Tips

References

About

Releases

Packages

Languages

sivadhulipala1999/SheetSimplify_with_RAG

Folders and files

Latest commit

History

Repository files navigation

SheetSimplify with RAG LLMs

The approach

The repo

Usage Tips

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages