GitHub - behkamfallah/Chat-Duck: This repository is a 'Chat-with-your-PDF' project using RAG approach.

About the Project

Chat with your PDF

This repository is a 'Chat-with-your-PDF' project using two different implementations, namely Light and Enterprise. Me and @Pardis-Rahbarsooreh have worked on this project.

Prerequisites

Ensure that you have installed the libraries in requirements.txt which is located in the .\source\requirements.txt. You can run this code from terminal:

!pip install -r requirements.txt

If you get "recursive_guard" error while running the code, try using python 3.11.

If you would like to fork the repository be sure that create an .env file in the ./source and put the API keys in it. These APIs will be needed if you would like to fully operate this code:

OPENAI_API_KEY='...'
ELASTIC_API_KEY='...'
ELASTIC_CLOUD_ID='...'
ELASTIC_END_POINT='...'
UNSTRUCTURED_API_KEY='...'
UNSTRUCTURED_SERVER_URL='...'
PINECONE_API_KEY='...'

Files and Folders

This repository has three main folders:

./data is the folder you should put your pdf file there.
./source is the folder that consists of .py files. This folder has these python files with these usages:
1. To insert data to databases, use these files:
  1. data_to_ElasticCloud.py
  2. data_to_Pinecone.py
  Simply specify your file in the line 12 and run the file.
2. To run the whole application on Streamlit you will need the streamlit_app.py: Open Terminal an change directory to ./source and then type:
```
streamlit run streamlit_app.py
```
3. document_loader.py has the responsibility to Load PDFs. You can call an instance of LoadDocument class that is implemented in this file.
4. chunker.py has the responsibility to chunk the data. This file is used only for dealing with the data that will be indexed to Pinecone database.
5. pinecone_handler.py handles the client and connection to Pinecone servers. It also retrieves data.
6. elasticsearchhandler.py handles the client and connection to Elastic Cloud.
7. unstructured_io_handler.py handles the connection and getting results from the 'Unstructured.io' servers.
8. light_model.py has the chain related to Light Model.
9. enterprise_model.py has the chain related to Enterprise Model.
10. test_synthetic_data.py is for testing the app via benchmarks. If you want to run this file, remember to change context window of light model and use enterprise_model_for_test.py instead of enterprise_model.py.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
source		source
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About the Project

Prerequisites

Files and Folders

About

Releases

Packages

Languages

License

behkamfallah/Chat-Duck

Folders and files

Latest commit

History

Repository files navigation

About the Project

Prerequisites

Files and Folders

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages