Skip to content

POC gradio app for a RAG bot based on arXiv articles.

License

Notifications You must be signed in to change notification settings

apiraccini/capra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CAPRA - Context AI Powered Research Assistant 🐑

Concept:

  • search for articles on arxiv
  • load each article into a corpus in chunks and obtain embeddings
  • when questioned, provide context from the corpus answer using a llm model (RAG)

The pdf articles are processed using the model Nougat, first proposed in Nougat: Neural Optical Understanding for Academic Documents and accessible via HuggingFace transformers, in order to extract the markdown text.

Notes

  • As of now, the arXiv API seems unreliable (maybe try a direct url GET call instead of using the Python wrapper for the API?).
  • You will need to create your own .env file inside the root project directory, with you OpenAI API key inside.
  • Will not make us of docker containers until Nougat is included with a stable version of transformers and a suitable solution for the arXiv API problem is found.

About

POC gradio app for a RAG bot based on arXiv articles.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages