Skip to content

Latest commit

 

History

History
14 lines (10 loc) · 951 Bytes

README.md

File metadata and controls

14 lines (10 loc) · 951 Bytes

CAPRA - Context AI Powered Research Assistant 🐑

Concept:

  • search for articles on arxiv
  • load each article into a corpus in chunks and obtain embeddings
  • when questioned, provide context from the corpus answer using a llm model (RAG)

The pdf articles are processed using the model Nougat, first proposed in Nougat: Neural Optical Understanding for Academic Documents and accessible via HuggingFace transformers, in order to extract the markdown text.

Notes

  • As of now, the arXiv API seems unreliable (maybe try a direct url GET call instead of using the Python wrapper for the API?).
  • You will need to create your own .env file inside the root project directory, with you OpenAI API key inside.
  • Will not make us of docker containers until Nougat is included with a stable version of transformers and a suitable solution for the arXiv API problem is found.