The following are the various components of this project:
- modified_llama
Llama2 modified to allow extraction of the context vectors. - generate_context_vectors.py
Use modified_llama to extract the context vectors from articles and store it using the cv_storage library (see below).
Check the arguments to the main function for the available options like input files and output folders. - wikipedia_parser
Read files generated by https://github.com/mlabs-haskell/wikipedia_parser/ - cv_storage
Efficiently store context vectors, queryable by article and section names. - cv_library
Generate lower-fidelity versions of a context vector for fast searching.
Analogous to Mipmaps in 3D rendering (https://en.wikipedia.org/wiki/Mipmap). - cv_hierarchical_storage
Use the lower-fidelity versions of the context vectors to quickly compare an input context vector and find the closest match. - query_generator
Generate LLM prompts from an article to help with context vector generation - bin_storage
Functions to read/write primitives used by the cv_storage and cv_hierarchical_storage libraries.