Rigel

The following are the various components of this project:

modified_llama
Llama2 modified to allow extraction of the context vectors.
generate_context_vectors.py
Use modified_llama to extract the context vectors from articles and store it using the cv_storage library (see below).
Check the arguments to the main function for the available options like input files and output folders.
wikipedia_parser
Read files generated by https://github.com/mlabs-haskell/wikipedia_parser/
cv_storage
Efficiently store context vectors, queryable by article and section names.
cv_library
Generate lower-fidelity versions of a context vector for fast searching.
Analogous to Mipmaps in 3D rendering (https://en.wikipedia.org/wiki/Mipmap).
cv_hierarchical_storage
Use the lower-fidelity versions of the context vectors to quickly compare an input context vector and find the closest match.
query_generator
Generate LLM prompts from an article to help with context vector generation
bin_storage
Functions to read/write primitives used by the cv_storage and cv_hierarchical_storage libraries.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
bin_storage		bin_storage
charts		charts
cv_hierarchical_storage		cv_hierarchical_storage
cv_library		cv_library
cv_storage		cv_storage
modified_llama		modified_llama
query_generator		query_generator
scripts		scripts
wikipedia_parser		wikipedia_parser
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
evaluate_hierarchical_compression.ipynb		evaluate_hierarchical_compression.ipynb
generate_context_vectors.py		generate_context_vectors.py
justfile		justfile
requirements.txt		requirements.txt
text_processing.py		text_processing.py

Provide feedback