Skip to content

Latest commit

 

History

History
39 lines (35 loc) · 2.72 KB

resources.md

File metadata and controls

39 lines (35 loc) · 2.72 KB

resources for search-relevance

datasets

  • quora: The Quora dataset is composed of question pairs, and the task is to determine if the questions are paraphrases of each other (have the same meaning).
  • wiki: From DeepLearning.ai Building Applications with Vector Databases Course.
    !wget -q -O lesson2-wiki.csv.zip "https://www.dropbox.com/scl/fi/yxzmsrv2sgl249zcspeqb/lesson2-wiki.csv.zip?rlkey=paehnoxjl3s5x53d1bedt4pmc&dl=0"
    !unzip lesson2-wiki.csv.zip
    
  • news: From DeepLearning.ai Building Applications with Vector Databases Course.
    wget -q --show-progress -O all-the-news-3.zip "https://www.dropbox.com/scl/fi/wruzj2bwyg743d0jzd7ku/all-the-news-3.zip?rlkey=rgwtwpeznbdadpv3f01sznwxa&dl=1"
    unzip all-the-news-3.zip
    
  • ashraq/fashion-product-images-small: Multimodal data of fashion products
  • family-photos: From DeepLearning.ai Building Applications with Vector Databases Course.
    !wget -q --show-progress -O family_photos.zip "https://www.dropbox.com/scl/fi/yg0f2ynbzzd2q4nsweti5/family_photos.zip?rlkey=00oeuiii3jgapz2b1bfj0vzys&dl=0"
    !unzip -q family_photos.zip
    
  • cisco-logs: From DeepLearning.ai Building Applications with Vector Databases Course.
    !wget -q --show-progress -O training.tar.zip "https://www.dropbox.com/scl/fi/rihfngx4ju5pzjzjj7u9z/lesson6.tar.zip?rlkey=rct9a9bo8euqgshrk8wiq2orh&dl=1"
    !tar -xzvf training.tar.zip
    !tar -xzvf lesson6.tar
    

tools

  • DSPy: For programming - not prompting - LLMs
  • jinja2: For creating prompt templates
  • python-dotenv: For loading .env variables
  • poetry: For Python package and dep management
  • uv: Package installer written in Rust
  • ruff: Fast Python linter and code formatter
  • sentence-transformers: Pretrained models for text/image embedding models

models