Home Depot product data is downloaded from https://data.world/
You can check out my project @ Data.World
For the purpose of this tutorial MongoDB is used to store embeddings along with rest of the data
sudo docker volume create mondodb_data
sudo docker-compose up -d
version: '3.7'
services:
mongodb_container:
image: mongo:latest
environment:
MONGO_INITDB_ROOT_USERNAME: username
MONGO_INITDB_ROOT_PASSWORD: password
ports:
- 27017:27017
volumes:
- mongodb_data:/data/db
volumes:
mongodb_data:
PostgreSQL DB server with pgvector extension is used. Vector storage and indexing capabilities are not part of this tutorial
sudo docker volume create postgresql_data
sudo docker-compose up -d
services:
db:
hostname: db
image: ankane/pgvector
ports:
- 5432:5432
restart: always
environment:
- POSTGRES_DB=vectordb
- POSTGRES_USER=testuser
- POSTGRES_PASSWORD=testpwd
- POSTGRES_HOST_AUTH_METHOD=trust
volumes:
- postgresql_data:/var/lib/postgresql/data
volumes:
postgresql_data:
This notebook is used for the initial setup of the Postgre data source and sample use of embedding service for the ETL process to store Product data and embeddings. The embeddings for the product names will be generated.
- Initialize Databases - ✅
- Create embeddings for product names and create collection - ✅
- Extending SimpleEmbeddingService to handle images - ✅
- Create image embeddings for product images and update collection - ✅
- Automation of feature store updates with Prefect - ✅