AGRMCR - Adapting Graph Reasoning for Explainable Cold Start Recommendation on Multi-Round Conversation Recommendation
1. Requirements
pip install -r requirements.txt
2. Docker Compose
For those who prefer containerization, Docker offers an isolated and consistent environment. Ensure Docker is installed on your system by following the official Docker installation guide.
- Start the Application with Docker Compose:
If you've made changes and want them to reflect, append
docker compose up -d
--build
to the command above. - Stopping the Application:
To stop and remove all running containers, execute:
docker-compose down
Four Amazon datasets (Amazon_Beauty, Amazon_CDs, Amazon_Cellphones, Amazon_Clothing) are available in the "JRL/raw_data/" directory and the split is consistent with [1] and [2]. All four datasets used in this paper can be downloaded here which consist of metadata and 5-core review.
Statistics of dataset
Summary statistics of datasets.
Entity Statistics for E-commerce Datasets
CDs | Cloth. | Cell. | Beauty | |
---|---|---|---|---|
#Entities | ||||
User | 75k | 39k | 27k | 22k |
Product | 64k | 23k | 10k | 12k |
Word | 202k | 21k | 22k | 22k |
Brand | 1.4k | 1.1k | 955 | 2k |
Category | 770 | 1.1k | 206 | 248 |
Relation Statistics for E-commerce Datasets
CDs | Cloth. | Cell. | Beauty | |
---|---|---|---|---|
#Relations | ||||
User |
1.1M | 278k | 194k | 198k |
User |
191M | 17M | 18M | 18M |
User |
192k | 60k | 90k | 132k |
User |
192k | 60k | 90k | 132k |
User |
2.0M | 949k | 288k | 354k |
Product |
191M | 17M | 18M | 18M |
Product |
466k | 154k | 36k | 49k |
Product |
64k | 23k | 10k | 12k |
Product |
3.6M | 1.4M | 590k | 891k |
Product |
78k | 147k | 22k | 155k |
Product |
78k | 28k | 12k | 14k |
Entities and Relations
Head | Relation | Tail |
---|---|---|
USER | INTERACT | ITEM |
USER | MENTION | WORD |
USER | LIKE** | BRAND |
USER | INTERESTED_IN** | CATEGORY |
ITEM | DESCRIBED_BY | WORD |
ITEM | BELONG_TO** | CATEGORY (FEATURE) |
ITEM | PRODUCED_BY** | BRAND (FEATURE) |
ITEM | ALSO_BUY | ITEM |
ITEM | ALSO_VIEW | ITEM |
ITEM | BOUGHT_TOGETHER | ITEM |
** denoted it used to integrate cold users or cold items into the KG.
- Index datasets
- Split datasets for training and test
- Extract gzip to txt
- Matching Relations brands, categories, related products
- Matching Feature
source 01-JRL/preprocessing_data.sh
source 01-JRL/clone_to_pr.sh
Details code
Description
index_and_filter_review_file.py
This script processes the review data to generate various entity files.
vocab.txt
: Contains a list of unique words from the reviews.user.txt
: Contains a list of unique user IDs.product.txt
: Contains a list of unique product IDs.review_text.txt
: Contains the text of the reviews.review_u_p.txt
: Maps reviews to users and products.review_id.txt
: Contains unique review IDs.train.txt
:test.txt
:validation.txt
:
split_train_test.py
gzip -d *.txt.gz
match_cate_brand_related.py
This script processes the data to generate relation files, which describe various relationships between entities such as products, brands, and categories.
also_bought_p_p.txt
: Contains pairs of products that are often bought together.also_view_p_p.txt
: Contains pairs of products that are often viewed together.bought_together_p_p.txt
: Contains pairs of products that are frequently bought together.brand_p_b.txt
: Maps products to their respective brands.category_p_c.txt
: Maps products to their respective categories.brand.txt
: Contains a list of unique brands.category.txt
: Contains a list of unique categories.related_product.txt
: Contains a list of unique related_product product IDs.
- Process original files
- Dataset Split, Cold users/items, and Knowledge Graph Creation
- Train the Knowledge Graph Embeddings
Details code
Description
- Training RL Agent
- Evaluation RL Agent
- Inference User Preference
source run_unicorn.sh
source inference_cold_start.sh
source clone_to_grec.sh
Details code
Description
STEP 1 : Training RL Agent RL_model.py
This script will train RL policy network. Given
STEP 2 : Evaluation RL Agentevaluate.py
This script will evaluate RL policy network. Given
STEP 3 : Inferencce User perference evaluate.py
This script will inference cold_start user to construct user perference
- Train the RL agent
- Evaluation
source 02-GRECS/run_grec.sh
source 02-GRECS/clone_to_mcr.sh
Details code
```bash ```Description
This script processes the review data to generate various entity files.
like_u_b.txt
:like_u_b_rating.txt
:dislike_u_b_rating.txt
:mentioned_by_u_w.txt
:described_as_p_w.txt
:purchases.txt
:interested_in_u_c.txt
:
This script processes the purchase.txt to generate pair(user,item) of train/test/validation.txt
cold_start_users.json
:cold_start_items.json
:train_dataset.pkl
:test_dataset.pkl
:valiation_dataset.pkl
:train_kg.pkl
:test_kg.pkl
:validation_kg.pkl
:train_label.pkl
:test_label.pkl
:validation_label.pkl
:
train_transe_model/transe_model_sd_epoch_{}.ckpt
: original embeddedtrain_transe_model.pkl
: null/avg translation train embeddedtest_transe_model.pkl
: null/avg translation test embeddedvalidation_transe_embed.pkl
: null/avg translation valid embedded
Overall, how does our technique compare to SOTA techniques?
bash source 02-GREC/run_basline.sh
Details code
echo "------------- 1 : Process the files for Recbole -------------"
# Process the processed files for RecBole (after processing the original files for Graph Reasoning)
echo "-------------- Formatting Beauty --------------------------"
python3 src/baselines/format_beauty.py \
--config config_default/beauty/baselines/format.json
echo "-------------- Formatting CDs --------------------------"
python3 src/baselines/format_cds.py \
--config config_default/cds/baselines/format.json
echo "-------------- Formatting Cellphones -------------------"
python3 src/baselines/format_cellphones.py \
--config config_default/cellphones/baselines/format.json
echo "-------------- Formatting Clothing ---------------------"
python3 src/baselines/format_clothing.py \
--config config_default/clothing/baselines/format.json
echo "--------------------------------------------------------"
# python3 src/baselines/format_coco.py \
# --config config_default/coco/baselines/format.json
# After this process, all the files from beauty have been standardized into the format needed by RecBole.
# We follow the same process for the other datasets:
echo "------------- 2 : Run the baselines -------------"
# To run a baseline on Beauty, choose a yaml config file in config_default/beauty/baselines and run the following:
DATASET_NAMES=("beauty" "cds" "cellphones" "clothing")
# DATASET_NAME=beauty
for DATASET_NAME in "${DATASET_NAMES[@]}"; do
python3 src/baselines/baseline.py \
--config config_default/${DATASET_NAME}/baselines/Pop.yaml
python3 src/baselines/baseline.py \
--config config_default/${DATASET_NAME}/baselines/ItemKNN.yaml
python3 src/baselines/baseline.py \
--config config_default/${DATASET_NAME}/baselines/BPR.yaml
python3 src/baselines/baseline.py \
--config config_default/${DATASET_NAME}/baselines/NeuMF.yaml
python3 src/baselines/baseline.py \
--config config_default/${DATASET_NAME}/baselines/CFKG.yaml
python3 src/baselines/baseline.py \
--config config_default/${DATASET_NAME}/baselines/KGCN.yaml
python3 src/baselines/baseline.py \
--config config_default/${DATASET_NAME}/baselines/MKR.yaml
python3 src/baselines/baseline.py \
--config config_default/${DATASET_NAME}/baselines/SpectralCF.yaml
done
# This example runs the Pop baseline on the Beauty dataset.
# You can ignore the warning "command line args [--config config_default/baselines/Pop.yaml] will not be used in RecBole". The argument is used properly.
Todsavad Tangtortan, Pranisaa Charnparttaravanit, Akraradet Sinsamersuk, Chaklam Silpasuwanchai. 2024. Adapting Graph Reasoning for Explainable Cold Start Recommendation on Multi-Round Conversation Recommendation (AGRMCR).
[1] Yongfeng Zhang, Qingyao Ai, Xu Chen, and W. Bruce Croft. 2017. Joint Representation Learning for Top-N Recommendation with Heterogeneous Information Sources. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 1449–1458. https://doi.org/10.1145/3132847.3132892
[2] Yukuo Cen, Jianwei Zhang, Xu Zou, Chang Zhou, Hongxia Yang, and Jie Tang. 2020. Controllable Multi-Interest Framework for Recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '20). Association for Computing Machinery, New York, NY, USA, 2942–2951. https://doi.org/10.1145/3394486.3403344
[3] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'13). Curran Associates Inc., Red Hook, NY, USA, 2787–2795.
[4] Yang Deng, Yaliang Li, Fei Sun, Bolin Ding, and Wai Lam. 2021. Unified Conversational Recommendation Policy Learning via Graph-based Reinforcement Learning. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21). Association for Computing Machinery, New York, NY, USA, 1431–1441. https://doi.org/10.1145/3404835.3462913
[5] Yikun Xian, Zuohui Fu, S. Muthukrishnan, Gerard de Melo, and Yongfeng Zhang. 2019. Reinforcement Knowledge Graph Reasoning for Explainable Recommendation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'19). Association for Computing Machinery, New York, NY, USA, 285–294. https://doi.org/10.1145/3331184.3331203
[6] Jibril Frej, Neel Shah, Marta Knezevic, Tanya Nazaretsky, and Tanja Käser. 2024. Finding Paths for Explainable MOOC Recommendation: A Learner Perspective. In Proceedings of the 14th Learning Analytics and Knowledge Conference (LAK '24). Association for Computing Machinery, New York, NY, USA, 426–437. https://doi.org/10.1145/3636555.3636898
[7] Jibril Frej, Marta Knezevic, Tanja Kaser. "Graph Reasoning for Explainable Cold Start Recommendation." arXiv preprint arXiv:2406.07420, 2024.