AGRMCR - Adapting Graph Reasoning for Explainable Cold Start Recommendation on Multi-Round Conversation Recommendation

Environment Setup

1. Requirements

pip install -r requirements.txt

2. Docker Compose

For those who prefer containerization, Docker offers an isolated and consistent environment. Ensure Docker is installed on your system by following the official Docker installation guide.

Start the Application with Docker Compose:
```
docker compose up -d 
```
If you've made changes and want them to reflect, append --build to the command above.
Stopping the Application: To stop and remove all running containers, execute:
```
docker-compose down
```

Data Preparation

Four Amazon datasets (Amazon_Beauty, Amazon_CDs, Amazon_Cellphones, Amazon_Clothing) are available in the "JRL/raw_data/" directory and the split is consistent with [1] and [2]. All four datasets used in this paper can be downloaded here which consist of metadata and 5-core review.

Statistics of dataset

Summary statistics of datasets.

Entity Statistics for E-commerce Datasets

	CDs	Cloth.	Cell.	Beauty
#Entities
User	75k	39k	27k	22k
Product	64k	23k	10k	12k
Word	202k	21k	22k	22k
Brand	1.4k	1.1k	955	2k
Category	770	1.1k	206	248

Relation Statistics for E-commerce Datasets

	CDs	Cloth.	Cell.	Beauty
#Relations
User $\xrightarrow{\text{purchase}}$ Product	1.1M	278k	194k	198k
User $\xrightarrow{\text{mention}}$ Word	191M	17M	18M	18M
User $\xrightarrow{\text{like}}$ Brand	192k	60k	90k	132k
User $\xrightarrow{\text{dislike}}$ Brand	192k	60k	90k	132k
User $\xrightarrow{\text{interested in}}$ Category	2.0M	949k	288k	354k
Product $\xrightarrow{\text{described by}}$ Word	191M	17M	18M	18M
Product $\xrightarrow{\text{belong to}}$ Category	466k	154k	36k	49k
Product $\xrightarrow{\text{produced by}}$ Brand	64k	23k	10k	12k
Product $\xrightarrow{\text{also bought}}$ Product	3.6M	1.4M	590k	891k
Product $\xrightarrow{\text{also viewed}}$ Product	78k	147k	22k	155k
Product $\xrightarrow{\text{bought together}}$ Product	78k	28k	12k	14k

Entities and Relations

Head	Relation	Tail
USER	INTERACT	ITEM
USER	MENTION	WORD
USER	LIKE**	BRAND
USER	INTERESTED_IN**	CATEGORY
ITEM	DESCRIBED_BY	WORD
ITEM	BELONG_TO**	CATEGORY (FEATURE)
ITEM	PRODUCED_BY**	BRAND (FEATURE)
ITEM	ALSO_BUY	ITEM
ITEM	ALSO_VIEW	ITEM
ITEM	BOUGHT_TOGETHER	ITEM

** denoted it used to integrate cold users or cold items into the KG.

How to run the code

JRL - Preprocessing dataset

Index datasets
Split datasets for training and test
Extract gzip to txt
Matching Relations brands, categories, related products
Matching Feature

source 01-JRL/preprocessing_data.sh
source 01-JRL/clone_to_pr.sh

Details code

Description

STEP 1 : Index datasets (Entity)

index_and_filter_review_file.py

This script processes the review data to generate various entity files.

Generated Files:

vocab.txt : Contains a list of unique words from the reviews.
user.txt : Contains a list of unique user IDs.
product.txt : Contains a list of unique product IDs.
review_text.txt : Contains the text of the reviews.
review_u_p.txt : Maps reviews to users and products.
review_id.txt : Contains unique review IDs.
train.txt :
test.txt :
validation.txt :

STEP 2 : Split datasets for training and test

split_train_test.py

STEP 3 : Extract gzip to txt

gzip -d *.txt.gz

STEP 4 : Matching Relations

match_cate_brand_related.py

This script processes the data to generate relation files, which describe various relationships between entities such as products, brands, and categories.

Generated Files:

also_bought_p_p.txt: Contains pairs of products that are often bought together.
also_view_p_p.txt: Contains pairs of products that are often viewed together.
bought_together_p_p.txt: Contains pairs of products that are frequently bought together.
brand_p_b.txt: Maps products to their respective brands.
category_p_c.txt: Maps products to their respective categories.
brand.txt: Contains a list of unique brands.
category.txt: Contains a list of unique categories.
related_product.txt : Contains a list of unique related_product product IDs.

STEP 5 : Clone preprocessed dataset to Path Reasoning

Transitional Embedding (TranSE)

Process original files
Dataset Split, Cold users/items, and Knowledge Graph Creation
Train the Knowledge Graph Embeddings

Details code

Description

UNICORN - Multi-round Conversation Recommendation (MCR)

Training RL Agent
Evaluation RL Agent
Inference User Preference

source run_unicorn.sh
source inference_cold_start.sh
source clone_to_grec.sh

Details code

Description

STEP 1 : Training RL Agent `RL_model.py`

This script will train RL policy network. Given $p_0$, the agent will decide which items to recommend.

STEP 2 : Evaluation RL Agent`evaluate.py`

This script will evaluate RL policy network. Given $p_0$, the agent will decide which items to recommend

STEP 3 : Inferencce User perference `evaluate.py`

This script will inference cold_start user to construct user perference

GRECS - Graph Reasoning (GR)

Train the RL agent
Evaluation

source 02-GRECS/run_grec.sh
source 02-GRECS/clone_to_mcr.sh

Details code

```bash ```

Description

STEP 1 : Preprocessing `preprocess/domain.py`

This script processes the review data to generate various entity files.

Generated Files:

like_u_b.txt :
like_u_b_rating.txt :
dislike_u_b_rating.txt :
mentioned_by_u_w.txt :
described_as_p_w.txt :
purchases.txt :
interested_in_u_c.txt :

STEP 2 : Make dataset `make_dataset.py`

This script processes the purchase.txt to generate pair(user,item) of train/test/validation.txt

Generated Files:

cold_start_users.json :
cold_start_items.json :
train_dataset.pkl :
test_dataset.pkl :
valiation_dataset.pkl :
train_kg.pkl :
test_kg.pkl :
validation_kg.pkl :
train_label.pkl :
test_label.pkl :
validation_label.pkl :

STEP 3 : Transitional Embedding (TransE) [3] `train_transe_model.py`

Generated Files:

train_transe_model/transe_model_sd_epoch_{}.ckpt : original embedded
train_transe_model.pkl : null/avg translation train embedded
test_transe_model.pkl : null/avg translation test embedded
validation_transe_embed.pkl : null/avg translation valid embedded

STEP 4 : Clone transE embedding to Multi-round conversation

STEP 5 : Train RL agent `train_agent.py`

Generated Files:

STEP 6 : Evaluation RL agent `test_agent.py`

Generated Files:

Run the baselines

Overall, how does our technique compare to SOTA techniques?

bash source 02-GREC/run_basline.sh

Details code

echo "------------- 1 : Process the files for Recbole -------------"
# Process the processed files for RecBole (after processing the original files for Graph Reasoning) 
echo "-------------- Formatting Beauty --------------------------"
python3 src/baselines/format_beauty.py \
    --config config_default/beauty/baselines/format.json 
echo "-------------- Formatting CDs --------------------------"
python3 src/baselines/format_cds.py \
    --config config_default/cds/baselines/format.json
echo "-------------- Formatting Cellphones -------------------"
python3 src/baselines/format_cellphones.py \
    --config config_default/cellphones/baselines/format.json
echo "-------------- Formatting Clothing ---------------------"
python3 src/baselines/format_clothing.py \
    --config config_default/clothing/baselines/format.json
echo "--------------------------------------------------------"
# python3 src/baselines/format_coco.py \
#     --config config_default/coco/baselines/format.json
# After this process, all the files from beauty have been standardized into the format needed by RecBole. 
# We follow the same process for the other datasets: 

echo "------------- 2 : Run the baselines -------------"
# To run a baseline on Beauty, choose a yaml config file in config_default/beauty/baselines and run the following:
DATASET_NAMES=("beauty" "cds" "cellphones" "clothing")

# DATASET_NAME=beauty
for DATASET_NAME in "${DATASET_NAMES[@]}"; do
    python3 src/baselines/baseline.py \
        --config config_default/${DATASET_NAME}/baselines/Pop.yaml
    python3 src/baselines/baseline.py \
        --config config_default/${DATASET_NAME}/baselines/ItemKNN.yaml
    python3 src/baselines/baseline.py \
        --config config_default/${DATASET_NAME}/baselines/BPR.yaml
    python3 src/baselines/baseline.py \
        --config config_default/${DATASET_NAME}/baselines/NeuMF.yaml
    python3 src/baselines/baseline.py \
        --config config_default/${DATASET_NAME}/baselines/CFKG.yaml
    python3 src/baselines/baseline.py \
        --config config_default/${DATASET_NAME}/baselines/KGCN.yaml
    python3 src/baselines/baseline.py \
        --config config_default/${DATASET_NAME}/baselines/MKR.yaml
    python3 src/baselines/baseline.py \
        --config config_default/${DATASET_NAME}/baselines/SpectralCF.yaml
done
# This example runs the Pop baseline on the Beauty dataset.
# You can ignore the warning "command line args [--config config_default/baselines/Pop.yaml] will not be used in RecBole". The argument is used properly.

Citation

Todsavad Tangtortan, Pranisaa Charnparttaravanit, Akraradet Sinsamersuk, Chaklam Silpasuwanchai. 2024. Adapting Graph Reasoning for Explainable Cold Start Recommendation on Multi-Round Conversation Recommendation (AGRMCR).

References

[1] Yongfeng Zhang, Qingyao Ai, Xu Chen, and W. Bruce Croft. 2017. Joint Representation Learning for Top-N Recommendation with Heterogeneous Information Sources. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 1449–1458. https://doi.org/10.1145/3132847.3132892

[2] Yukuo Cen, Jianwei Zhang, Xu Zou, Chang Zhou, Hongxia Yang, and Jie Tang. 2020. Controllable Multi-Interest Framework for Recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '20). Association for Computing Machinery, New York, NY, USA, 2942–2951. https://doi.org/10.1145/3394486.3403344

[3] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'13). Curran Associates Inc., Red Hook, NY, USA, 2787–2795.

[4] Yang Deng, Yaliang Li, Fei Sun, Bolin Ding, and Wai Lam. 2021. Unified Conversational Recommendation Policy Learning via Graph-based Reinforcement Learning. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21). Association for Computing Machinery, New York, NY, USA, 1431–1441. https://doi.org/10.1145/3404835.3462913

[5] Yikun Xian, Zuohui Fu, S. Muthukrishnan, Gerard de Melo, and Yongfeng Zhang. 2019. Reinforcement Knowledge Graph Reasoning for Explainable Recommendation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'19). Association for Computing Machinery, New York, NY, USA, 285–294. https://doi.org/10.1145/3331184.3331203

[6] Jibril Frej, Neel Shah, Marta Knezevic, Tanya Nazaretsky, and Tanja Käser. 2024. Finding Paths for Explainable MOOC Recommendation: A Learner Perspective. In Proceedings of the 14th Learning Analytics and Knowledge Conference (LAK '24). Association for Computing Machinery, New York, NY, USA, 426–437. https://doi.org/10.1145/3636555.3636898

[7] Jibril Frej, Marta Knezevic, Tanja Kaser. "Graph Reasoning for Explainable Cold Start Recommendation." arXiv preprint arXiv:2406.07420, 2024.

Files

README.md

Latest commit

History

README.md

File metadata and controls

AGRMCR - Adapting Graph Reasoning for Explainable Cold Start Recommendation on Multi-Round Conversation Recommendation

Environment Setup

Data Preparation

Summary statistics of datasets.

Entity Statistics for E-commerce Datasets

Relation Statistics for E-commerce Datasets

Entities and Relations

How to run the code

JRL - Preprocessing dataset

STEP 1 : Index datasets (Entity)

Generated Files:

STEP 2 : Split datasets for training and test

STEP 3 : Extract gzip to txt

STEP 4 : Matching Relations

Generated Files:

STEP 5 : Clone preprocessed dataset to Path Reasoning

Transitional Embedding (TranSE)

UNICORN - Multi-round Conversation Recommendation (MCR)

STEP 1 : Training RL Agent RL_model.py

STEP 2 : Evaluation RL Agentevaluate.py

STEP 3 : Inferencce User perference evaluate.py

GRECS - Graph Reasoning (GR)

STEP 1 : Preprocessing preprocess/domain.py

Generated Files:

STEP 2 : Make dataset make_dataset.py

Generated Files:

STEP 3 : Transitional Embedding (TransE) [3] train_transe_model.py

Generated Files:

STEP 4 : Clone transE embedding to Multi-round conversation

STEP 5 : Train RL agent train_agent.py

Generated Files:

STEP 6 : Evaluation RL agent test_agent.py

Generated Files:

Run the baselines

Citation

References

STEP 1 : Training RL Agent `RL_model.py`

STEP 2 : Evaluation RL Agent`evaluate.py`

STEP 3 : Inferencce User perference `evaluate.py`

STEP 1 : Preprocessing `preprocess/domain.py`

STEP 2 : Make dataset `make_dataset.py`

STEP 3 : Transitional Embedding (TransE) [3] `train_transe_model.py`

STEP 5 : Train RL agent `train_agent.py`

STEP 6 : Evaluation RL agent `test_agent.py`