❗❗ This repo will no longer be maintained, please visit https://github.com/milvus-io/bootcamp ❗ ❗
This project combines milvus and bert to implement an item-based text recommendation system.
In this project, we selected a public data set from ArXiv. We have downloaded more than 3 million pieces of data. The data set is a metadata file in json format. This file contains entries for each paper:
id
: ArXiv ID (can be used to access the paper, see below)submitter
: Who submitted the paperauthors
: Authors of the papertitle
: Title of the papercomments
: Additional info, such as number of pages and figuresjournal-ref
: Information about the journal the paper was published indoi
: Digital object identifierabstract
: The abstract of the papercategories
: Categories / tags in the ArXiv systemversions
: A version history
You can access each paper directly on ArXiv using these links:
https://arxiv.org/abs/{id}
: Page for this paper including its abstract and further linkshttps://arxiv.org/pdf/{id}
: Direct link to download the PDF
Download Data
To download the original data, please refer to arxiv-public-datasets.
In this project, only the part of Article metadata in the project arxiv-public-datasets is downloaded.
This project contains two parts, service and webclient.
service provides the code of the back-end service. webclient provides scripts for the front-end interface.
The configuration file config.py in service explains:
Parameter | Description | Default |
---|---|---|
MILVUS_HOST | Milvus service ip | 127.0.0.1 |
MILVUS_PORT | Milvus service port | 19530 |
BERT_HOST | Bert service ip | 127.0.0.1 |
BERT_PORT | Bert service port | 5555 |
MYSQL_HOST | MySql service ip | 127.0.0.1 |
MYSQL_PORT | MySql service port | 3306 |
MYSQL_USER | MySql user name | root |
MYSQL_PASSWORD | MySql password | 123456 |
MYSQL_DATABASE | MySql database name | mysql |
TABLE_NAME | Default table name | recommend |
batch_size | Batch data size | 10000 |
temp_file_path | Temporary data text | temp.csv |
categories_num | Number of categories displayed on the homepage | 50 |
texts_num | Number of texts displayed in each category | 100 |
collection_param | Parameters of the Milvus collection | default |
search_param | Parameters of Milvus search | 16 |
top_k | Number of recommended texts | 10 |
-
Install Milvus0.10.4。
-
Install MySql.
-
Clone project
git clone https://github.com/milvus-io/bootcamp.git
cd bootcanp/solution/item_based_recommend
- Installation dependencies
pip3 install -r requirement.txt
- Start the Bert service
#Download model
mkdir model
cd model
wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
unzip uncased_L-12_H-768_A-12.zip
#start service
bert-serving-start -model_dir uncased_L-12_H-768_A-12 -num_worker=12
- Import data
python load.py -p ../data/test.json
- Start service
cd service
uvicorn main:app
you can access http://127.0.0.1:8000/docs to learn about the interface provided by the service
- Start the client
docker run -d -p 9999:80 -e API_URL=http://127.0.0.1:8000 tumao/paper-recommend-demo:latest
show categorie
show papers
Similar articles