elephant-sense

Content itself quality evaluation by machine learning

Setup

Get Qiita API token and set it to environment variable.

$ export QiitaToken=xxx

(only read_qiita scope is required)

Then use Dockerfile and run!

For Training the Model

Data Preparation

Locate the Qiita posts on data/raw/items
- You can get Qiita posts by Qiita API
- 1 post is 1 json file whose name is post id (like 0a0000aa0a0000a00aa0.json).
Locate the annotated file labeled_qiita_posts.csv on data/raw.
- It's format is No,url,Title, and annotator1, annotator2... (column names are as you like ).

Data Preprocessing

Run the following script.

python scripts/data/make_data.py

Then, labeled json file is stored at data/processed/items.

Next, execute preprocessing.

python scripts/data/preprocessing.py

posts.json will be created at data/processed/.
posts.json includes splited tokens of each posts. You can use this to get the words in the posts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

elephant-sense

Setup

For Training the Model

Data Preparation

Data Preprocessing

Files

README.md

Latest commit

History

README.md

File metadata and controls

elephant-sense

Setup

For Training the Model

Data Preparation

Data Preprocessing