Skip to content

yinzhang809/zhihucrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Function

  • Crawl comments under zhihu answer, and form one round conversation , the answers are from collution"赞同超过1000的回答
  • Assign the emotion to each sentence in QA pairs using NLPCC emotion labeled data of weibo

Sample Data

the data is stored in the json format. it is a dictionary list, contains three field: "post" ,"res" for response, and the conversation id.

Requirement

  • zhihu_outh
  • torch
  • pandas
  • jieba
  • numpy

Usage

  • fill in you zhihu account in login.py and generate token.pkl.
  • run crawler.py to generate the conversation and run preprocessing.py to clean and format data.
  • download NLPCC 2017 weibo data and place it in ./data folder.
  • run Classify/cnn_pytorch.py to generate the emtion label for the sentences.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages