Skip to content

dremeika/cug-pageclass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Page Function Classifier for Vilnius Clojure Dojo

Code ready for modification to increase page classifier accuracy and precision.

Usage

Unzip labeled data

unzip dataset.zip

Generate page features

lein generate-features features.tsv

Cross-validate classifier

lein cross-validate features.tsv

Train classifier

lein train features.tsv

Evaluate classifier (optional, do not use data used for training)

lein evaluate test.tsv

Test classifier with real page

lein classify "http://..."

Add page to dataset (optional)

lein add-page "L" "http://..."

Used Tools

Main points of modification

  • Use different classifier: src/pageclass/train.clj Available ones extend AbstractClassifier.
  • Generate meaningful features: src/pageclass/features.clj

Page labels

  • A - Article
  • D - Discussion, Forum
  • F - Form
  • H - Home page
  • L - Listing
  • I - Single item or product page in e-shop
  • M - Media
  • Z - Contacts page
  • X - Unknown

About

Webpage function classifier for Vilnius Clojure Dojo

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published