Skip to content

Latest commit

 

History

History
12 lines (10 loc) · 599 Bytes

README.md

File metadata and controls

12 lines (10 loc) · 599 Bytes

Text-Preprocess

Text preprocessing pipeline for my graduation project. Pipeline includes sentence boundary detection, sentence tokenizer, stemmer, disambugiator and POS TAG. This pipeline uses Turkish NLP library zemberek-nlp by Ahmet A. Akın and Turkish Deasciifier for Java by Ahmet Alp Balkan.

Dataset

Type Number of Reviews
Positive 220,284
Negative 14,881

Requirements

  • JAVA 8
  • Maven