Text processing with OpenNLP.
Assignment report here.
- Detect sentences with
SentenceDetectorME
- Extract tokens with
SimpleTokenizer
,WhitespaceTokenizer
,TokenizerME
and compare their performance - Detect parts-of-speech with
POSTaggerME
- Find named entities with the generic entity finder
NameFinderME
, inititalized with persons (en-ner-person.bin
), locations (en-ner-location.bin
), money/currencies (en-ner-money.bin
), and percentages (en-ner-percentage.bin
).
Document classification using Weka.
Assignment report here (the README file).
- Create ARFF train and test file from plain text file (already tokenized and stemmed)
- Use Weka's
StringToWordVector
to create word vectors andFilteredClassifier
to split into train and test datasets - Use Weka's
AttributeSelection
to select attributes (words) from the text, to fine-tune the classifiers - Compare the
NaiveBayesMultinomial
with theLibSVM
classifiers
Sentiment analysis with TextBlob.
Assignment report here.
Compare the performance of PatternAnalyzer
and NaiveBayesAnalyzer
in sentiment analysis of restaurant reviews.
TensforFlow introduction and applications for natural language processing (NLP).
Introduction here and slide deck used for presentation here.