MSc-Y1-S1-W10-Thu-Lang-Eng-Python-12-2h-Lecture | Summary attempt
NLTK
Tokenisation
To find page:
Step 1:
Step 2: click through to lexical analysis, which links to the relevant section of that page:
- "A lexical token is a string with an assigned and thus identified meaning, in contrast to the probabilistic token used in large language models." Lexical analysis > Lexical token and lexical tokenization | Wikipedia
Stemming - "reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form." Stemming | Wikipedia
Lemmatization - "the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form." Lemmatization | Wikipedia
Language Engineering Module