Skip to content

Key words

Le Thu Nguyen edited this page Apr 20, 2018 · 4 revisions

Previous

Term frequency

The number of times that a word or term occurs in a document

Stemming

Describe

  • Chop off the ends of the words
  • Reduce inflectional forms of words
  • Decrease the size of the vocabulary

Examples

"automation, automatic, automates"→automat

Porter's algorithm:

  • ssess →ss
  • ies→i
  • ational→ate
  • tional→tion

Positive

Recall for queries

Negative

Precision harm

Lemmatization

Describe

Transform to standard form according to syntactic category

Example

  • verb + ing → verb
  • noun + s → noun
  • am, are, is →be
  • car, cars, car's, cars' → car
  • The boy’s cars are different colors → lemmatization → the boy car be different color

Stop words

Common words which would appear to be of little value in helping select documents that are excluded from the index vocabulary.

They are function words without much information such as propositions, articles, pronouns, adverbs, adjectives, frequent words (of, in, about, which, although, and so on). They are not added to the index.

For example: '.'

Previous