Skip to content

A tool to assess semantic similarity between English words

Notifications You must be signed in to change notification settings

zoobereq/semantic_similarities

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Semantic Word Similarity

Motivation

Computing word similarity is a fundamental problem in NLP and used in many applications such as plagiarism detection, question answering, and surverying diachronic language change.

Method

The program implements and evaluates several methods of computing semantic word similarity:

  • WordNet shortest-path similarity
  • Wu-Palmer WordNet semantic depth similarity
  • Word embeddings cosine similarity

Code

The program first computes semantic similarity between the following six word pairs:

  • jaguar : cat
  • jaguar : car
  • king : queen
  • king : rook
  • tiger : zoo
  • tiger : cat

WordNet-based similarity scores are computed by selecting a pair of senses that yields the highest similarity score for both shortest-path and Wu-Palmer algorithms. The cosine similarity is computed for dense high-dimensional vector representations derived from GloVe Wiki Gigaword 50. Users are free to implement different word embedding models.

The resulting similarity scores are then compared against human ratings, extracted from the WordSimilarity-353 Test Collection. Here again, users are free to implement their own baseline.

Evaluation

The correlation between machine and human scores is expressed with the Spearman Correlation metric, first for the above-referenced six word pairs, and subsequently for 203 word pairs extracted from the WordSimilarity-353 Test Collection.