Determining the Semantic Similarity (SS) between word pairs is an important component in several research fields such as artificial intelligence, information retrieval, natural language processing and biomedical domain. The majority of SS measures are assessed using the lexical database WordNet.
The API WNetSS (WordNet Semantic Similarity) allows the reproducibility of a wide range of SS measures pertaining to different categories including taxonomic-based, features-based and IC-based measures. This API allows the extraction of the topological parameters from the WordNet “is a” taxonomy which are used to express the semantics of concepts. Also, we give the different ways in expressing the topological parameters depth and the hyponyms’ subgraph. Moreover, an evaluation module is proposed to assess the measures accuracy that can be evaluated and compared according to several widely-used benchmarks through the correlations coefficients.
WNetSS API can be dowloaded: link
The IC-based similarity measure was first introduced by Resnik. The basic idea of IC is that general and abstract entities found in a discourse present less IC than more concrete and specialized ones. This principle is inspired from the work of Shannon. The more probable a concept appears, the less information it conveys. The concept has then been modified and extended by several authors to include other methods. Although they commonly rely on IC values assigned to the concepts in the ontology. IC-based measures are based on couples (IC computing method, IC measure). Concerning the computing IC methods, they follow two strategies: statistical corpora analysis and exploiting only the topological parameters of “is a” taxonomy known as intrinsic computing method.
The instructions that must be followed are:
- install the MySQL
- install the English WordNet (2.1 and/or 3.0)
- copy the file "file_properties.xml", that contains the configuration for accessing to the WordNet data, in your work folder.
- Treating the wordNet "is a" taxonomy (verbal or nominal) as it is indicated in Example0.java (for creating the wordnet database) and Example1.java (for extracting the paramters)
- Exploiting the semantic similarity measures such as presented in the provided examples.
10 Exmaples using the WNetSS API are provided for helping the developers.
- Example0: Creating data base and loading WordNet data.
- Example1: Extracting Topological Parameters of Nominal WordNet "is a" taxonomy.
- Example2: Wordnet Semantic Similarity Taxonomic Measures.
- Example3: Wordnet Semantic Similarity Information Content Approach.
- Example4: Wordnet Semantic Similarity Features Approach.
- Example5: Studying the accuracy of semantic measures through the nominal benchmarks.
- Example6: Extracting Topological Parameters of Verbal WordNet "is a" taxonomy.
- Example7: Studying the accuracy of semantic measures through the verbal benchmarks.
- Example8: Wordnet "is a" taxonomy - Topological paramters.
- Example9: Wordnet "is a" taxonomy - Length Shortest Path Similarity Measure.
For downloading all Examples follow this link : download examples
Mohamed Ben Aouicha, Mohamed Ali Hadj Taieb, Abdelmajid Ben Hamadou: SISR: System for integrating semantic relatedness and similarity measures. Soft Comput. 22(6): 1855-1879 (2018) LINK