Skip to content

tobyberster/BSP-WSD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Binary Spatter Code - Word Sense Disambiguation (BSP-WSD)

Java tool that utilizes the Semantic Vectors Package to disambiguate word senses from the NLM Disambiguation data set.

What does it do?

Performs word sense disambiguation on a set of 50 words stored in a lucene index.

You may change the vector size and the context around the ambiguous term in the code to change the behavior of the disambiguation algorithm.

Warning: This code is not yet optimized and was created only for research purposes. If you do find anything wrong with it, please let me know so I can update it accordingly.

Requirements

Data

  • NLM - WSD data set: Included as lucene index in tarball format in the data folder. Untar and use.
  • Stopword List: Included as *.txt file in the data folder.

Usage

java BuildDisambiguationIndex PATH_TO_DISAMBIGUATION_INDEX

Dont forget to set the memory limit quite high, or else it will crash right away. For my purposes I usually set it to the following: -Xmx16G

Tests

No unit tests have been created for this project. Three separate researchers looked over the code and found no logical issues within the code. Please feel free to make changes and share them with the community.

Author

Toby Berster (@wurzelgogerer) toby.berster@gmail.com

License

Copyright (c) 2012 Toby Berster

Licensed under the New BSD License.

About

Binary Spatter Code - Word Sense Disambiguation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages