Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use modified version of WordNet #7

Open
mrmechko opened this issue Nov 25, 2019 · 2 comments
Open

Use modified version of WordNet #7

mrmechko opened this issue Nov 25, 2019 · 2 comments

Comments

@mrmechko
Copy link

If I wanted to use my own modified version of WordNet, or perhaps a different hierarchy with this system, where would I start?

I notice that the java code uses JWI, but I'm trying to figure out if the core system actually needs the full wordnet hierarchy or just the tags.

@loic-vial
Copy link
Contributor

Hi !

So, it's true that we currently rely a lot on the WordNet hierarchy. If you want to use another sense inventory, here are some tips:

  • No need to change the Python code, all things WordNet-related are located in Java files.
  • Four Java classes need to be changed: NeuralWSDPrepare (the main) and NeuralDataPreparator are in charge of preparing the training data and configuring the neural network, and NeuralWSDDecode and NeuralDisambiguator are in charge of using a trained neural network to decode new text.
  • Track the usage of classes WordnetHelper and WordnetUtils, they are in charge of everything WordNet-related.
  • In general, we use WordNet for the following: list the possible senses for a word considering its lemma (so the neural network will predict a sense among these possibilities only), converting senses to synsets and/or to compressed synset (as in our article).

I know that it would be great to have a clear interface, to use any sense inventory, and it's not too difficult, but I don't have the time to do the changes right now, however it's planned for 2020 (after I finish my PhD actually ^^).
If you want to work on it, I would be glad to take pull requests :) I think the best way to achieve this would be to replace all "WordNetStuff" by a generic "SenseInventoryStuff", so the code stays globally the same, and we will then provide different implementation of the SenseInventory.

@mrmechko
Copy link
Author

Hi, fellow PhD student here, hoping to finish in 2020 too.

I decided to sidestep the issue for now by using sense compression. The TRIPS ontology has mappings from WordNet, so I'm just replacing the hypernym compression algorithm with TRIPS compression. That does violate the invariant that was described in the paper (that no compression should result in losing a unique wordsense) but it seems to be working pretty well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants