Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let's turn multilingual! #18

Open
diegopaucarv opened this issue Jun 16, 2022 · 1 comment
Open

Let's turn multilingual! #18

diegopaucarv opened this issue Jun 16, 2022 · 1 comment

Comments

@diegopaucarv
Copy link

diegopaucarv commented Jun 16, 2022

Okay, so, I have decided that I want to use this amazing IMPRESSIVE work of art for predicting sentiments in spanish. I noticed that the SentimentAnalysis tool runs on Spacy and uses a English news training dataset. I just can't find the folder were spacy or the dataset are loaded. Is changing these parameters enough? Any further ideas?

EDIT: I forked the code, edited dataset.py and made a simple way of requiring a different spacy model from the coder's input. I changed the order in FXBaseModel to require RoBERTa before the XLNET model. I plan to make new training datasets (but i just don't know how yet).

@egilron
Copy link

egilron commented Oct 28, 2022

I too want to see if I can adapt it, to Norwegian. Here are my thoughts:
SpaCy is used with neuralcoref. I once tried to adapt that framework to Norwegian but gave up. Maybe you can find an OK coreference resolver for Spanish, outside of neuralcoref. Or maybe you will do better than me on adapting that to another language.
For the dataset, I saw that the recent Semeval task on SSA, which includes TSC, or TSA- Targeted Sentiment Analysis as I have been calling it, have also Spanish data.
(OpeNER) (Agerri et al., 2013).
If you have annotated TSC-data with individual in-sentence sentiment targets, and put a layer of coreference resolution on top, then you are getting closer to a solution for the sentiment conveyed by a document towards an entity. Two problems with previous TSC annotations mentioned in the NewsMTSC are indirect sentiment and choice of words. I think indirect sentiment would be annotated OK, while choice of words is probably mostly overlooked. These are just thoughts, based on working with TSC annotations in general.
I recently studied the difference between sentence-based TSA and document-level sentiment towards entities for Norwegian professional reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants