Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is sentence embeddings calculated from the corpus using Sent2vec command? #86

Open
tqx94 opened this issue Sep 12, 2019 · 0 comments

Comments

@tqx94
Copy link

tqx94 commented Sep 12, 2019

Hi,

when using the sent2vec command, a model will be produced through the cbow model.
According to the paper, sent2vec will average the words vectors based on the weights learned in the training of corpus phase.
But how does cbow initialise and update the weights, and what are the n grams used?
For instance, when training a wikipedia corpus, what goes under the hood to calculate the different weights and dimensions for the sentence- 'I ate my breakfast in the morning'?
What are the unigrams and bigrams involved here to be averaged? how are the initialisation of weights done? what is the target/source word in the sentence above? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant