Skip to content

Word Embeddings for the town of La Solana (Ciudad Real)

License

Notifications You must be signed in to change notification settings

mariagabv/lasolana-embeddings

Repository files navigation

In a Vector Space of La Mancha, whose Position I Do Wish to Recall... 🍇

espacio

This repository presents a comprehensive evaluation of word embeddings in Spanish with paper and code. We trained word vectors over La Solana using the library fasttext. You can visualize them following this link to Tensorflow Embeddings Projector. Try searching for 'la_casota', 'ofrecimiento' or 'moje'. Do you have any idea what these words mean in La Solana?

Summary and links for lasolana-embeddings:

Corpus Size Algorithm Vectors Vec-Dim
Collected corpus 4.1M FastText 29682 150

La Solana FastText embeddings

Links to the embeddings (#dimensions=150, #vectors=29,682):

Corpus

All digitized corpus about La Solana that we have access to:

Corpus Size: over 4 million words Preprocessing: Explained in training_eval notebook.

Algorithm

Implementation: FastText with Skipgram and no sub-words

Hyperparameters

  • min subword-ngram = 0
  • max subword-ngram = 0
  • neg = 5
  • ws = 5
  • epochs = 5
  • dim = 150
  • all other parameters set as default

About

Word Embeddings for the town of La Solana (Ciudad Real)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published