In a Vector Space of La Mancha, whose Position I Do Wish to Recall... 🍇

This repository presents a comprehensive evaluation of word embeddings in Spanish with paper and code. We trained word vectors over La Solana using the library fasttext. You can visualize them following this link to Tensorflow Embeddings Projector. Try searching for 'la_casota', 'ofrecimiento' or 'moje'. Do you have any idea what these words mean in La Solana?

Summary and links for lasolana-embeddings:

Corpus	Size	Algorithm	Vectors	Vec-Dim
Collected corpus	4.1M	FastText	29682	150

La Solana FastText embeddings

Links to the embeddings (#dimensions=150, #vectors=29,682):

Vector format (.vec) (34.5 MB)
Binary format (.bin) (34.5 MB)
TSV format (.tsv) (49.5 MB)

Corpus

All digitized corpus about La Solana that we have access to:

Gaceta de La Solana until december 2023
Noticias de La Solana
Julián Simón's blog
García Gallego, José María. El legado Bustillo de La Solana: su fundación, visicitudes, intervención de Costa, asesinato del cura Torrijos, situación actual. (1935)

Corpus Size: over 4 million words Preprocessing: Explained in training_eval notebook.

Algorithm

Implementation: FastText with Skipgram and no sub-words

Hyperparameters

min subword-ngram = 0
max subword-ngram = 0
neg = 5
ws = 5
epochs = 5
dim = 150
all other parameters set as default

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

In a Vector Space of La Mancha, whose Position I Do Wish to Recall... 🍇

La Solana FastText embeddings

Corpus

Algorithm

Hyperparameters

Files

README.md

Latest commit

History

README.md

File metadata and controls

In a Vector Space of La Mancha, whose Position I Do Wish to Recall... 🍇

La Solana FastText embeddings

Corpus

Algorithm

Hyperparameters