class iWord2Vec(c=5, e=64, epochs=1, source=None, destination=None, seed=15)
This class implements a iWord2Vec model.
-
c : int, optional (default=5)
The size of the context window.
-
e : int, optional (default=64)
The size of the word embeddings.
-
epochs : int, optional (default=1)
The number of training epochs.
-
source : str or None, optional (default=None)
The source file to load a pre-trained model from.
-
destination : str or None, optional (default=None)
The destination file to save the trained model.
-
seed : int, optional (default=15)
The random seed for reproducibility.
train(corpus, save=False)
Train the iWord2Vec model on the given corpus.
-
corpus : list of list of str A list of sentences where each sentence is a list of words.
-
save : bool, optional (default=False) Whether to save the trained model.
load_model():
Load a pre-trained iWord2Vec model from a file.
get_embeddings(ips=None, emb_path=None)
Get word embeddings for specific words or all words.
-
ips : list of str or None, optional (default=None) A list of words to retrieve embeddings for. If None, retrieves embeddings for all words.
-
emb_path : str or None, optional (default=None) The file path to save the embeddings as a CSV file.
- embeddings : pd.DataFrame
A DataFrame containing word embeddings.
update(corpus, save=False)
Update the iWord2Vec model with additional training on a new corpus.
-
corpus : list of list of str A list of sentences where each sentence is a list of words.
-
save : bool, optional (default=False) Whether to save the updated model.
del_embeddings(to_drop, mname=None)
Delete word embeddings for specific words.
-
to_drop : list of str A list of words to delete from the embeddings.
-
mname : str or None, optional (default=None) The destination file to save the model after removing embeddings.