Skip to content

Latest commit

 

History

History
97 lines (67 loc) · 2.21 KB

nlp.md

File metadata and controls

97 lines (67 loc) · 2.21 KB

src.models.nlp.iWord2Vec

class iWord2Vec(c=5, e=64, epochs=1, source=None, destination=None, seed=15)

source

This class implements a iWord2Vec model.

Parameters

  • c : int, optional (default=5)

    The size of the context window.

  • e : int, optional (default=64)

    The size of the word embeddings.

  • epochs : int, optional (default=1)

    The number of training epochs.

  • source : str or None, optional (default=None)

    The source file to load a pre-trained model from.

  • destination : str or None, optional (default=None)

    The destination file to save the trained model.

  • seed : int, optional (default=15)

    The random seed for reproducibility.

Methods

train(corpus, save=False)

Train the iWord2Vec model on the given corpus.

Parameters

  • corpus : list of list of str A list of sentences where each sentence is a list of words.

  • save : bool, optional (default=False) Whether to save the trained model.


load_model():

Load a pre-trained iWord2Vec model from a file.


get_embeddings(ips=None, emb_path=None)

Get word embeddings for specific words or all words.

Parameters

  • ips : list of str or None, optional (default=None) A list of words to retrieve embeddings for. If None, retrieves embeddings for all words.

  • emb_path : str or None, optional (default=None) The file path to save the embeddings as a CSV file.

Returns

  • embeddings : pd.DataFrame

A DataFrame containing word embeddings.


update(corpus, save=False)

Update the iWord2Vec model with additional training on a new corpus.

Parameters

  • corpus : list of list of str A list of sentences where each sentence is a list of words.

  • save : bool, optional (default=False) Whether to save the updated model.


del_embeddings(to_drop, mname=None)

Delete word embeddings for specific words.

Parameters

  • to_drop : list of str A list of words to delete from the embeddings.

  • mname : str or None, optional (default=None) The destination file to save the model after removing embeddings.