Skip to content

Releases: finalfusion/finalfusion-python

Finalfusion in Python

05 Jun 07:30
Compare
Choose a tag to compare

This release marks a major change to finalfusion-python: the entire package has been rewritten in Python and is no longer a wrapper around finalfusion-rust.

The API is now almost on par with finalfusion-rust and in some places even goes beyond that.

  • Vocab, Storage, Metadata and Norms are now accessible as properties on Embeddings
  • Any of the chunks above can be loaded by themselves from a finalfusion file
  • All chunks can be constructed from within Python
    • It's possible to add, remove or change embeddings
  • Storage types integrate directly with numpy arrays
  • Reading and writing to all common Embedding formats (word2vec, GloVe, fastText) is supported
  • The API for vocabularies and subword indexers has been made mor ergonomic:
    • vocab words and the word -> index mapping are accessible as properties
    • SubwordVocabs expose the subword indexer through vocab.subword_indexer

In addition to the overhauled API, finalfusion-python now comes with executables:

  • ffp-convert to convert between embedding formats
  • ffp-similar and ffp-analogy for similarity and analogy queries
  • ffp-bucket-to-explicit to convert from bucket subword to explicit subword embeddings

Check out the documentation at https://finalfusion-python.readthedocs.io for more information!

0.6.2

08 Mar 12:47
Compare
Choose a tag to compare
Bump version to 0.6.2

0.6.1

18 Nov 11:49
Compare
Choose a tag to compare
Bump the version to 0.6.1

0.6.0

15 Nov 18:29
Compare
Choose a tag to compare
Bump the version to 0.6.0

Support for fastText, word2vec, and text embeddings

10 Sep 09:00
Compare
Choose a tag to compare

The largest change is this release is support for reading fastText, word2vec, and text embeddings, in addition to finalfusion embeddings.

  • Add support for reading fastText (Embeddings.read_fasttext()), text (Embeddings.read_text()), textdims (Embeddings.read_text()), and word2vec (Embeddings.read_fasttext()) formats.
  • Each of these newly-supported formats provides a keyword argument lossy. If set, the embeddings will be read lossily, permitting invalid UTF-8 in words.
  • Add the embedding_similarity method, which looks up words that are similar to a given embedding. The method for traditional word-based lookups has been renamed from similarity to word_similarity.
  • Iteration over embeddings returned tuples (word, embedding) in previous releases. Now instances of the Embedding class are returned, which provide word, embedding, and norm properties. norm is the embedding norm before normalization of an embedding using its l2 norm.
  • Add support for memory mapping quantized embedding matrices.
  • Add the ngram_indices and subword_indices to the Vocab class. These methods return the subword indices for a given word, which can be used to retrieve the subword embeddings individually. The ngram_indices methods returns each subword with its index, whereas subword_indices only returns the indices.
  • Update to pyo3 0.8.

travis-0.5.0-rebuild

10 Sep 11:39
Compare
Choose a tag to compare
CI: Fix crate name in Travis-CI builds

0.4.0

24 Jul 17:10
Compare
Choose a tag to compare
Bump version to 0.4.0

0.3.1

14 Jun 11:12
Compare
Choose a tag to compare
Bump version to 0.3.1

New convenience methods

12 Jun 07:14
Compare
Choose a tag to compare

This release has the following changes:

  • Add the matrix_copy method to get a numpy array copy of the embedding matrix.
  • Add the vocab method to get a Vocab instance, which provides the item_to_indices method to get the indices or subword indices of a word. Vocab also provides indexing to look up the word corresponding to an index (e.g. vocab[3823]).
  • Upgrade to finalfusion 0.6.

Switch to numpy arrays

24 Apr 11:06
Compare
Choose a tag to compare
  • Return numpy arrays rather than Python lists.
  • Update to pyo3 0.6.
  • Switch from rust2vec to the finalfusion crate.