Releases: elixir-nx/tokenizers
v0.5.1
v0.5.0
Release v0.5.0
v0.4.0
Added
-
Support for training a tokenizer from scratch. See
Tokenizers.Tokenizer.train_from_files/3
andTokenizers.Model
for available models. -
Support for changing tokenizer configuration, such as
Tokenizers.Tokenizer.set_padding/2
andTokenizers.Tokenizer.set_truncation/2
. See the "Configuration" functions group in
Tokenizers.Tokenizer
. -
Support for apply multiple encoding transformations without additional data copies,
seeTokenizers.Encoding.Transformation
. Transformations can be passed to
Tokenizers.Tokenizer.encode/3
via:encoding_transformations
or applied via
Tokenizers.Encoding.transform/2
.
Changed
-
(Breaking)
Tokenizers.Tokenizer.encode/3
no longer accepts a batch of inputs,
to encode a batch useTokenizers.Tokenizer.encode_batch/3
instead -
(Breaking)
Tokenizers.Tokenizer.decode/3
no longer accepts a batch of inputs,
to encode a batch useTokenizers.Tokenizer.decode_batch/3
instead
Full Changelog: v0.3.2...v0.4.0
Checksums
SHA256 list:
2d04e0b62b8d23b515ad33bf8a6fec4c8a01aa87f79516fee1b57ac39e96ec20 ex_tokenizers-v0.4.0-nif-2.15-x86_64-pc-windows-gnu.dll.tar.gz
dee3196c4908b8f56cb3080af38a7ed955873c685a2d8cbddde8fbc96e466220 ex_tokenizers-v0.4.0-nif-2.15-x86_64-pc-windows-msvc.dll.tar.gz
3615b939766fe439ff0d3fa939ad37b6a6059a8347f06a05657185eb2e0e5b51 ex_tokenizers-v0.4.0-nif-2.16-x86_64-pc-windows-gnu.dll.tar.gz
310c032a19d088520bd4d4644ae9f19039099a2d4143523771a7a5571f2f7117 ex_tokenizers-v0.4.0-nif-2.16-x86_64-pc-windows-msvc.dll.tar.gz
bbf8f8324804c40346cb5cd7a5addc31a931fc710ca4b42e01dd07d23b8eca10 libex_tokenizers-v0.4.0-nif-2.15-aarch64-apple-darwin.so.tar.gz
3bbf7a63ed4bda9a4390b5acba2d60e2ebd744f1d3f1d754f8ec4cc4f5eca6ff libex_tokenizers-v0.4.0-nif-2.15-aarch64-unknown-linux-gnu.so.tar.gz
7dd6b547c95482518a15fb2a4bacdd751537a75ba7d8285ff5145904d55c4449 libex_tokenizers-v0.4.0-nif-2.15-aarch64-unknown-linux-musl.so.tar.gz
e1014519eb978cbc20649bb122a6c1f834629579ea0650a79bf8e816f58da6dd libex_tokenizers-v0.4.0-nif-2.15-arm-unknown-linux-gnueabihf.so.tar.gz
a6456c8719c5b5914068378198a43d080a87221dd37ff2ccf1da05371ff028d7 libex_tokenizers-v0.4.0-nif-2.15-riscv64gc-unknown-linux-gnu.so.tar.gz
9007181e446c00a9113993b43b6522b6b28a098ae1154f3f964ba8fa3cbe0b8c libex_tokenizers-v0.4.0-nif-2.15-x86_64-apple-darwin.so.tar.gz
061cef149b7e91f4556090c1df2672827c4e9d59e30befacd5167e4fd28c7098 libex_tokenizers-v0.4.0-nif-2.15-x86_64-unknown-linux-gnu.so.tar.gz
b2d9e65ccb6aeaaa4bbcf43d034c7af13e257770eaef55a0eb70f58b5e28ebc1 libex_tokenizers-v0.4.0-nif-2.15-x86_64-unknown-linux-musl.so.tar.gz
dfe7755fbad8a3409f3b5258e810220f2b0f179d8569dc7abcc489e65804d26e libex_tokenizers-v0.4.0-nif-2.16-aarch64-apple-darwin.so.tar.gz
3bc9e5e23afaa2ed15680afd8f8e843434de804e8e708a5ca00c6a413a3f6be2 libex_tokenizers-v0.4.0-nif-2.16-aarch64-unknown-linux-gnu.so.tar.gz
987a81d6babb13b2baa9bfb8c602691766a6319548f4f129e0c99d56c6a155ae libex_tokenizers-v0.4.0-nif-2.16-aarch64-unknown-linux-musl.so.tar.gz
b973a15035c495c27aa74e525342b2aff3be66c8f1f75f23776ea5f8dce59ba3 libex_tokenizers-v0.4.0-nif-2.16-arm-unknown-linux-gnueabihf.so.tar.gz
3a1cef66172ab7c82a255933a4d957d21d6b5fb26f6deea968de2598fd477f48 libex_tokenizers-v0.4.0-nif-2.16-riscv64gc-unknown-linux-gnu.so.tar.gz
e4a57f3e7a1dd29933fedab12b7c57d9a9cb1454d8fd6415bd82938d8cf64e3b libex_tokenizers-v0.4.0-nif-2.16-x86_64-apple-darwin.so.tar.gz
6578bcaf43c24c449997354ac0439b0be5e4a27bf18de13e3c33929d514fe129 libex_tokenizers-v0.4.0-nif-2.16-x86_64-unknown-linux-gnu.so.tar.gz
043a3b36e463ea354cae656347b01b1988cae416fa46014c7c39b118fa9b36d0 libex_tokenizers-v0.4.0-nif-2.16-x86_64-unknown-linux-musl.so.tar.gz
v0.3.2
What's Changed
Full Changelog: v0.3.1...v0.3.2
Checksums
SHA256 list:
0beb6a4514b33bd830ffb323b43aaca52adace65c3cb41d0101b9ea7c97fbe92 ex_tokenizers-v0.3.2-nif-2.15-x86_64-pc-windows-gnu.dll.tar.gz
b34c34dba1b1531f88d3e9ab2538b525724cf4e337f34c8a412f4e8e2c662079 ex_tokenizers-v0.3.2-nif-2.15-x86_64-pc-windows-msvc.dll.tar.gz
16b2c9b49d4e07da1c0563b44a491e759dbccb869ad019e43f41743604740f37 ex_tokenizers-v0.3.2-nif-2.16-x86_64-pc-windows-gnu.dll.tar.gz
b9f0a888f930d19022849fc67087fa34888f4e1add4eb6d7927a78b31af0ab33 ex_tokenizers-v0.3.2-nif-2.16-x86_64-pc-windows-msvc.dll.tar.gz
1f24f3c83ff4c80b4ba359c856a5cf399713576da3a7358f322d653c728511c5 libex_tokenizers-v0.3.2-nif-2.15-aarch64-apple-darwin.so.tar.gz
9ac8d641873c7effe2b004c2714769bb1084816b8ca785155ec91c7d8bcdb414 libex_tokenizers-v0.3.2-nif-2.15-aarch64-unknown-linux-gnu.so.tar.gz
2ee6be226626c39a86aa14857256984b0ee04a8ea8d26ea044db381373bf25ab libex_tokenizers-v0.3.2-nif-2.15-aarch64-unknown-linux-musl.so.tar.gz
407cab208bd66b1aa244359daf90d76c06571e17707708fab3324614ca7cddb3 libex_tokenizers-v0.3.2-nif-2.15-arm-unknown-linux-gnueabihf.so.tar.gz
3cbae917a12934828d5c7b2f33a2e96cb49c01bc8d43dd95b1c8173efd615d44 libex_tokenizers-v0.3.2-nif-2.15-riscv64gc-unknown-linux-gnu.so.tar.gz
23246e6f4e40c3b456450c6ea9c8eaf3c2b4854e8e384c3d38dc387e7cb2490a libex_tokenizers-v0.3.2-nif-2.15-x86_64-apple-darwin.so.tar.gz
237573833e93a49bc0ed1ba88bec9eaee91e5090b1db218bfbc0272670fd6502 libex_tokenizers-v0.3.2-nif-2.15-x86_64-unknown-linux-gnu.so.tar.gz
a9d07c4c2b6336dfaeda7ff915d4ba1500ac0fc7e30bfc6a18cf46f1e660e83b libex_tokenizers-v0.3.2-nif-2.15-x86_64-unknown-linux-musl.so.tar.gz
ad4cf83c29299bd045898ddb86f0cdc1e39d7ca70a320d93e0b91179da707251 libex_tokenizers-v0.3.2-nif-2.16-aarch64-apple-darwin.so.tar.gz
c144c7e9bef8427bffac2b2744cebe7ca293454edcad003a6e37ce051e0778b7 libex_tokenizers-v0.3.2-nif-2.16-aarch64-unknown-linux-gnu.so.tar.gz
f8c6a0ffa01f6873649d01f71a17ad771934a3ee36249f11cdfef57ac564b4ee libex_tokenizers-v0.3.2-nif-2.16-aarch64-unknown-linux-musl.so.tar.gz
c3e4156c7b7007807064ecac3392dba696252c1838658f3fbe7c453d943d4c3b libex_tokenizers-v0.3.2-nif-2.16-arm-unknown-linux-gnueabihf.so.tar.gz
21d70308a4906397243f50eaefa2b0da2035047f82e5a973916d4fff7a38cacc libex_tokenizers-v0.3.2-nif-2.16-riscv64gc-unknown-linux-gnu.so.tar.gz
c458f5d75509edcb1822796777a1c679a9dcb45228d55b92436bed81bffe89f7 libex_tokenizers-v0.3.2-nif-2.16-x86_64-apple-darwin.so.tar.gz
6eaf8d44a29bc8674e236e3812c40a9682896e021c7fc5cc50b99535846f8864 libex_tokenizers-v0.3.2-nif-2.16-x86_64-unknown-linux-gnu.so.tar.gz
aad218c6de546475ab51767802357c9d50bbc9b8ff4197006b47e44b01f405b4 libex_tokenizers-v0.3.2-nif-2.16-x86_64-unknown-linux-musl.so.tar.gz
v0.3.1
Added
-
Add binary variants for accessing encoding data. This way we can convert encoding
data to tensors without additional allocations. The following functions were added:get_u32_ids/1
get_u32_attention_mask/1
get_u32_type_ids/1
get_u32_special_tokens_mask/1
Pull requests
- Add binary variants for accessing encoding data by @jonatanklosko in #32
- Use updated GitHub Actions by @philss in #33
Full Changelog: v0.3.0...v0.3.1
Checksums
Here is the SHA256 checksums of each precompiled file:
d27eed0e8395f4065721f970a7a6f109d3f54d1b36c850d0a14844082ab90c8d ex_tokenizers-v0.3.1-nif-2.15-x86_64-pc-windows-gnu.dll.tar.gz
0f5981b8affa087be4384f0fc9de444271556662b289333748e7a4d25f2d00c7 ex_tokenizers-v0.3.1-nif-2.15-x86_64-pc-windows-msvc.dll.tar.gz
d110ebf8a1c7e43487eb9c224dae456e72596618c0c614eb617990a47f203804 ex_tokenizers-v0.3.1-nif-2.16-x86_64-pc-windows-gnu.dll.tar.gz
7d140750c9b927434b179ddf33efd8bb77428f17556ebd8d1d8d07212c19c76a ex_tokenizers-v0.3.1-nif-2.16-x86_64-pc-windows-msvc.dll.tar.gz
6f15a001864f3ad5911c9a75b178ead798ed66c2325fb825588bde3b36ea5c72 libex_tokenizers-v0.3.1-nif-2.15-aarch64-apple-darwin.so.tar.gz
5ba97d7fa56763b5d5235a77bfc26169baabb5aa3c7a04e2583f9b1ec92e84a3 libex_tokenizers-v0.3.1-nif-2.15-aarch64-unknown-linux-gnu.so.tar.gz
c86df80018eef4fd74174bc484401a71a06a50bacdd26478129a51d1750555f4 libex_tokenizers-v0.3.1-nif-2.15-aarch64-unknown-linux-musl.so.tar.gz
e5b7f21328f5c203fa9da3c140f10e34e034cc623f1d6c681193fd5b7724738a libex_tokenizers-v0.3.1-nif-2.15-arm-unknown-linux-gnueabihf.so.tar.gz
9de202cf8ee208ab6f43f7474e8614918fd2a434b725444ec1741ded4a8e78bc libex_tokenizers-v0.3.1-nif-2.15-riscv64gc-unknown-linux-gnu.so.tar.gz
58288eb84284a6c692df5789e3ca238f89a3f7858ddce33095c0a3cee47e2060 libex_tokenizers-v0.3.1-nif-2.15-x86_64-apple-darwin.so.tar.gz
64ce2cda9b2631968ba3aeee86574e62f2e33077cd8aa65137a700d6a03de6de libex_tokenizers-v0.3.1-nif-2.15-x86_64-unknown-linux-gnu.so.tar.gz
32334ac268196e708c79dea0d5ee4c4adbc59716693454fe5d88fb302b5bfd30 libex_tokenizers-v0.3.1-nif-2.15-x86_64-unknown-linux-musl.so.tar.gz
2fefb0ef7cc98438332a8effd7788b258f245f7f2e7e9dfe13c4f61b5ce23e01 libex_tokenizers-v0.3.1-nif-2.16-aarch64-apple-darwin.so.tar.gz
a0761c7e6dc035180ea42fb15d4a6ad957bd33e5ae363a4ef04ed444bc523bb8 libex_tokenizers-v0.3.1-nif-2.16-aarch64-unknown-linux-gnu.so.tar.gz
6eb0686cf9dc77df45dbd284d7b63be4b62798b2dd930b53cc7ca0723da883aa libex_tokenizers-v0.3.1-nif-2.16-aarch64-unknown-linux-musl.so.tar.gz
4aebed75ee0ad016877d8c64f1c841b25d9e473132dc7fa80842b01a3a42b8fc libex_tokenizers-v0.3.1-nif-2.16-arm-unknown-linux-gnueabihf.so.tar.gz
615099e92676e1470f9b88aff8aa00f21fee1c568c80df57c26a934b55d042db libex_tokenizers-v0.3.1-nif-2.16-riscv64gc-unknown-linux-gnu.so.tar.gz
0b3b366e06b87cc1252d90f66af73482d89e2fa4c1a7faad235ac342d83490e3 libex_tokenizers-v0.3.1-nif-2.16-x86_64-apple-darwin.so.tar.gz
b1dd8be7b57f1929ada728b1049e652b2c1ab5c100f766ae53dcd30093926a3b libex_tokenizers-v0.3.1-nif-2.16-x86_64-unknown-linux-gnu.so.tar.gz
a0d3ed11d8287b674279695dd9ea61f10f61e9661eaa5d5837b90bce6efed612 libex_tokenizers-v0.3.1-nif-2.16-x86_64-unknown-linux-musl.so.tar.gz
v0.3.0
Added
-
Add option to use cache when downloading pretrained files. We check the ETAG of
the file before trying to download it. This introduces the:use_cache
and:cache_dir
options to theTokenizers.from_pretrained/2
function. -
Support adding special tokens when creating a tokenizer. This allows a pretrained
tokenizer to be loaded with additional special tokens.This change adds the
:additional_special_tokens
option to theTokenizers.from_pretrained/2
function. -
Add support for the
riscv64gc-unknown-linux-gnu
target, which is useful for Nerves
projects running on 64 bits RISC-V computers.
This means that we are precompiling the project to run on those machines.
Changed
- Change minimum required version of Rustler Precompiled to
~> 0.6
. With this, we have
theaarch64-unknown-linux-musl
andriscv64gc-unknown-linux-gnu
as default targets.
But we also drop support for the NIF version 2.14.
Pull requests
- Add option to use cache when download pretrained files by @philss in #23
- Fix typos by @kianmeng in #24
- Update rustler_precompiled to v0.5.5 by @philss in #25
- Feature/add special tokens by @travismorton in #26
- Add riscv64gc-unknown-linux-gnu for RISC-V Nerves devices by @fhunleth in #27
- Simplify release workflow by @philss in #29
- Allow usage of castore 1.0 by @edennis in #30
- Updates to release v0.3.0 by @seanmor5 in #31
New Contributors
- @kianmeng made their first contribution in #24
- @travismorton made their first contribution in #26
- @fhunleth made their first contribution in #27
- @edennis made their first contribution in #30
Full Changelog: v0.2.0...v0.3.0
Official changelog: https://github.com/elixir-nx/tokenizers/blob/main/CHANGELOG.md
Checksums
Here is the list of SHA256 checksums of the precompiled files:
e73178ccbea2e63b7b86afcbcff1a01a10e2b69901f424e703e8a38ff74c1dcf ex_tokenizers-v0.3.0-nif-2.15-x86_64-pc-windows-gnu.dll.tar.gz
a562cac8feb8b3964860a461897be64d27a2a999f8cb237334f552ad1e14ff8a ex_tokenizers-v0.3.0-nif-2.15-x86_64-pc-windows-msvc.dll.tar.gz
da32956b0346021376fd14e2c484c1150b1ed197577d5c8f99193b3b1c815ae2 ex_tokenizers-v0.3.0-nif-2.16-x86_64-pc-windows-gnu.dll.tar.gz
aaf9fbd3ffbfced33e7871f3f036067156433fd935f0b42778f47982aeee4717 ex_tokenizers-v0.3.0-nif-2.16-x86_64-pc-windows-msvc.dll.tar.gz
9937b4a50fbd03e48483484b09aa95446a4b4a67eb07e74f480193eaac73087a libex_tokenizers-v0.3.0-nif-2.15-aarch64-apple-darwin.so.tar.gz
56038e1045c674c3a321a5bade5467c0cb599ede1a58df4fd1428f9eefe979b9 libex_tokenizers-v0.3.0-nif-2.15-aarch64-unknown-linux-gnu.so.tar.gz
bd1b27a026f3f8f5b0d60a7a2415bb19823fc5a726b173da24ca8fc1534e49f4 libex_tokenizers-v0.3.0-nif-2.15-aarch64-unknown-linux-musl.so.tar.gz
52e527a66d2806321c2297362fbac27429fb8e60f29386637c66ce78e8dadcf3 libex_tokenizers-v0.3.0-nif-2.15-arm-unknown-linux-gnueabihf.so.tar.gz
f43190147eafbc812607b61c9459a6c47aa526c492c13e2157cd613e382ba35e libex_tokenizers-v0.3.0-nif-2.15-riscv64gc-unknown-linux-gnu.so.tar.gz
36a6d6691a3b3fa6d56ddf37fa2f696ae79bf2b991c17ee7cccbb2b9dda1719b libex_tokenizers-v0.3.0-nif-2.15-x86_64-apple-darwin.so.tar.gz
f48f95b1a5373f75a555a78899916acb48d1c003795a02b0ee0001b7c92c8a51 libex_tokenizers-v0.3.0-nif-2.15-x86_64-unknown-linux-gnu.so.tar.gz
e79bac2f303dafdf7b8c7946c77a5f7be2e90db40f3102d1bdd3929d309fc949 libex_tokenizers-v0.3.0-nif-2.15-x86_64-unknown-linux-musl.so.tar.gz
5fdc9b12dcdc0eaf6ac7ba8a3608af33abe902299f81ec953fd7eac6de8477de libex_tokenizers-v0.3.0-nif-2.16-aarch64-apple-darwin.so.tar.gz
0e40ed777f14a41df64526b50224392723eabd83188fa79910c6baceca4c72f1 libex_tokenizers-v0.3.0-nif-2.16-aarch64-unknown-linux-gnu.so.tar.gz
287d5896f09562c25527105998f1468f97649fb7a8fec60a04ad53d113a5d9df libex_tokenizers-v0.3.0-nif-2.16-aarch64-unknown-linux-musl.so.tar.gz
dc8fdaa04935d32ab6a0cc00d06c1f3b747eb08eb09ab8b9d2143157c2de436e libex_tokenizers-v0.3.0-nif-2.16-arm-unknown-linux-gnueabihf.so.tar.gz
db79c2f522a4b39940d98550b29b47e9f8ba45c1d3f3a00a19278a3b977c65b7 libex_tokenizers-v0.3.0-nif-2.16-riscv64gc-unknown-linux-gnu.so.tar.gz
b343d1dd3e2467e4e54624bbd8ff9dcefe48dd23ddd59f8ab54e808cc19a5865 libex_tokenizers-v0.3.0-nif-2.16-x86_64-apple-darwin.so.tar.gz
3850d19c6b2e635e46475a0ee0cc6c3dc62b0325a8c232fdf1137cc98e48af9d libex_tokenizers-v0.3.0-nif-2.16-x86_64-unknown-linux-gnu.so.tar.gz
bdfeb7816dec218b04a024d2163f1fe7080eaca6e9b18823c3ac8f9cef3ecfbc libex_tokenizers-v0.3.0-nif-2.16-x86_64-unknown-linux-musl.so.tar.gz
v0.2.0
Adds a minimal http server to avoid problems with openssl
v0.1.2
v0.1.1
v0.1.0
First release.