v2.7.0
Release 2.7.0
Major Features and Improvements
- Added new tokenizer: FastWordpieceTokenizer that is considerably faster than the original WordpieceTokenizer
- WhitespaceTokenizer was rewritten to increase speed and smaller kernel size
- Ability to convert WhitespaceTokenizer & FastWordpieceTokenizer to TF Lite
- Added Keras layers for tokenizers: UnicodeScript, Whitespace, & Wordpiece
Bug Fixes and Other Changes
- (Generated change) Update tf.Text versions and/or docs.
- tiny change for variable name in transformer tutorial
- Update nmt_with_attention.ipynb
- Add vocab_size for wordpiece tokenizer to have consistency with sentence piece.
- This is a general clean up to the build files. The previous tf_deps paradigm was confusing. By encapsulating everything into a single call lib, I'm hoping this makes it easier to understand and follow.
- This adds the builder for the new WhitespaceTokenizer config cache. This is the first in a series of changes to update the WST for mobile.
- C++ API for new WhitespaceTokenizer. The updated API is more useful (accepts strings instead of ints), faster, and smaller in size.
- Adds pywrap for WhitespaceTokenizer config builder.
- Simplify the configure.bzl. Since for each platform we build with C++14, let's just make it easier to default to it across the board. This should be easier to understand and maintain.
- Remove most of the default oss deps for kernels as they are no longer required for building.
- Updating this BERT tutorial to use model subclassing (easier for students to hack on it this way).
- Adds kernels for TF & TFLite for the new WhitespaceTokenizer.
- Fix a problem with the WST template that was causing members to be exported as undefined symbols. After this change they become a unique global symbol in the shared object file.
- Update whitespace op to use new kernel. This change still allows for building the old kernel as well so current users can continue to use it, even though we cannot make new calls to it.
- Update whitespace op to use new kernel. This change still allows for building the old kernel as well so current users can continue to use it, even though we cannot make new calls to it.
- Convert the TFLite kernel for ngram with STRING_JOIN mode to use tfshim so the same code is now used for TF and TFLite kernels.
- fix: masked_ids -> masked_lm_ids
- Save the transformer.
- Remove the sentencepiece patch in OSS
- fix vocab_table arg is not used in bert_pretrain_preprocess()
- Disable TSAN for one more tutorial test that may run for >900sec when TSAN is
- Remove the sentencepiece patch in OSS
- internal
- (Generated change) Update tf.Text versions and/or docs.
- Update deps to fix broken build.
- Remove --gen_report flag.
- Small typo fixed
- Explain that all heads are handled with a single Dense layer
- internal change, should be a noop in github.
- Update whitespace op to use new kernel. This change still allows for building the old kernel as well so current users can continue to use it, even though we cannot make new calls to it.
- Creates tf Lite registrar and adds TF Lite tests for mobile ops.
- Fix nmt_with_attention start_index
- Export LD_LIBRARY_PATH when configuring for build.
- Update tf lite test to use the function rather than having to globally share the linked library symbols so the interpreter can find the name since this is only available on linux.
- Temporarily switch to the definition of REGISTER_TF_OP_SHIM while it updates.
- Update REGISTER_TF_OP_SHIM macro to remove unnecessary parameter.
- Remove temporary code and set back to using the op shim macro.
- Updated import statement
- Internal change
- pushed back forward compatibility date for tf_text.WhitespaceTokenizer.
- Add .gitignore
- The --keep_going flag will make bazel run all tests instead of stopping
- Add missing blank line between test and doctest.
- Adds a regression test for model server for the replaced WST op. This ensures that current models using the old kernel will continue to work.
- Fix the build by adding a new dependency required by TF to kernel targets.
- Add sentenepiece detokenize op to stateful allowlist.
- Fix broken build. This occurred because of a change on TF that updated the compiler infra version (tensorflow/tensorflow@e0940f2).
- Clean up code now that the build horizon has passed.
- Add pywrap dependency for tflite ops.
- Update TextVectorization layer
- Allows overridden get_selectable to be used.
- fix: masked_input_ids is not used in bert_pretrain_preprocess()
- Update word_embeddings.ipynb
- Fixed a value where the training accuracy was shown instead of the validation accuracy
- Mark old SP targets
- Create a single SELECT_TFTEXT_OPS for registering all of the TF Text ops with TF Lite interpreter. Also adds a single target for building to them.
- Add TF Lite op for RaggedTensorToTensor.
- Adds a new guide for using select TF Text ops in TF Lite models for mobile.
- Switch FastWordpieceTokenizer to default to running pre-tokenization, and rename the end_to_end parameter to no_pretokenization. This should be a no-op. The flatbuffer is not changed so as to not affect any models already using FWP currently. Only the python API is updated.
- Update version
Thanks to our Contributors
This release contains contributions from many people at Google, as well as:
Aaron Siddhartha Mondal, Abhijeet Manhas, Dominik Schlösser, jaymessina3, Mao, Xiaoquan Kong, Yasir Modak, Olivier Bacs, Tharaka De Silva