generated from caikit/caikit-template
-
Notifications
You must be signed in to change notification settings - Fork 50
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Embeddings fix for truncation without room for begin/end and for batc…
…h truncation * Attempting truncate_input_tokens=2 (or 1) was creating a strange error (or misbehaving) because it takes at least 3 tokens for [CLS] TOK [SEP] for meaningful results. * Now that truncate value generally means number of tokens not including begin/end. * On the max end the 2 special tokens will be allowed to consume 2 from the limit. * Batch embedding processing was returning odd/misordered results when combined with truncation. Added a re tokenize() call to avoid sending the overflow tokens as features to be processed. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
- Loading branch information
Showing
2 changed files
with
90 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters