Skip to content

Commit

Permalink
Used wrong default clean fn in SimpleTokenizer, put lower case back
Browse files Browse the repository at this point in the history
  • Loading branch information
rwightman committed Oct 12, 2023
1 parent 2f568cd commit 2c396d2
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/open_clip/tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ def __init__(
if canonicalize:
self.clean_fn = _canonicalize_basic_clean
else:
self.clean_fn = _whitespace_basic_clean
self.clean_fn = _lower_whitespace_basic_clean
self.vocab_size = len(self.encoder)
self.all_special_ids = [self.encoder[t] for t in special_tokens]

Expand Down

0 comments on commit 2c396d2

Please sign in to comment.