-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update tokenizers to 0.15.0 #55
Conversation
a56bd16
to
e228f7d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, if you want to add the option that would be great!
Okay @jonatanklosko exposed the option <3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐑
|
||
let byte_fallback = match options | ||
.iter() | ||
.find(|opt| matches!(opt, UnigramOption::ByteFallback(_))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iter . find
on each opt is
Probably it's better to find another solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With two options it's fine, but we could also do the same as here:
tokenizers/native/ex_tokenizers/src/models.rs
Lines 116 to 125 in 170ceac
struct Opts { | |
prefix: Option<String>, | |
} | |
// Default values | |
let mut opts = Opts { prefix: None }; | |
options.into_iter().for_each(|option| match option { | |
ModelSaveOption::Prefix(prefix) => opts.prefix = Some(prefix), | |
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I mean we're talking about an incredibly negligible bit of performance, but yeah mutating an opts struct is probably more performant :).
_ => None, | ||
}; | ||
|
||
let byte_fallback = match options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we matching the same 2 times (here and inside matches!
)? Is it compiler optimised?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, the outer match is on the found inner match, which is on each element.
@@ -125,7 +125,7 @@ fn apply_load_options(mut tokenizer: ExTokenizerImpl, options: Vec<LoadOption>) | |||
} | |||
|
|||
if opts.disable_truncation { | |||
tokenizer.with_truncation(None); | |||
tokenizer.with_truncation(None).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is API changed to return error? Why? I'm not sure unwrap
handles it properly (mb yes)
As on the tin. This gets us various improvements and bugfixes as detailed in the release notes.