-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PoC: add chat template heuristics #1283
base: concedo_experimental
Are you sure you want to change the base?
PoC: add chat template heuristics #1283
Conversation
The fallback chat template adapter of Vicuna is not ideal in some cases (e.g. a test against a sub-portion of the BBC news classification task on Kaggle gave an 82% accuracy with Vicuna and 88% with the official ChatML format for a q4_k_m Qwen 2.5 3B-Instruct gguf). This PR adds a proof of concept simple heuristic which looks at the chat template and upgrades the adapter when it is able to.
The problem is the chat template cannot be trusted. It is set by unknown third parties, and very often straight up incorrect or misleading - that's the whole reason why I didn't use the jinja template to begin with. I'm alright with frontends using the The reason why the default is Alpaca (not vicuna in fact) has to do with the fact that |
I see. When I was evaluating a bunch of different models against a test, I was completely unaware of the fact they were all using Alpaca (sorry, mixed them up) and when I switched, there was a significant increase in accuracy across the board. I understand not wanting to trust third parties willy nilly. Perhaps there can be a flag that lets users choose whether or not they want to trust it or not? Edit: We can also add built in guard rails to deal with fuckery like unknown tokens. I.e. for every chat template profile, we include a list of tokens that must be present in the tokenizer. If any are missing, the chat template is not adopted. |
Well, I think there's a nice way to do it. What I can suggest instead is we create a new dummy file called (they can also simply type AutoGuess) Then, the heuristics will apply. |
Sounds good.
|
I added two new prints on startup, which come after the text model load:
|
9de98ca
to
b45380f
Compare
Giving this some thought, I think this would do well as a json file with search string array -> chat template name + params. Then a for loop that checks each in the code. We could probably even use the AutoGuess.json file for this. |
Better! Any adapter json file can now be a list with dicts with search, name, and adapter keys which will be searched for. Not sure how useful that is, but hey. This seems like a pretty seamless integration. |
The fallback chat template adapter of Vicuna is not ideal in some cases (e.g. a test against a sub-portion of the BBC news classification task on Kaggle gave an 82% accuracy with Vicuna and 88% with the official ChatML format for a q4_k_m Qwen 2.5 3B-Instruct gguf).
This PR adds a proof of concept simple heuristic which looks at the chat template and upgrades the adapter when it is able to.
Alternative approach: expose the llama chat template mechanism and use that.