-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference server, lots of related changes #42
Conversation
238ab22 -> mapped_tokens are retrieved from HF's added_tokens (special_tokens_map.json) TODO:
|
…xample, newline replace
We can probably merge this. The server in itself works. It needs some improvement (gpu/memory model management, error handling, etc.) but all that can be added iteratively. |
d2fd18f aligns the behaviour of converted and trained models : transforms_configs of a trained model are adapted to facilitate loading of corresponding artifacts. |
fe8e8d7 -> when calling the |
This is a very first draft for a simple fastAPI based inference server. Not much but will be a first base to iterate on.
Key concepts/changes
transforms
andtransforms_configs
are saved in aninference.json
config file within the model directory, for transparent loading + tentative adaptation ofconvert_HF
to grab everything transparently;DecodingConfig
;random_sampling_topk/p
->top_k/p
,random_sampling_temp
->temperature
) and homogenization across the code;gpu
flag in PredictConfig, duplicate withworld_size
/gpu_ranks
(might still be improved though).Some short-term TODOs
Some nice-to-haves