You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The results of inferences are not recorded in the log at once, so it is difficult to compare or search for the history of inference times for each model. Currently, the trace output varies widely by category (e.g., HTTP_RECV_START, INFER_RESPONSE_COMPLETE, HTTP_SEND_END), and additional work (like json parsing) is required to calculate the total time.
The --log-verbose option generates too many logs for one request.
Describe the solution you'd like
I would like to obtain the results of input token / output token / inference time / output_text for each ID in one line. (Like vllm, tgi)
Is your feature request related to a problem? Please describe.
The results of inferences are not recorded in the log at once, so it is difficult to compare or search for the history of inference times for each model. Currently, the trace output varies widely by category (e.g., HTTP_RECV_START, INFER_RESPONSE_COMPLETE, HTTP_SEND_END), and additional work (like json parsing) is required to calculate the total time.
The --log-verbose option generates too many logs for one request.
Describe the solution you'd like
I would like to obtain the results of input token / output token / inference time / output_text for each ID in one line. (Like vllm, tgi)
Additional context
The text was updated successfully, but these errors were encountered: