You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I built an LLM inference topology, including preprocessing inference and postprocessing. Each time the inference node only outputs the latest token_id to the postprocessing node, but sometimes the postprocessing node receives a lot of token_ids at one time, for example:
When I request the inference node alone, I don't receive a similar response. This phenomenon is very similar to the duplication of memory, and the dimension of the token_id received by postprocessing will be doubled with each iteration of the model, and finally a token_id with billions of dimensions will be obtained.
The text was updated successfully, but these errors were encountered:
I built an LLM inference topology, including preprocessing inference and postprocessing. Each time the inference node only outputs the latest token_id to the postprocessing node, but sometimes the postprocessing node receives a lot of token_ids at one time, for example:
[8908, 8908, 234, 8908, 8908, 234, 114, 8908, 8908, 234, 8908, 8908, 234, 114, 103081, 8908, 8908, 234, 8908, 8908, 234, 114, 8908, 8908, 234, 8908, 8908, 234, 114, 103081, 99662, 8908, 8908, 234, 8908, 8908, 234, 114, 8908, 8908, 234, 8908, 8908, 234, 114, 103081, 8908, 8908, 234, 8908, 8908, 234, 114, 8908, 8908, 234, 8908, 8908, 234, 114, 103081, 99662, 99808, 8908, 8908, 234, 8908, 8908, 234, 114, 8908, 8908, 234, 8908, 8908, 234, 114, 103081, 8908, 8908, 234, 8908, 8908, 234, 114, 8908, 8908, 234, 8908, 8908, 234, 114, 103081, 99662, 8908, 8908, 234, 8908, 8908, 234, 114, 8908, 8908, 234, 8908, 8908, 234, 114, 103081, 8908, 8908, 234, 8908, 8908, 234, 114, 8908, 8908, 234, 8908, 8908, 234, 114, 103081, 99662, 99808, 99219, 9909]
When I request the inference node alone, I don't receive a similar response. This phenomenon is very similar to the duplication of memory, and the dimension of the token_id received by postprocessing will be doubled with each iteration of the model, and finally a token_id with billions of dimensions will be obtained.
The text was updated successfully, but these errors were encountered: