Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The input dimensions received by subsequent nodes in ensemble mode are incorrect #7383

Open
SeibertronSS opened this issue Jun 27, 2024 · 0 comments

Comments

@SeibertronSS
Copy link

I built an LLM inference topology, including preprocessing inference and postprocessing. Each time the inference node only outputs the latest token_id to the postprocessing node, but sometimes the postprocessing node receives a lot of token_ids at one time, for example:

[8908, 8908, 234, 8908, 8908, 234, 114, 8908, 8908, 234, 8908, 8908, 234, 114, 103081, 8908, 8908, 234, 8908, 8908, 234, 114, 8908, 8908, 234, 8908, 8908, 234, 114, 103081, 99662, 8908, 8908, 234, 8908, 8908, 234, 114, 8908, 8908, 234, 8908, 8908, 234, 114, 103081, 8908, 8908, 234, 8908, 8908, 234, 114, 8908, 8908, 234, 8908, 8908, 234, 114, 103081, 99662, 99808, 8908, 8908, 234, 8908, 8908, 234, 114, 8908, 8908, 234, 8908, 8908, 234, 114, 103081, 8908, 8908, 234, 8908, 8908, 234, 114, 8908, 8908, 234, 8908, 8908, 234, 114, 103081, 99662, 8908, 8908, 234, 8908, 8908, 234, 114, 8908, 8908, 234, 8908, 8908, 234, 114, 103081, 8908, 8908, 234, 8908, 8908, 234, 114, 8908, 8908, 234, 8908, 8908, 234, 114, 103081, 99662, 99808, 99219, 9909]

When I request the inference node alone, I don't receive a similar response. This phenomenon is very similar to the duplication of memory, and the dimension of the token_id received by postprocessing will be doubled with each iteration of the model, and finally a token_id with billions of dimensions will be obtained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant