Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add back 24.05 response sending path to fix performance #381

Merged
merged 14 commits into from
Oct 8, 2024

Conversation

krishung5
Copy link
Contributor

@krishung5 krishung5 commented Oct 2, 2024

(Continuing from #378)
This PR restores the response sending path where the response batch is used to send responses from the stub to the backend, which was the method used in 24.05. In this PR, the method was changed to use response_sender.Send() to send responses back to the backend from execute(), with the goal of consolidating the two methods. However, we are observing performance regressions due to an extra IPC message in the response_sender.Send() path. Since the stub must send the response_batch to the backend regardless, the response returned from execute() can simply be appended to the response_batch, avoiding the extra IPC round.

Results:
Before

"infbench_summary": {
            "Requests": 1000,
            "Requests/s": 38.98603983854408,
            "E2E Latency(s)": 25.650207205999322,
            "Avg Latency(s)": 22.21363853594903,
            "Total Input Tokens": 127000,
            "Total Output Tokens": 128000,
            "Avg. Input Tokens / Request": 127.0,
            "Avg. Output Tokens / Request": 128.0,
            "Stddev Output Tokens / Request": 0.0,
            "Input Tokens/s": 4951.227059495098,
            "Output Tokens/s": 4990.213099333642,
            "Request Issues/s": 2361.583116145965,
            "Avg TTFT(s)": 3.2621611083069912,
            "Avg TPOT(s)": 0.14922423171371685
        }

After

"infbench_summary": {
            "Requests": 1000,
            "Requests/s": 44.054850715509474,
            "E2E Latency(s)": 22.69897602099809,
            "Avg Latency(s)": 21.284687425723977,
            "Total Input Tokens": 127000,
            "Total Output Tokens": 127950,
            "Avg. Input Tokens / Request": 127.0,
            "Avg. Output Tokens / Request": 127.95,
            "Stddev Output Tokens / Request": 0.5191338940966963,
            "Input Tokens/s": 5594.966040869703,
            "Output Tokens/s": 5636.818149049437,
            "Request Issues/s": 2459.541871853629,
            "Avg TTFT(s)": 3.136820318606013,
            "Avg TPOT(s)": 0.14295287205291818,
            "empty_chunks": 50
        }

24.05

"infbench_summary": {
            "Requests": 1000,
            "Requests/s": 42.78907614953764,
            "E2E Latency(s)": 23.37045082500117,
            "Avg Latency(s)": 21.546290700555048,
            "Total Input Tokens": 127000,
            "Total Output Tokens": 128000,
            "Avg. Input Tokens / Request": 127.0,
            "Avg. Output Tokens / Request": 128.0,
            "Stddev Output Tokens / Request": 0.0,
            "Input Tokens/s": 5434.212670991281,
            "Output Tokens/s": 5477.001747140818,
            "Request Issues/s": 2550.7080439825672,
            "Avg TTFT(s)": 3.256834254478021,
            "Avg TPOT(s)": 0.14401146807934667
        }

@krishung5 krishung5 marked this pull request as ready for review October 4, 2024 00:43
@krishung5 krishung5 requested review from kthui and Tabrizian October 4, 2024 00:56
src/python_be.cc Outdated Show resolved Hide resolved
Copy link
Member

@Tabrizian Tabrizian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kris 🙏

src/python_be.cc Outdated Show resolved Hide resolved
src/response_sender.cc Show resolved Hide resolved
src/pb_stub.cc Outdated Show resolved Hide resolved
@krishung5 krishung5 requested a review from Tabrizian October 4, 2024 19:39
@alexmsettle
Copy link

Is there anything holding up this PR? Looks like it should be ready to merge. We need this to intercept the 0.14 release of TRT-LLM. Code freeze just started today.

@krishung5 krishung5 merged commit a2564ea into main Oct 8, 2024
3 checks passed
krishung5 added a commit that referenced this pull request Oct 9, 2024
* Add back 24.05 response sender path

* Improve perf

* Fix cleanup

* Review comments

* Fix up

* Fix up

* Fix response factory cleanup

* Fix segfault

* Fix error handling

* Remove extra logs

* Fix up, add comments

* Address comment

* Fix up

---------

Co-authored-by: Iman Tabrizian <itabrizian@nvidia.com>
pvijayakrish pushed a commit that referenced this pull request Oct 15, 2024
* Add back 24.05 response sender path

* Improve perf

* Fix cleanup

* Review comments

* Fix up

* Fix up

* Fix response factory cleanup

* Fix segfault

* Fix error handling

* Remove extra logs

* Fix up, add comments

* Address comment

* Fix up

---------

Co-authored-by: Iman Tabrizian <itabrizian@nvidia.com>
@krishung5 krishung5 deleted the krish-fix-perf branch October 22, 2024 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants