Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] Fix passing of key-value store handle from client to compiler. #19237

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sergachev
Copy link
Contributor

No description provided.

Copy link
Member

@PatriosTheGreat PatriosTheGreat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix!

Just our of curiosity, which PJRT client you are using that goes directly to PjRtStreamExecutorClient::Compile?

I recently added the same fix for JAX to PyClient::CompileIfrtProgram: f548642

I'm curious if we need to keep only one of these fixes.

copybara-service bot pushed a commit that referenced this pull request Nov 12, 2024
… compiler.

Imported from GitHub PR #19237

Copybara import of the project:

--
27a2ca3 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19237 from openxla:fix_kv_store 27a2ca3
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 12, 2024
… compiler.

Imported from GitHub PR openxla/xla#19237

Copybara import of the project:

--
27a2ca351957b8c5f5ab2b9105c6a897bc5f7957 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19237 from openxla:fix_kv_store 27a2ca351957b8c5f5ab2b9105c6a897bc5f7957
PiperOrigin-RevId: 695629316
@sergachev
Copy link
Contributor Author

sergachev commented Nov 12, 2024

Just our of curiosity, which PJRT client you are using that goes directly to PjRtStreamExecutorClient::Compile?

JAX uses this in the reproducer you provided for example. Not directly, but eventually it goes also through this method. Without this fix any attempt to use sharded autotuning fails at head.

copybara-service bot pushed a commit that referenced this pull request Nov 12, 2024
… compiler.

Imported from GitHub PR #19237

Copybara import of the project:

--
27a2ca3 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19237 from openxla:fix_kv_store 27a2ca3
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit that referenced this pull request Nov 12, 2024
… compiler.

Imported from GitHub PR #19237

Copybara import of the project:

--
27a2ca3 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19237 from openxla:fix_kv_store 27a2ca3
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 12, 2024
… compiler.

Imported from GitHub PR openxla/xla#19237

Copybara import of the project:

--
27a2ca351957b8c5f5ab2b9105c6a897bc5f7957 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19237 from openxla:fix_kv_store 27a2ca351957b8c5f5ab2b9105c6a897bc5f7957
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit that referenced this pull request Nov 12, 2024
… compiler.

Imported from GitHub PR #19237

Copybara import of the project:

--
27a2ca3 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19237 from openxla:fix_kv_store 27a2ca3
PiperOrigin-RevId: 695629316
@sergachev sergachev force-pushed the fix_kv_store branch 2 times, most recently from 8080bd9 to 5b731fc Compare November 12, 2024 22:19
@sergachev
Copy link
Contributor Author

While this isn't merged, I realized, there is a better way to fix this: setting the key-value store in CompileInternal() is better because all paths merge at it. Added a 2nd test case for that.

copybara-service bot pushed a commit that referenced this pull request Nov 12, 2024
… compiler.

Imported from GitHub PR #19237

Copybara import of the project:

--
8080bd9 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19237 from openxla:fix_kv_store 8080bd9
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit that referenced this pull request Nov 12, 2024
… compiler.

Imported from GitHub PR #19237

Copybara import of the project:

--
8080bd9 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19237 from openxla:fix_kv_store 8080bd9
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 12, 2024
… compiler.

Imported from GitHub PR openxla/xla#19237

Copybara import of the project:

--
8080bd9fb8d169bf2add2c079650e4cad2e9fdff by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19237 from openxla:fix_kv_store 8080bd9fb8d169bf2add2c079650e4cad2e9fdff
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit that referenced this pull request Nov 13, 2024
… compiler.

Imported from GitHub PR #19237

Copybara import of the project:

--
8080bd9 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19237 from openxla:fix_kv_store 8080bd9
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 13, 2024
… compiler.

Imported from GitHub PR openxla/xla#19237

Copybara import of the project:

--
8080bd9fb8d169bf2add2c079650e4cad2e9fdff by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19237 from openxla:fix_kv_store 8080bd9fb8d169bf2add2c079650e4cad2e9fdff
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit that referenced this pull request Nov 13, 2024
… compiler.

Imported from GitHub PR #19237

Copybara import of the project:

--
5b731fc by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19237 from openxla:fix_kv_store 5b731fc
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 13, 2024
… compiler.

Imported from GitHub PR openxla/xla#19237

Copybara import of the project:

--
5b731fca5d64d894f3b860fece4d85731c3984f5 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19237 from openxla:fix_kv_store 5b731fca5d64d894f3b860fece4d85731c3984f5
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit that referenced this pull request Nov 13, 2024
… compiler.

Imported from GitHub PR #19237

Copybara import of the project:

--
5b731fc by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19237 from openxla:fix_kv_store 5b731fc
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit that referenced this pull request Nov 13, 2024
… compiler.

Imported from GitHub PR #19237

Copybara import of the project:

--
5b731fc by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19237 from openxla:fix_kv_store 5b731fc
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 13, 2024
… compiler.

Imported from GitHub PR openxla/xla#19237

Copybara import of the project:

--
5b731fca5d64d894f3b860fece4d85731c3984f5 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19237 from openxla:fix_kv_store 5b731fca5d64d894f3b860fece4d85731c3984f5
PiperOrigin-RevId: 695629316
@sergachev
Copy link
Contributor Author

Rebased and resolved a conflict.

copybara-service bot pushed a commit that referenced this pull request Nov 15, 2024
… compiler.

Imported from GitHub PR #19237

Copybara import of the project:

--
99336a6 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19237 from openxla:fix_kv_store 99336a6
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 15, 2024
… compiler.

Imported from GitHub PR openxla/xla#19237

Copybara import of the project:

--
99336a672505d7efa9aacfea492585afb22a5dd2 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19237 from openxla:fix_kv_store 99336a672505d7efa9aacfea492585afb22a5dd2
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit that referenced this pull request Nov 15, 2024
… compiler.

Imported from GitHub PR #19237

Copybara import of the project:

--
99336a6 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19237 from openxla:fix_kv_store 99336a6
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit that referenced this pull request Nov 15, 2024
… compiler.

Imported from GitHub PR #19237

Copybara import of the project:

--
99336a6 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19237 from openxla:fix_kv_store 99336a6
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit that referenced this pull request Nov 15, 2024
… compiler.

Imported from GitHub PR #19237

Copybara import of the project:

--
99336a6 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19237 from openxla:fix_kv_store 99336a6
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit that referenced this pull request Nov 15, 2024
… compiler.

Imported from GitHub PR #19237

Copybara import of the project:

--
99336a6 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19237 from openxla:fix_kv_store 99336a6
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 15, 2024
… compiler.

Imported from GitHub PR openxla/xla#19237

Copybara import of the project:

--
99336a672505d7efa9aacfea492585afb22a5dd2 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19237 from openxla:fix_kv_store 99336a672505d7efa9aacfea492585afb22a5dd2
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit that referenced this pull request Nov 15, 2024
… compiler.

Imported from GitHub PR #19237

Copybara import of the project:

--
99336a6 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19237 from openxla:fix_kv_store 99336a6
PiperOrigin-RevId: 695629316
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 15, 2024
… compiler.

Imported from GitHub PR openxla/xla#19237

Copybara import of the project:

--
99336a672505d7efa9aacfea492585afb22a5dd2 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Fix passing of key-value store handle from client to compiler.

Merging this change closes #19237

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19237 from openxla:fix_kv_store 99336a672505d7efa9aacfea492585afb22a5dd2
PiperOrigin-RevId: 695629316
@reedwm
Copy link
Member

reedwm commented Nov 16, 2024

I'm seeing internal test failures for tests //xla/pjrt/gpu:se_gpu_pjrt_client_test and //xla/pjrt:pjrt_stream_executor_client_test, but they don't fail externally. I'm also seeing a third internal-only test fail.

The issue with pjrt_stream_executor_client_test and the internal test seems to be calling key_value_store() in pjrt_stream_executor_client.cc without checking there is a value in the std::optional. I see the internal test using the InterpreterDevice which does not override key_value_store. Not sure about pjrt_stream_executor_client_test though, and why it passes externally. In any case, can you fix the case where key_value_store() is nullopt?

se_gpu_pjrt_client_test is failing internally with the error

INTERNAL: RET_CHECK failure (third_party/tensorflow/compiler/xla/pjrt/gpu/se_gpu_pjrt_client_test.cc:1886) absl::StrContains( executable->GetHloModules()->front()->ToString(), "triton_gemm")

Any ideas what the issue is? If not, I can investigate internally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants