Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ray Data] Improve the design and documentation of _MapWorker #46323

Open
marwan116 opened this issue Jun 28, 2024 · 0 comments
Open

[Ray Data] Improve the design and documentation of _MapWorker #46323

marwan116 opened this issue Jun 28, 2024 · 0 comments
Labels
data Ray Data-related issues enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks

Comments

@marwan116
Copy link
Contributor

marwan116 commented Jun 28, 2024

Description

Background: _MapWorker are actors created by ActorPoolMapOperator, which is used in the case where users call map-like APIs with a callable class UDF.

Use case

I have a ray cluster that is stuck attempting to schedule a _MapWorker. This is executing a complex ray data pipeline which contains more than one UDF, so a few problems arise:

  1. There is no way to tell which of the map UDF operations is stuck given it is only a generic _MapWorker and the user has to resolve the "optimal ray data DAG execution" in their head to guess what might be the issue.
  2. Currently one has to be knowledgeable of the ray data implementation specifics to understand what _MapWorker does given the name on its own is not self-explicable

See the attached screenshot for more details
image

Improvement suggestions:

  • Update the ray data docs to explain what the _MapWorker construct or whatever we end up naming it does, and what does its methods like __init__ andget_location do...
  • Rename _MapWorker to something more public facing like LaunchWorker
  • Always show the LaunchWorker under a subgroup of the MapWorker it is trying to create
@marwan116 marwan116 added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jun 28, 2024
@scottjlee scottjlee added P1 Issue that should be fixed within a few weeks data Ray Data-related issues and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jun 28, 2024
@marwan116 marwan116 changed the title [Ray Data] Improve the design and of _MapWorker [Ray Data] Improve the design and documentation of _MapWorker Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Ray Data-related issues enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests

2 participants