WIP: Add RayCliHook and RayCliOperator #37

kyle-hamlin · 2021-08-31T18:54:28Z

Hi team,

This was a first attempt at integrating Airflow and Ray together using a wrapper for the Ray CLI. I wasn't a big fan of having to fetch the cluster_config YAML and script file so I abandoned this approach in favor of using the KubernetePodOperator so as to keep Airflow and Ray decoupled from one another. That being said, I understand there are many use cases out there and some may be completely happy using this kind of operator.

Aside from my deployment thoughts, I do think it would be better to convert this operator to be a wrapper for the Ray SDK rather than the CLI as it would make a number of tasks quite a bit easier. Here is some seriously rough pseudo-code for how we might go about structuring the hook and ultimately a series of operators:

from ray.autoscaler.sdk import (
    create_or_update_cluster,
    run_on_cluster,
    teardown_cluster,
)

from typing import Union

class RaySDKHook:

    def __init__(
        cluster_config: Union[dict, str],
        ...
    ):

    self.cluster_config = self._get_cluster_config()


    def _get_cluster_config(self):
        if isinstance(self.cluster_config, dict):
            return self.cluster_config
        elif isinstance(self.cluster_config, str):
            if self.cluster_config.startswith("s3://"):
                # Get from s3
                # Could also apply templating here
            elif self.cluster_config.startswith("gs://"):
                # Get from gs
                # Could also apply templating here
            else:
                # Get from local file system
                # Could also apply templating here


    def start_cluster(self):
        ...
        create_or_update_cluster(cluster_config=self.cluster_config)
        ...
    

    def stop_cluster(self):
        ...
        teardown_cluster(cluster_config=self.cluster_config)
        ...
    

    def submit_job(self):
        ...
        run_on_cluster(cluster_config=self.cluster_config)
        ...


class RayStartClusterOperator:

    def execute():
        hook = RaySDKHook()
        hook.start_cluster()


class RayStopClusterOperator:

    def execute():
        hook = RaySDKHook()
        hook.stop_cluster()


class RayStopClusterOperator:

    def execute():
        hook = RaySDKHook()
        hook.submit_job()

dimberman · 2021-09-02T18:41:13Z

This looks great so far. @Mokubyow @grechaw would this just require that we pip install ray on the machine? I imagine the ray CLI doesn't need any extras, yes?

dimberman · 2021-09-02T18:43:20Z

I think having a Ray start and ray stop operator makes a LOT of sense here. Even for the decorator it would be great to allow people to start a ray cluster at the start of a DAG and then shut it down at the end of a DAG.

dimberman

Few small things but overall looks really good!

ray_provider/operators/ray_cli.py

dimberman · 2021-09-04T00:17:53Z

ray_provider/operators/ray_cli.py

+
+    def execute(self, context):
+        self.hook = self.get_hook()
+        self.hook.run()


Should we not return self.hook.run()? Seems like that function is returning stdoout.

ray_provider/hooks/ray_cli.py

kyle-hamlin · 2021-09-09T17:41:44Z

@dimberman Thanks for the feedback! I'd actually like to get some feedback from the Ray team on using the SDK methods as I outlined above vs. the CLI wrapper approach before proceeding. As I mentioned, I think it will be a bit easier to write/extend and create robust operator/s that way. @richardliaw @worldveil @grechaw Thoughts?

grechaw · 2021-09-09T21:50:19Z

I think it looks good. I have my usual concerns about it not being OOTB supported on Anyscale.

It's not like we have a ready way to make this work with Anyscale anyhow, but your approach will make that simple to graft in when we prioritize it.

dimberman · 2021-09-13T17:45:12Z

@Mokubyow I spoke with the Anyscale team and it sounds like there's no real way to do this via the SDK. However, the fact that you're using Popen instead of using Bash directly should be sufficient to prevent injections so should be good from a security standpoint.

kyle-hamlin · 2021-09-19T22:17:48Z

@dimberman @worldveil @grechaw I've pushed a WIP implementation of what using the SDK could look like vs wrapping the CLI directly. I'm partial to the SDK approach, as I've expressed, so I believe much of the decision on which route to pursue will come down to how stable the ray team thinks the SDK will be. Anyway, looking forward to chatting about both approaches and getting something out to the community!

WIP: Add RayCliHook and RayCliOperator

7a91077

richardliaw assigned dimberman Aug 31, 2021

dimberman requested changes Sep 4, 2021

View reviewed changes

ray_provider/hooks/ray_cli.py Outdated Show resolved Hide resolved

Kyle Hamlin added 2 commits September 19, 2021 16:09

Add WIP hook and operators for SDK implementation.

0e97ae9

Add WIP hook and operators for SDK implementation.

39117dc

Kyle Hamlin added 2 commits September 20, 2021 16:38

Remove SDK hook, Use CLI hook for start, stop, submit operators.

a8304c3

Add sphinx docs, Add verbose param.

3bbdc58

kyle-hamlin requested a review from dimberman September 21, 2021 16:02

Fill in descriptions

3c36d8d

kyle-hamlin marked this pull request as ready for review September 22, 2021 23:00

dimberman approved these changes Sep 24, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add RayCliHook and RayCliOperator #37

WIP: Add RayCliHook and RayCliOperator #37

kyle-hamlin commented Aug 31, 2021

dimberman commented Sep 2, 2021

dimberman commented Sep 2, 2021

dimberman left a comment

dimberman Sep 4, 2021

kyle-hamlin commented Sep 9, 2021 •

edited

Loading

grechaw commented Sep 9, 2021

dimberman commented Sep 13, 2021

kyle-hamlin commented Sep 19, 2021

WIP: Add RayCliHook and RayCliOperator #37

Are you sure you want to change the base?

WIP: Add RayCliHook and RayCliOperator #37

Conversation

kyle-hamlin commented Aug 31, 2021

dimberman commented Sep 2, 2021

dimberman commented Sep 2, 2021

dimberman left a comment

Choose a reason for hiding this comment

dimberman Sep 4, 2021

Choose a reason for hiding this comment

kyle-hamlin commented Sep 9, 2021 • edited Loading

grechaw commented Sep 9, 2021

dimberman commented Sep 13, 2021

kyle-hamlin commented Sep 19, 2021

kyle-hamlin commented Sep 9, 2021 •

edited

Loading