-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add RayCliHook and RayCliOperator #37
base: main
Are you sure you want to change the base?
Conversation
This looks great so far. @Mokubyow @grechaw would this just require that we pip install ray on the machine? I imagine the ray CLI doesn't need any extras, yes? |
I think having a Ray start and ray stop operator makes a LOT of sense here. Even for the decorator it would be great to allow people to start a ray cluster at the start of a DAG and then shut it down at the end of a DAG. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few small things but overall looks really good!
ray_provider/operators/ray_cli.py
Outdated
|
||
def execute(self, context): | ||
self.hook = self.get_hook() | ||
self.hook.run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not return self.hook.run()? Seems like that function is returning stdoout.
@dimberman Thanks for the feedback! I'd actually like to get some feedback from the Ray team on using the SDK methods as I outlined above vs. the CLI wrapper approach before proceeding. As I mentioned, I think it will be a bit easier to write/extend and create robust operator/s that way. @richardliaw @worldveil @grechaw Thoughts? |
I think it looks good. I have my usual concerns about it not being OOTB supported on Anyscale. It's not like we have a ready way to make this work with Anyscale anyhow, but your approach will make that simple to graft in when we prioritize it. |
@Mokubyow I spoke with the Anyscale team and it sounds like there's no real way to do this via the SDK. However, the fact that you're using Popen instead of using Bash directly should be sufficient to prevent injections so should be good from a security standpoint. |
@dimberman @worldveil @grechaw I've pushed a WIP implementation of what using the SDK could look like vs wrapping the CLI directly. I'm partial to the SDK approach, as I've expressed, so I believe much of the decision on which route to pursue will come down to how stable the ray team thinks the SDK will be. Anyway, looking forward to chatting about both approaches and getting something out to the community! |
Hi team,
This was a first attempt at integrating Airflow and Ray together using a wrapper for the Ray CLI. I wasn't a big fan of having to fetch the cluster_config YAML and script file so I abandoned this approach in favor of using the KubernetePodOperator so as to keep Airflow and Ray decoupled from one another. That being said, I understand there are many use cases out there and some may be completely happy using this kind of operator.
Aside from my deployment thoughts, I do think it would be better to convert this operator to be a wrapper for the Ray SDK rather than the CLI as it would make a number of tasks quite a bit easier. Here is some seriously rough pseudo-code for how we might go about structuring the hook and ultimately a series of operators: