Replies: 1 comment
-
Does rocketry support a master/worker architecture where master submits tasks to a data store and multiple workers concurrently fetch tasks from the data store to execute? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This is a question I have got relatively frequently while working on the project (totally valid question). The follow is purely my personal opinion and I'm not an expert with any of the alternatives, unfortunately. Feel free to correct me and you are free to have different opinions (which you can state below). I'm biased and I hope you forgive me.
Of course, there are several other options:
I think Rocketry is clearly better than any of those except Airflow. You are much more limited with the other options in terms of the scheduling tasks, pipelining tasks, parametrization and parallel execution. In my opinion, Rocketry is the most readable by far, and offers much more than the alternatives except Airflow. But Airflow is for enterprises and the focus is on data engineering. Rocketry's focus is on being the scheduling backend for Python applications.
Most scheduling alternatives tell you to schedule your tasks in Cron-like options. Rocketry just asks you to form a statement that is true when you want the task to run. This approach allows much more complex scheduling strategies out of the box.
Here is a quick run-through of some comparisons:
Crontab VS Rocketry
Cron:
Rocketry:
I think Rocketry is clearly winner here. Crontab is more minimal but I don't think that's minimal enough to justify the loss of readability in most cases. In more sophisticated examples, Crontab falls short.
Schedule VS Rocketry
Schedule:
Rocketry:
This is actually quite similar and maybe opinionated on which is cleaner. However, Schedule fails on more sophisticated examples. It does not support running one task after another or properly parametrizing the tasks (at least on the level provided by Rocketry). Furthermore, it's much more limited in terms of available conditions and extending those.
APSchedule VS Rocketry
APSchedule:
Rocketry
I personally prefer Rocketry's syntax but maybe again it's opinionated. However, APSchedule is also missing many of the features Rocketry has: it has much fewer conditions, missing logical conditions, does not support multiple processes, does not support pipelining etc.
Airflow VS Rocketry
This is probably the hot topic and I think the answer is it depends. Airflow is much more mature than Rocketry, it is much more thought-out and it has much more built into it. If you are building data pipelines for enterprises, you probably should choose Airflow.
However, Rocketry is way easier to set up than Airflow. Take a look at the Airflow's tutorial vs Rocketry's. And again, it seems Rocketry offers more scheduling conditions and an easier way to extend those than Airflow.
But I think this competition comes down to what you need. If you are a data engineer or thinking of scheduling batched tasks in an enterprise setting, you probably should choose Airflow. If you are creating a Python application that also needs a scheduler, that's where Rocketry shines. If you need something quick or simple, Rocketry is also then probably better. I also think Airflow tells you more what to do than Rocketry.
The core idea of Rocketry is to have a Pythonic scheduling framework for Python applications. It's a bit like the web backends, Flask or FastAPI, but instead of web apps, it's a backend for automated things.
Beta Was this translation helpful? Give feedback.
All reactions