-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open to PRs? #18
Comments
Hi Johnny! Welcome! We can't always promise an immediate turnaround, but we're absolutely open to PRs! We were actually already considering introducing a Puma-style forking model, so consider my interest piqued 😄. Given that it's not exactly a small addition, I'd love to dig in a little more and—ideally before you do a bunch of work (because I hear you about having PRs go unmerged 🥲)—discuss the proposed design/implementation, the use cases that it targets, and get your general thoughts on how to align the feature with the philosophy of this gem. So to start the conversation, I'd like to share a few reasons why we didn't go with a forking model originally (when we added the existing concurrency support). Here are three in no particular order (expand them for more context): 1. The job claim/pickup query is usually the bottleneck.We determined that the job claim/pickup query (and the strain it places on a DB with a high-throughput jobs table) is one of most common bottlenecks in determining an upper bound for worker counts, and so when exploring our options for concurrency, we wanted to avoid an implementation that would result in additional/concurrent pickup queries. That's why we first went with a multithreading approach (via our 2. We observed a lack of CPU-bound jobs. (Most jobs are DB- or network- bound.)Because of the GIL, a single Ruby process can only utilize one CPU core (no matter how many threads it spins out), so it seems like a forking model would be an obvious performance win. However, we generally weren't seeing very many CPU-bound jobs that would benefit from workers utilizing multiple cores. So this further lowered the priority of a forking approach for us. Obviously, this is going to vary from app to app, but knowing that Ractors are on their way as a potential path forward, we decided to keep things simpler and avoid introducing... 3. The need for process-management capabilities.A forking process manager requires a whole new layer of process-management concerns (i.e. what should be done when a child process dies, or uses too much memory, or becomes otherwise unresponsive, etc). Thus far, we've avoided baking in our own process manager, and instead have been able to rely on our deployment infrastructure for dealing with process management/scaling (we use a kubernetes-backed platform, so it works well, but Heroku dynos work similarly). And so the philosophy of this gem has been to follow "The Process Model", meaning that some outside process manager (k8s, heroku, systemd, etc) should be in charge of scaling out the processes (in containers resourced to fit their needs), and restarting crashed or unhealthy processes (and notifying us, or logging the occurrence, accordingly). Given the above (particularly point number 3), we decided to remove I'd also like to think more about what Puma-style forking could get us that we couldn't already get by scaling out additional K8s pods (or dynos, etc). I think there's a great case for better utilizing all of the resources available to larger containers -- e.g. if you've provisioned very large Dynos for your workers, a Puma-like forking model might help you make the best use of those resources. (Is that the kind of use case you had in mind for this?) Would love to get your thoughts on all of this! 🙏 |
Hi thanks for the response! There's a lot of ground to cover here... First more background, in terms of Puma, we run on K8s and using Puma we like to be able to tune both threads and processes. We run large K8s pods, e.g. 16 cores with 1 process per core, 5-20 threads per process. AFAIK the GIL prevents Ruby from effectively using multiple cores in a multi-threaded app, so you really need both multi-process and multi-core. I've run my multi-process fork of Delayed Job in production at TableCheck for 6+ months; we've served billions of jobs each month so its demostrably robust/stable. To respond to your points:
I use MongoDB and I've managed to optimize this. The problem I observed when using many different queues was "whiffing" on queries, so for example, I'd have 1,000,000 jobs from queue A at-rest, and a worker queue B scans all 1,000,000 for any job B items and doesn't find any. The trick here was to shard my different job queues into different database tables. (*By the way, I'd need to add MongoDB support to your repo.)
I have many Ruby-heavy CPU-bound computation jobs.
Not needed, we can live without this in the first interation. Puma implements inter-process communication (IPC) via UNIX pipes and I started to port it to DelayedJob its pretty nifty.
I was going to propose to add this back :) If we get into process management etc. think you'd want to do it without rake, but can of course keep Rake. This is another thing I refactored in Puma (not merged). |
Ah, makes sense. So your aim is to get as much as you can out of fewer, more heavily-resourced pods, and the pickup query is perhaps less of a bottleneck for you given your datastore & table sharding strategy. And given your CPU-bound jobs, you do benefit from utilizing multiple cores, beyond just the improved-on-net memory efficiency. Out of curiosity, in a perfect world where Ractors are stable everywhere, would you consider going back to a non-forking model?
As it stands, we're not looking to support anything other than SQL RDBMs (and ActiveRecord). More broadly we'd like to keep this gem focused around workloads that we can test and maintain in-house, so at least for now that will probably mean a much stronger focus on non-compute-bound workloads, which deemphasizes the need for Puma-style forking. Our current aim is really to only introduce as much complexity as is necessary to meet our own scale needs, so I'm having trouble reconciling our goals for this project with the additions you're proposing. 🙁 That's not to say that we don't want to support multi-core worker processes (I think, at least eventually, we do), but your need for MongoDB support would be a non-starter for us at this time. I hope that makes sense. |
Yes, if we had Ractors everywhere, then theoretically one Ruby process would be able to use multiple cores (one core per Ractor), and each Ractor having its own intepreter lock and multiple threads. We will have to benchmark to see if multi-process might still provide better perfomance. It is reasonable to consider multi-process as a stepping stone to multi-Ractor. It would be good to see Puma or other major server adopt Ractors first, then I could port it.
It's actually easy to do. All the MongoDB (Mongoid) logic can be contained in one model file that has the same interface as the ActiveRecord. I'm a maintainer of the current DelayedJob MongoDB plugin (as well as 20+ other MongoDB-related Ruby libs) and am happy to support the MongoDB code here. If you won't accept MongoDB PR and I can't find some way of patching the library, it won't be worth my time to contribute here. I may have to fork and release separately :( |
I hear you on how compatible the interface is! Unfortunately, we removed the multi-backend config, and bringing it back does not align with the goals of this project. At the very least because we're looking to extend/change that very interface in the future, and we won't be able to support tech that we don't use in house, or vouch for its alignment with the guarantees we'd like to make. We're also reshaping this gem's role primarily as a backend for ActiveJob, and will likely continue to remove features that aren't in support of that. (We want to make it easy to reach for the ActiveJob queue adapter that is right for the job, without any additional layers of configuration.) When we did the work to pull in |
ActiveJob is backend agnostic. The decision to use ActiveJob's job interface is not related to using ActiveRecord vs. Mongoid for persistence.
This doesn't make sense. DelayedJob's functionality is very heavy, DelayedJob Mongoid is ~100 lines of code. As a compromise, would you accept if I make a |
👋 I hear you, I really do. The Ruby background job ecosystem is something that I care a lot about, and I want to see continued development and iteration in this space, so I'm going to do my best to explain why I don't see this particular proposal working out for us in the long run.
As a user of ActiveJob, the decision of which queue adapter to use absolutely relates to the datastore that will back it. For example, setting As such, our focus with this gem is on providing a very targeted experience for folks who specifically want their queue adapter backed by their app's own SQL datastore, and we have provided examples of how this informs enqueuing & operating jobs as well as how to monitor the queue. The Continuous Monitoring feature (
I completely understand that, regardless of your datastore, many of the pieces of As an example, But if spinning out a standalone gem is not an appealing option for you, it makes me think that despite any challenges you've had in engaging with |
MongoDB is not Redis or Redis-like. For all intents-and-purposes, MongoDB does everything that a SQL database does. In addition, by design, the Mongoid gem has a 95% similar interface to ActiveRecord. I completely understand why Redis would be out-of-scope, I understand well the limitations of Sidekiq/Redis, and I am not asking to have a Redis, Memcached, etc. queue adapter in this gem. Would you be willing to have an open-mind if I submit a PR which creates a basic API for a pluggable "SQL/ActiveRecord-like" backend, and consider merging it if it doesn't significantly increase code complexity? The PR will not contain any MongoDB/Mongoid specific code. I'd really like to collaborate on this gem and I feel I/my team at TableCheck can add a lot of value, given what we've already built on Delayed Job. |
Hi Johnny, appreciate you dialing back some of the stronger typography in your initial version of your most recent comment. Please remember that Nathan and others have put a lot of thought into the philosophy behind delayed and the design decisions aren’t based on shallow misunderstandings of the differences between data stores, so if you want to engage on whether we should flex those decisions it’ll be important to work to understand them more deeply. If you read carefully you’ll note that Nathan is not making the argument that we chose not to support mongodb because mongodb is like redis. Also just wanted to give you a heads up that Nathan is on vacation right now, and I don’t know how online he plans to be, but he may follow up with more thoughts. |
@jmileham when the team forked DelayedJob, they copied in the ActiveRecord adapter Job class and removed ~50 lines of code related to pluggable backends. This removal specifically does not reduce the complexity of how the core of Delayed/DelayedJob works. They've simply gone from N adapters to 1 adapter. Other improvements in Delayed over DelayedJob (e.g. multi-threading) are certainly great, but specifically the removal (or re-addition) of pluggable backends does not affect that good stuff. As the structure has not fundamentally changed since DelayedJob, the work to add the pluggable backends code back is minimal. But fine if the team doesn't want to accept a PR for it, my team will just maintain our own fork. |
Hi there, I've raised several PRs to DelayedJob which aren't getting merged :(
I'm wondering if the owners of this repo are open to me porting some of the work here:
Let me know and I'll start raising PRs soon.
The text was updated successfully, but these errors were encountered: