-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export job start/end times #3
Comments
Hi, Thank you for your suggestion, how do think these job start/end times should be exposed? I'm not a Prometheus expert but I think getting this amount of details about the jobs is maybe more suitable for another monitoring system, for example check out:
But I haven't checked if these projects expose the job start/end times. |
I think there's been two kind of monitoring systems I've seen so far for RQ:
The start/end time isn't as important as the job duration like you suggested. The use case I'm thinking would be:
So kind of an average job duration per queue name view and/or per I'm also not a Prometheus either expert though, so happy to hear your thoughts. |
Actually I thought that we can calculate the duration from the start and end times of the job. As per the Prometheus documentation on histograms and summaries:
The thing is that RQ is not like Celery which uses events, the RQ exporter queries Redis on each scrape by Prometheus to get the jobs and workers at the time of the scrape, so say for example a job was finished and deleted before this scrape, we won't know anything about it, also this means that we need to fetch all the jobs in the finished and failed registries and calculate their duration from their start and end times on each scrape. That's why I can't think of a good way to track the job duration without events dispatched from the jobs. |
Hmm, BaseRegistry.get_job_ids() does have a start/end range calling ZRANGE. Thoughts on scraping the last 10 jobs or so on each queue.finished_job_registry? I'm thinking it'd have the same effect for the use case with a histogram, though this could get expensive very fast with one call to get the IDs per finished queue and another per job to get start/end times. If I remember correctly, the finished_job_registry includes the failed_job_registry. |
I could be wrong but I still don't think we can record this data using a histogram/summary metric accurately, also as you said this operation could get very expensive. About the job registries, the finished and failed registries hold the successful and failed jobs respectively, the Honestly, I really don't have time to start working on this feature, of course pull requests are welcome, but if the operation is expensive you need to implement it behind a flag and disable it by default. |
That works, I'll have to take a further look on a good way to record this. Thanks for the insight. We'll have a use case for this in Q4 this year, so I'll open a PR if it works out. |
Firstly, RQ Exporter has been an excellent!
Do you think it'd be possible to include an option to export Job start/end times? I believe this should be accessible at
job.started_at
andjob.ended_at
per rq docs.The text was updated successfully, but these errors were encountered: