Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batched kombu consumer #483

Open
wants to merge 69 commits into
base: develop
Choose a base branch
from
Open

Conversation

gfmio
Copy link

@gfmio gfmio commented Aug 25, 2020

This PR adds support for batch processing of messages in kombu consumer servers.

A typical use case for this would be a queue consumer that has better performance characteristics when performing batched processing, for example if performing a batch of database mutations in a single transaction is significantly faster than performing individual writes.

The KombuQueueConsumerFactory now accepts additional parameters to configure the batch processing. If the batch_size is None (default case), messages are consumed individually, so the behaviour is the same as before. If the batch_size is set, the messages will be processed in batches.

The kombu pump worker class KombuConsumerWorker accepts the same parameters and performs the actual batching. If batch_size is set, it stores new messages in a batch and processes them once the batch_size is reached or a timeout occurs.

The PR also adds a KombuBatchMessageHandler, which handles message sequences rather than individual messages. The KombuQueueConsumerFactory will return the correct type of message handler depending on whether batch_size is set.

The timeout in the KombuBatchQueueConsumer is implemented using a new Timer helper class which internally uses the library schedule to schedule the work. To prevent blocking, the default scheduler is extended to run on a separate thread.

The Timer and KombuConsumerWorker have safeguards in place for requeueing unprocessed messages in the event of a server shutdown or errors during batch processing.

baseplate/frameworks/queue_consumer/kombu.py Outdated Show resolved Hide resolved
baseplate/server/queue_consumer.py Outdated Show resolved Hide resolved
baseplate/lib/timer.py Outdated Show resolved Hide resolved
baseplate/lib/timer.py Outdated Show resolved Hide resolved
baseplate/lib/timer.py Outdated Show resolved Hide resolved
@spladug spladug added this to the 1.4 milestone Aug 27, 2020
@gfmio gfmio requested a review from pacejackson August 29, 2020 11:41
@gfmio
Copy link
Author

gfmio commented Aug 29, 2020

@pacejackson I've made the requested changes. The pump worker is now handling the batching. However, I had to introduce these rather ugly @overload declarations to address the type changes. Do you have an idea how to make this nicer?

Copy link
Contributor

@pacejackson pacejackson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the comments about using separate classes, I think we can simplify things a bit further by putting the actual batch management into it's own object rather than having it be in the pump worker itself:

class BatchedQueue:
    def __init__(self, work_queue: Queue, batch_size: int, flush_interval: timedelta):
        assert batch_size >= 1
        assert work_queue is not None
        self._work_queue = work_queue
        self._batch = []
        self._batch_size = batch_size
        self._flush_interval = flush_interval
        self._flusher = None

    def put(self, data):
        self._batch.append(data)
        if len(self._batch) >= self._batch_size:
            self._flush()
        else:
            self._queue_flush()

    def _flush(self):
        self._flusher.stop()
        self._flusher = None
        batch = copy(self._batch)
        self._batch = []
        self._work_queue.put(batch)

    def _queue_flush(self):
        if self._flusher != None:
            return
        self._flusher = DelayedWorker(delay=self._flush_interval, fn=self._flush)


class DelayedWorker:
    def __init__(self, delay: timedelta, fn: Callable):
        self._delay = delay
        self._fn = fn
        self._stopped = False
        self._thread = threading.Thread(target=self._run)
        self._thread.run()

    def stop(self):
        self._stopped = True

    def _run():
        time.sleep(self._delay)
        if self._stopped:
            return
        self._fn()

    def __del__(self):
        self.stop()

So then the pump consumer is just put-ing the message into a BatchedQueue rather than directly into a queue.Queue, you might even be able to make it easy to pass in one or the other to the same PumpQueue class constructor?

Note, this also would remove the ThreadedScheduler class and the dependency on schedule, and replaces the Timer class with a DelayedWorker object.

baseplate/frameworks/queue_consumer/kombu.py Outdated Show resolved Hide resolved
baseplate/frameworks/queue_consumer/kombu.py Outdated Show resolved Hide resolved
baseplate/lib/threaded_scheduler.py Outdated Show resolved Hide resolved
baseplate/frameworks/queue_consumer/kombu.py Show resolved Hide resolved
@gfmio
Copy link
Author

gfmio commented Sep 4, 2020

@pacejackson Thank you for your recommendations. I've adapted them slightly to be generic and to merged ideas of the DelayedWorker and Timer.

I do think that either the pump worker or the queue consumer should requeue batched messages in this design. Since we didn't want to modify the QueueConsumer class, the PumpWorker seems like the only good option to me.

Copy link
Contributor

@pacejackson pacejackson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good! There's still a few things to fix but I think it's just about there (or at least to the point where you can feel comfortable writing tests).

baseplate/lib/batched_queue.py Outdated Show resolved Hide resolved
baseplate/lib/timer.py Outdated Show resolved Hide resolved
baseplate/frameworks/queue_consumer/kombu.py Show resolved Hide resolved
baseplate/frameworks/queue_consumer/kombu.py Outdated Show resolved Hide resolved
baseplate/lib/batched_queue.py Outdated Show resolved Hide resolved
baseplate/frameworks/queue_consumer/kombu.py Outdated Show resolved Hide resolved
baseplate/frameworks/queue_consumer/kombu.py Outdated Show resolved Hide resolved
baseplate/frameworks/queue_consumer/kombu.py Show resolved Hide resolved
baseplate/frameworks/queue_consumer/kombu.py Outdated Show resolved Hide resolved
baseplate/frameworks/queue_consumer/kombu.py Outdated Show resolved Hide resolved
@gfmio
Copy link
Author

gfmio commented Apr 28, 2021

Sorry for taking 7 months to add the requested changes to this PR ;-)

@spladug spladug self-requested a review May 20, 2021 20:44
Copy link
Contributor

@spladug spladug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

@spladug spladug requested a review from pacejackson May 21, 2021 17:11
Copy link
Contributor

@pacejackson pacejackson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small things around naming of arguments

baseplate/frameworks/queue_consumer/kombu.py Outdated Show resolved Hide resolved
baseplate/frameworks/queue_consumer/kombu.py Outdated Show resolved Hide resolved
baseplate/frameworks/queue_consumer/kombu.py Outdated Show resolved Hide resolved
baseplate/frameworks/queue_consumer/kombu.py Outdated Show resolved Hide resolved
gfmio and others added 4 commits June 22, 2021 16:04
Co-authored-by: Andrew Boyle <pacejackson@users.noreply.github.com>
Co-authored-by: Andrew Boyle <pacejackson@users.noreply.github.com>
Co-authored-by: Andrew Boyle <pacejackson@users.noreply.github.com>
Co-authored-by: Andrew Boyle <pacejackson@users.noreply.github.com>
@gfmio
Copy link
Author

gfmio commented Jun 23, 2021

@spladug Ready to merge ♻️

@spladug
Copy link
Contributor

spladug commented Jun 23, 2021

@gfmio it looks like it's failing lint (non-lazy string formatting in a logger call)

Copy link
Contributor

@spladug spladug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for the lint failure

@gfmio
Copy link
Author

gfmio commented Sep 15, 2021

@spladug The linter and tests are now all green 💚

@gfmio gfmio requested a review from spladug September 15, 2021 00:59
Comment on lines +132 to +139
logger.warning(
"%s %s %s %s %s",
"KombuBatchConsumerWorker is stopping and requeuing messages in the",
"unprocessed batch. A message in this unprocessed batch had already",
"been acknowledged, so it is not being requeued. However, this indicates",
"an inconsistency in the batch processing logic which should be",
"investigated.",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here? I'm guessing it was complaining about concatenating the strings with +? It should be possible to concatenate them without this formatting stuff by just putting them next to eachother with no operators:

Suggested change
logger.warning(
"%s %s %s %s %s",
"KombuBatchConsumerWorker is stopping and requeuing messages in the",
"unprocessed batch. A message in this unprocessed batch had already",
"been acknowledged, so it is not being requeued. However, this indicates",
"an inconsistency in the batch processing logic which should be",
"investigated.",
)
logger.warning(
"KombuBatchConsumerWorker is stopping and requeuing messages in the "
"unprocessed batch. A message in this unprocessed batch had already "
"been acknowledged, so it is not being requeued. However, this indicates "
"an inconsistency in the batch processing logic which should be "
"investigated."
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants