deps: python3-kafka sync our package with upstream [SRE-7606] #24

rushidave · 2024-01-03T17:05:44Z

Need to sync upstream as python3-kafka package is failing due to this import error. The fix we need has been merged upstream (https://github.com/dpkp/kafka-python/pull/2398/files) and we need to sync with our package.

After stop/start kafka service, kafka-python may use 100% CPU caused by busy-retry while the socket was closed. This fix the issue by sleep 0.1 second if the fd is negative.

Implement support for SOCKS5 proxies. Implement a new proxy wrapper that handles SOCKS5 connection, authentication and requesting connections to the actual Kafka broker endpoint. The proxy can be configured via a new keyword argument `socks5_proxy` to consumers, producers or admin client. The value is URL with optional username and password. E.g. `socks5://user:secret@proxy.example.com:10800` The implementation is done in state machine that emulates repeated calls to connect_ex. The rationale with this design is minimal changes to the actual BrokerConnection object.

If the value `_controller_id` is out-of-date and the node was removed from the cluster, `_send_request_to_node` would enter an infinite loop.

A call to `maybe_connect` can be performed while the cluster metadata is being updated. If that happens, the assumption that every entry in `_connecting` has metadata won't hold. The existing assert will then raise on every subsequent call to `poll` driving the client instance unusable. This fixes the issue by ignoring connetion request to nodes that do not have the metadata available anymore.

edward-evans-aiven

Just the one note, not sure if it needs to change or is OK. Most of the sync seems fine.

edward-evans-aiven · 2024-02-21T00:30:02Z

kafka/admin/client.py

        Returns a future that may be polled for status and results.

        :param node_id: The broker id to which to send the message.
        :param request: The message to send.
+        :param wakeup: Optional flag to disable thread-wakeup.


We seem to be missing the wakeup parameter

aiven-anton

Merging upstream into our fork is not the right action to take, rather we should make sure our commits are applied on top of upstream, i.e. essentially a rebase.

aiven-anton · 2024-02-23T15:05:37Z

@rushidave To expand on my comment, I'd expect this to be carried out something like this:

# Setup access to the two repositories and fetch.
$ git remote add fork git@github.com:aiven/kafka-python.git
$ git remote add upstream git@github.com:dpkp/kafka-python.git
$ git fetch -p fork
$ git fetch -p upstream
# Checkout main and reset it to the fork's main (make sure to not have anything uncommitted before doing this)
$ git checkout main
$ git reset fork/main --hard
# Rebase main on upstream/main. This means any commit made on the fork will be reapplied on top of
# the new upstream/main commits.
$ git rebase upstream/main
# Force push our rebased main to the fork.
# --force-with-lease makes sure we don't accidentally overwrite something that was committed while
# we were working.
$ git push fork main --force-with-lease

rushidave · 2024-02-28T18:09:07Z

Would love to get another set of eyes before pushing. Also we have a backup branch rushidave-main-backup-feb-2024

git log. Main is 8 commits ahead of upstream
commit c662d941b13f11620f60fc48580f2f6643051ca6 (HEAD -> main)
Author: Jonas Keeling <jonas.keeling@aiven.io>
Date:   Mon Feb 26 14:36:42 2024 +0100

    bugfix: catch KeyError in check_version and retry

commit 1a6bf016fe5c47c597afa081c6e103ca0d283d14
Author: Jonas Keeling <jonas.keeling@gmail.com>
Date:   Mon Feb 26 10:29:17 2024 +0100

    Revert "bugfix: raise error in check_version if broker is unavailable"

commit 919f61da9efdfe1cf760b94585d72b16d414cdda
Author: Jonas Keeling <jonas.keeling@aiven.io>
Date:   Thu Feb 22 23:00:34 2024 +0100

    bugfix: raise error in check_version if broker is unavailable
    
    This fixes an issue in check_version where KeyError is raised if the broker is unavailable or an invalid node_id is used. Instead it will return BrokerNotAvailableError.

commit 4ca2f45994ff71da76211ca28b8978fbc2278b17
Author: Augusto F. Hack <hack.augusto@gmail.com>
Date:   Tue Jan 5 13:20:38 2021 +0100

    bugfix: race among _connecting and cluster metadata (#2189)
    
    A call to `maybe_connect` can be performed while the cluster metadata is
    being updated. If that happens, the assumption that every entry in
    `_connecting` has metadata won't hold. The existing assert will then
    raise on every subsequent call to `poll` driving the client instance
    unusable.
    
    This fixes the issue by ignoring connetion request to nodes that do not
    have the metadata available anymore.

commit 6985761d2f981ace1943e728a33781afc454552c
Author: Augusto F. Hack <hack.augusto@gmail.com>
Date:   Thu Jan 14 17:50:28 2021 +0100

    bugfix: infinite loop when send msgs to controller (#2194)
    
    If the value `_controller_id` is out-of-date and the node was removed
    from the cluster, `_send_request_to_node` would enter an infinite loop.

commit a927ff28a2d78fdf45f8892ad7b67c66ce6276fc
Author: Augusto F. Hack <hack.augusto@gmail.com>
Date:   Thu Jan 14 15:18:05 2021 +0100

    bugfix: fix infinite loop on KafkaAdminClient (#2194)
    
    An infinite loop may happen with the following pattern:
    
        self._send_request_to_node(self._client.least_loaded_node(), request)
    
    The problem happens when `self._client`'s cluster metadata is out-of-date, and the
    result of `least_loaded_node()` is a node that has been removed from the cluster but
    the client is unware of it. When this happens `_send_request_to_node` will enter an
    infinite loop waiting for the chosen node to become available, which won't happen,
    resulting in an infinite loop.
    
    This commit introduces a new method named `_send_request_to_least_loaded_node` which
    handles the case above. This is done by regularly checking if the target node is
    available in the cluster metadata, and if not, a new node is chosen.
    
    Notes:
    
    - This does not yet cover every call site to `_send_request_to_node`, there are some
      other places were similar race conditions may happen.
    - The code above does not guarantee that the request itself will be sucessful, since
      it is still possible for the target node to exit, however, it does remove the
      infinite loop which can render client code unusable.

commit 6efff5251b887306f46c1aeff8984d6292390378
Author: Heikki Nousiainen <htn@aiven.io>
Date:   Tue Sep 29 21:45:27 2020 +0300

    Support connecting through SOCKS5 proxies (#2169)
    
    Implement support for SOCKS5 proxies. Implement a new proxy wrapper
    that handles SOCKS5 connection, authentication and requesting
    connections to the actual Kafka broker endpoint.
    
    The proxy can be configured via a new keyword argument `socks5_proxy`
    to consumers, producers or admin client. The value is URL with optional
    username and password. E.g.
    `socks5://user:secret@proxy.example.com:10800`
    
    The implementation is done in state machine that emulates repeated
    calls to connect_ex. The rationale with this design is minimal changes
    to the actual BrokerConnection object.

commit ce7d853198b4207fa52d6bbf4ef0f5426ae102bf
Author: Orange Kao <orange@aiven.io>
Date:   Tue Aug 4 01:42:47 2020 +0000

    Avoid 100% CPU usage while socket is closed (sleep)
    
    After stop/start kafka service, kafka-python may use 100% CPU caused by
    busy-retry while the socket was closed. This fix the issue by sleep 0.1
    second if the fd is negative.

commit b68f61d49556377bf111bebb82f8f2bd360cc6f7 (upstream/master)
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Wed Dec 13 21:11:04 2023 -0500

    Bump actions/setup-java from 3 to 4 (#2417)
    
    Bumps [actions/setup-java](https://github.com/actions/setup-java) from 3 to 4.
    - [Release notes](https://github.com/actions/setup-java/releases)
    - [Commits](https://github.com/actions/setup-java/compare/v3...v4)
    
    ---
    updated-dependencies:
    - dependency-name: actions/setup-java
      dependency-type: direct:production
      update-type: version-update:semver-major
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

aiven-anton · 2024-02-29T11:00:11Z

@rushidave I've verified:

✅ Last commit listed (b68f61d) is the tip of upstream: https://github.com/dpkp/kafka-python/commits/master/
✅ All commits we're currently ahead of upstream are included in your git log.

I would consider omitting 1a6bf016fe5c47c597afa081c6e103ca0d283d14 and 919f61da9efdfe1cf760b94585d72b16d414cdda as one reverts the other, but no strong opinion either way. This LGTM.

aiven-anton · 2024-02-29T16:32:28Z

@rushidave From your comment here.

Discussion here to remove branch protection

Yes, I think you can ask in access management if you can get admin permissions for this repository.

orange-kao and others added 5 commits January 3, 2024 10:55

Avoid 100% CPU usage while socket is closed (sleep)

02efe65

After stop/start kafka service, kafka-python may use 100% CPU caused by busy-retry while the socket was closed. This fix the issue by sleep 0.1 second if the fd is negative.

bugfix: fix python3-kafka error for python3.12

d5baa29

bugfix: infinite loop when send msgs to controller (dpkp#2194)

8897252

If the value `_controller_id` is out-of-date and the node was removed from the cluster, `_send_request_to_node` would enter an infinite loop.

rushidave self-assigned this Jan 3, 2024

rushidave requested a review from kevinhughes27 January 3, 2024 17:07

rushidave changed the title ~~Rushidave merge upstream~~ Sync our package with upstream Jan 3, 2024

rushidave requested review from ivanyu and tvainika January 3, 2024 17:24

rushidave changed the title ~~Sync our package with upstream~~ deps: python3-kafka sync our package with upstream [SRE-7606] Jan 18, 2024

edward-evans-aiven reviewed Feb 21, 2024

View reviewed changes

aiven-anton requested changes Feb 23, 2024

View reviewed changes

rushidave mentioned this pull request Feb 29, 2024

Rushidave sync fork to upstream #28

Open

rushidave force-pushed the main branch from 4ffdc97 to c662d94 Compare March 1, 2024 16:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deps: python3-kafka sync our package with upstream [SRE-7606] #24

deps: python3-kafka sync our package with upstream [SRE-7606] #24

rushidave commented Jan 3, 2024

edward-evans-aiven left a comment

edward-evans-aiven Feb 21, 2024

aiven-anton left a comment

aiven-anton commented Feb 23, 2024

rushidave commented Feb 28, 2024 •

edited

Loading

aiven-anton commented Feb 29, 2024 •

edited

Loading

aiven-anton commented Feb 29, 2024

deps: python3-kafka sync our package with upstream [SRE-7606] #24

Are you sure you want to change the base?

deps: python3-kafka sync our package with upstream [SRE-7606] #24

Conversation

rushidave commented Jan 3, 2024

edward-evans-aiven left a comment

Choose a reason for hiding this comment

edward-evans-aiven Feb 21, 2024

Choose a reason for hiding this comment

aiven-anton left a comment

Choose a reason for hiding this comment

aiven-anton commented Feb 23, 2024

rushidave commented Feb 28, 2024 • edited Loading

aiven-anton commented Feb 29, 2024 • edited Loading

aiven-anton commented Feb 29, 2024

rushidave commented Feb 28, 2024 •

edited

Loading

aiven-anton commented Feb 29, 2024 •

edited

Loading