Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PYTHON-4476 Separate data and IO classes more effectively #1678

Merged
merged 16 commits into from
Jun 26, 2024

Conversation

NoahStapp
Copy link
Contributor

@NoahStapp NoahStapp commented Jun 12, 2024

This PR refactors a significant amount of our codebase to reduce code duplication and more clearly separate network IO from data handling and business logic. There's more to be done, but breaking up future changes into smaller, more manageable PRs is a much better and more maintainable strategy. The changes here are largely circular in nature with import and dependency graphs, making it difficult to separate them out into multiple PRs.

Core changes:

  • Moved client_options.py, collation.py, common.py, compression_support.py, encryption_options.py, event_loggers.py, hello.py, logger.py, max_staleness_selectors.py, message.py, monitoring.py, operations.py, read_preferences.py, response.py, server_description.py, server_selectors.py, srv_resolver.py, topology_description.py, uri_parser.py out of asynchronous and into the top-level package. This significantly reduces the amount of code duplication done by synchronization, and should result in only files with actual async def methods staying inside asynchronous.
  • Separated PoolOptions and most of the helpers and constants out from pool.py into their own shared pool_options.py file.
  • Moved many of the helpers, constants, and data classes of the remaining asynchronous files into NAME_shared.py files inside the top-level package.
  • Removed all network IO from message.py. _Query, _GetMore, _BulkWriteContext, _EncryptedBulkWriteContext are all now pure data classes that rely on their users (bulk.py and server.py) to perform network IO.

The coloring nature of asynchronous code encourages very clear separation of network IO from the rest of the codebase. Going forward, I recommend we try to avoid bundling IO inside classes in order to reduce code duplication in the synchronization process.

Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+8,955 −19,746

nice. Just one quick comment. I'll need to spend more time to review this in the next day or two.

gridfs/synchronous/grid_file.py Outdated Show resolved Hide resolved
test/utils_selection_tests.py Outdated Show resolved Hide resolved
@@ -616,7 +616,7 @@ def get_lsid_for_session(self, session_name):
session = self[session_name]
if not isinstance(session, ClientSession):
self.test.fail(
f"Expected entity {session_name} to be of type ClientSession, got {type(session)}"
f"Expected entity {session_name} to be of type AsyncClientSession, got {type(session)}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should be ClientSession, right?

@@ -39,13 +39,13 @@
wait_until,
)

from bson import encode
from bson import RawBSONDocument, encode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this file ever get run? Seems like a duplicate of test/test_collection.py

Copy link
Contributor Author

@NoahStapp NoahStapp Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generated from test/asynchronous/test_collection.py, left as an example result for our future test asynchronization work

@@ -40,21 +40,17 @@
wait_until,
)

from bson import encode
from bson import RawBSONDocument, encode
Copy link
Member

@ShaneHarvey ShaneHarvey Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does test/asynchronous/ ever get run in CI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, left in as an example for our future test asynchronization work

pymongo/message.py Outdated Show resolved Hide resolved
pymongo/message.py Outdated Show resolved Hide resolved
pymongo/message.py Outdated Show resolved Hide resolved
@@ -1,1903 +0,0 @@
# Copyright 2015-present MongoDB, Inc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come this file doesn't show up as a rename like message.py does? Any way we can have git recognize it to minimize code churn?

I have the same question for many of the other files as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it's just git being confused. Let me do some commit reworking, I think I can significantly improve it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong sadly, there isn't a good way to do this. Because we are essentially removing a lot of the asynchronous and synchronous files and putting a single version into the top-level pymongo package, it complicates the history a lot for git. Git doesn't track renames/moves really, just the changes, so a "rename" just means it had few enough changes along with the location move to qualify.

@@ -121,26 +135,26 @@ def run_operation(
cursors.
Can raise ConnectionFailure, OperationFailure, etc.

:param conn: A Connection instance.
:param conn: An AsyncConnection instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"An AsyncConnection" is incorrect here.

:param operation: A _Query or _GetMore object.
:param read_preference: The read preference to use.
:param listeners: Instance of _EventListeners or None.
:param unpack_res: A callable that decodes the wire protocol response.
:param client: An AsyncMongoClient instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsyncMongoClient is incorrect here.

@@ -180,7 +91,7 @@ def __init__(
comment: Optional[str] = None,
let: Optional[Any] = None,
) -> None:
"""Initialize a _Bulk instance."""
"""Initialize a _AsyncBulk instance."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_Bulk -> _AsyncBulk

@@ -407,7 +509,23 @@ async def _execute_command(
if self.ordered and "writeErrors" in result:
break
else:
to_send = await bwc.execute_unack(cmd, ops, client)
if self.is_encrypted:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be refactored into a helper method? Something like:

                    result, to_send = await self._execute_batch_unack(cmd, ops, client)

We can also use the new helper in execute_op_msg_no_results down below.

@@ -387,7 +475,21 @@ async def _execute_command(

# Run as many ops as possible in one command.
if write_concern.acknowledged:
result, to_send = await bwc.execute(cmd, ops, client)
if self.is_encrypted:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be refactored into a helper method? Something like:

                    result, to_send = await self._execute_batch(cmd, ops, client)

result = await self.write_command(
bwc, cmd, request_id, msg, to_send, client
)
await client._process_response(result, bwc.session)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't client._process_response be called in the encrypted case as well?

Copy link
Contributor

@Jibola Jibola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great so far! Got as far as all of the non-test files.

I decided I'd break it off and take the tests in a separate review!

Comment on lines +1507 to +1515
class _CursorAddress(tuple):
"""The server address (host, port) of a cursor, with namespace property."""

__namespace: Any

def __new__(cls, address: _Address, namespace: str) -> _CursorAddress:
self = tuple.__new__(cls, address)
self.__namespace = namespace
return self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope for this PR, but why is this a __new__. call instead of an __init__?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, maybe something to do with tuple?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's a tuple thing.

if TYPE_CHECKING:
from pymongo.pyopenssl_context import SSLContext

_IS_SYNC = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is _IS_SYNC needed for a file like this?

Also, I think the file is synchronous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Those are all artifacts from moving these files out of pymongo/asynchronous, fixed.

s.min_wire_version,
common.MAX_SUPPORTED_WIRE_VERSION,
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you know if this if statements also needs a break?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes a difference in behavior, as if the if block is executed the elif block won't be.

@NoahStapp NoahStapp requested a review from Jibola June 25, 2024 22:08
Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, not a fan of all the type ignores but I can't think of a better option right now. Nice work!

Copy link
Contributor

@Jibola Jibola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last comment!

@@ -1,4 +1,4 @@
# Copyright 2024-present MongoDB, Inc.
# Copyright 2016-present MongoDB, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copyright 2016-present MongoDB, Inc.
# Copyright 2016-present MongoDB, Inc.

What's the criteria for the year changes? I see some read 2014, 2020, etc. How are we determining what year to set per-file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should reflect the year the original, pre-async file was committed.

@NoahStapp NoahStapp requested a review from Jibola June 26, 2024 17:03
Copy link
Contributor

@Jibola Jibola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NoahStapp NoahStapp merged commit ffa6555 into mongodb:master Jun 26, 2024
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants