-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PYTHON-4476 Separate data and IO classes more effectively #1678
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+8,955 −19,746
nice. Just one quick comment. I'll need to spend more time to review this in the next day or two.
test/unified_format.py
Outdated
@@ -616,7 +616,7 @@ def get_lsid_for_session(self, session_name): | |||
session = self[session_name] | |||
if not isinstance(session, ClientSession): | |||
self.test.fail( | |||
f"Expected entity {session_name} to be of type ClientSession, got {type(session)}" | |||
f"Expected entity {session_name} to be of type AsyncClientSession, got {type(session)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one should be ClientSession, right?
test/synchronous/test_collection.py
Outdated
@@ -39,13 +39,13 @@ | |||
wait_until, | |||
) | |||
|
|||
from bson import encode | |||
from bson import RawBSONDocument, encode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this file ever get run? Seems like a duplicate of test/test_collection.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generated from test/asynchronous/test_collection.py
, left as an example result for our future test asynchronization work
test/asynchronous/test_collection.py
Outdated
@@ -40,21 +40,17 @@ | |||
wait_until, | |||
) | |||
|
|||
from bson import encode | |||
from bson import RawBSONDocument, encode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does test/asynchronous/ ever get run in CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, left in as an example for our future test asynchronization work
@@ -1,1903 +0,0 @@ | |||
# Copyright 2015-present MongoDB, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come this file doesn't show up as a rename like message.py does? Any way we can have git recognize it to minimize code churn?
I have the same question for many of the other files as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it's just git being confused. Let me do some commit reworking, I think I can significantly improve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wrong sadly, there isn't a good way to do this. Because we are essentially removing a lot of the asynchronous
and synchronous
files and putting a single version into the top-level pymongo
package, it complicates the history a lot for git. Git doesn't track renames/moves really, just the changes, so a "rename" just means it had few enough changes along with the location move to qualify.
pymongo/synchronous/server.py
Outdated
@@ -121,26 +135,26 @@ def run_operation( | |||
cursors. | |||
Can raise ConnectionFailure, OperationFailure, etc. | |||
|
|||
:param conn: A Connection instance. | |||
:param conn: An AsyncConnection instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"An AsyncConnection" is incorrect here.
pymongo/synchronous/server.py
Outdated
:param operation: A _Query or _GetMore object. | ||
:param read_preference: The read preference to use. | ||
:param listeners: Instance of _EventListeners or None. | ||
:param unpack_res: A callable that decodes the wire protocol response. | ||
:param client: An AsyncMongoClient instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AsyncMongoClient is incorrect here.
pymongo/synchronous/bulk.py
Outdated
@@ -180,7 +91,7 @@ def __init__( | |||
comment: Optional[str] = None, | |||
let: Optional[Any] = None, | |||
) -> None: | |||
"""Initialize a _Bulk instance.""" | |||
"""Initialize a _AsyncBulk instance.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_Bulk -> _AsyncBulk
pymongo/asynchronous/bulk.py
Outdated
@@ -407,7 +509,23 @@ async def _execute_command( | |||
if self.ordered and "writeErrors" in result: | |||
break | |||
else: | |||
to_send = await bwc.execute_unack(cmd, ops, client) | |||
if self.is_encrypted: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be refactored into a helper method? Something like:
result, to_send = await self._execute_batch_unack(cmd, ops, client)
We can also use the new helper in execute_op_msg_no_results down below.
pymongo/asynchronous/bulk.py
Outdated
@@ -387,7 +475,21 @@ async def _execute_command( | |||
|
|||
# Run as many ops as possible in one command. | |||
if write_concern.acknowledged: | |||
result, to_send = await bwc.execute(cmd, ops, client) | |||
if self.is_encrypted: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be refactored into a helper method? Something like:
result, to_send = await self._execute_batch(cmd, ops, client)
pymongo/asynchronous/bulk.py
Outdated
result = await self.write_command( | ||
bwc, cmd, request_id, msg, to_send, client | ||
) | ||
await client._process_response(result, bwc.session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't client._process_response be called in the encrypted case as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great so far! Got as far as all of the non-test files.
I decided I'd break it off and take the tests in a separate review!
class _CursorAddress(tuple): | ||
"""The server address (host, port) of a cursor, with namespace property.""" | ||
|
||
__namespace: Any | ||
|
||
def __new__(cls, address: _Address, namespace: str) -> _CursorAddress: | ||
self = tuple.__new__(cls, address) | ||
self.__namespace = namespace | ||
return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope for this PR, but why is this a __new__
. call instead of an __init__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, maybe something to do with tuple
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's a tuple thing.
pymongo/uri_parser.py
Outdated
if TYPE_CHECKING: | ||
from pymongo.pyopenssl_context import SSLContext | ||
|
||
_IS_SYNC = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is _IS_SYNC needed for a file like this?
Also, I think the file is synchronous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Those are all artifacts from moving these files out of pymongo/asynchronous
, fixed.
s.min_wire_version, | ||
common.MAX_SUPPORTED_WIRE_VERSION, | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you know if this if statements also needs a break?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes a difference in behavior, as if the if
block is executed the elif
block won't be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, not a fan of all the type ignores but I can't think of a better option right now. Nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last comment!
@@ -1,4 +1,4 @@ | |||
# Copyright 2024-present MongoDB, Inc. | |||
# Copyright 2016-present MongoDB, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copyright 2016-present MongoDB, Inc. | |
# Copyright 2016-present MongoDB, Inc. |
What's the criteria for the year changes? I see some read 2014, 2020, etc. How are we determining what year to set per-file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should reflect the year the original, pre-async file was committed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR refactors a significant amount of our codebase to reduce code duplication and more clearly separate network IO from data handling and business logic. There's more to be done, but breaking up future changes into smaller, more manageable PRs is a much better and more maintainable strategy. The changes here are largely circular in nature with import and dependency graphs, making it difficult to separate them out into multiple PRs.
Core changes:
client_options.py, collation.py, common.py, compression_support.py, encryption_options.py, event_loggers.py, hello.py, logger.py, max_staleness_selectors.py, message.py, monitoring.py, operations.py, read_preferences.py, response.py, server_description.py, server_selectors.py, srv_resolver.py, topology_description.py, uri_parser.py
out ofasynchronous
and into the top-level package. This significantly reduces the amount of code duplication done by synchronization, and should result in only files with actualasync def
methods staying insideasynchronous
.PoolOptions
and most of the helpers and constants out frompool.py
into their own sharedpool_options.py
file.asynchronous
files intoNAME_shared.py
files inside the top-level package.message.py
._Query, _GetMore, _BulkWriteContext, _EncryptedBulkWriteContext
are all now pure data classes that rely on their users (bulk.py
andserver.py
) to perform network IO.The coloring nature of asynchronous code encourages very clear separation of network IO from the rest of the codebase. Going forward, I recommend we try to avoid bundling IO inside classes in order to reduce code duplication in the synchronization process.