-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client side routing implementation #205
Conversation
Hey, thanks for this PR, this helps a lot! I went over the changes and don't have anything to add, but I also did look in detail at the actual routing protocol and algorithms. I'm ok with merging something before dealing with all limitations. In particular, I think it's fine to not support the routing context, bookmarks, only have partial 4.3 support, only having one routing strategy at the beginning. We don't need to have everything figured out upfront and can do this in steps/multiple PRs. I've been thinking about the testing a bit (or a lot) lately; I don't like the amount of integration tests compared to what the unit tests do. Especially when testing bolt protocol details of a non-happy path, it can be either impossible or very difficult to setup. I have half an idea to rewrite the core parts with the connection IO and pooling being abstracted out. I would like to be able to do more simulation testing where exact IO operations can be simulated in the tests without having to have an actual db to setup. I'd also like to move async out from the core to some outer layer. Maybe this goes in the sans-io direction, maybe it'll look a bit different. |
Thanks for the reply @knutwalker, I will remove the implementation of the client-side routing for the stable protocol and leave it only when the |
2184948
to
a8904a9
Compare
great!
There shouldn't be a need for any scheme. The first connection uses the scheme to determine the initial routing and the encryption to use. When connecting to a server from the routing table, you use the same encryption settings (and, when supported, routing context) and directly connect to that socket address, without going through the bolt/neo4j scheme parsing again. That is what the product drivers do, at least. That looks like this requires some changes to the pool creation, essentially using the same ConnectionInfo, but with only the host/port replaced. |
Yes, sorry...I realized that too. |
5eacd78
to
abf4dd8
Compare
@knutwalker please take a look at my last commit where I had to modify the main Graph interface. I had essentially to expose the |
abf4dd8
to
98bc0ce
Compare
a0c09be
to
b7b9889
Compare
@knutwalker I have finished my work on this first implementation of the client-side routing. I tested it in a simple local cluster and I am now running it on a k8s cluster and it seems to work properly. I still don't have any data about the performance impact regarding the locking mechanism implemented on the routing table the driver holds after the initialization. |
36316ae
to
2651092
Compare
4.4 IT failure could be related to #209 |
8c7afc4
to
9c449c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very good, thanks for adding all this. I do have few comments I'd like to be resolved before merging in, but it isn't anything bigger (I hope) and some are more ideas and thoughts than suggestions.
- Use round-robin balancing - Store routing table in a Dashmap to allow fast concurrency - Perform the update of the routing table using a simple mutex on the TTL of the table itself - read/write mode is propagated into the RUN command - Bolt scheme in pools to allow direct connections with no routing to provided servers
9c449c6
to
cae0599
Compare
You said you are already planning a follow up for the update in the background, so we can also leave it as is until then.
…On Thu, Jan 9, 2025, at 5:23 PM, Pierpaolo Follia wrote:
***@***.**** commented on this pull request.
In lib/src/routing/connection_registry.rs <#205 (comment)>:
> + registry.retain(|k, _| servers.contains(k));
+ let _ = self
+ .ttl
+ .fetch_update(Ordering::Relaxed, Ordering::Relaxed, |_ttl| {
+ Some(routing_table.ttl)
+ })
+ .unwrap();
+ debug!(
+ "Registry updated. New size is {} with TTL {}s",
+ registry.len(),
+ routing_table.ttl
+ );
+ *guard = now;
+ }
+ } else {
+ debug!("Routing table is not expired");
Blocking the connection is going to slow down all the connections that are coming during the fetch and update process. I used a try_lock to reduce this slow down even tho I understand it's a minimal improvement. The solution that uses a separated thread should actually solve this issue.
I can change this with a lock and see how much it impacts on performances in a real environment.
—
Reply to this email directly, view it on GitHub <#205 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAE2SX7QZR6BSNNQQCYGZHT2J2PAHAVCNFSM6AAAAABTV3KSECVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNBQGM4DENRQGI>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
- Lock the creation_time
Direct sounds good, I like it!
…On Thu, Jan 9, 2025, at 5:13 PM, Pierpaolo Follia wrote:
***@***.**** commented on this pull request.
In lib/src/graph.rs <#205 (comment)>:
> };
+use backoff::{Error, ExponentialBackoff};
+use std::time::Duration;
+
+#[derive(Clone)]
+enum ConnectionPoolManager {
+ #[cfg(feature = "unstable-bolt-protocol-impl-v2")]
+ Routed(RoutedConnectionManager),
+ Normal(ConnectionPool),
What about `Direct`?
—
Reply to this email directly, view it on GitHub <#205 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAE2SX7GJPZ5IG2QAJSLUSD2J2NY5AVCNFSM6AAAAABTV3KSECVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNBQGM2TKNJQHA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks!
The 4.4 test failures are persistent enough that they might actually be related to this change 🤔 |
lib/src/version.rs
Outdated
} | ||
|
||
impl Version { | ||
pub fn add_supported_versions(bytes: &mut BytesMut) { | ||
bytes.reserve(16); | ||
bytes.put_u32(0x0404); // V4_4 | ||
bytes.put_u32(0x0304); // V4_3 | ||
bytes.put_u32(0x0104); // V4_1 | ||
bytes.put_u32(0x0004); // V4 | ||
bytes.put_u32(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to remove the two put_u32(0)
calls. The initial message only expected 4 versions. I guess on 5.x, the server drains every additional bytes until the next message, on 4.4 those 0 values are part of the next message, effectively sending a few empty chunks.
This PR is a draft implementation of the client-side routing for new4j clusters.
The PR implements the ROUTE message for the unstable version of the protocol (v2).
- [ ] Bookmarks are not supported at the moment (the array is always sent empty)imp_user
should be added to the transaction begin message)Additional Notes:
- Extra fields in the route message are not implemented yet (part of v4.4)Protocol version is now 4.4skip_field
method is now rendering anull
value