Nats SuperCluster with Gateways Adding Jetstream #5317

wilstoff · 2024-04-16T23:30:17Z

wilstoff
Apr 16, 2024

Hello again, we currently have a 3 data center 9-node nats super-cluster via gateways, as well as a 3-node stan cluster in each data center (with some custom eventually consistent topic replication across data centers). We're looking to upgrade to jetstream and i have some performance questions before we design the cluster configuration.

If we were to add an intra-datacenter only cluster configuration for jetstream on each of our current nats nodes, which are connected via gateways, how does the replication or raft consensus work between data centers if any? Would the jetstream raft consensus need to go across data centers to be able to achieve consensus? Or would you even be able to subscribe to steam A in datacenter 1 on jetstream and publish on stream A in datacenter 2 and it know those topics are the same? Or is it eventually consistent replication across the gateways? Ideally we are looking for the eventual consistent replication because we don't want to need to achieve consensus across datacenters and take the performance hit when majority of our streams are intra-datacenter.

I read the docs and found this issue on this kinda being available #3089 but wasn't able to find exactly the mechanism it decides to use. If there are docs and i missed them just please point me to them.

Below is our current nats configuration for 3 nodes in 1 data center and the other 2 datacenters map the same kind of configuration:

listen: 10.10.151.212:4222

logtime: true
log_file: "/var/log/nats/nats1.log"
http_port: 8222
debug: False
trace: False
max_pending: 268435456
max_payload: 10000000
write_deadline: 5s
cluster: {
  listen: 10.10.151.212:5222
  routes: [nats-route://chivmprdnats001:5222, nats-route://chivmprdnats002:5222, nats-route://chivmprdnats003:5222]
}
gateway: {
  name: riv
  listen: 10.10.151.212:7222
  reject_unknown: true
  gateways: [{'name': 'riv', 'urls': ['nats://chivmprdnats001:7222', 'nats://chivmprdnats002:7222', 'nats://chivmprdnats003:7222']}, {'name': 'eqx', 'urls': ['nats://eqxvmprdnats001:7222', 'nats://eqxvmprdnats002:7222', 'nats://eqxvmprdnats003:7222']}, {'name': 'ny4', 'urls': ['nats://ny4vmprdnats001:7222', 'nats://ny4vmprdnats002:7222', 'nats://ny4vmprdnats003:7222']}]

}

listen: 10.10.151.213:4222

logtime: true
log_file: "/var/log/nats/nats2.log"
http_port: 8222
debug: False
trace: False
max_pending: 268435456
max_payload: 10000000
write_deadline: 5s
cluster: {
  listen: 10.10.151.213:5222
  routes: [nats-route://chivmprdnats001:5222, nats-route://chivmprdnats002:5222, nats-route://chivmprdnats003:5222]
}
gateway: {
  name: riv
  listen: 10.10.151.213:7222
  reject_unknown: true
  gateways: [{'name': 'riv', 'urls': ['nats://chivmprdnats001:7222', 'nats://chivmprdnats002:7222', 'nats://chivmprdnats003:7222']}, {'name': 'eqx', 'urls': ['nats://eqxvmprdnats001:7222', 'nats://eqxvmprdnats002:7222', 'nats://eqxvmprdnats003:7222']}, {'name': 'ny4', 'urls': ['nats://ny4vmprdnats001:7222', 'nats://ny4vmprdnats002:7222', 'nats://ny4vmprdnats003:7222']}]

}

listen: 10.10.151.214:4222

logtime: true
log_file: "/var/log/nats/nats3.log"
http_port: 8222
debug: False
trace: False
max_pending: 268435456
max_payload: 10000000
write_deadline: 5s
cluster: {
  listen: 10.10.151.214:5222
  routes: [nats-route://chivmprdnats001:5222, nats-route://chivmprdnats002:5222, nats-route://chivmprdnats003:5222]
}
gateway: {
  name: riv
  listen: 10.10.151.214:7222
  reject_unknown: true
  gateways: [{'name': 'riv', 'urls': ['nats://chivmprdnats001:7222', 'nats://chivmprdnats002:7222', 'nats://chivmprdnats003:7222']}, {'name': 'eqx', 'urls': ['nats://eqxvmprdnats001:7222', 'nats://eqxvmprdnats002:7222', 'nats://eqxvmprdnats003:7222']}, {'name': 'ny4', 'urls': ['nats://ny4vmprdnats001:7222', 'nats://ny4vmprdnats002:7222', 'nats://ny4vmprdnats003:7222']}]

}

wilstoff · 2024-04-17T03:21:20Z

wilstoff
Apr 17, 2024
Author

After reading through more docs i did find that streams define the number of replicas they want and also can define a particular cluster to be stored in. I found this interesting example: https://natsbyexample.com/examples/use-cases/cross-region-streams-supercluster/cli which seemed to have something similar, but curious what the goal of the cross site cluster definition is. If i were to create a consumer on the stream EVENTS but i was connected in one of the regional clusters would i still be getting data from the cross site cluster publishes because i'm in a super cluster? Does this cross site cluster then need consensus across all 3 datacenters/regions?

I guess i'm trying to see if i should design a specific cluster configuration to setup what i would say is local region streams that aggregate (eventually) streams from other data centers. Or should i define that at the stream level with some combination of --source flag?

To put it in other terms, i want local consensus for publishes within one region/datacenter, with aggregated data from the other region's streams with the same name (or similar with a helpful prefix). I'm ok with not having queue groups work across regions or other side effects of the 3 clusters independently deciding the state of their local cluster's stream.

0 replies

ripienaar · 2024-04-17T06:36:31Z

ripienaar
Apr 17, 2024
Collaborator

There are a number of raft groups:

One that manages the super cluster, all nodes belong to this and it coordinates some asset placement, resource availabilit etc
Streams, as you point out, is one cluster only so there are per-stream and consumer raft groups to repilcate the data

So the bulk of the replication lives in your datacenter and needs kind of LAN latencies. You can access a stream from any datacenter.

The problem is what do you do when the data center holding the stream is unreachable. There are a number of strategies around mirrors, sources etc but those are read only in the event of a split or if that cluster holding the main stream looses quorum. One can even do circular replication that can maintain read and write during outages.

Another option is to create a kind of super cluster thats in reallity actually a normal cluster just with long latency connections - max around 100 to 150m - and use that for streams that should be available everywhere all the time. This is the "cross region" example you found - a very advanced use case example and to be used as a last resort imo.

Each strategy have a long list of pros and cons so careful consideration should be given to how and when you wish to access data in other clusters and how failure is manifested.

7 replies

wilstoff Apr 17, 2024
Author

Awesome! any example of super cluster config with circular replication will help. Whether its the stream config or server config was the big question i had. Follow up question, if end up with 3 clusters connected via gateways with 3 nodes each for the meta consensus group is that all 9 nodes? Can a node going down in datacenter eqx impact local only message publishes, requests, or new connections in ny4?

ripienaar Apr 18, 2024
Collaborator

All 9 nodes belong to the meta group. So you can loose 3 nodes before things start breaking.

When the meta group is bad iirc this is the situation:

Publishing to existing streams work
Consuming from existing consumers work
Can’t make new streams
Can’t make new consumers
Can’t do many operational things like moving streams etc.

wilstoff May 9, 2024
Author

Did anyone have that explicit example of circular replicated streams? Just so i make sure i'm not wasting my time in a configuration that won't do what i want. Thanks.

MauriceVanVeen May 9, 2024
Collaborator

There was a nice blog post some time ago, not sure if you've seen it already?

Has example configs of circular replicated streams as well.

https://www.synadia.com/blog/multi-cluster-consistency-models

wilstoff May 9, 2024
Author

Thank you very much this is exactly what i needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nats SuperCluster with Gateways Adding Jetstream #5317

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Nats SuperCluster with Gateways Adding Jetstream #5317

wilstoff Apr 16, 2024

Replies: 2 comments · 7 replies

wilstoff Apr 17, 2024 Author

ripienaar Apr 17, 2024 Collaborator

wilstoff Apr 17, 2024 Author

ripienaar Apr 18, 2024 Collaborator

wilstoff May 9, 2024 Author

MauriceVanVeen May 9, 2024 Collaborator

wilstoff May 9, 2024 Author

wilstoff
Apr 16, 2024

Replies: 2 comments 7 replies

wilstoff
Apr 17, 2024
Author

ripienaar
Apr 17, 2024
Collaborator

wilstoff Apr 17, 2024
Author

ripienaar Apr 18, 2024
Collaborator

wilstoff May 9, 2024
Author

MauriceVanVeen May 9, 2024
Collaborator

wilstoff May 9, 2024
Author