Using Nats Jetstream as an event store #3772

cloudcompute · 2023-01-08T14:07:30Z

cloudcompute
Jan 8, 2023

a. Can we use NATS JetStream for Event Sourcing? Does it have the capability to store millions of events at a time without affecting the speed and performance of an application?

b. I read that we can archive the event logs to cold storage. But I have not come across an example that demonstrates how we can move the logs to, let's say, S3 storage. Is this "cold storage" feature as powerful as Pulsar's Tiered Storage?

c. Are there any applications who are using it as an Event Store in production? If yes, any feedback?

Thank you all for making such a great software and make it available to all of us.

Answered by bruth

Jan 8, 2023

Hi @cloudcompute. Answers in order:

a. The short answer is yes. A stream in NATS (JetStream) is very well suited for event sourcing since every event can be published/appended to a subject that represents an aggregate/entity/consistency boundary of your choosing. For example, a stream called orders with subjects orders.* bound to it. Then all publishes/appends to orders.1, orders.2, etc. would be appended to that stream.

You can enforce optimistic concurrency on a stream-level or a per-subject level within a stream using a message header, Nats-Expected-Last-Sequence or Nats-Expected-Last-Subject-Sequence, respectively, whose value is the expected sequence. Of course, if the sequence in th…

View full answer

bruth · 2023-01-08T19:48:07Z

bruth
Jan 8, 2023
Maintainer

Hi @cloudcompute. Answers in order:

a. The short answer is yes. A stream in NATS (JetStream) is very well suited for event sourcing since every event can be published/appended to a subject that represents an aggregate/entity/consistency boundary of your choosing. For example, a stream called orders with subjects orders.* bound to it. Then all publishes/appends to orders.1, orders.2, etc. would be appended to that stream.

You can enforce optimistic concurrency on a stream-level or a per-subject level within a stream using a message header, Nats-Expected-Last-Sequence or Nats-Expected-Last-Subject-Sequence, respectively, whose value is the expected sequence. Of course, if the sequence in the published message does not match what the server has for the stream or subject, then the publish will be rejected.

This allows for concurrent appends across subjects without contention, linearizability on a per-subject basis (entity event stream), while still gaining a total order of events across all subjects within a stream for consumption.

Subjects are indexed within a stream, so the OCC check does not add overhead, and a stream in general can grow as large as you have resources to support it. On replay to build the state of the aggregate to accept a new event, if a consumer is filtered to a specific subject, since the index is present, it only performs a linear scan over the blocks between the earliest and latest events for that subject.

If applying CQRS, many consumers with optional subject-based filtering can be used to derive desired read models.

b. Currently, tiered storage support is not built-in as an option to a stream's retention policy. This has been discussed several times, but needs to be prioritized. Depending on whether snapshotting is being used in conjunction with event streams, a separate consumer that is archiving event blocks should be suitable. If there is a desired to have transparent infinite retention, then tiered storage would be ideal, otherwise this could be abstracted in Rita, for example.

c. There certainly has been an increasing amount of interest from the DDD/ES/CQRS community, however I don't have a list of folks using it in production today. To my knowledge @pavelnikolov has used NATS for event sourcing (based on his talk), but I will let him share his experience and whether it was in a production scenario.

What I will say is that all of these features that are required for a "proper event store" are being used in other use cases and there are many, many people using NATS/JetStream in production.

If there is a question about scale or performance, the NATS team will address any concerns or questions you have.

4 replies

yordis Apr 9, 2023

How could you read the stream backward?

gedw99 Apr 22, 2023

How could you read the stream backward?

A consumer that gets the events that were put in by the producer .

the consumer then materialised the data or if pipelined again onto another stream with again another consumer produce more materialised whatever .

Hope that makes sense ??

each consumer up the chain can put materialised data into anything . A sql db, a file System, a html Page. Basically whatever the outputs of a system needs .

matthewadams May 31, 2023

I'm coming a little late to the party. @bruth or @pavelnikolov, if they exist, can you point me to some examples of an ES+CQRS "hello world" built on NATS/JetStram/Rita? That would be an invaluable start for me (and others, I think).

matthewadams Jun 2, 2023

I'm coming a little late to the party. @bruth or @pavelnikolov, if they exist, can you point me to some examples of an ES+CQRS "hello world" built on NATS/JetStram/Rita? That would be an invaluable start for me (and others, I think).

Answering my own question, I found https://github.com/bruth/kmm

cloudcompute · 2023-01-09T14:31:31Z

cloudcompute
Jan 9, 2023
Author

@bruth Thanks for a huge write-up, this is great response.

Certainly, ES/CQRS is picking up as companies have started thinking about moving to microservices. I'd like to give a try to test implementing Nats for event sourcing. But I know it'd be a huge task to bring it in production.

Just one more question, Kafka and Pulsar use Bookkeeper/Zookeeper to store the events data which are optimised for real-time workloads. What does Nats use, is it a home-grown key-value store? If yes, why not something built on top of a key-value store like RocksDB.

Regards

4 replies

bruth Jan 9, 2023
Maintainer

I will do a brief summary here, but I recommend checking out a very thorough slide deck that compares Kafka and NATS by @jnmoyne.

NATS has it own custom storage layer that supports both in-memory and file-based persistence. It combines a traditional append-only log style structure for making writes fast, however it supports some traditional "data store" kind of operations, such as being able to get an individual message or mark a message for deletion. As noted above, indexing is maintained to support subject-based server-side filtering for consumers. There is also a handful of retention policies which the storage layer needs to support.

Kafka and Pulsar use Bookkeeper/Zookeeper to store the events data which are optimised for real-time workloads.

NATS has a very different architecture from Kafka and Pulsar. Many people compare a Kafka topic with a NATS stream, but this is not correct. It would be more accurate that a single partition is comparable to NATS stream since it is the unit of total ordering and replication. The slide deck above calls out the slew of other capabilities a stream has compared to a partition/topic so I won't enumerate them here.

Coming back to your question, based on some recent benchmarking efforts (which will be published in the future so I am omitting many details), write and read performance of a NATS stream (both throughput and latency) are better than a single Kafka partition in nearly every way. Many of the incredibly high throughput numbers being reported with Kafka and Pulsar are due to topics that have many partitions each of which support parallel writes and reads and with quite a lot of infra behind it.

If you do want to emulate a multi-partition subject space, this can be done using subject mapping.

So, in terms of "real-time-ness" a NATS stream has better latency characteristics, which, in my opinion is preferred especially for event sourcing and CQRS. In addition, NATS supports stateless messaging (N:M pub-sub and request-reply) for communication between components that do not require any persistence.

cloudcompute Jan 9, 2023
Author

write and read performance of a NATS stream (both throughput and latency) are better than a single Kafka partition.. So, in terms of "real-time-ness" a NATS stream has better latency characteristics

For this reason, I asked the question. If this is so, good enough !

cloudcompute Jan 9, 2023
Author

One more point: I have read a lot about Kafka, there is a lot of hype surrounding it. Confluent has published so many materials on its website (including a book), their diagrams (figures) to explain the topics are so well-drawn and very attractive. Probably, because of this advertising, Kafka is still popular.

Definitely, both Nats and Pulsar are technically superior than Kafka. But the materials, real-world examples, which companies are using them, their use cases, they don't match Kafka's counterpart.

bruth Jan 9, 2023
Maintainer

I agree. Confluent has does a great job at marketing, pushing content, and building up an ecosystem. This provides a lot of inertia for people assuming Kafka is the appropriate/superior choice.

shafqatevo · 2023-02-14T15:49:19Z

shafqatevo
Feb 14, 2023

Hi @bruth same use case as OP here - thanks for your elaborate answer and further explanations!

Is there any limits to the maximum number of Streams in JetStream? Or maximum number of subscribers or maximum concurrent consumers, etc.?

Is there any other such limits or bottlenecks or potential issue in order to achieve linear scalability for a large-scale event sourcing use case?

3 replies

bruth Mar 1, 2023
Maintainer

Hi @shafqatevo, sorry for the delay on this! There are two theoretical upper bounds with how much linear scale is possible within a single cluster. However, you can join clusters together to further expand the overall surface area of scale. There is an upper limit that as well, but in both of these cases, we have yet to see large-scale users (or customers on the Synadia side) hit these limits in practice. And as noted above, if there is a legit perf issue observed, the maintainers are quick to address.

The theoretical upper bound is based on the raft traffic. At a steady state, this would be heart beats among all the raft groups. Each stream and consumer having replicas >1 has an associated raft group. So there will be a small amount of traffic overhead per asset created. A theoretical upper may be on the order of 100s of thousands of assets could likely saturate the network and CPU of the servers within a given cluster.

From a cluster size perspective, since all servers form a full mesh, this means they are all chatting with each other at different times to share information. This would also have an upper bound in the sense that a cluster can only grow so large before saturation occurs.

I would love to learn more about your use specific use case to understand which dimensions of scale you could run into. Feel free to write here or DM me on NATS Slack.

shafqatevo Mar 16, 2023

Thanks @bruth for the insights... will understand the NATS scalability more and come back with questions if any...

gedw99 Apr 22, 2023

Thanks for this discussion …

Say you have v1 of a type going into to a stream. Tomorrow you ( or some actor) changes the type structure to v2 and is now pumping that v2 type into the stream.

consumers up the chain are taking the data and see that the type changed and so can react is need be. Because your using protobufs it all still works without compiling the consumer code again.. standard stuff…

Just for fun , I am trying out pumping schema change notifications into the system . The actor that does this is a something that observes that the schema of types has changed . This allows the nats control plane to be manipulated and version the streams !! So your streams are segregated by version of the same type. The consumers are also versioned.

this results in the pipeline having schema version affinity . Sort of like session affinity .

The reason I did this is because it’s exactly like how you create new micro services and leave the old ones running . The control plane can see up the producer chain and the consumer chain and work out when no one is pushing data of a certain schema version and then kill off those . Sort of like garbage collection .

cloudcompute · 2024-12-08T16:35:51Z

cloudcompute
Dec 8, 2024
Author

Hi @Fizmath

Thank you for sharing your work. I have few concerns/questions about the architecture.

a. Each microservice has its own PostgreSQL. Probably you are using CDC to stream the changes to NATS. I don't know whether you are using NATS (Jetstream) as 'an event store' and as single source of truth. If yes, I don't think there is any need for intermediary PostgreSQL. The events can be inserted inside Jetstream straightway.

b. As Payment, Customers, etc. are microservices, what is the term "monolith" about?

c. Ideally speaking, we should strive for an architecture where microservices can be built using any language of our choice.. particularly if we are using event-driven. But I don't know exactly how we can achieve this with little development effort. I know we can use Nats SDKs available for different languages. But do all SDKs support features like backpressure, no idea.

Is there any NATS proxy?

With Regards

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Nats Jetstream as an event store #3772

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 11 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Using Nats Jetstream as an event store #3772

Replies: 4 comments · 11 replies

bruth Jan 8, 2023 Maintainer

cloudcompute Jan 9, 2023 Author

bruth Jan 9, 2023 Maintainer

cloudcompute Jan 9, 2023 Author

cloudcompute Jan 9, 2023 Author

bruth Jan 9, 2023 Maintainer

bruth Mar 1, 2023 Maintainer

cloudcompute Dec 8, 2024 Author

Replies: 4 comments 11 replies

bruth
Jan 8, 2023
Maintainer

cloudcompute
Jan 9, 2023
Author

bruth Jan 9, 2023
Maintainer

cloudcompute Jan 9, 2023
Author

cloudcompute Jan 9, 2023
Author

bruth Jan 9, 2023
Maintainer

bruth Mar 1, 2023
Maintainer

cloudcompute
Dec 8, 2024
Author