What are the scaling characteristics as the number of shapes and the number of transactions increases? #2146

scottmessinger · 2024-12-11T02:36:07Z

scottmessinger
Dec 11, 2024

Apologies in advance if the subject is unclear -- I'm trying to understand how Electric checks if a shape is invalid and the implications of that approach.

If I'm following the code correctly, it appears like each shape is compared against each transaction. So, if we have 1000 clients and each has 10 shape subscriptions, each transaction would be checked against 10,000 shapes. If 1000 clients send 1 txn/sec, that would be 1,000 txn/sec * 10,000 shapes = 10,000,000 shape checks/sec. If we get 1000 additional clients (each with 10 shape subscriptions) and sending 1 txn/sec, we'd have 2,000 txn/sec * 20,000 shapes = 40,000,000 shape checks/second.

Is that correct? If not, it seems like shape checking scales exponentially.

If that's wrong, could you clarify how it works? I've looked at your benchmarks page, but it doesn't seem to speak to this question (or, I'm completely misreading the appropriate graph)

KyleAMathews · 2024-12-11T03:16:48Z

KyleAMathews
Dec 11, 2024
Maintainer

We create an index for common where clauses so it's very fast to do shape checks.

Top Graph: The WHERE clause is in the form field = constant, where each shape is assigned a unique constant. These types of WHERE clause, along with similar patterns, are optimised for high performance regardless of the number of shapes — analogous to having an index on the field. As shown in the graph, the latency remains consistently flat at 6ms as the number of shapes increases. This 6ms latency includes 3ms for PostgreSQL to process the write operation and 3ms for Electric to propagate it. We are actively working to optimise additional WHERE clause types in the future.

The graph is the time for the client to get the write (5-6ms) but the actual check is a lot faster — around 0.5 μs for checking each transaction. So 2000 transactions takes around ~1ms to match against however many shapes you have. There's more work to do of course if there's a matching shape but a 2000 transactions / second scenario should fit very comfortably in Electric.

@robacourt's just barely landed the optimization last week (#2076) I'm not sure if he's tried yet to max out the tx/sec rate with the new code path.

2 replies

scottmessinger Dec 11, 2024
Author

Thanks, @KyleAMathews! Is the check happening for each field in the shape? For instance, if I have a where clause checking 10 fields, is that 10x the checks as if I have a where clause with 1 field?

KyleAMathews Dec 11, 2024
Maintainer

The check is the other way I believe (I'll page @robacourt to give a better answer) — when a transaction comes in, it checks which where clauses match it. So it doesn't really matter how many where clauses a shape has.

KyleAMathews · 2024-12-11T03:21:50Z

KyleAMathews
Dec 11, 2024
Maintainer

Our general goal is to be able to go as fast as Postgres can go which of course depends on a ton of stuff but can be upwards of 20k tx / second.

0 replies

robacourt · 2024-12-18T14:40:55Z

robacourt
Dec 18, 2024
Maintainer

The situation you are talking about @scottmessinger, with 1000 clients, each with 10 shapes and writing 1 transaction a second would indeed lead to Electric having to make 10,000 checks per second but only if the where clauses are not optimised. If the where clauses are optimised then Electric can check all 10,000 shapes at once in 0.2 milliseconds. What are optimised where clauses? Well, field = constant is one example of a pattern we optimise, we can evaluate millions of these where clauses at once by indexing the shapes based on the constant value for each shape. This index is internal to Electric, and nothing to do with Postgres indexes. It's a hashmap if you're interested. field = const AND another_condition is another pattern we optimise. We aim to optimise a large subset of Postgres where clauses in the future. Optimised where clauses mean that we can process writes in a fraction of a millisecond, regardless of how many shapes there are.

Is the check happening for each field in the shape? For instance, if I have a where clause checking 10 fields, is that 10x the checks as if I have a where clause with 1 field?

The simple answer is yes, but it depends on your where clause. What I would say is if you are noticing poor write throughput with electric let us know what your where clause is and we might be able to optimise it or give you suggestions on how to change it to be more optimal.

Thanks for these questions @scottmessinger . It's prompted me to add our write throughput benchmarks and an explanation of optimised where clauses to our benchmarking page .

4 replies

scottmessinger Dec 20, 2024
Author

@robacourt Thank you for the clarification! Followup question:

This index is internal to Electric, and nothing to do with Postgres indexes. It's a hashmap if you're interested.

So, is this index replicated on each node running the Elixir service? If so, how is it kept consistent? Related, if a node crashes, how is the shape index recreated?

KyleAMathews Dec 20, 2024
Maintainer

The index is updated as new shapes are added. If the node crashes, then the shapes would be deleted along the index and it'd be recreated as clients start requesting shapes again.

scottmessinger Dec 20, 2024
Author

@KyleAMathews Thanks!

The index is updated as new shapes are added.

So, when a new shape is added, it's fanned out to all connected nodes?

If the node crashes, then the shapes would be deleted along the index

Will the clients know the node is down and they need to resend their shapes? Or, will they stop receiving updates silently?

KyleAMathews Dec 20, 2024
Maintainer

So, when a new shape is added, it's fanned out to all connected nodes?

There's only one electric server per database so no fanning out.

Will the clients know the node is down and they need to resend their shapes?

Clients just retry on errors so if the electric server went down they'd just keep retrying until it came back

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What are the scaling characteristics as the number of shapes and the number of transactions increases? #2146

{{title}}

Replies: 3 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

What are the scaling characteristics as the number of shapes and the number of transactions increases? #2146

scottmessinger Dec 11, 2024

Replies: 3 comments · 6 replies

KyleAMathews Dec 11, 2024 Maintainer

scottmessinger Dec 11, 2024 Author

KyleAMathews Dec 11, 2024 Maintainer

KyleAMathews Dec 11, 2024 Maintainer

robacourt Dec 18, 2024 Maintainer

scottmessinger Dec 20, 2024 Author

KyleAMathews Dec 20, 2024 Maintainer

scottmessinger Dec 20, 2024 Author

KyleAMathews Dec 20, 2024 Maintainer

scottmessinger
Dec 11, 2024

Replies: 3 comments 6 replies

KyleAMathews
Dec 11, 2024
Maintainer

scottmessinger Dec 11, 2024
Author

KyleAMathews Dec 11, 2024
Maintainer

KyleAMathews
Dec 11, 2024
Maintainer

robacourt
Dec 18, 2024
Maintainer

scottmessinger Dec 20, 2024
Author

KyleAMathews Dec 20, 2024
Maintainer

scottmessinger Dec 20, 2024
Author

KyleAMathews Dec 20, 2024
Maintainer