Modular synchronizer #2065

arnaubennassar · 2023-05-02T14:29:07Z

arnaubennassar
May 2, 2023
Maintainer

This discussion suggest changes to be done in the synchronizer component for it to become more flexible, and enable alt implementations and/or different functionalities

Split by source

The first design principle is split the synchronizer into many synchronizers, according to the source (where the data is synchronized from).

The cmd should then spin up different synchronizers, each synchronizer should be autonomus, in a way that it shouldn't need to interact directly with other synchronizers. On the other hand this coordination among synchronizers can happen through DB as it already happens with other components

The current synchronizer should be splitted in the following synchronizers

Genesis

Split the logic to create the genesis state. Note that this is not a synchronizer as it doesn't download any data from an external source. Instead the data is provided manually by the node operator. However the result of building the genesis is contrasted with L1 as a sanity check

It could be interesting to consider moving this logic to some other package although it's not necessary. Keep in mind that there is genesis logic spread through some packages:

config/network.go: load genesis configuration either from file or pre-defined networks (hardcoded)
etherman/etherman.go:VerifyGenBlockNumber: check the block in which SCs where deployed
state/genesis.go: type definitions related to genesis
state/state.go:SetGenesis: function to actually set leafs into the merkletree
synchronizer/synchronizer.go:Sync before starting the main loop, it checks if the genesis is already added into the state. If it is not, it creates the genesis

Proposal:

a) Implement this as part of the L1 Sync (as it is right now)

b) Create a separated genesis package to encapsulate all this logic (except the config). cmd should call this before starting the actual synchronizers

L1 Sync (`synchronizer/l1sync`)

Consume events and does polling to read the latest changes on the L1 SCs, it captures the following information:

Last virtual batch number
Last verified batch number
Build virtual state (re-orging the trusted state if needed)
Mark batches as verified (halting the sync if a root missmatch happens)
Register forced batches
Register GERs
Register ForkIDs
...

This part of the synchronizer is higly coupled with the etherman, the downside is that any changes on the L1 SCs can break this synchronizer, not to mention an escenario where many contracts are used to accomodate different networks. Therefore it's needed to build some interfaces to make this synchronizer flexible. The idea is to have a generic L1 synchronizer that it's agnostic to the events and the functions to poll latest changes, instead this synchronizers will take care of calling this two types of functions, in a pattern of handler registration

// cmd package

// ...
	ethMan, _ := etherman.NewClient()
	eh := l1eventhandlers.New()
	p := l1poll.New()
	l1Sync := l1sync.New(p.LatestPollers(), eh.EventHandlers(), *ethMan)
	l1Sync.Start()

	trustedSync := trustedsync.New()
	trustedSync.Start()
// ...

// -----
//
// ==> NOTE: Etherman becomes agnostic to the event content, just returns an array of Logs!!
// We should separate the Etherman that does generic Ethereum interactions from the SC specific, this way any implementation can reuese the "generic" Etherman.
// Proposal: one SC : one package
// As for this specific proposal for the synchronizer, the L1Sync should use the generic Etherman, while l1eventhandlers and l1poll the SC specific packages generated from abi
package etherman


func (etherMan *Client) GetEventsByBlockRange(ctx context.Context, fromBlock uint64, toBlock *uint64) ([]types.Log, error) {
	// Filter query
	query := ethereum.FilterQuery{
		FromBlock: new(big.Int).SetUint64(fromBlock),
		Addresses: etherMan.SCAddresses,
	}
	if toBlock != nil {
		query.ToBlock = new(big.Int).SetUint64(*toBlock)
	}
	blocks, blocksOrder, err := etherMan.readEvents(ctx, query)
	if err != nil {
		return nil, nil, err
	}
	return etherMan.EthClient.FilterLogs(ctx, query)
}

// -----
//
// ==> NOTE: l1sync is agnostic to underlaying SCs  l1poll and l1eventhandlers are the actual implementations that are aware of the SC, and l1sync can receive handlers from many packages that implement the logic, in a way that there can be a mix & match of different implementations. Specially in the case of events, this allows to reuse functions implemented for other SC as long as the events follow the same format and logic
//

package l1sync // synchronizer/l1sync/l1sync.go

// latestPoller queries L1 SC to get the latest value of the SC state
// if the func returns error, the block associated updates will rollback
type LatestPollers []func(dbTx pgx.Tx) error

type EventHandlers map[common.Hash]func(log types.Log, dbTx pgx.Tx) error

type L1Sync struct {
	latestPollers LatestPollers
	eventHandlers EventHandlers
	etherMan      etherman.Client
}

func New(
	latestPollers LatestPollers,
	eventHandlers EventHandlers,
	etherMan etherman.Client,

	// Used to creat dbTx
	// st stateInterface,
	// ...
) L1Sync {
	return L1Sync{
		// ...
	}
}

func (s *L1Sync) Start() {
	// go loop to periodically call the latestPollers
	go s.pollLatestSCChanges()
	// go loop to sync events by block range
	go s.syncEvents()
}

func (s *L1Sync) pollLatestSCChanges() {
	for {
		dbTx, err := s.state.BeginStateTransaction(s.ctx)
		for _, f := range s.latestPollers {
			err := f(dbTx)
			if err != nil {
				// Rollback dbTx
			}
		}
		// Commit dbTx
	}
}

func (s *L1Sync) syncEvents() {
	for {
		dbTx, err := s.state.BeginStateTransaction(s.ctx)
		events, _ := s.etherMan.GetEventsByBlockRange(context.TODO(), 0, nil)
		for _, e := range events {
			f, ok := s.eventHandlers[e.Topics[0]]
			if !ok {
				log.Warn("unsupported event")
			} else {
				log.Info("processing event ...")
				err := f(e, dbTx) // NOTE: events are processed in order, so no need for the Block struct
				if err != nil {
					// Rollback dbTx
				}
			}
		}
		// Commit dbTx
	}
}

// -----

package l1eventhandlers // synchronizer/l1sync/l1eventhandlers/l1eventhandlers.go

import (
	"github.com/arnaubennassar/sync-playground/synchronizer/l1sync"
	"github.com/ethereum/go-ethereum/core/types"
	"github.com/ethereum/go-ethereum/crypto"
	"github.com/jackc/pgx/v4"
)

type EventHandler struct {
	// state stateInterface
	// etherman
	// ...
}

func New() EventHandler {
	return EventHandler{}
}

func (eh EventHandler) EventHandlers() l1sync.EventHandlers {
	handlers := make(l1sync.EventHandlers)
	handlers[Event1SignatureHash] = eh.Event1Handler
	handlers[Event2SignatureHash] = eh.Event2Handler
	return handlers
}

var Event1SignatureHash = crypto.Keccak256Hash([]byte("Event1(bytes32,bytes32)"))

func (e *EventHandler) Event1Handler(log types.Log, dbTx pgx.Tx) error {
	// do some parsing with the log
	// store something into the DB
	return nil
}

var Event2SignatureHash = crypto.Keccak256Hash([]byte("Event2(bytes32,bytes32)"))

func (e *EventHandler) Event2Handler(log types.Log, dbTx pgx.Tx) error {
	// do some parsing with the log
	// store something into the DB
	return nil
}

// -----

package l1poll // synchronizer/l1sync/l1poll/l1poll.go

import (
	"github.com/arnaubennassar/sync-playground/synchronizer/l1sync"
	"github.com/jackc/pgx/v4"
)

type L1Poll struct {
	// state
	// etherman
	// ...
}

func New( /*...*/ ) L1Poll {
	return L1Poll{}
}

func (lp *L1Poll) LatestPollers() l1sync.LatestPollers {
	return l1sync.LatestPollers{
		lp.FetchItem1,
	}
}

func (lp *L1Poll) FetchItem1(dbTx pgx.Tx) error {
	// Query SC for item 1
	// Store item 1
	return nil
}

Trusted Sequencer JSON RPC sync (`synchronizer/trustedsync`)

Does polling on the JSON RPC of the trusted sequencer in order to build the trusted state

This synchronizer can be understood as the client side of the trusted sequencer server, and it needs to be developed accordingly. Therefore here the only change required is to actually decouple this from the monolithic synchronizer, and additionally add some coordination mechanisms:

In order to coordinate with the l1sync, the trustedsync needs to be aware of the latest batch seen on Ethereum vs the last synchronized. If last batch on DB < last batch on Ethereum the trsutedsync will be halted
Inserts need to be atomic: it's needed to detect if the l1sync has discarded some batches before proceeding with the insert, in other words, only insert if the last state root used to call the executor is still the same when inserting

arnaubennassar · 2023-05-05T21:24:29Z

arnaubennassar
May 5, 2023
Maintainer Author

Reggarding how to perform the insert atomics, this can be achieved at least using two techniques from Postgres:

Solution using isolation

PROS:

easy to implement, just needs to:
- instantiate the dbTx like this: BeginTx(ctx, pgx.TxOptions{IsoLevel: pgx.Serializable})
- and handle the potential error due to failed to serialize

CONS:

potential performance impact
could create many retries attempts for the virtual state sync (as this introduces necessary error handling), but this should only be a problem in case of trusted reorg
trusted sync needs to do an artifical select to invalidate the insert if the selected content has changed

Solution using explicit locking

as said in the doc:

FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row (or no row, if the row was deleted). Within a REPEATABLE READ or SERIALIZABLE transaction, however, an error will be thrown if a row to be locked has changed since the transaction started

PROS:

easy to reason about: just select the content that you want to make the tx block
granular and concious locking
txs will get blocked and wait until the other finishes without error handling

CONS:

harder to implement as it needs specific select ... for update queries for locking

Example

Given the following test cases:

Case 1:

Trusted sync init tx
L1 sync init tx
Trusted sync reads latest state root, so the dbTx get's invalidated if changed
Trusted sync open batch N+1
Trusted sync closes batch N+1
L1 sync reorgs from batch N-1
L1 sync commits
Trusted sync commits

Expected:

For isolation strategy: Last batch is N-1, trusted sync commit fails due to change on the readed value
For explicit locking strategy: Last batch is N-1, trusted sync commit detects that the last batch num has changed and doesn't do the insert

Case 2:

Trusted sync init tx
L1 sync init tx
Trusted sync reads latest state root, so the dbTx get's invalidated if changed
Trusted sync open batch N+1
Trusted sync closes batch N+1
L1 sync reorgs from batch N-1
Trusted sync commits
L1 sync commits

Expected:

For isolation strategy: Last batch is N+1, L1 sync commit fails due to change on the readed value
For isolation strategy: Last batch is N-1, L1 sync waits until Trusted sync does the insert and then does the reorg (also reorging the just inserted batch from the trusted)

package state

import (
	"context"
	"errors"
	"math/big"
	"sync"
	"testing"

	"github.com/0xPolygonHermez/zkevm-node/db"
	"github.com/0xPolygonHermez/zkevm-node/log"
	"github.com/0xPolygonHermez/zkevm-node/test/dbutils"
	"github.com/ethereum/go-ethereum/common"
	"github.com/jackc/pgconn"
	"github.com/jackc/pgx/v4"
	"github.com/jackc/pgx/v4/pgxpool"
	"github.com/stretchr/testify/require"
)

func TestConcurrentSync(t *testing.T) {
	// Setup
	stateDBCfg := dbutils.NewStateConfigFromEnv()
	if err := dbutils.InitOrResetState(stateDBCfg); err != nil {
		panic(err)
	}
	stateDB, err := db.NewSQLDB(stateDBCfg)
	state := NewPostgresStorage(stateDB)
	require.NoError(t, err)
	defer stateDB.Close()

	ctx := context.Background()
	setupDBTx, err := stateDB.Begin(ctx)
	require.NoError(t, err)

	// Insert N trusted batches
	nTrusteBatches := 5
	for i := 1; i <= nTrusteBatches; i++ {
		require.NoError(t, state.openBatch(ctx, ProcessingContext{BatchNumber: uint64(i)}, setupDBTx))
		require.NoError(t, state.closeBatch(ctx, ProcessingReceipt{
			BatchNumber: uint64(i),
			StateRoot:   common.BigToHash(big.NewInt(int64(i))),
		}, setupDBTx))
	}
	require.NoError(t, setupDBTx.Commit(ctx))

	// Using isolation
	//	Case 1
	trustedDBTx, l1DBTx := doDBInteractionsUsingIsolation(t, nTrusteBatches, state, stateDB)
	expectedLastBatch := uint64(nTrusteBatches - 1)
	// L1 sync commits first
	assertsIsolation(t, expectedLastBatch, l1DBTx, trustedDBTx, state, stateDB)
	nTrusteBatches = int(expectedLastBatch)

	//	Case 2
	trustedDBTx, l1DBTx = doDBInteractionsUsingIsolation(t, nTrusteBatches, state, stateDB)
	expectedLastBatch = uint64(nTrusteBatches + 1)
	// Trusted sync commits first
	assertsIsolation(t, expectedLastBatch, trustedDBTx, l1DBTx, state, stateDB)
	nTrusteBatches = int(expectedLastBatch)


	// Using isolation
	//	Case 1
	var wg sync.WaitGroup
	trustedDBTx, l1DBTx = doDBInteractionsUsingLocks(t, nTrusteBatches, true, state, stateDB, &wg)
	expectedLastBatch = uint64(nTrusteBatches - 1)
	// L1 sync commits first
	assertsLocks(t, expectedLastBatch, l1DBTx, trustedDBTx, state, stateDB, &wg)
	nTrusteBatches = int(expectedLastBatch)

	//	Case 2
	trustedDBTx, l1DBTx = doDBInteractionsUsingLocks(t, nTrusteBatches, false, state, stateDB, &wg)
	expectedLastBatch = uint64(nTrusteBatches - 1)
	// Trusted sync commits first
	assertsLocks(t, expectedLastBatch, trustedDBTx, l1DBTx, state, stateDB, &wg)
}

func doDBInteractionsUsingIsolation(t *testing.T, nTrusteBatches int, state *PostgresStorage, stateDB *pgxpool.Pool) (trustedDBTx, l1DBTx pgx.Tx) {
	ctx := context.Background()
	var err error
	trustedDBTx, err = stateDB.BeginTx(ctx, pgx.TxOptions{IsoLevel: pgx.Serializable})
	require.NoError(t, err)
	l1DBTx, err = stateDB.BeginTx(ctx, pgx.TxOptions{IsoLevel: pgx.Serializable})
	require.NoError(t, err)

	// Trusted sync interactions
	var actualRoot string
	err = trustedDBTx.QueryRow(ctx, "SELECT state_root FROM state.batch ORDER BY batch_num DESC LIMIT 1").Scan(&actualRoot)
	require.NoError(t, err)
	require.Equal(t, common.BigToHash(big.NewInt(int64(nTrusteBatches))).Hex(), actualRoot)
	require.NoError(t, state.openBatch(ctx, ProcessingContext{BatchNumber: uint64(nTrusteBatches + 1)}, trustedDBTx))
	require.NoError(t, state.closeBatch(ctx, ProcessingReceipt{BatchNumber: uint64(nTrusteBatches + 1)}, trustedDBTx))

	// L1 sync interactions
	const resetSQL = "DELETE FROM state.batch WHERE batch_num > $1"
	_, err = l1DBTx.Exec(ctx, resetSQL, nTrusteBatches-1)
	require.NoError(t, err)

	return
}

func assertsIsolation(t *testing.T, expectedLastBatchNum uint64, firstCommiter, secondCommiter pgx.Tx, state *PostgresStorage, stateDB *pgxpool.Pool) {
	ctx := context.Background()
	require.NoError(t, firstCommiter.Commit(ctx))
	// https://github.com/jackc/pgx/wiki/Error-Handling
	err := secondCommiter.Commit(ctx)
	require.NotNil(t, err)
	var pgErr *pgconn.PgError
	require.True(t, errors.As(err, &pgErr))
	require.Equal(t, "40001", pgErr.Code)
	bn, err := state.GetLastBatchNumber(ctx, nil)
	require.NoError(t, err)
	require.Equal(t, expectedLastBatchNum, bn)
}

func doDBInteractionsUsingLocks(t *testing.T, nTrusteBatches int, l1GoesFirst bool, state *PostgresStorage, stateDB *pgxpool.Pool, wg *sync.WaitGroup) (trustedDBTx, l1DBTx pgx.Tx) {
	wg.Add(2)
	log.Debug("INIT doDBInteractionsUsingLocks")
	ctx := context.Background()
	var err error
	trustedDBTx, err = stateDB.Begin(ctx)
	require.NoError(t, err)
	l1DBTx, err = stateDB.Begin(ctx)
	require.NoError(t, err)

	// Lock
	const lockQuery = "SELECT batch_num FROM state.batch ORDER BY batch_num DESC LIMIT 1 FOR UPDATE"
	trustedInteractions := func() {
		// Trusted sync interactions
		var actualBatchNum int
		require.NoError(t, trustedDBTx.QueryRow(ctx, lockQuery).Scan(&actualBatchNum))
		log.Warnf("trustedInteractions reads actualBatchNum: %d", actualBatchNum)
		if actualBatchNum == nTrusteBatches {
			log.Warn("Trusted Interactions executing")
			// L1 Sync has not reorged yet, let's insert next trusted batch
			require.NoError(t, state.openBatch(ctx, ProcessingContext{BatchNumber: uint64(nTrusteBatches + 1)}, trustedDBTx))
			require.NoError(t, state.closeBatch(ctx, ProcessingReceipt{BatchNumber: uint64(nTrusteBatches + 1)}, trustedDBTx))
			log.Warn("Trusted Interactions DONE executing")
		}
		wg.Done()
	}
	l1Interactions := func() {
		// L1 sync interactions
		var actualBatchNum int
		require.NoError(t, l1DBTx.QueryRow(ctx, lockQuery).Scan(&actualBatchNum))
		log.Warnf("l1Interactions reads actualBatchNum: %d", actualBatchNum)
		log.Warn("L1 Interactions executing")
		/*
			NOTE: If the trusted sync locks first, actualBatchNum will still be nTrusteBatches
			instead of nTrusteBatches + 1. This is because nTrusteBatches doesn't get modified.
			However this is not a problem in reality, on the countrary, this is the desired behaviour:
			1. Trusted sync adds the next batch atomically
			2. L1 Sync waits until Trusted sync is done
			3. Trusted sync finishes inserting nTrusteBatches + 1
			4. L1 Sync gets unlocked and deletes nTrusteBatches and nTrusteBatches + 1

			This is not how the test using isolation behaves, as the test finish when the error gets detected, but right after that
			the L1 Sync should re-try the reorg query and achieve the same result
		*/
		const resetSQL = "DELETE FROM state.batch WHERE batch_num > $1"
		_, err = l1DBTx.Exec(ctx, resetSQL, nTrusteBatches-1)
		require.NoError(t, err)
		log.Warn("L1 Interactions DONE executing")
		wg.Done()
	}
	if l1GoesFirst {
		l1Interactions()
		go trustedInteractions()
	} else {
		trustedInteractions()
		go l1Interactions()
	}
	return
}

func assertsLocks(t *testing.T, expectedLastBatchNum uint64, firstCommiter, secondCommiter pgx.Tx, state *PostgresStorage, stateDB *pgxpool.Pool, wg *sync.WaitGroup) {
	ctx := context.Background()
	require.NoError(t, firstCommiter.Commit(ctx))
	wg.Wait()
	require.NoError(t, secondCommiter.Commit(ctx))
	bn, err := state.GetLastBatchNumber(ctx, nil)
	require.NoError(t, err)
	require.Equal(t, expectedLastBatchNum, bn)
}

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modular synchronizer #2065

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Modular synchronizer #2065

arnaubennassar May 2, 2023 Maintainer

Split by source

Genesis

L1 Sync (synchronizer/l1sync)

Trusted Sequencer JSON RPC sync (synchronizer/trustedsync)

Replies: 1 comment

arnaubennassar May 5, 2023 Maintainer Author

Solution using isolation

Solution using explicit locking

Example

Case 1:

Case 2:

arnaubennassar
May 2, 2023
Maintainer

L1 Sync (`synchronizer/l1sync`)

Trusted Sequencer JSON RPC sync (`synchronizer/trustedsync`)

arnaubennassar
May 5, 2023
Maintainer Author