Marlowe Runtime Chain Index Architecture #180
Replies: 21 comments 80 replies
-
Interface summary: querySynchronizationStatus :: ClientM SynchronizationStatus
queryContractRoster :: ContractFilter -> ClientM ContractRoster
queryTransactions :: MonadIO m => Set TransactionId -> ClientM (Producer TransactionResult m ())
streamChainEvents :: MonadIO m => ClientM (Producer ChainEvent m ())
-- pure helpers
applyFilter :: ContractFilter -> ContractRoster -> ContractRoster Note that In this context, |
Beta Was this translation helpful? Give feedback.
-
Architecture diagram: graph
node((node socket))
consumer(("consumer(s)"))
db[(Database)]
config[(Config)]
qport((Query Port))
sport((Streaming Port))
subgraph Chain Index
epoch[Marlowe Epoch]
intersect[Intersection Points]
env[Environment]
sync[Chain Sync Client]
dbcon[DB connection]
status[Synchronization Status]
persist[Chain Persister]
sapi[Streaming API]
qapi[Query API]
qsock[Query Socket]
ssock[Streaming Socket]
end
sync-->node
config-->db
sync-->epoch
epoch-->env
qapi-->qsock
qapi-->status
qapi-->dbcon
qsock-->env
qsock--->qport
sapi-->ssock
sapi-->status
sapi-->sync
ssock-->env
ssock--->sport
status-->sync
persist-->status
sync-->intersect
intersect-->dbcon
persist-->dbcon
env-->config
dbcon-->env
dbcon-->db
persist-->sync
consumer-->qport
consumer-->sport
|
Beta Was this translation helpful? Give feedback.
-
I'm realizing that the consumer has no way to apply a contract filter to a contract roster with the information it contains. I think that |
Beta Was this translation helpful? Give feedback.
-
Interface draft v2: querySynchronizationStatus :: ClientM SynchronizationStatus
queryContractRoster :: ContractFilter -> ClientM ContractRoster
queryTransactions :: MonadIO m => Set TransactionId -> ClientM (Producer TransactionResult m ())
streamChainEvents :: MonadIO m => ContractFilter -> ClientM (Producer ChainEvent m ()) The filter passed to |
Beta Was this translation helpful? Give feedback.
-
I'm also realizing that the querySynchronizationStatus :: ClientM SynchronizationStatus
queryTransactions :: ContractFilter -> ClientM (Producer Transaction m ())
streamChainEvents :: MonadIO m => ContractFilter -> ClientM (Producer ChainEvent m ())
data ChainEvent
= SynchronizationComplete
| RollForward ChainTip [Transaction]
| RollBackward ChainTip This should dramatically simplify the API, and consequentially, the consumer. |
Beta Was this translation helpful? Give feedback.
-
@jhbertra, Is the "Roster" mechanism something standard? Do you have a reference to an explanation of it? Maybe it would be good to add it to the discussion. I saw you wrote rationale for using it, but it would be useful to have some more information about what a Roster is, how does it work, etc |
Beta Was this translation helpful? Give feedback.
-
@jhbertra, is it the case that the chain index has no Marlowe-specific knowledge aside from (1) the roster knows the opaque |
Beta Was this translation helpful? Give feedback.
-
Query/Filtering RequirementsTransaction InformationThe Marlowe-relevant information in
This information is essential for reconstructing the Query Patterns
References
|
Beta Was this translation helpful? Give feedback.
-
In Functional Requirements 2.: "Given the synchronization status of the chain index is not Synchronized, the response should indicate that that the roster is possibly incomplete." This could be expressed with Phantom Types if that makes sense with how the roster is to be used.
Then the compiler can keep us from mixing synchronized and unsynchronized rosters accidentally. And we can define operations |
Beta Was this translation helpful? Give feedback.
-
@palas @bwbush @dino-iog Can we explore a different idea for a minute? I would like to break out of the weeds of this one proposed solution and consider a different approach to clear our thinking a bit. What if our chain index consisted of:
In this sense, it would be quite similar to what we already have (the PAB implementation) with a couple of crucial differences:
|
Beta Was this translation helpful? Give feedback.
-
This is a modification to the previous proposal which was motivated by @palas raising concerns about consistency / transactionality. The idea is that the downstream consumers of the chain index may need to aggregate data across multiple queries that should all come from a consistent view of the ledger. If the API of the chain index requires consumers to perform one call per query, we would need to do extra work to ensure the results are consistent with one another. Instead, what we could do is only expose a single API which accepts a composite query and runs them all against a consistent view of the data, returning the results in bulk. Here is an example of the proposed API: -- Previous proposal
queryUtxosAtAddress :: MonadChainIndex m => AddressAny -> m [TxOut]
queryTxOut :: MonadChainIndex m => TxOutRef -> m (Maybe TxOut)
...
-- New proposal
data ChainIndexQuery a where
GetUtxosAtAddress :: AddressAny -> ChainIndexQuery [TxOut]
GetTxOut :: TxOutRef -> ChainIndexQuery (Maybe TxOut)
...
GetProduct :: ChainIndexQuery a -> ChainIndexQuery b -> ChainIndexQuery (a, b)
queryChainIndex :: MonadChainIndex m => ChainIndexQuery a -> m a This would take care of consistent queries, as it transforms multiple calls into a single call. It is also more efficient as it avoids multiple network round-trips. We would also need to consider synchronization between receiving a callback and performing a query as a follow-up. When awaiting an on-chain event, we will often want to perform one or more queries after we receive the callback. In the time between when the chain index calls us back and when it receives our follow-up query, there is a chance that the chain state has changed. This is especially true when the chain index is synchronizing. We can employ a similar technique to the previous one to bring the two together into the same call, but it requires a bit more work, because we need to consider how to use the data associated with event results as parameters for the queries. This will likely involve some complex type-level machinery and I want to get a sense of what people think of this approach before diving too deep into design here. |
Beta Was this translation helpful? Give feedback.
-
One observation that I've made is that different components of the runtime require radically different query patterns. The history processor is predominantly push-driven, in that it needs to traverse transactions and receive notifications when the chain is updated. Transaction balancing, on the other hand, is entirely pull-driven, in that it needs to query existing UTxOs, but is unconcerned about updates to the chain. This is part of what makes designing this component so challenging - the design needs to satisfy both sets of requirements. |
Beta Was this translation helpful? Give feedback.
-
I'd like to consider yet another approach for this component that side-steps some of the complexity we've been considering by pushing it downstream. The approach would be to make the chain index basically act like a smart proxy for the chain sync protocol. NOTE If you are not already familiar with the chain sync protocol, I highly recommend you familiarize yourself with it first before reading this proposal. A good example of its use can be seen here The responsibilities of the chain index would be:
This would allow downstream components to be built as state machines that pull the next information in the chain they need and do with it as they see fit. Benefits of this approach include:
Drawbacks of the approach include:
Here is a draft of the protocol types, adapted from newtype FilteredChainSyncClient (query :: Type -> Type) point tip m a = FilteredChainSyncClient
{ runFilteredChainSyncClient :: m (ClientStIdle query point tip m a) }
data ClientStIdle query point tip m a where
SendMsgRequestNext
:: query result -- ^ a query that extracts a result from a block
-> ClientStNext result query point tip m a -- handler to run if the server responds immediately
-> m (ClientStNext result query point tip m a) -- handler to run if the server tells us to wait
-> ClientStIdle query point tip m a
SendMsgFindIntersect
:: [point]
-> ClientStIntersect query point tip m a
-> ClientStIdle query point tip m a
SendMsgDone
:: a
-> ClientStIdle query point tip m a
data ClientStNext result query point tip m a = ClientStNext
{ recvMsgRollForward :: result -- the result of the query
-> tip -- information about the tip of the chain
-> ChainSyncClient query point tip m a -- continuation client
, recvMsgRollBackward :: point -- rollbback point
-> tip -- information about the tip of the chain
-> ChainSyncClient query point tip m a -- continuation client
}
data ClientStIntersect query point tip m a = ClientStIntersect
{ recvMsgIntersectFound :: point -- found intersection point
-> tip -- information about the tip of the chain
-> ChainSyncClient query point tip m a -- continuation client
, recvMsgIntersectNotFound :: tip --- information about the tip of the chain
-> ChainSyncClient query point tip m a -- continuation client
} The main difference between this and the regular
The concrete argument for the To summarize, whereas the regular chain sync protocol allows you to define a state machine that handles each block in the chain sequentially, the filtered chain sync protocol allows you to define a similar state machine that handles a sequence of chain events which are at the discretion of the client to specify. |
Beta Was this translation helpful? Give feedback.
-
@jhbertra, which of the following are still under consideration in the design for the chain index?
|
Beta Was this translation helpful? Give feedback.
-
At this point, we've discussed four separate proposals, which I'd like to summarize here: 1. Roster SyncThe Chain Index pushes roster changes to clients. The Roster is a map that shows A) which Marlowe contracts exist and B) what transactions are associated with each Marlowe contract. The Roster only includes identifiers, both for contracts and transactions. The client can control which contracts the Roster contains via a The Chain Index also provides a bulk download API for transactions. When the client sees new transaction IDs in its Roster, it can use this API to download the transactions. Benefits
Drawbacks
2. Queries and CallbacksThe Chain Index exposes a set of queries and a set of event callbacks via its API. Clients use queries to load data from the blockchain and event callbacks to wait for new transactions. Clients receive a synchronization token from event callbacks. This token describes the client's "position" in the blockchain as of the event occurrence. The client then provides this token to subsequent queries and event requests. The Chain Index uses the token to give the client a consistent view of the blockchain. It also detects when the client needs to be informed of a rollback. Benefits
Drawbacks
3. Aggregated Queries and CallbacksThis option is a variation of the second one. Clients send batch queries to the Chain Index and receive batch results instead of threading a synchronization token through multiple queries. Continuation queries accompany event callbacks so the Chain index can ensure consistency between event results and query results. Benefits
Drawbacks
4. Extended Chain SyncDownstream clients communicate with the Chain Index via a modified version of the Chain Sync protocol. The Filtered Chain Sync protocol extends the base Chain Sync protocol by adding a
Benefits
Drawbacks
|
Beta Was this translation helpful? Give feedback.
-
Recap of discussion from video call:
|
Beta Was this translation helpful? Give feedback.
-
I am going to write here a couple of what-if questions like we talked about, to understand better how the option 4 works, and to test it, and I will write them as separate replies to make it easier to discuss:
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Marlowe Runtime Chain Index Entity-Relationship Diagram v1 erDiagram
Block ||--o{ Transaction : contains
Transaction ||--|{ TransactionInput : receives
Transaction ||--|{ TransactionOutput : produces
TransactionOutput ||--o{ AssetTransfer : transfers
TransactionOutput ||--o| TransactionInput : feeds
TransactionInput ||--o| PlutusScriptWitness : provides
AssetClass ||--o{ AssetTransfer : classifies
AssetClass ||--o{ AssetMint : classifies
Transaction ||--o{ AssetMint : mints
Block {
bytea headerHash PK
bigint slotNo "INDEX"
bigint blockNo
}
Transaction {
bytea id PK
bytea headerHash FK
bigint validityLowerBound "NULL"
bigint validityUpperBound "NULL"
bytea metadataKey1564 "NULL"
}
TransactionOutput {
uuid id PK
bytea transactionId FK
int index "INDEX"
bytea datum "NULL"
bytea address "INDEX"
bigint lovelace
}
TransactionInput {
uuid id PK
uuid transactionOutputId FK
bytea transactionId FK
}
PlutusScriptWitness {
uuid transactionInputId
bytea datum
bytea redeemer
}
AssetClass {
uuid id PK
bytea policyId FK
text name "INDEX"
}
AssetTransfer {
uuid transactionOutputId FK
uuid assetClassId FK
bigint quantity
}
AssetMint {
bytea transactionId FK
uuid assetClassId FK
bigint quantity
}
|
Beta Was this translation helpful? Give feedback.
-
Filtered Chain Sync Protocol State Diagram v1 Note the omission of the stateDiagram-v2
[*] --> Idle
Idle --> Next : QueryNext
Idle --> [*] : Done
state Next {
[*] --> CanAwait
CanAwait --> MustAwait : AwaitReply
CanAwait --> [*] : RollForward
CanAwait --> [*] : RollBackward
MustAwait --> [*] : RollForward
MustAwait --> [*] : RollBackward
}
Next --> Idle : RollForward
Next --> Idle : RollBackward
|
Beta Was this translation helpful? Give feedback.
-
I propose that from now on, we refer to this component as the "Marlowe Chain Sync Engine" instead of the "Marlowe Chain Index." It is a more descriptive and accurate name for what we plan to build. |
Beta Was this translation helpful? Give feedback.
-
I've started this thread as a place to discuss the design for the chain index component of the runtime. I'm tentatively using the term "chain index" as opposed to "chain follower" or "chain sync client" because we may be using more than just the chain sync mini protocol, so "chain index" is more general.
Before we dive into the internals of the component, we should settle on an initial set of requirements that we feel comfortable with and describe the interface.
Here is a first draft of the specification for discussion:
High-level and Rationale
I'm beginning to think that we want the chain index to actually be a separate process. The reason for this is because of the potentially large volume of data it will store. It may make sense in some cases to deploy it on a different physical computer, or provide hosted instances for users who don't want to host either the Cardano Node or the database themselves. It certainly makes sense to segregate the chain index's database from other databases in the runtime for this reason.
The reason for introducing the intermediate contract roster structure is to support use cases in which only a subset of contracts are required with minimal data transfer, particularly in the streaming API. Sending only transaction IDs makes the message payload smaller, and enables the streaming API to send everything, simplifying its semantics. This also allows consumers to maintain a lightweight index that they can cheaply update and hold in memory, while actual transactions can be loaded lazily and optionally kept in a transient caching layer.
Definitions
Functional requirements
Synchronized
, the response should indicate that that the roster is possibly incomplete.Synchronizing
.Synchronized
.Disconnected
.Synchronized
, the response should indicate that that the transaction set is possibly incomplete.Synchronizing
, the stream should not send any events.Synchronized
, the first event the stream should send should be aSynchronizationComplete
.Disconnected
, the connection request should be rejected.Disconnected
, the connection should be severed.Synchronized
, the first event the stream should send should be aSynchronizationComplete
.Non-functional requirements
Usage notes
SynchronizationComplete
on the streaming connection, it should request the contract roster.Beta Was this translation helpful? Give feedback.
All reactions