Skip to content

Reusable Diffusion Investigation

Armando Santos edited this page Nov 12, 2024 · 2 revisions

Table of Contents

Reusable Diffusion

This document outlines the findings of an investigation into the Cardano diffusion layer and explores how to make it reusable for third-party users. The goal is to transform the current diffusion layer so as to become a library that enables others to build overlay networks, leveraging the existing self-balancing, self-optimizing, and self-healing capabilities of the current Cardano Node network. Users will be able to diffuse any data across their network by running custom protocols, defining their own targets for connectivity, and setting their own governance policies (such as churning and monitoring jobs).

Note that this project is not to meant to be a fork of the current ouroboros-network repository, but rather a major refactorisation.

Current State

In the current architecture, the Cardano node initializes the consensus layer, which in turn initializes the diffusion layer. The node parses configuration files and passes diffusion parameters to the consensus layer via RunNodeArgs and StdRunNodeArgs. Consensus then uses these parameters to initialise the diffusion layer.

Additionally, the consensus layer is responsible for providing the diffusion layer with the versioned applications (mini-protocols) bundle, which manages peer connections and promotions. This is important for protocols like the handshake protocol and codecs. The consensus layer also supplies the LedgerPeersConsensusInterface, which informs the diffusion layer about the latest slot numbers, ledger state judgments, and the ledger peers.

Currently the consensus layer only depends on the ouroboros-network and ouroboros-network-api libraries. ouroboros-network offers the top-level integration of all network components (i.e. diffusion layer). ouroboros-network-api the shared API between network and consensus layers.

The goal of a reusable design is to allow third-party users to leverage the ouroboros-network diffusion layer for their own applications by decoupling it from Cardano node-specific implementation details. Currently, the only client for the ouroboros-network is the Cardano node, which, being part of a monolithic stack, has tightly coupled many implementation specifics to the diffusion layer. This coupling makes it difficult to reuse the diffusion layer for other purposes.

Current Design Overview:

flowchart TB
    subgraph CN[Cardano Node]
        C[Consensus]
        subgraph Diffusion["Diffusion (ouroboros-network)"]
            direction TB
            DN2C[N2C]
            DN2N[N2N]
        end
        C -- "LedgerPeerConsensusInterface" --> Diffusion
    end
Loading

Target Design:

flowchart TB
    subgraph CN["Cardano Node (Haskell)"]
        C[Consensus]
        subgraph CNDiff["Diffusion (o-network)"]
            CNN2C[N2C]
            CNN2N[N2N]
        end
        C -- "LedgerPeerConsensusInterface" --> CNDiff
    end
    subgraph CA["Custom App (Haskell)"]

        subgraph CADiff["Diffusion (o-network)"]
            CAN2C[N2C]
            CAN2N[N2N]
        end
    CNN2C -- "LedgerPeerConsensusInterface RPC" --> CAN2C
    end
    subgraph CAE["Custom App Logic (X Lang)"]
        CAE2C[N2C]
    end
    CAN2C <--> CAE2C
Loading

The main objective is to abstract and extend the LedgerPeerConsensusInterface so that third-party diffusion layers can interact with a full Cardano node and access all necessary information to operate. This will be facilitated through a node-to-client RPC interface. Users will also have the ability to configure all diffusion parameters, such as PeerMetrics for churn or PeerSelectionTargets for the peer selection governor. Additionally, users can introduce their own application-specific parameters to implement custom details as needed.

Current Stakeholders

This section describes and analyses the current stakeholders (i.e. third party users) that will benefit from using the ouroboros-network stack as a reusable API. This document's decisions are guided by the properties identified in the stakeholders use cases.

Mithril Node

Mithril is a protocol based on a Stake-based Threshold Multi-signature scheme, which leverages Cardano's Stake Pool Operators (SPOs) to certify data from the Cardano blockchain in a trustless manner. Currently, Mithril is used within the Cardano ecosystem to enable fast bootstrapping of full nodes and secure light wallets.

The Mithril protocol coordinates the collection of individual signatures from the signers (run by SPOs), which are then aggregated into multi-signatures and certificates by the aggregators. To achieve full decentralization, Mithril must operate over a decentralized peer-to-peer network. Building such a network from scratch would require substantial time, effort, and investment. Moreover, since SPOs, representing Cardano's active stake, will need to adopt and operate Mithril nodes alongside their Cardano nodes, a more efficient solution is to leverage the existing Cardano network layer. This approach will simplify Mithril's development while minimizing its impact on the Cardano network and reducing the maintenance overhead for SPOs.

Mithril will be an early adopter of the proposed design in this document, serving as a use case and illustrative example.

Mithril Processes:
  • mithril-node
  • mithril-signer (developed in Rust by the Mithril team)
  • cardano-node
architecture-beta
    group MithrilNode[Mithril Node]

    group OuroborosNetworkMithril[Mithril Ouroboros Network] in MithrilNode
    service NodeToClientMithril[Mithril NodeToClient] in OuroborosNetworkMithril
    service NodeToNodeMithril[Mithril NodeToNode] in OuroborosNetworkMithril
    service NodeToClientCardanoClient[NodeToClient Cardano Client] in OuroborosNetworkMithril

    group CardanoNode[Cardano Node]

    group OuroborosNetworkCardano[Cardano Ouroboros Network] in CardanoNode
    service NodeToClientCardano[Cardano NodeToClient] in OuroborosNetworkCardano
    service NodeToNodeCardano[Cardano NodeToNode] in OuroborosNetworkCardano

    group MithrilSigner[Mithril Signer]
    service NodeToClientMithrilSigner[Mithril NodeToClient] in MithrilSigner

    NodeToClientCardanoClient:L -- R:NodeToClientCardano
    NodeToClientMithrilSigner:B -- T:NodeToClientMithril
Loading

The Mithril node must operate alongside the Cardano node, communicating through UNIX sockets and a custom Node-to-Client (N2C) RPC protocol. This enables the Mithril diffusion layer to access the necessary ledger information to establish a resilient overlay network. Additionally, the Mithril node's diffusion layer will need to support custom protocols to facilitate communication with both other Mithril nodes and the Mithril signer nodes, which provide signatures for diffusion. These signer nodes, or any application-specific logic processes, can be implemented in any suitable programming language, as long as both the Mithril node and the signer node communicate using the same protocol.

Others

Other protocols in the Cardano ecosystem, such as Leios and Peras (and probably other protocols in the future), also need the capability to diffuse messages originating from block producers in a decentralized fashion. However, in the Leios and Peras cases, the Cardano node itself is a producer and consumer of these messages. We have taken into consideration this need for a generic solution in the design.

Configurable Parameters

Currently, the following diffusion-specific parameters can be configured manually, via configuration files, or programmatically (these belong to the Diffusion.run function):

  1. Tracers (Common interface between P2P and Non-P2P)
  2. Tracers Extra (Additional tracers for P2P diffusion)
  3. Arguments (Common interface between P2P and Non-P2P)
  4. Arguments Extra (Additional arguments for P2P diffusion)
  5. Applications (Common interface for mini-protocol bundles)
  6. Applications Extra (Additional settings for P2P applications)

NOTE: Once we remove non-p2p, we can merge Tracers & TracersExtra, Arguments & ArgumentsExtra and Applications & ApplicationsExtra.

1. Tracers: Includes tracers for (Local) Mux, (Local) Handshake, and the diffusion tracer. These tracers are likely to remain unchanged until the Non-P2P stack is fully removed.

2. Tracers Extra: Includes tracers for P2P components such as (Local) Public Root Peers, Ledger Peers, Peer Selection, (Churn) Peer Selection Counters, (Local) Connection Manager, (Local) Server, and (Local) Inbound Governor. Third-party users will need to implement their own tracers to monitor the diagnostics of their diffusion layer. 3. Arguments: Includes settings for IPv4/IPv6 addresses, rate limits, and whether the diffusion layer should be initiated in either initiator-only or initiator-responder mode. These configurations are required from third-party users.

4. Arguments Extra: Covers P2P-specific configurations, including Peer Targets, Consensus Mode, Minimum Big Ledger Peers for Trusted State, Peer Sharing, Protocol Idle Timeouts, TIME_WAIT timeouts, Churn Deadline Interval, and Bulk Churn Interval. It also requires STM actions to handle data such as (Local) Public Root Peers, Bootstrap Peers flags, Ledger Peer Snapshots, and whether to use Ledger Peers. Some parameters here are specific to the Cardano node and may not be relevant to third-party applications, so they should be abstracted from the user.

5. Applications: This includes versioned mini-protocol bundles to run based on the connection mode: Initiator Mode, Initiator-Responder Mode, or Local Responder (Non-P2P). These bundles are polymorphic on the N2N/N2C versions and respective data, enabling third-party users to define their own configurations. However, the user must provide the LedgerPeersConsensusInterface, which serves as the only communication point between the diffusion and consensus layers. In the current system, the consensus layer provides this interface by directly accessing the required TVars. To support third-party usage, the diffusion layer will need to communicate with a Cardano node using an N2C (RPC) protocol. A callback informing whether the node is connected to local roots or external peers is also required, though this may not be relevant for third-party users.

6. Applications Extra: This includes additional configuration for application behaviors such as the (Local) Rethrow Policy, Return Policy, Peer Metrics TVar for Peer Selection, Block Fetch Mode, and Peer Sharing Registry. As with other parameters, not all of these settings may be relevant for third-party users.

The following better illustrates the dependencies between components:

flowchart TB
    subgraph CN[Cardano Node]
        subgraph Diffusion Arguments
            Tr[Tracers]
            TrE[Tracers Extra]
            Args[Arguments]
            ArgsE[Arguments Extra]
        subgraph Apps[Applications]
            LCI[LedgerConsensusInterface]
        end
            AppsE[Applications Extra]
        end
        subgraph ON[Ouroboros Network]
            OG[Outbound Governor]
            CG[Churn Governor]
            IG[Inbound Governor]
            Etc[Others: Connection Manager, Local Root Peers, Public Root Peers, etc..]
        end
        OC[Ouroboros Consensus]
    end
    OC --> LCI
    Tr --> Etc
    TrE --> OG
    TrE --> CG
    TrE --> IG
    TrE --> Etc
    Args --> Etc
    Args --> IG
    ArgsE --> OG
    ArgsE --> CG
    ArgsE --> IG
    ArgsE --> Etc
    Apps --> Etc
    AppsE --> OG
    AppsE --> CG
    AppsE --> Etc
Loading

Note that all parameters, except for the LedgerPeerConsensusInterface, are static. The LedgerPeerConsensusInterface enables dynamic interaction with the consensus layer, providing real-time values that the diffusion layer relies on to function correctly.

Proposed Configuration Structure

To improve the current configuration scheme and provide a cleaner API for third-party users, a new diffusion configuration structure is proposed. The goal is to abstract and hide irrelevant components specific to the Cardano node application (e.g., Block Fetch Mode, Ledger Snapshot, Bootstrap Peers) while allowing users to configure the essential parts of the diffusion layer.

  1. Tracers: Remains unchanged.

  2. Tracers Extra: These will need to be made extensible so that third-party users can add their own tracers if needed.

  3. Arguments: Remains unchanged.

  4. Arguments Extra: These will be exposed so that third-party users can define and instantiate their own arguments, for example:

    data CardanoArgumentsExtra m =
      CardanoArgumentsExtra {
        caePeerTargets                      :: ConsensusModePeerTargets
      , caeReadUseBootstrapPeers            :: STM m UseBootstrapPeers
      , caeMinBigLedgerPeersForTrustedState :: MinBigLedgerPeersForTrustedState
      , caeConsensusMode                    :: ConsensusMode
      }
    
    data ArgumentsExtra extraArgs m =
      ArgumentsExtra {
      , daPeerTargets            :: PeerSelectionTargets
      , daOwnPeerSharing         :: PeerSharing
      , daProtocolIdleTimeout    :: DiffTime
      , daTimeWaitTimeout        :: DiffTime
      , daDeadlineChurnInterval  :: DiffTime
      , daBulkChurnInterval      :: DiffTime
      , caeReadUseLedgerPeers    :: STM m UseLedgerPeers
      , caeReadUseBootstrapPeers :: UseBootstrapPeers
      , caeMinBigLedgerPeersForTrustedState :: MinBigLedgerPeersForTrustedState
      , daReadLocalRootPeers     :: STM m (LocalRootPeers.Config RelayAccessPoint)
      , daReadPublicRootPeers    :: STM m (Map RelayAccessPoint PeerAdvertise)
      , daReadLedgerPeerSnapshot :: STM m (Maybe LedgerPeerSnapshot)
      , daAPIArgs                :: extraArgs
      }
  5. Applications: These remain mostly unchanged, except for LedgerPeersConsensusInterface, which will require changes to support all RPC methods required by the N2C communication protocols. This will require an implementation of the protocols on the Cardano node side, and a new handshake protocol as well. Since there is currently a Cardano (Genesis) specific callback (daUpdateOutboundConnectionsState), it would be best to abstract over extra consensus callbacks, for example:

    data CardanoLedgerPeersConsensusInterface m =
      CardanoLedgerPeersConsensusInterface {
        clpciGetLedgerStateJudgement        :: STM m LedgerStateJudgement
      , clpciUpdateOutboundConnectionsState :: OutboundConnectionsState -> m ()
      }
    
    data LedgerPeersConsensusInterface api m =
      LedgerPeersConsensusInterface {
        lpGetLatestSlot           :: STM m (WithOrigin SlotNo)
      , lpGetLedgerPeers          :: STM m [(PoolStake, NonEmpty RelayAccessPoint)]
      -- ... other RPC methods
      , lpExtraAPI                :: api
      }

    Here daUpdateOutboundConnectionsState could always be filled with an empty value like \_ -> pure (), but if one can avoid leaking Cardano specific details the better. Also an extension point is more versatile.

  6. Applications Extra: The only questionable value in this record is daBlockFetchMode, as it is specific to the Block Fetch mini-protocol, and the Churn Governor logic is not decoupled from this and other Cardano Node-specific parameters. If we have two separate implementations for Churn as mentioned below in Churn Governor then having an extension point is a viable option.

Main P2P Components

Cardano Node Implementation Specific Details

The P2P components that have the most Cardano Node implementation details on them and thus can not be made completely general easily are the Outbound Governor, the Churn Governor and the Local/Public Root Peers Provider. The best way to proceed in order to better separate the concerns and provide the best, general API for third party users is to factor out the Cardano specific types and parameters of these main components types and arguments into different modules. With a good enough module structure we can provide a clean and simple API for diffusion instantiation just by importing the right modules.

With this being said, a good way to do this would be to:

  • Add all extension points mentioned above in Cardano.{Diffusion, Node, PeerSelection}

  • Move P2P diffusion instantiation to one of these folders/modules

  • Extend Ouroboros.Network.Diffusion to be able to accommodate a third party diffusion instantiations like:

data P2P = P2P -- ^ General P2P mode. Can be instantiated with custom -- data types | NonP2P -- ^ Cardano non-P2P mode. Deprecated | P2PCardano -- ^ Cardano P2P mode.

-- | Tracers which depend on p2p mode.

data ExtraTracers (p2p :: P2P) extraState extraFlags extraPeers m where P2PTracers :: Common.TracersExtra RemoteAddress NodeToNodeVersion NodeToNodeVersionData LocalAddress NodeToClientVersion NodeToClientVersionData IOException extraState extraState extraFlags extraPeers m -> ExtraTracers 'P2P extraState extraFlags extraPeers m

P2PCardanoTracers :: Common.TracersExtra RemoteAddress NodeToNodeVersion NodeToNodeVersionData LocalAddress NodeToClientVersion NodeToClientVersionData IOException CardanoPeerSelectionState CardanoPeerSelectionState PeerTrustable (CardanoPublicRootPeers RemoteAddress) m -> ExtraTracers 'P2PCardano CardanoPeerSelectionState PeerTrustable (CardanoPublicRootPeers RemoteAddress) m

NonP2PTracers :: NonP2P.TracersExtra -> ExtraTracers 'NonP2P extraState extraFlags extraPeers m

-- | Diffusion arguments which depend on p2p mode.

data ArgumentsExtra (p2p :: P2P) extraArgs extraPeers m where P2PArguments :: Common.ArgumentsExtra extraArgs extraPeers m -> ArgumentsExtra 'P2P extraArgs extraPeers m

P2PCardanoArguments :: Common.ArgumentsExtra (CardanoArgumentsExtra m) PeerTrustable m -> ArgumentsExtra 'P2PCardano (CardanoArgumentsExtra m) PeerTrustable m

NonP2PArguments :: NonP2P.ArgumentsExtra -> ArgumentsExtra 'NonP2P extraArgs extraPeers m

-- | Application data which depend on p2p mode.

data Applications (p2p :: P2P) extraAPI m a where P2PApplications :: Common.Applications RemoteAddress NodeToNodeVersion NodeToNodeVersionData LocalAddress NodeToClientVersion NodeToClientVersionData (CardanoLedgerPeersConsensusInterface m) m a -> Applications 'P2P extraAPI m a

P2PCardanoApplications :: Common.Applications RemoteAddress NodeToNodeVersion NodeToNodeVersionData LocalAddress NodeToClientVersion NodeToClientVersionData (CardanoLedgerPeersConsensusInterface m) m a -> Applications 'P2PCardano (CardanoLedgerPeersConsensusInterface m) m a

NonP2PApplications :: Common.Applications RemoteAddress NodeToNodeVersion NodeToNodeVersionData LocalAddress NodeToClientVersion NodeToClientVersionData () m a -> Applications 'NonP2P () m a

-- | Application data which depend on p2p mode.

data ApplicationsExtra (p2p :: P2P) ntnAddr m a where P2PApplicationsExtra :: Common.ApplicationsExtra ntnAddr m a -> ApplicationsExtra 'P2P ntnAddr m a

P2PCardanoApplicationsExtra :: Common.ApplicationsExtra ntnAddr m a -> ApplicationsExtra 'P2PCardano ntnAddr m a

NonP2PApplicationsExtra :: NonP2P.ApplicationsExtra -> ApplicationsExtra 'NonP2P ntnAddr m a

-- | Run data diffusion in either 'P2P' or 'NonP2P' mode.

run :: forall (p2p :: P2P) extraArgs extraState extraFlags extraPeers extraAPI a. Tracers RemoteAddress NodeToNodeVersion LocalAddress NodeToClientVersion IO -> ExtraTracers p2p extraState extraFlags extraPeers IO -> Arguments IO Socket RemoteAddress LocalSocket LocalAddress -> ArgumentsExtra p2p extraArgs extraFlags IO -> Applications p2p extraAPI IO a

-> ApplicationsExtra p2p RemoteAddress IO a
-> IO ()

run _ (P2PTracers _) _ (P2PArguments _) (P2PApplications _) (P2PApplicationsExtra _) = ThirdParty.run ... run tracers (P2PCardanoTracers tracersExtra) args (P2PCardanoArguments argsExtra) (P2PCardanoApplications apps) (P2PCardanoApplicationsExtra appsExtra) = void $ P2P.run tracers tracersExtra args argsExtra apps appsExtra run tracers (NonP2PTracers tracersExtra) args (NonP2PArguments argsExtra) (NonP2PApplications apps) (NonP2PApplicationsExtra appsExtra) = do NonP2P.run tracers tracersExtra args argsExtra apps appsExtra


<!-- TOC --><a name="outbound-governor-peer-selection-governor"></a>
#### Outbound Governor (Peer Selection Governor)

The peer selection governor manages the discovery and selection of upstream peers. It operates based on a set of targets (`PeerSelectionTargets`) and attempts to meet them through a series of monitoring actions. For example, if the governor is below its target for established peers, it will select a peer from its known set to connect to. Conversely, if the governor exceeds its target for hot peers, it will demote peers according to predefined metrics.

Currently, the Outbound Governor is coupled with Cardano node-specific parameters like `ConsensusMode`, `ConsensusModePeerTargets`, `LedgerPeerSnapshot`, and `LedgerStateJudgement`. These parameters are scattered across `LocalRootPeers`, `PublicRootPeers`, `PeerSelectionActions`, `PeerSelectionInterfaces`, and `PeerSelectionState`. However, by abstracting these into an additional record, we can separate application-specific logic:

```haskell
data LocalRootPeers extraFlags peeraddr =
   LocalRootPeers
     -- | Here extraFlags allow 3rd party users to enhance their local peers
     (Map peeraddr (PeerAdvertise, PeerTrustable, extraFlags))

     [(HotValency, WarmValency, Set peeraddr)]

data PublicRootPeers peeraddr =
PublicRootPeers {
  -- | Configuration aka Public Config Peers should not be needed anymore
  -- getPublicConfigPeers :: !(Map peeraddr PeerAdvertise)
  getLedgerPeers       :: !(Set peeraddr)
, getBootstrapPeers    :: !(Set peeraddr)
, getBigLedgerPeers    :: !(Set peeraddr)
}

-- This moves readUseLedgerPeers from PeerSelectionInterfaces to here
data CardanoPeerSelectionActions m =
CardanoPeerSelectionActions {
  cpsaReadLedgerPeerSnapshot :: STM m (Maybe LedgerPeerSnapshot)
, cpsaPeerTargets            :: ConsensusModePeerTargets
}

type Config extraLocalRootPeersFlags peeraddr =
   [(HotValency, WarmValency, Map peeraddr ( PeerAdvertise, extraLocalRootPeersFlags))]

data PeerSelectionActions extraActions consensusAPI extraLocalRootPeersFlags
                        peeraddr peerconn m =
PeerSelectionActions {
, readPeerSelectionTargets :: STM m PeerSelectionTargets
, readLocalRootPeers       :: STM m (LocalRootPeers.Config extraLocalRootPeersFlags
                                                           peeraddr)
, readInboundPeers         :: m (Map peeraddr PeerSharing)
, peerSharing              :: PeerSharing
, peerConnToPeerSharing    :: peerconn -> PeerSharing
, requestPublicRootPeers   :: LedgerPeersKind
                              -> Int
                              -> m ( PublicRootPeers peeraddr , DiffTime)
, requestPeerShare         :: PeerSharingAmount -> peeraddr -> m (PeerSharingResult peeraddr)
, peerStateActions         :: PeerStateActions peeraddr peerconn m
, getLedgerStateCtx        :: LedgerPeersConsensusInterface consensusAPI m
, readUseBootstrapPeers    :: STM m UseBootstrapPeers
, readUseLedgerPeers       :: STM m UseLedgerPeers
, getExtraActions          :: extraActions
}

-- readUseLedgerPeers was moved to CardanoPeerSelectionActions
data PeerSelectionInterfaces peeraddr peerconn m =
PeerSelectionInterfaces {
  countersVar    :: StrictTVar m PeerSelectionCounters,
, publicStateVar :: StrictTVar m (PublicPeerSelectionState peeraddr),
, debugStateVar  :: StrictTVar m (PeerSelectionState peeraddr peerconn),
}

data CardanoPeerSelectionState =
CardanoPeerSelectionState {
  ledgerStateJudgement  :: !LedgerStateJudgement
, consensusMode         :: !ConsensusMode
, hasOnlyBootstrapPeers :: !Bool
, ledgerPeerSnapshot    :: Maybe LedgerPeerSnapshot
}

data PeerSelectionState extraState extraLocalRootPeersFlags
                      peeraddr peerconn =
PeerSelectionState {
  targets                     :: !PeerSelectionTargets
, localRootPeers              :: !(LocalRootPeers extraLocalRootPeersFlags peeraddr)
, publicRootPeers             :: !(PublicRootPeers peeraddr)
, knownPeers                  :: !(KnownPeers peeraddr)
, establishedPeers            :: !(EstablishedPeers peeraddr peerconn)
, activePeers                 :: !(Set peeraddr)
, publicRootBackoffs          :: !Int
, publicRootRetryTime         :: !Time
, inProgressPublicRootsReq    :: !Bool
, bigLedgerPeerBackoffs       :: !Int
, bigLedgerPeerRetryTime      :: !Time
, inProgressBigLedgerPeersReq :: !Bool
, inProgressPeerShareReqs     :: !Int
, inProgressPromoteCold       :: !(Set peeraddr)
, inProgressPromoteWarm       :: !(Set peeraddr)
, inProgressDemoteWarm        :: !(Set peeraddr)
, inProgressDemoteHot         :: !(Set peeraddr)
, inProgressDemoteToCold      :: !(Set peeraddr)
, stdGen                      :: !StdGen
, inboundPeersRetryTime       :: !Time
, bootstrapPeersTimeout       :: !(Maybe Time)
, bootstrapPeersFlag          :: !UseBootstrapPeers
, minBigLedgerPeersForTrustedState :: MinBigLedgerPeersForTrustedState
, extraState                  :: !extraState
}

data CardanoPeerSelectionArguments =
CardanoPeerSelectionArguments {
  cnpsaConsensusMode :: ConsensusMode
}

data PeerSelectionArguments extraArgs extraActions consensusAPI
                          peeraddr peerconn m =
PeerSelectionArguments {
  psaPeerSelectionTracer         :: Tracer m (TracePeerSelection peeraddr)
, psaDebugPeerSelectionTracer    :: Tracer m (DebugPeerSelection peeraddr)
, psaPeerSelectionCountersTracer :: Tracer m PeerSelectionCounters
, psaFuzzRng                     :: StdGen
, psaPeerSelectionActions        :: PeerSelectionActions extraActions consensusAPI
                                                         extraLocalRootPeersFlags
                                                         peeraddr peerconn m
, psaPeerSelectionPolicy         :: PeerSelectionPolicy  peeraddr m
, psaPeerSelectionInterfaces     :: PeerSelectionInterfaces peeraddr peerconn m
, psaMinBigLedgerPeersForTrustedState :: MinBigLedgerPeersForTrustedState
, psaExtraArgs                   :: extraArgs
}

Perhaps PeerSelectionPolicy record should also be able to be extended so third party users can write their own policies and use them.

peerSelectionGovernor
    :: ( Alternative (STM m)
       , MonadAsync m
       , MonadDelay m
       , MonadLabelledSTM m
       , MonadMask m
       , MonadTimer m
       , Ord peeraddr
       , Show peerconn
       , Hashable peeraddr
       )
    => PeerSelectionArguments extraArgs extraActions consensusAPI
                             extraLocalRootPeersFlags peeraddr peerconn m
    -> m Void
peerSelectionGovernor
    PeerSelectionArguments {
        psaPeerSelection
      , psaDebugPeerSelectionTracer
      , psaPeerSelectionCountersTracer
      , psaFuzzRng
      , psaPeerSelectionActions
      , psaPeerSelectionPolicy
      , psaPeerSelectionInterfaces
      , psaMinBigLedgerPeersForTrustedState
      , psaExtraArgs
      } =
  JobPool.withJobPool $ \jobPool ->
    peerSelectionGovernorLoop
      psaPeerSelectionTracer
      psaDebugPeerSelectionTracer
      psaPeerSelectionCountersTracer
      psaPeerSelectionActions
      psaPeerSelectionPolicy
      psaPeerSelectionInterfaces
      psaMinBigLedgerPeersForTrustedState
      jobPool
      (emptyPeerSelectionState psaFuzzRng extraArgs)
Monitoring Actions

The Peer Selection Governor consists of a series of guarded decisions known as monitoring actions. These actions, either blocking or non-blocking, guide the governor's decision-making process. The order of execution is crucial. Although the current governor has Cardano-specific actions, third-party users may need to customize their monitoring actions.

Below is a minimal (without Cardano-specific actions) set of monitoring actions, sorted by execution order:

  • Blocking decisions:

    1. connections
    2. jobs
    3. targetPeers
    4. localRoots
  • Non-blocking decisions: 5. BigLedgerPeers.belowTarget 6. BigLedgerPeers.aboveTarget 7. RootPeers.belowTarget 8. KnownPeers.belowTarget 9. KnownPeers.aboveTarget 10. EstablishedPeers.belowTarget 11. EstablishedPeers.aboveTarget 12. ActivePeers.belowTarget 13. ActivePeers.aboveTarget

The minimal set of monitoring actions above is not fully decoupled from Cardano node-specific details. Below is a list of actions that depend on Cardano-specific details:

  • Blocking decisions: 3. targetPeers: Requires access to ledgerStateJudgement, and specific ConsensusModePeerTargets (for Genesis implementation). These help decide which set of targets to switch to. 4. localRoots: Requires access to ledgerStateJudgement, to decide when to skip this action and maintain Genesis invariants regarding local and trusted peers.

  • Non-blocking decisions: 5. BigLedgerPeers.belowTarget: Requires access to ledgerStateJudgement to determine when to skip this action. 8. KnownPeers.belowTarget: Requires ledgerStateJudgement for action skipping. 10. EstablishedPeers.belowTarget: Requires ledgerStateJudgement for action skipping. 12. ActivePeers.belowTarget: Depends on ledgerStateJudgement to manage the belowTargetBigLedgerPeers action.

The importance of targetPeers and localRoots could be reconsidered, allowing third-party users to decide how to manage their targets and local roots. Since these two are the only actions for which their logic depends on Cardano specific details. The other actions coupling are only on the outside for a guard.

Since the general case is not obvious, providing a clear/simple outbound governor API for third party users is not easy. One can just do enough to support the Cardano node case which would be only to provide a way to add extra guarding conditions for belowTarget actions. However, if the third-party user requires finer control over their monitoring actions, a more customizable approach might be beneficial, allowing full control through a simple API:

type MonitoringAction extraState extraActions consensusAPI
                      extraLocalRootPeersFlags peeraddr peerconn m =
  [  PeerSelectionPolicy peeraddr m
  -> PeerSelectionActions extraActions consensusAPI extraLocalRootPeersFlags
                         peeraddr peerconn m
  -> PeerSelectionState extraState extraLocalRootPeersFlags peeraddr peerconn
  -> Guarded (STM m) (TimedDecision m peeraddr peerconn)]

guardedDecisions :: PeerSelectionState extraState extraLocalRootPeersFlags
                                      peeraddr peerconn
                 -> MonitoringActions extraState extraActions consensusAPI
                                     extraLocalRootPeersFlags peeraddr peerconn
                                     m
                 -> Guarded (STM m) (TimedDecision m peeraddr peerconn)
guardedDecisions st actions =
  [ Monitor.jobs jobPool st ]
  <> foldMap (\a -> a st) actions

This API would grant the user full control, with a caveat: they must ensure not to introduce errors, but it would also make debugging easier since the user controls the governor's code. A minimal API could be provided with the essential monitoring actions as a baseline, allowing for customization while avoiding redundancy.

Alternatively, an inversion-of-control approach could be implemented. The minimal set of essential operations would still be defined, but users could extend these by providing their own callbacks without compromising the core actions. Although this would complicate the API, it’s unclear how flexible this approach would be for future needs.

To extend monitoring actions, users could insert them either before or after the essential blocking and non-blocking actions:

data ExtraGuardedDecisions extraState extraActions consensusAPI
                           extraLocalRootPeersFlags peeraddr peerconn m =
  ExtraGuardedDecisions {

    -- | This list of guarded decisions will come before all default possibly
    -- blocking -- decisions. The order matters, making the first decisions
    -- have priority over the later ones.
    --
    -- Note that these actions should be blocking.
    preBlocking      :: [MonitoringAction extraState extraActions
                                          consensusAPI extraLocalRootPeersFlags
                                          peeraddr peerconn m]

    -- | This list of guarded decisions will come after all possibly preBlocking
    -- and default blocking decisions. The order matters, making the first
    -- decisions have priority over the later ones.
    --
    -- Note that these actions should be blocking.
  , postBlocking     :: [MonitoringAction extraState extraActions
                                          consensusAPI extraLocalRootPeersFlags
                                          peeraddr peerconn m]

    -- | This list of guarded decisions will come before all default non-blocking
    -- decisions. The order matters, making the first decisions have priority over
    -- the later ones.
    --
    -- Note that these actions should not be blocking.
  , preNonBlocking   :: [MonitoringAction extraState extraActions
                                          consensusAPI extraLocalRootPeersFlags
                                          peeraddr peerconn m]

    -- | This list of guarded decisions will come before all preNonBlocking and
    -- default non-blocking decisions. The order matters, making the first
    -- decisions have priority over the later ones.
    --
    -- Note that these actions should not be blocking.
  , postNonBlocking  :: [MonitoringAction extraState extraActions
                                          consensusAPI extraLocalRootPeersFlags
                                          peeraddr peerconn m]
  }

An example of Cardano node monitoring actions could look like this:

cardanoNodeMonitoringActions
  :: ExtraGuardedDecisions extraState extraActions consensusAPI
                           extraLocalRootPeersFlags peeraddr peerconn m
cardanoNodeMonitoringActions = ExtraGuardedDecisions {
    preBlocking     = [ \_ psa pst -> monitorBootstrapPeersFlag   psa pst
                      , \_ psa pst -> monitorLedgerStateJudgement psa pst
                      , \_ _   pst -> waitForSystemToQuiesce          pst
                      ]
  , postBlocking    = [ \_ psa pst -> ledgerPeerSnapshotChange   pst psa
                      ]
  , preNonBlocking  = []
  , postNonBlocking = []
}

guardedDecisions :: Time
                 -> PeerSelectionState extraState extraLocalRootPeersFlags
                                       peeraddr peerconn
                 -> Map peeraddr PeerSharing
                 -> MonitoringAction extraState extraActions consensusAPI
                                    extraLocalRootPeersFlags peeraddr peerconn m
                 -> Guarded (STM m) (TimedDecision m peeraddr peerconn)
guardedDecisions blockedAt st inboundPeers ExtraGuardedDecisions {...} =
  -- Make sure preBlocking set is in the right place
    foldMap (\a -> a policy actions st) preBlocking
  <> Monitor.connections          actions st
  <> Monitor.jobs                 jobPool st
  <> Monitor.targetPeers          actions st
  <> Monitor.localRoots           actions st
  -- Make sure postBlocking set is in the right place
    foldMap (\a -> a policy actions st) postBlocking

  -- Make sure preNonBlocking set is in the right place
    foldMap (\a -> a policy actions st) preNonBlocking
  <> BigLedgerPeers.belowTarget   actions blockedAt        st
  <> BigLedgerPeers.aboveTarget                     policy st
  <> RootPeers.belowTarget        actions blockedAt           st
  <> KnownPeers.belowTarget       actions blockedAt
                                          inboundPeers policy st
  <> KnownPeers.aboveTarget                            policy st
  <> EstablishedPeers.belowTarget actions              policy st
  <> EstablishedPeers.aboveTarget actions              policy st
  <> ActivePeers.belowTarget      actions              policy st
  <> ActivePeers.aboveTarget      actions              policy st
  -- Make sure postNonBlocking set is in the right place
    foldMap (\a -> a policy actions st) postNonBlocking

The options presented in the previous section are not entirely conclusive in determining the best course of action. As the project evolves and more information is gathered—particularly regarding the stakeholders and their specific requirements greater clarity will emerge. With this additional insight, it will become easier to assess the trade-offs and make more informed decisions about the most suitable approach to follow.

Churn Governor

The Churn Governor is responsible for rotating peers in the diffusion layer. This process is not modular and has Cardano node-specific dependencies like BlockFetchMode and ConsensusMode. Ideally, the Churn Governor should be decoupled from these specifics, but it’s possible to offer a simplified API for third-party users who just need basic churning.

Interestingly enough PeerMetrics which is an argument for PeerChurnArgs, isn't used in the churn logic. The peer metrics are used by simplePeerSelectionPolicy which is used in the peer selection governor in the policyPickHotPeersToDemote so this only really matters for hot demotion, it doesn't have to do with Churn at all, in fact that field in PeerChurnArgs is never used. So this can/should be refactored too.

Here's an abstraction of the Cardano-specific PeerChurnArguments:

data CardanoPeerChurnArgs m =
  CardanoPeerChurnArgs {
    cpcaModeVar          :: StrictTVar m ChurnMode
  , cpcaReadFetchMode    :: STM m FetchMode
  , cpcaPeerTargets      :: ConsensusModePeerTargets
  , cpcaReadUseBootstrap :: STM m UseBootstrapPeers
  , cpcaConsensusMode    :: ConsensusMode
  }

data PeerChurnArgs m extraArgs extraDebugState extraFlags extraPeers extraAPI peeraddr =
  PeerChurnArgs {
    pcaPeerSelectionTracer :: Tracer m (TracePeerSelection extraDebugState extraFlags extraPeers peeraddr)
  , pcaChurnTracer         :: Tracer m ChurnCounters
  , pcaDeadlineInterval    :: DiffTime
  , pcaBulkInterval        :: DiffTime
  , pcaPeerRequestTimeout  :: DiffTime
  , pcaMetrics             :: PeerMetrics m peeraddr
  , pcaRng                 :: StdGen
  , pcaPeerSelectionVar    :: StrictTVar m PeerSelectionTargets
  , pcaReadCounters        :: STM m PeerSelectionCounters
  , getLedgerStateCtx      :: LedgerPeersConsensusInterface extraAPI m
  , getLocalRootHotTarget  :: STM m HotValency
  , getExtraArgs           :: extraArgs
  }

By abstracting Cardano-specific parameters, we can provide two implementations: one tightly coupled with CardanoPeerChurnArgs and another for basic churn functionality.

-- | Promoted data type.
data ChurnType = BasicChurn | CardanoChurn

data PeerChurnArgs (churnType :: ChurnType) m consensusAPI peeraddr =
    BasicChurnArgs :: PeerChurnArgs' m () consensusAPI peeraddr
                   -> PeerChurnArgs BasicChurn m consensusAPI peeraddr
  | CardanoChurnArgs :: PeerChurnArgs' m (CardanoPeerChurnArgs m) consensusAPI peeraddr
                     -> PeerChurnArgs CardanoChurn m consensusAPI peeraddr

peerChurnGovernor :: forall m peeraddr.
                     ( MonadDelay m
                     , Alternative (STM m)
                     , MonadTimer m
                     , MonadCatch m
                     )
                  => PeerChurnArgs churnType m consensusAPI peeraddr
                  -> m Void
peerChurnGovernor
    CardanoChurnArgs (PeerChurnArgs { ... }) = ...

    BasicChurnArgs (PeerChurnArgs { ... }) = ...

A more straightforward approach is to have two implementations: a general one and a Cardano Node specific one and the right one is used when instantiating diffusion as mentioned in Cardano Node Implementation Specific Details.

Peer Selection Actions

The Peer Selection Actions module provides the withPeerSelectionActions function, which initializes the ledger peers and local root providers threads and creates a PeerSelectionActions record. This record, which includes the requestPublicRootPeers function along with other field methods, is primarily used by the Outbound Governor. Notably, requestPublicRootPeers contains Cardano-specific node implementation details that should be abstracted for use with any PublicRootPeers of type extraPeers.

One key observation is that the PeerSelectionActionsArgs and PeerSelectionActions records contain overlapping information and are only used within this context, without dependencies from other components. Therefore, these can be streamlined into a single record structure.

data PeerSelectionActions extraActions extraPeers extraFlags extraAPI peeraddr peerconn m =
  PeerSelectionActions {
       readPeerSelectionTargets   :: STM m PeerSelectionTargets,

       -- | Provides the initial configuration of locally or privately known root peers.
       --
       -- Sourced from 'ArgumentsExtra' during Diffusion initialization.
       --
       readOriginalLocalRootPeers :: STM m (LocalRootPeers.Config extraFlags RelayAccessPoint),

       readLocalRootPeers         :: STM m (LocalRootPeers.Config extraFlags peeraddr),
       requestPublicRootPeers     :: LedgerPeersKind -> Int -> m (PublicRootPeers extraPeers peeraddr, DiffTime),
       peerStateActions           :: PeerStateActions peeraddr peerconn m,
       getLedgerStateCtx          :: LedgerPeersConsensusInterface extraAPI m,
       readLedgerPeerSnapshot     :: STM m (Maybe LedgerPeerSnapshot),

       -- | Extension point for third-party users to incorporate additional actions.
       --
       extraActions               :: extraActions
     }

Since requestPublicRootPeers relies on a function provided by ledgerPeerThread, we need to adjust the withPeerSelectionActions function type to a callback, allowing for this dependency and generalizing the implementation:

withPeerSelectionActions
  :: forall extraActions extraPeers extraFlags extraAPI peeraddr peerconn resolver exception m a.
     ( Alternative (STM m)
     , MonadAsync m
     , MonadDelay m
     , MonadThrow m
     , Ord peeraddr
     , Exception exception
     , Eq extraFlags
     )
  => Tracer m (TraceLocalRootPeers extraFlags peeraddr exception)
  -> StrictTVar m (Config extraFlags peeraddr)
  -> PeerActionsDNS peeraddr resolver exception m
  -> ( (NumberOfPeers -> LedgerPeersKind -> m (Maybe (Set peeraddr, DiffTime)))
     -> PeerSelectionActions extraActions extraPeers extraFlags extraAPI peeraddr peerconn m)
  -> WithLedgerPeersArgs extraAPI m
  -> (   (Async m Void, Async m Void)
      -> PeerSelectionActions extraActions extraPeers extraFlags extraAPI peeraddr peerconn m
      -> m a)
  -> m a
withPeerSelectionActions
  localTracer
  localRootsVar
  paDNS
  getPeerSelectionActions
  ledgerPeersArgs
  k = do
    withLedgerPeers
      paDNS
      ledgerPeersArgs
      (\getLedgerPeers lpThread -> do
          let peerSelectionActions@PeerSelectionActions
                { readOriginalLocalRootPeers
                } = getPeerSelectionActions getLedgerPeers
          withAsync
            (localRootPeersProvider
              localTracer
              paDNS
              DNS.defaultResolvConf
              readOriginalLocalRootPeers
              localRootsVar)
            (\lrppThread -> k (lpThread, lrppThread) peerSelectionActions))

To further increase flexibility, this module should define general getPublicRootPeers and getPeerShare functions, enabling both Cardano and third-party applications to utilize these functionalities for their specific requirements.

getPublicRootPeers
  :: ( Monad m
     , Monoid extraPeers
     )
  => (NumberOfPeers -> m (extraPeers, DiffTime))
  -> (NumberOfPeers -> LedgerPeersKind -> m (Maybe (Set peeraddr, DiffTime)))
  -> LedgerPeersKind
  -> Int
  -> m (PublicRootPeers extraPeers peeraddr, DiffTime)
getPublicRootPeers getExtraPeers getLedgerPeers ledgerPeersKind n = do
  mbLedgerPeers <- getLedgerPeers (NumberOfPeers $ fromIntegral n) ledgerPeersKind
  case mbLedgerPeers of
    Nothing -> do
      (extraPeers, dt) <- getExtraPeers (NumberOfPeers $ fromIntegral n)
      pure (PublicRootPeers.empty extraPeers, dt)
    Just (ledgerPeers, dt) ->
      case ledgerPeersKind of
        AllLedgerPeers ->
          pure (PublicRootPeers.fromLedgerPeers ledgerPeers, dt)
        BigLedgerPeers ->
          pure (PublicRootPeers.fromBigLedgerPeers ledgerPeers, dt)

This revision clarifies and simplifies the structure, emphasizing essential details and refactoring steps while enhancing readability.

Inbound Governor, Connection Manager, etc.

The other components that get initialized by diffusion layer do not depend in any way on Cardano specific details and can be customized all by providing the diffusion arguments at the top level, so these should stay the same.

Preliminary Steps for Decoupling and Generalizing the Diffusion Layer

To fully decouple the diffusion layer and make it reusable for third-party use cases, several important steps are necessary.

  1. Stakeholder Identification and Collaboration

    • Objective: Identify key stakeholders and gather input on design requirements. This will help in making decisions that may not be straightforward, such as defining protocol features for compatibility and extensibility.
    • Actions: Engage with teams such as Mithril to gather feedback, understand integration requirements, and identify any needed enhancements for both Cardano and third-party compatibility.
  2. Documentation and API Specification

    • Objective: Before restructuring begins, produce comprehensive documentation that describes the current diffusion layer architecture, including its dependencies, interaction points with the consensus layer, and configurable parameters.
    • Actions: Create a detailed API reference for the diffusion layer, covering its dependencies, parameter configurations, and modular extension points. This document will serve as a baseline reference for both Cardano developers and external adopters.
  3. Network Reorganization, Unit Testing and Modularity

    • Objective: Modularize the ouroboros-network to support different configuration and implementations for both Cardano-specific and third-party uses.
    • Actions:
      • Refactor the ouroboros-network to allow users to specify which parameters they want to instantiate their own diffusion with.
      • Refactor the ouroboros-network to allow users to specify which protocols they want to run. Redesign the Node-to-Client (N2C) client to retrieve data needed for diffusion and expanding the N2C protocol where needed.
      • Run the full test suite and ensure modular test coverage in the redesigned network. Testing should guarantee that at least Cardano node configuration is validated.
      • Work closely with the performance team to verify that all structural changes maintain high performance for the Cardano node, addressing any performance bottlenecks as they are identified.
  4. Abstracting Consensus-Related Logic

    • Objective: Define a flexible, clear interface between the consensus and diffusion layers.
    • Actions: Abstract consensus interactions (e.g., the N2C RPC protocol) into a standalone interface, minimizing tight coupling with Cardano’s infrastructure. Work with stake holders to figure out the minimal API that should be required.
  5. Mithril-Specific Protocol and Network Development

    • Objective: Implement protocols specific to the Mithril Network, to allow Mithril to run independently on the new diffusion layer.
    • Actions:
      • Implement a Mithril-specific N2C protocol, including versioning, handshake processes, and the mini-protocol for signature submission, following the designs specified in the CIP.
      • Design a Node-to-Node protocol for Mithril that supports custom validation rules and a dedicated mempool for signature submission.
      • Develop the Mithril churn metric to monitor peer behavior and activity.
      • Implement the executable, covering configuration parsing, KES key integration, and support for custom diffusion configuration options, such as peer targets and topology files.
      • Explore reusability of Cardano-node code, potentially exposing customizable libraries with APIs that are extensible to fit third-party node requirements.
  6. Performance Validation and Final Integration Testing

    • Objective: Ensure that the redesigned diffusion layer and ouroboros-network maintain stable performance and operational reliability for both Cardano-specific and third-party uses.
    • Actions: Conduct a comprehensive performance analysis in collaboration with the performance team to validate efficiency under both Cardano and generalized configurations. Once stable, perform final integration testing across the Cardano node and Mithril configurations, addressing any regressions or performance challenges to maintain high standards.