MADSci Developer Bible

Guidelines when developing for MADSci.

Managers and Clients

Functionality of the system should be broken up into Managers and Clients.
Managers should be central arbiters of their respective functionality, with Clients used by other parts of the system to interact with the Managers.
- For example, the ResourceManager is the central authority for the resources it manages, while other parts of the system use the ResourceClient to interact with the resources via the manager.
Managers should play nice with distributed systems: they should use a language- and system-agnostic communications interface that can be used over the network, such as REST HTTP APIs.
Managers should be solely in charge of interfacing with things like databases.
Clients should not be directly interacting with databases
Clients should be dependency light and standardized.
Managers should be modular: a user should be able to write and use their own manager implementation in place of a default implementation, so long as it implements a compliant version of the existing manager's API.

Where possible, components of a MADSci lab should be defined using Definition YAML Files.
Definition files should be named according to the form instance_name.component.yaml.
- For instance a lab definition for the lab TestLab should be named test_lab.lab.yaml, an example PCR workflow definition should be pcr_example.workflow.yaml, etc.
In general, the system should only make changes to definition files at direction from the user, i.e. via an explicit CLI command, use of the Dashbaord to edit the definition of a component, etc.
- If the user is making a change to the definition of some component of the system at runtime, they should be given the choice to update the definition file with that change or not in situations where they might reasonably want to make a temporary change.
  - For instance, a user might reasonably expect to temporarily add a node to a workcell (for instance, for debugging purposes). However, it's reasonable to assume that if they're making a change to a workcell's default resources, they intend that to update the workcell's definition (as the default resources only really matter in the definition)
- The one general exception to this rule is setting unique ID's automatically. A definition file should be updated to include the corresponding unique ID if the system generates that ID at runtime. This enables persistence of relationships across restarts for related items.

Components of the system that need configuration, such as Managers and Nodes, should support the following:

A Config section in their Definition File that allows the user/maintainer to define configuration parameters, their default value if any, whether or not they're required, and whether they can be set after startup.
CLI arguments to set the values of configuration parameters at runtime.
Setting configuration values via their client, with a set_config method.

Hard-coded absolute paths are to be avoided at all costs.
Instead, use relative paths anchored to some known fixed point, such as a configurable working directory or the current file.
- If you know the target will always be in a certain location relative to the current file, use the path relative to the current file
- If you know the target will always be in a certain location relative to some directory, use the relative path from that directory and allow the user to configure the working directory without changes to source code (e.g. via command line argument)
The only place where hard-coded paths are acceptable is for resources like drivers or DLL's that are automatically installed in fixed locations. In such cases:
- The hard-coded path can be used as the default value for a configurable parameter that can overriden without changes to source code (e.g. via command line argument)
- If the hard-coded path points to a location in a users home, the path should use Path.home() or equivalent, to avoid fragile assumptions like specific user's running the code.

Use the following rules of thumb when considering how to store data or state across your lab:

If the data is ephemeral or frequently changed and doesn't need to be shared between different parts of the system, it can be stored in memory as a variable.
If the data is ephemeral or frequently changed and needs to be shared, consider using a caching database like Redis to allow distributed access to the data.
If the data is ephemeral, doesn't need to be frequently changed or shared, and is large, consider using a temporary file.
If the data needs persistence, isn't changed frequently, and doesn't need to be shared, it can be stored in a file.
If the data needs persistence, isn't changed frequently, and needs to be shared, it should be stored in a relational database, such as PostgreSQL.
If the data needs persistence, isn't changed frequently, needs to be shared, and is large, consider using object/cloud storage like Globus Transfer or S3.