Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use workflow management library #101

Open
morcuended opened this issue Feb 2, 2022 · 5 comments
Open

Use workflow management library #101

morcuended opened this issue Feb 2, 2022 · 5 comments
Labels
question Further information is requested

Comments

@morcuended
Copy link
Member

In order to properly handle the data workflow. There are several options: (Sci)Luigi, Snakemake, Airflow, etc.

In SciLuigi the workflow can be defined through classes with requires, run and output methods which makes possible to build pipelines. Besides, it integrates the SLURM scheduler.

In the following flowchart, the desired data flow is defined. Subrun-wise jobs in orange. Run-wise jobs & files in light purple. Currently, to merge files on a run basis we have to check that all previous jobs were successfully completed. This dependency would naturally come with a workflow management system.

mermaid-diagram-20220201212804

It is also possible to use SLURM dependency option (which is currently used in some parts of the code).

@moralejo
Copy link

moralejo commented Feb 2, 2022

There are three input arrows to DL1 datacheck subrun wise. The two on the left actually correspond to a single DL1 file, the one which comes out of lstchain_dl1ab, correct? (event though the DL1a data it contains is a mere copy of the DL1a data produced in the r0_ro_dl1 step). This is a bit confusing, because the output files of r0_to_dl1 are not processed by the data check. I think the sketch would be more clear by removing the central incoming arrow in the "DL1 datacheck subrun wise" box

@morcuended
Copy link
Member Author

morcuended commented Feb 2, 2022

The two on the left actually correspond to a single DL1 file, the one which comes out of lstchain_dl1ab, correct? (event though the DL1a data it contains is a mere copy of the DL1a data produced in the r0_ro_dl1 step).

You're right. I've just added the corrected sketch.
mermaid-diagram-20220202030913

@vuillaut
Copy link
Member

vuillaut commented Feb 2, 2022

You may also consider the common workflow language that will integrate with several tools using SLURM. On their page I see Arvados, Toil and StreamFlow.
Please let us know if you strat working towards an actual workflow manager, I think it would be the right time to converge with lstmcpipe.

@morcuended
Copy link
Member Author

@vuillaut any preference on which framework we should go for? I agree that this would be a good time for joining forces.

@vuillaut
Copy link
Member

I am currently beta-testing CWL.
Maybe we could review our needs and some frameworks to make an educated choice?

@morcuended morcuended added the question Further information is requested label Mar 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants