-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathwrroc-diff.tex
3193 lines (2796 loc) · 238 KB
/
wrroc-diff.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% Template for PLoS
%DIF LATEXDIFF DIFFERENCE FILE
%DIF DEL ./original_submission/wrroc.tex Tue Jul 9 11:04:57 2024
%DIF ADD wrroc.tex Tue Jul 9 10:33:29 2024
% Version 3.6 Aug 2022
%
% % % % % % % % % % % % % % % % % % % % % %
%
% -- IMPORTANT NOTE
%
% This template contains comments intended
% to minimize problems and delays during our production
% process. Please follow the template instructions
% whenever possible.
%
% % % % % % % % % % % % % % % % % % % % % % %
%
% Once your paper is accepted for publica
% PLEASE REMOVE ALL TRACKED CHANGES in this file
% and leave only the final text of your manuscript.
% PLOS recommends the use of latexdiff to track changes during review, as this will help to maintain a clean tex file.
% Visit https://www.ctan.org/pkg/latexdiff?lang=en for info or contact us at latex@plos.org.
%
%
% There are no restrictions on package use within the LaTeX files except that no packages listed in the template may be deleted.
%
% Please do not include colors or graphics in the text.
%
% The manuscript LaTeX source should be contained within a single file (do not use \input, \externaldocument, or similar commands).
%
% % % % % % % % % % % % % % % % % % % % % % %
%
% -- FIGURES AND TABLES
%
% Please include tables/figure captions directly after the paragraph where they are first cited in the text.
%
% DO NOT INCLUDE GRAPHICS IN YOUR MANUSCRIPT
% - Figures should be uploaded separately from your manuscript file.
% - Figures generated using LaTeX should be extracted and removed from the PDF before submission.
% - Figures containing multiple panels/subfigures must be combined into one image file before submission.
% For figure citations, please use "Fig" instead of "Figure".
% See http://journals.plos.org/plosone/s/figures for PLOS figure guidelines.
%
% Tables should be cell-based and may not contain:
% - spacing/line breaks within cells to alter layout or alignment
% - do not nest tabular environments (no tabular environments within tabular environments)
% - no graphics or colored text (cell background color/shading OK)
% See http://journals.plos.org/plosone/s/tables for table guidelines.
%
% For tables that exceed the width of the text column, use the adjustwidth environment as illustrated in the example table in text below.
%
% % % % % % % % % % % % % % % % % % % % % % % %
%
% -- EQUATIONS, MATH SYMBOLS, SUBSCRIPTS, AND SUPERSCRIPTS
%
% IMPORTANT
% Below are a few tips to help format your equations and other special characters according to our specifications. For more tips to help reduce the possibility of formatting errors during conversion, please see our LaTeX guidelines at http://journals.plos.org/plosone/s/latex
%
% For inline equations, please be sure to include all portions of an equation in the math environment. For example, x$^2$ is incorrect; this should be formatted as $x^2$ (or $\mathrm{x}^2$ if the romanized font is desired).
%
% Do not include text that is not math in the math environment. For example, CO2 should be written as CO\textsubscript{2} instead of CO$_2$.
%
% Please add line breaks to long display equations when possible in order to fit size of the column.
%
% For inline equations, please do not include punctuation (commas, etc) within the math environment unless this is part of the equation.
%
% When adding superscript or subscripts outside of brackets/braces, please group using {}. For example, change "[U(D,E,\gamma)]^2" to "{[U(D,E,\gamma)]}^2".
%
% Do not use \cal for caligraphic font. Instead, use \mathcal{}
%
% % % % % % % % % % % % % % % % % % % % % % % %
%
% Please contact latex@plos.org with any questions.
%
% % % % % % % % % % % % % % % % % % % % % % % %
\documentclass[10pt,letterpaper]{article}
\usepackage[top=0.85in,left=2.75in,footskip=0.75in]{geometry}
% amsmath and amssymb packages, useful for mathematical formulas and symbols
\usepackage{amsmath,amssymb}
% Use adjustwidth environment to exceed column width (see example table in text)
\usepackage{changepage}
% textcomp package and marvosym package for additional characters
\usepackage{textcomp,marvosym}
% cite package, to clean up citations in the main text. Do not remove.
\usepackage{cite}
%% Let's match PeerJ Style:
%\renewcommand\citepunct{; }
%\renewcommand\citeleft{(}
%\renewcommand\citeright{)}
%\renewcommand\citemid{, }
% Use nameref to cite supporting information files (see Supporting Information section for more info)
\usepackage{nameref,hyperref}
% line numbers
\usepackage[right]{lineno}
% ligatures disabled
\usepackage[nopatch=eqnum]{microtype}
\DisableLigatures[f]{encoding = *, family = * }
% color can be used to apply background shading to table cells only
\usepackage[table]{xcolor}
\hypersetup{
colorlinks,
linkcolor={red!50!black},
citecolor={blue!50!black},
urlcolor={blue!80!black}
}
% array package and thick rules for tables
\usepackage{array}
% create "+" rule type for thick vertical lines
\newcolumntype{+}{!{\vrule width 2pt}}
% create \thickcline for thick horizontal lines of variable length
\newlength\savedwidth
\newcommand\thickcline[1]{%
\noalign{\global\savedwidth\arrayrulewidth\global\arrayrulewidth 2pt}%
\cline{#1}%
\noalign{\vskip\arrayrulewidth}%
\noalign{\global\arrayrulewidth\savedwidth}%
}
% \thickhline command for thick horizontal lines that span the table
\newcommand\thickhline{\noalign{\global\savedwidth\arrayrulewidth\global\arrayrulewidth 2pt}%
\hline
\noalign{\global\arrayrulewidth\savedwidth}}
% Remove comment for double spacing
%\usepackage{setspace}
%\doublespacing
% Text layout
\raggedright
\setlength{\parindent}{0.5cm}
\textwidth 5.25in
\textheight 8.75in
% Bold the 'Figure #' in the caption and separate it from the title/caption with a period
% Captions will be left justified
\usepackage[aboveskip=1pt,labelfont=bf,labelsep=period,justification=raggedright,singlelinecheck=off]{caption}
\renewcommand{\figurename}{Fig}
% Use the PLoS provided BiBTeX style
\bibliographystyle{plos2015}
% Remove brackets from numbering in List of References
\makeatletter
\renewcommand{\@biblabel}[1]{\quad#1.}
\makeatother
%DIF 158a158
\usepackage{listings} %DIF >
%DIF -------
%DIF 159d160
%DIF <
%DIF -------
% Header and Footer with logo
\usepackage{lastpage,fancyhdr,graphicx}
\usepackage{epstopdf}
%\pagestyle{myheadings}
\pagestyle{fancy}
\fancyhf{}
%\setlength{\headheight}{27.023pt}
%\lhead{\includegraphics[width=2.0in]{PLOS-submission.eps}}
\rfoot{\thepage/\pageref{LastPage}}
\renewcommand{\headrulewidth}{0pt}
\renewcommand{\footrule}{\hrule height 2pt \vspace{2mm}}
\fancyheadoffset[L]{2.25in}
\fancyfootoffset[L]{2.25in}
\lfoot{\today}
%% Include all macros below
%DIF 177a177-181
\usepackage[inline]{enumitem} %DIF >
%DIF >
\newlist{inlineenum}{enumerate*}{1} %DIF >
\setlist[inlineenum]{label=\roman*)} %DIF >
%DIF >
%DIF -------
\newcommand{\lorem}{{\bf LOREM}}
\newcommand{\ipsum}{{\bf IPSUM}}
%DIF 179a184-190
%DIF >
%DIF >
% Macros to insert prefixed terms as hypterlinks %DIF >
\newcommand{\termsorg}[1]{\href{https://schema.org/#1}{\color{black}{\emph{s:#1}}}} %DIF >
\newcommand{\termbioschemas}[1]{\href{https://bioschemas.org/#1}{\color{black}{\emph{bioschemas:#1}}}} %DIF >
\newcommand{\termbsp}[1]{\href{https://bioschemas.org/properties/#1}{\color{black}{\emph{bsp:#1}}}} %DIF >
\newcommand{\termwfrun}[1]{\href{https://w3id.org/ro/terms/workflow-run\##1}{\color{black}{\emph{wfrun:#1}}}} %DIF >
%DIF -------
%% END MACROS SECTION
%DIF PREAMBLE EXTENSION ADDED BY LATEXDIFF
%DIF UNDERLINE PREAMBLE %DIF PREAMBLE
\RequirePackage[normalem]{ulem} %DIF PREAMBLE
\RequirePackage{color}\definecolor{RED}{rgb}{1,0,0}\definecolor{BLUE}{rgb}{0,0,1} %DIF PREAMBLE
\providecommand{\DIFaddtex}[1]{{\protect\color{blue}\uwave{#1}}} %DIF PREAMBLE
\providecommand{\DIFdeltex}[1]{{\protect\color{red}\sout{#1}}} %DIF PREAMBLE
%DIF SAFE PREAMBLE %DIF PREAMBLE
\providecommand{\DIFaddbegin}{} %DIF PREAMBLE
\providecommand{\DIFaddend}{} %DIF PREAMBLE
\providecommand{\DIFdelbegin}{} %DIF PREAMBLE
\providecommand{\DIFdelend}{} %DIF PREAMBLE
\providecommand{\DIFmodbegin}{} %DIF PREAMBLE
\providecommand{\DIFmodend}{} %DIF PREAMBLE
%DIF FLOATSAFE PREAMBLE %DIF PREAMBLE
\providecommand{\DIFaddFL}[1]{\DIFadd{#1}} %DIF PREAMBLE
\providecommand{\DIFdelFL}[1]{\DIFdel{#1}} %DIF PREAMBLE
\providecommand{\DIFaddbeginFL}{} %DIF PREAMBLE
\providecommand{\DIFaddendFL}{} %DIF PREAMBLE
\providecommand{\DIFdelbeginFL}{} %DIF PREAMBLE
\providecommand{\DIFdelendFL}{} %DIF PREAMBLE
%DIF HYPERREF PREAMBLE %DIF PREAMBLE
\providecommand{\DIFadd}[1]{\texorpdfstring{\DIFaddtex{#1}}{#1}} %DIF PREAMBLE
\providecommand{\DIFdel}[1]{\texorpdfstring{\DIFdeltex{#1}}{}} %DIF PREAMBLE
\newcommand{\DIFscaledelfig}{0.5}
%DIF HIGHLIGHTGRAPHICS PREAMBLE %DIF PREAMBLE
\RequirePackage{settobox} %DIF PREAMBLE
\RequirePackage{letltxmacro} %DIF PREAMBLE
\newsavebox{\DIFdelgraphicsbox} %DIF PREAMBLE
\newlength{\DIFdelgraphicswidth} %DIF PREAMBLE
\newlength{\DIFdelgraphicsheight} %DIF PREAMBLE
% store original definition of \includegraphics %DIF PREAMBLE
\LetLtxMacro{\DIFOincludegraphics}{\includegraphics} %DIF PREAMBLE
\newcommand{\DIFaddincludegraphics}[2][]{{\color{blue}\fbox{\DIFOincludegraphics[#1]{#2}}}} %DIF PREAMBLE
\newcommand{\DIFdelincludegraphics}[2][]{% %DIF PREAMBLE
\sbox{\DIFdelgraphicsbox}{\DIFOincludegraphics[#1]{#2}}% %DIF PREAMBLE
\settoboxwidth{\DIFdelgraphicswidth}{\DIFdelgraphicsbox} %DIF PREAMBLE
\settoboxtotalheight{\DIFdelgraphicsheight}{\DIFdelgraphicsbox} %DIF PREAMBLE
\scalebox{\DIFscaledelfig}{% %DIF PREAMBLE
\parbox[b]{\DIFdelgraphicswidth}{\usebox{\DIFdelgraphicsbox}\\[-\baselineskip] \rule{\DIFdelgraphicswidth}{0em}}\llap{\resizebox{\DIFdelgraphicswidth}{\DIFdelgraphicsheight}{% %DIF PREAMBLE
\setlength{\unitlength}{\DIFdelgraphicswidth}% %DIF PREAMBLE
\begin{picture}(1,1)% %DIF PREAMBLE
\thicklines\linethickness{2pt} %DIF PREAMBLE
{\color[rgb]{1,0,0}\put(0,0){\framebox(1,1){}}}% %DIF PREAMBLE
{\color[rgb]{1,0,0}\put(0,0){\line( 1,1){1}}}% %DIF PREAMBLE
{\color[rgb]{1,0,0}\put(0,1){\line(1,-1){1}}}% %DIF PREAMBLE
\end{picture}% %DIF PREAMBLE
}\hspace*{3pt}}} %DIF PREAMBLE
} %DIF PREAMBLE
\LetLtxMacro{\DIFOaddbegin}{\DIFaddbegin} %DIF PREAMBLE
\LetLtxMacro{\DIFOaddend}{\DIFaddend} %DIF PREAMBLE
\LetLtxMacro{\DIFOdelbegin}{\DIFdelbegin} %DIF PREAMBLE
\LetLtxMacro{\DIFOdelend}{\DIFdelend} %DIF PREAMBLE
\DeclareRobustCommand{\DIFaddbegin}{\DIFOaddbegin \let\includegraphics\DIFaddincludegraphics} %DIF PREAMBLE
\DeclareRobustCommand{\DIFaddend}{\DIFOaddend \let\includegraphics\DIFOincludegraphics} %DIF PREAMBLE
\DeclareRobustCommand{\DIFdelbegin}{\DIFOdelbegin \let\includegraphics\DIFdelincludegraphics} %DIF PREAMBLE
\DeclareRobustCommand{\DIFdelend}{\DIFOaddend \let\includegraphics\DIFOincludegraphics} %DIF PREAMBLE
\LetLtxMacro{\DIFOaddbeginFL}{\DIFaddbeginFL} %DIF PREAMBLE
\LetLtxMacro{\DIFOaddendFL}{\DIFaddendFL} %DIF PREAMBLE
\LetLtxMacro{\DIFOdelbeginFL}{\DIFdelbeginFL} %DIF PREAMBLE
\LetLtxMacro{\DIFOdelendFL}{\DIFdelendFL} %DIF PREAMBLE
\DeclareRobustCommand{\DIFaddbeginFL}{\DIFOaddbeginFL \let\includegraphics\DIFaddincludegraphics} %DIF PREAMBLE
\DeclareRobustCommand{\DIFaddendFL}{\DIFOaddendFL \let\includegraphics\DIFOincludegraphics} %DIF PREAMBLE
\DeclareRobustCommand{\DIFdelbeginFL}{\DIFOdelbeginFL \let\includegraphics\DIFdelincludegraphics} %DIF PREAMBLE
\DeclareRobustCommand{\DIFdelendFL}{\DIFOaddendFL \let\includegraphics\DIFOincludegraphics} %DIF PREAMBLE
%DIF COLORLISTINGS PREAMBLE %DIF PREAMBLE
\RequirePackage{listings} %DIF PREAMBLE
\RequirePackage{color} %DIF PREAMBLE
\lstdefinelanguage{DIFcode}{ %DIF PREAMBLE
%DIF DIFCODE_UNDERLINE %DIF PREAMBLE
moredelim=[il][\color{red}\sout]{\%DIF\ <\ }, %DIF PREAMBLE
moredelim=[il][\color{blue}\uwave]{\%DIF\ >\ } %DIF PREAMBLE
} %DIF PREAMBLE
\lstdefinestyle{DIFverbatimstyle}{ %DIF PREAMBLE
language=DIFcode, %DIF PREAMBLE
basicstyle=\ttfamily, %DIF PREAMBLE
columns=fullflexible, %DIF PREAMBLE
keepspaces=true %DIF PREAMBLE
} %DIF PREAMBLE
\lstnewenvironment{DIFverbatim}{\lstset{style=DIFverbatimstyle}}{} %DIF PREAMBLE
\lstnewenvironment{DIFverbatim*}{\lstset{style=DIFverbatimstyle,showspaces=true}}{} %DIF PREAMBLE
%DIF END PREAMBLE EXTENSION ADDED BY LATEXDIFF
\begin{document}
\vspace*{0.2in}
% Title must be 250 characters or less.
\begin{flushleft}
{\Large
\textbf\newline{Recording provenance of workflow runs with RO-Crate} % Please use "sentence case" for title and headings (capitalize only the first word in a title (or heading), the first word in a subtitle (or subheading), and any proper nouns).
}
\newline
% Insert author names, affiliations and corresponding author email (do not include titles, positions, or degrees).
\\
Simone Leo\textsuperscript{1*},
Michael R. Crusoe\textsuperscript{2,3,4},
Laura Rodríguez-Navas\textsuperscript{5},
Raül Sirvent\textsuperscript{5},
Alexander Kanitz\textsuperscript{6,7},
Paul De Geest\textsuperscript{8},
Rudolf Wittner\textsuperscript{9,10,11},
Luca Pireddu\textsuperscript{1},
Daniel Garijo\textsuperscript{12},
José M. Fernández\textsuperscript{5},
Iacopo Colonnelli\textsuperscript{13},
Matej Gallo\textsuperscript{9},
Tazro Ohta\textsuperscript{14,15},
Hirotaka Suetake\textsuperscript{16},
Salvador Capella-Gutierrez\textsuperscript{5},
Renske de Wit\textsuperscript{2},
Bruno P. Kinoshita\textsuperscript{5},
Stian Soiland-Reyes\textsuperscript{17,18}
\\
\bigskip
\textbf{1} Center for Advanced Studies, Research, and Development in Sardinia (CRS4), Pula (CA), Italy
\\
\textbf{2} Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
\\
\textbf{3} DTL Projects, The Netherlands
\\
\textbf{4} Forschungszentrum Jülich, Germany
\\
\textbf{5} Barcelona Supercomputing Center, Barcelona, Spain
\\
\textbf{6} Biozentrum, University of Basel, Basel, Switzerland
\\
\textbf{7} Swiss Institute of Bioinformatics, Lausanne, Switzerland
\\
\textbf{8} VIB Data Core, Gent, Belgium
\\
\textbf{9} Faculty of Informatics, Masaryk University, Brno, Czech Republic
\\
\textbf{10} Institute of Computer Science, Masaryk University, Brno, Czech Republic
\\
\textbf{11} BBMRI-ERIC, Graz, Austria
\\
\textbf{12} Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain
\\
\textbf{13} Computer Science \DIFdelbegin \DIFdel{Dept.}\DIFdelend \DIFaddbegin \DIFadd{Department}\DIFaddend , Università degli Studi di Torino, Torino, Italy
\\
\textbf{14} Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Shizuoka, Japan
\\
\textbf{15} Institute for Advanced Academic Research, Chiba University, Chiba, Japan
\\
\textbf{16} Sator, \DIFdelbegin \DIFdel{Inc.}\DIFdelend \DIFaddbegin \DIFadd{Incorporated}\DIFaddend , Tokyo, Japan
\\
\textbf{17} Department of Computer Science, The University of Manchester, Manchester, United Kingdom
\\
\textbf{18} Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
\\
\bigskip
% Insert additional author notes using the symbols described below. Insert symbol callouts after author names as necessary.
%
% Remove or comment out the author notes below if they aren't used.
%
% Primary Equal Contribution Note
%\Yinyang These authors contributed equally to this work.
% Additional Equal Contribution Note
% Also use this double-dagger symbol for special authorship notes, such as senior authorship.
%\ddag These authors also contributed equally to this work.
% Current address notes
%\textcurrency Current Address: Dept/Program/Center, Institution Name, City, State, Country % change symbol to "\textcurrency a" if more than one current address note
% \textcurrency b Insert second current address
% \textcurrency c Insert third current address
% Deceased author note
%\dag Deceased
% Group/Consortium Author Note
%\textpilcrow Membership list can be found in the Acknowledgments section.
% Use the asterisk to denote corresponding authorship and provide email address in note below.
* simone.leo@crs4.it \DIFaddbegin \DIFadd{(SL)
}\DIFaddend
\end{flushleft}
% Please keep the abstract below 300 words
\section*{Abstract}
Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products.
Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing.
However, existing approaches tend to lack interoperable adoption across workflow management systems.
In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated \DIFdelbegin \DIFdel{products }\DIFdelend \DIFaddbegin \DIFadd{objects }\DIFaddend (inputs, outputs, code, etc.).
The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects.
Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems.
We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems.
Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain.
% Disable for preprint
\linenumbers
% \emph{The below is a snapshot as of ``Overleaf 2023-12-07'' from
% \url{https://docs.google.com/document/d/1rq22Vu_lmmRLkmnZivsKVdRidq4aoePs-l20gHFYpu0/edit}}
\section{Introduction}\label{introduction}
A crucial part of scientific research is recording the provenance of its outputs.
The W3C PROV standard defines provenance as ``a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing''\DIFaddbegin \DIFadd{~}\DIFaddend \cite{Moreau 2013}.
Provenance is instrumental to activities such as traceability, reproducibility,
accountability, and quality assessment\DIFaddbegin \DIFadd{~}\DIFaddend \cite{Herschel 2017}.
The constantly growing size and complexity of scientific datasets and the analysis that is required to extract useful information from them has made science increasingly dependent on advanced automated processing techniques in order to get from experimental data to final results~\cite{Himanen 2019, Gauthier 2019, Huntingford 2019}.
Consequently, a large part of the provenance information for scientific outputs consists of descriptions of complex computer-aided data processing steps. This data processing is often expressed as workflows \DIFdelbegin \DIFdel{, }\DIFdelend \DIFaddbegin \DIFadd{-- }\DIFaddend i.e., high-level applications that coordinate multiple tools and manage intermediate outputs in order to produce the final results.
In order to homogenise the collection and interchange of provenance records, the W3C consortium proposed \DIFdelbegin \DIFdel{the }\DIFdelend \DIFaddbegin \DIFadd{a standard for representing provenance in the Web (PROV ~\mbox{%DIFAUXCMD
\cite{Moreau 2013}}\hskip0pt%DIFAUXCMD
), along with the PROV ontology (}\DIFaddend PROV-O\DIFdelbegin \DIFdel{standard}\DIFdelend \DIFaddbegin \DIFadd{)}\DIFaddend ~\cite{Lebo 2013}, an OWL\DIFaddbegin \DIFadd{~}\DIFaddend \cite{W3C OWL Working Group 2012} representation of PROV\DIFdelbegin \DIFdel{for provenance in the Web. }\DIFdelend \DIFaddbegin \DIFadd{. %DIF > , an representation of PROV for provenance in the Web.
}\DIFaddend PROV-O has been widely extended for workflows (\DIFaddbegin \DIFadd{e.g., }\DIFaddend D-PROV\DIFaddbegin \DIFadd{~}\DIFaddend \cite{Missier 2013}, ProvONE\DIFaddbegin \DIFadd{~}\DIFaddend \cite{Cuevas-Vicenttin 2016}, OPMW\DIFdelbegin \DIFdel{\mbox{%DIFAUXCMD
\cite{Garijo 2011}}\hskip0pt%DIFAUXCMD
}\DIFdelend \DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{Garijo 2011} }\hskip0pt%DIFAUXCMD
(Open Provenance Model for Workflows)}\DIFaddend , P-PLAN\DIFaddbegin \DIFadd{~}\DIFaddend \cite{Garijo 2012}), where provenance information is collected in two main forms: prospective and retrospective~\cite{Freire 2008}. \emph{Prospective provenance} -- the execution plan -- is essentially the workflow itself: it includes a machine-readable specification with the processing steps to be performed and the data and software dependencies to carry out each computation.
\emph{Retrospective provenance} refers to what actually happened during an execution \DIFdelbegin \DIFdel{, }\DIFdelend \DIFaddbegin \DIFadd{-- }\DIFaddend i.e.~what were the values of the input parameters, which outputs were produced, which tools were executed, how much time did the execution take, whether the execution was successful or not, etc.
Retrospective provenance \DIFdelbegin \DIFdel{can also }\DIFdelend \DIFaddbegin \DIFadd{may }\DIFaddend be represented at different levels of abstraction\DIFdelbegin \DIFdel{depending on available computing resources: for instance, by the workflow execution becoming a single activity which produces results,
by specifying the }\DIFdelend \DIFaddbegin \DIFadd{, depending on the information that is available and/or required: a workflow execution may be interpreted
}\begin{inlineenum}
\item \DIFadd{as a single end-to-end activity,
}\item \DIFadd{as a set of }\DIFaddend individual execution of \DIFdelbegin \DIFdel{each workflow step, or
}\DIFdelend \DIFaddbegin \DIFadd{workflow steps, or
}\item \DIFaddend by going a step further and indicating how each step is divided into sub-processes when a workflow is deployed in a cluster.
\DIFdelbegin %DIFDELCMD <
%DIFDELCMD < %%%
\DIFdel{Different workflow systems have adopted and extended PROV (}\DIFdelend \DIFaddbegin \end{inlineenum}
\DIFadd{Various workflow management systems, such as WINGS~\mbox{%DIFAUXCMD
\cite{Gil 2011} }\hskip0pt%DIFAUXCMD
(Workflow INstance Generation and Specialization) and VisTrails~\mbox{%DIFAUXCMD
\cite{Scheidegger 2008,Costa 2013}}\hskip0pt%DIFAUXCMD
, have adopted PROV }\DIFaddend and its PROV-O representation \DIFdelbegin \DIFdel{) to the workflow domain (WINGS \mbox{%DIFAUXCMD
\cite{Gil 2011, Garijo 2014}}\hskip0pt%DIFAUXCMD
, VisTrails \mbox{%DIFAUXCMD
\cite{Scheidegger 2008,Costa 2013}}\hskip0pt%DIFAUXCMD
), in order to ease the }\DIFdelend \DIFaddbegin \DIFadd{to lift the }\DIFaddend burden of provenance collection from tool \DIFdelbegin \DIFdel{developers to workflow management systems (WMS) }\DIFdelend \DIFaddbegin \DIFadd{users and developers~}\DIFaddend \cite{Atkinson 2017,Perez 2018}.
D-PROV, PROV-ONE, \DIFdelbegin \DIFdel{OPMW-PROV, P-Plan }\DIFdelend \DIFaddbegin \DIFadd{OPMW, P-PLAN }\DIFaddend propose representations of workflow plans and their respective executions, taking into account the features of the workflow systems implementing them (e.g., hierarchical representations, sub-processes, etc.).
Other data models\DIFdelbegin \DIFdel{like }\DIFdelend \DIFaddbegin \DIFadd{, such as }\DIFaddend \emph{wfprov} and \emph{wfdesc}\DIFdelbegin \DIFdel{\mbox{%DIFAUXCMD
\cite{Belhajjame 2015} }\hskip0pt%DIFAUXCMD
}\DIFdelend \DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{Belhajjame 2015}}\hskip0pt%DIFAUXCMD
, }\DIFaddend go a step further by considering not only the link between plans and executions, but \DIFaddbegin \DIFadd{also }\DIFaddend how to package the various artefacts as a Research Object (RO)~\cite{Bechhofer 2013} \DIFdelbegin \DIFdel{in order to ease portability while keeping }\DIFdelend \DIFaddbegin \DIFadd{to improve metadata interoperability and document }\DIFaddend the context of a digital experiment.
However, while these models address some workflow provenance representation issues, they have two main limitations: \DIFdelbegin \DIFdel{firstly}\DIFdelend \DIFaddbegin \DIFadd{first}\DIFaddend , the extensions of PROV are not directly interoperable because of differences in \DIFdelbegin \DIFdel{granularity }\DIFdelend \DIFaddbegin \DIFadd{their granularities }\DIFaddend or different assumptions in their workflow representations; \DIFdelbegin \DIFdel{secondly}\DIFdelend \DIFaddbegin \DIFadd{second}\DIFaddend , their support from \DIFdelbegin \DIFdel{WMS }\DIFdelend \DIFaddbegin \DIFadd{Workflow Management Systems (WMS) }\DIFaddend is typically one system per model. An early approach to unify and integrate workflow provenance traces across \DIFdelbegin \DIFdel{WMS was WEST (}\DIFdelend \DIFaddbegin \DIFadd{WMSs was the }\DIFaddend Workflow Ecosystems through STandards \DIFdelbegin \DIFdel{) \mbox{%DIFAUXCMD
\cite{Garijo 2014}}\hskip0pt%DIFAUXCMD
, through the use of WINGS \mbox{%DIFAUXCMD
\cite{Gil 2011} }\hskip0pt%DIFAUXCMD
}\DIFdelend \DIFaddbegin \DIFadd{(WEST)~\mbox{%DIFAUXCMD
\cite{Garijo 2014}}\hskip0pt%DIFAUXCMD
, which used WINGS }\DIFaddend to build workflow templates and different converters. In all of these workflow provenance models, the emphasis is on the workflow execution structure as a directed graph, with only partial references for the data items.
The REPRODUCE-ME ontology\DIFaddbegin \DIFadd{~}\DIFaddend \cite{Samuel 2022} extended PROV and \DIFdelbegin \DIFdel{P-Plan }\DIFdelend \DIFaddbegin \DIFadd{P-PLAN }\DIFaddend to explain the overall scientific process with the experimental context including real life objects (e.g. instruments, specimens) and human activities (e.g. lab protocols, screening), demonstrating provenance of individual Jupyter Notebook cells\DIFdelbegin \DIFdel{(}%DIFDELCMD < \url{https://sheeba-samuel.github.io/REPRODUCE-ME/research/provbook.html}%%%
\DIFdel{) }\DIFdelend \DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{Samuel 2018} }\hskip0pt%DIFAUXCMD
}\DIFaddend and highlighting the need for provenance also where there is no workflow management system.
More recently, interoperability \DIFdelbegin \DIFdel{have }\DIFdelend \DIFaddbegin \DIFadd{has }\DIFaddend been partially addressed by Common \DIFdelbegin \DIFdel{Worlflow }\DIFdelend \DIFaddbegin \DIFadd{Workflow }\DIFaddend Language Prov (CWLProv)~\cite{Khan 2019}, which represents workflow enactments as \DIFdelbegin \DIFdel{ROs }\DIFdelend \DIFaddbegin \DIFadd{research objects }\DIFaddend serialised according to the Big Data Bag \DIFdelbegin \DIFdel{(BDBag) }\DIFdelend approach~\cite{Chard 2016}.
The resulting format is a folder containing several data and metadata files~\cite{Soiland-Reyes 2018}, expanding on the \DIFdelbegin \DIFdel{RO }\DIFdelend \DIFaddbegin \DIFadd{Research Object }\DIFaddend Bundle approach of Taverna\DIFaddbegin \DIFadd{~}\DIFaddend \cite{Soiland-Reyes 2016}.
CWLProv also extends PROV with a representation of executed processes (activities), their inputs and outputs (entities) and their executors (agents), together with their Common Workflow Language \DIFdelbegin \DIFdel{specification
}\DIFdelend \DIFaddbegin \DIFadd{(CWL) specification~}\DIFaddend \cite{Crusoe 2022} -- a standard workflow specification adopted by at least a dozen different workflow systems\DIFdelbegin \DIFdel{(}%DIFDELCMD < \url{https://www.commonwl.org/implementations/}%%%
\DIFdel{)}\DIFdelend \DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{cwl-implementations}}\hskip0pt%DIFAUXCMD
}\DIFaddend . Although CWLProv includes prospective provenance as a \emph{plan}
within PROV (based on the \emph{wfdesc} model), in practice its implementation does not include tool definitions or file formats\DIFdelbegin \DIFdel{, as proposed by the wfdesc extension Roterms (}%DIFDELCMD < \url{https://wf4ever.github.io/ro/2016-01-28/roterms}%%%
\DIFdel{).In order }\DIFdelend \DIFaddbegin \DIFadd{.%DIF > , as proposed by the wfdesc extension Roterms~\cite{Soiland-Reyes 2015}.
Thus, }\DIFaddend for CWLProv consumers to reconstruct the full prospective provenance for understanding the workflow, they would also need to inspect the separate workflow definition in the native language of the \DIFdelbegin \DIFdel{WMS}\DIFdelend \DIFaddbegin \DIFadd{workflow management system}\DIFaddend .
Additionally, the CWLProv RO may include several other metadata files and PROV serialisations conforming to different formats, complicating its generation and consumption.
As for granularity, CWLProv \DIFdelbegin \DIFdel{proposed }\DIFdelend \DIFaddbegin \DIFadd{proposes }\DIFaddend multiple levels of provenance\DIFdelbegin \DIFdel{\mbox{%DIFAUXCMD
\cite[figure 2]{Khan 2019}}\hskip0pt%DIFAUXCMD
}\DIFdelend \DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite[Figure 2]{Khan 2019}}\hskip0pt%DIFAUXCMD
}\DIFaddend , from Level 0 (capturing workflow definition) to Level 3 (domain-specific annotations).
In practice, the CWL reference implementation \emph{cwltool}\DIFaddbegin \DIFadd{~}\DIFaddend \cite{Amstutz 2023} and the corresponding CWLProv specification\DIFdelbegin \DIFdel{\mbox{%DIFAUXCMD
\cite{Soiland-Reyes 2018} }\hskip0pt%DIFAUXCMD
records }\DIFdelend \DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{Soiland-Reyes 2018} }\hskip0pt%DIFAUXCMD
record }\DIFaddend provenance details of all task executions together with the intermediate data and any nested workflows (CWLProv level 2)\DIFdelbegin \DIFdel{, a granularity level that }\DIFdelend \DIFaddbegin \DIFadd{. This level of granularity }\DIFaddend requires substantial support from the \DIFdelbegin \DIFdel{WMS.
This approach is }\DIFdelend \DIFaddbegin \DIFadd{workflow management system implementing the CWL specification, resulting }\DIFaddend appropriate for workflow languages where the execution plan, including its distribution among the various tasks, is well known in advance\DIFdelbegin \DIFdel{(such as CWL)}\DIFdelend .
However, it can be at odds with other systems where the execution is more dynamic, depending on the verification of specific runtime conditions, such as the size and distribution of the data (e.g., COMPSs~\cite{Lordan 2014}).
This \DIFaddbegin \DIFadd{design }\DIFaddend makes the implementation of CWLProv challenging, \DIFdelbegin \DIFdel{as shown by the fact that }\DIFdelend \DIFaddbegin \DIFadd{which the authors suspect may be one of the main causes for the low adoption of CWLProv (}\DIFaddend at the time of writing the format is supported only by cwltool\DIFaddbegin \DIFadd{)}\DIFaddend .
Finally, being based on the PROV model, CWLProv is highly focused on the interaction between agents, processes and related entities, while support for contextual metadata (such as workflow authors, licence or creation date) in the Research Object Bundle is limited\DIFdelbegin \DIFdel{(}%DIFDELCMD < \url{https://w3id.org/bundle/context}%%%
\DIFdel{) }\DIFdelend \DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{rob-context} }\hskip0pt%DIFAUXCMD
}\DIFaddend and stored in a separate manifest file, \DIFdelbegin \DIFdel{that }\DIFdelend \DIFaddbegin \DIFadd{which }\DIFaddend includes the data identifier mapping to filenames.
A project that uses serialised \DIFdelbegin \DIFdel{ROs }\DIFdelend \DIFaddbegin \DIFadd{Research Objects }\DIFaddend similar to those used by CWLProv is Whole Tale\DIFaddbegin \DIFadd{~}\DIFaddend \cite{Chard 2019}, a web platform with a focus on the narrative around scientific studies and their reproducibility, where the serialised ROs are used to export data and metadata from the platform. In contrast, our work is primarily focused on the ability to capture the provenance of computational workflow execution including its data and executable workflow definitions.
RO-Crate~\cite{Soiland-Reyes 2022a} is \DIFdelbegin \DIFdel{a recent approach to }\DIFdelend \DIFaddbegin \DIFadd{an approach for }\DIFaddend packaging research data together with their metadata \DIFdelbegin \DIFdel{; it }\DIFdelend \DIFaddbegin \DIFadd{and associated resources. RO-Crate }\DIFaddend extends Schema.org~\cite{Guha 2015}, a popular vocabulary for describing resources on the Web.
In its simplest form, an RO-Crate is a directory structure that contains a single JSON-LD\DIFaddbegin \DIFadd{~}\DIFaddend \cite{w3-json-ld} metadata file at the top level.
The metadata file describes all entities stored in the RO-Crate along with their relationships\DIFdelbegin \DIFdel{; }\DIFdelend \DIFaddbegin \DIFadd{, and }\DIFaddend it is both machine-readable and human-readable.
RO-Crate is general enough to be able to describe any dataset, but can also be made as specific as needed through the use of extensions called \emph{profiles}. \DIFdelbegin \DIFdel{At the same time, the }\DIFdelend \DIFaddbegin \DIFadd{Profiles describe ``a set of conventions, types and properties that one minimally can require and expect to be present in that subset of RO-Crates"~\mbox{%DIFAUXCMD
\cite{profiles-ro-crate}}\hskip0pt%DIFAUXCMD
.
The }\DIFaddend broad set of types and properties from Schema.org, complemented by a few additional terms from other vocabularies, make the RO-Crate model \DIFdelbegin \DIFdel{capable of }\DIFdelend \DIFaddbegin \DIFadd{a candidate for }\DIFaddend expressing a wide range of contextual information that complements and enriches the core information specified by the profile.
This \DIFaddbegin \DIFadd{information }\DIFaddend may include, among others, the workflow authors and their affiliations, associated publications, licensing information, related software, etc.
This \DIFdelbegin \DIFdel{is an approach }\DIFdelend \DIFaddbegin \DIFadd{approach is }\DIFaddend used by WorkflowHub\DIFaddbegin \DIFadd{~}\DIFaddend \cite{Goble 2021}, a \DIFdelbegin \DIFdel{workflow system agnostic workflow }\DIFdelend \DIFaddbegin \DIFadd{workflow-system-agnostic workflow }\DIFaddend registry which specifies a Workflow RO-Crate profile\DIFaddbegin \DIFadd{~}\DIFaddend \cite{Bacall 2022} to gather the workflow definition with such metadata in an archived RO-Crate.
In this work, we present \textbf{Workflow Run RO-Crate} (WRROC), an extension of RO-Crate for representing computational workflow execution provenance.
Our main contributions \DIFdelbegin \DIFdel{are the following:
}%DIFDELCMD <
%DIFDELCMD < %%%
\DIFdelend \DIFaddbegin \DIFadd{include:
%DIF >
}\DIFaddend \begin{itemize}
\item \DIFdelbegin \DIFdel{A }\DIFdelend \DIFaddbegin \DIFadd{a }\DIFaddend collection of RO-Crate profiles to represent and package both the prospective and the retrospective provenance of a computational workflow run in a way that is machine-actionable\DIFdelbegin \DIFdel{\mbox{%DIFAUXCMD
\cite{Batista 2022}}\hskip0pt%DIFAUXCMD
, independent }\DIFdelend \DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{Batista 2022}}\hskip0pt%DIFAUXCMD
, independently }\DIFaddend of the specific workflow language or execution system, and including support for \DIFdelbegin \DIFdel{reexecution.
}\DIFdelend \DIFaddbegin \DIFadd{re-execution;
}\DIFaddend \item \DIFdelbegin \DIFdel{Implementations of the }\DIFdelend \DIFaddbegin \DIFadd{implementations of this new }\DIFaddend model in six workflow management systems and \DIFaddbegin \DIFadd{in }\DIFaddend one conversion tool\DIFaddbegin \DIFadd{;
}\DIFaddend \item \DIFdelbegin \DIFdel{A }\DIFdelend \DIFaddbegin \DIFadd{a }\DIFaddend mapping of our profiles against the W3C PROV-O Standard using the Simple Knowledge Organisation System (SKOS)\DIFdelbegin \DIFdel{\mbox{%DIFAUXCMD
\cite{Isaac 2009}
}\hskip0pt%DIFAUXCMD
}\DIFdelend \DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{Isaac 2009}}\hskip0pt%DIFAUXCMD
.
}\DIFaddend \end{itemize}
To foster usability, the profiles are characterised by different levels of detail, and the set of mandatory metadata items is kept to a minimum in order to ease the implementation.
This flexible approach increases the model's adaptability to the diverse landscape of \DIFdelbegin \DIFdel{WMS }\DIFdelend \DIFaddbegin \DIFadd{WMSs }\DIFaddend used in practice.
The base profile, in particular, is applicable to any kind of computational process, not necessarily described in a formal workflow language.
All profiles are supported and sustained by the Workflow Run RO-Crate community, which meets regularly to discuss extensions, issues and new implementations.
The rest of this work is organised as follows: we first describe the Workflow Run RO-Crate profiles \DIFaddbegin \DIFadd{in Section~\ref{the-workflow-run-ro-crate-profiles}}\DIFaddend ; we then illustrate implementations \DIFaddbegin \DIFadd{in Section~\ref{implementations} }\DIFaddend and usage examples \DIFdelbegin \DIFdel{; this is followed by a discussion and }\DIFdelend \DIFaddbegin \DIFadd{in Section~\ref{exemplary-use-cases}; finally, we include a discussion in Section~\ref{discussion} and we conclude the paper with our }\DIFaddend plans for future work \DIFaddbegin \DIFadd{in Section~\ref{conclusion}}\DIFaddend .
%%
\section{The Workflow Run RO-Crate profiles}\label{the-workflow-run-ro-crate-profiles}
RO-Crate profiles are extensions of the base RO-Crate specification that describe how to represent the \DIFdelbegin \DIFdel{entities }\DIFdelend \DIFaddbegin \DIFadd{classes }\DIFaddend and relationships that appear in a specific domain or use case.
An RO-Crate conforming to a profile is not just machine-readable, but also machine-actionable\DIFaddbegin \DIFadd{, }\DIFaddend as a digital object whose type is represented by the profile itself~\cite{Soiland-Reyes 2022b}.
The Workflow Run RO-Crate profiles are the main outcome of the activities of the Workflow Run RO-Crate Community\DIFdelbegin \DIFdel{(}%DIFDELCMD < \url{https://www.researchobject.org/workflow-run-crate}%%%
\DIFdel{)}\DIFdelend \DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{wrroc-site}}\hskip0pt%DIFAUXCMD
}\DIFaddend , an open working group that includes workflow users and developers, WMS users and developers, and researchers and software engineers interested in workflow execution provenance and Findable, Accessible, Interoperable and Reusable (FAIR) approaches for data and software.
\DIFdelbegin \DIFdel{In order to develop the }\DIFdelend %DIF > In order to develop the Workflow-Run RO-Crate profiles, one of the first community efforts was to compile a list of requirements in the form of competency questions~\cite{wrroc-cqs} to be addressed by the model.
%DIF >
\DIFaddbegin \DIFadd{One of the first steps in the development of the }\DIFaddend Workflow-Run RO-Crate profiles \DIFdelbegin \DIFdel{, one of the first community efforts }\DIFdelend was to compile a list of requirements \DIFaddbegin \DIFadd{to be addressed by the model from all interested participants, }\DIFaddend in the form of \DIFdelbegin \DIFdel{competency questions (}%DIFDELCMD < \url{https://www.researchobject.org/workflow-run-crate/requirements}%%%
\DIFdel{)to be addressed by the model. }\DIFdelend \DIFaddbegin \textit{\DIFadd{competency questions}}\DIFadd{~(CQs)~\mbox{%DIFAUXCMD
\cite{wrroc-cqs}}\hskip0pt%DIFAUXCMD
.
%DIF >
The process also included reviewing existing state of the art models, such as wfprov~\mbox{%DIFAUXCMD
\cite{Belhajjame 2015}}\hskip0pt%DIFAUXCMD
, ProvONE~\mbox{%DIFAUXCMD
\cite{Cuevas-Vicenttin 2016} }\hskip0pt%DIFAUXCMD
or OPMW~\mbox{%DIFAUXCMD
\cite{Garijo 2011}}\hskip0pt%DIFAUXCMD
. The result was the definition of 11 CQs capturing requirements which span a broad application scope and consider different levels of provenance granularity.
%DIF >
}\DIFaddend Each requirement was \DIFdelbegin \DIFdel{backed up }\DIFdelend \DIFaddbegin \DIFadd{supported }\DIFaddend by a rationale and linked to a GitHub issue to drive the public discussion forward. When a requirement was addressed, related changes were integrated into the profiles and the relevant issue was closed. \DIFdelbegin \DIFdel{Many of }\DIFdelend \DIFaddbegin \DIFadd{All }\DIFaddend the original issues are now closed, and the profiles have had \DIFdelbegin \DIFdel{four }\DIFdelend \DIFaddbegin \DIFadd{five }\DIFaddend official releases on Zenodo\DIFdelbegin \DIFdel{.
}\DIFdelend \DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{WRROC 2024a, WRROC 2024b, WRROC 2024c}}\hskip0pt%DIFAUXCMD
.
%DIF > Daniel started recording changes from here, sorry I did not do it before
%DIF >
The target of several of the original CQs evolved during profile development, as the continuous discussion within the community highlighted the main points to be addressed. This continuous process is reflected in the corresponding issues and pull requests in the community's GitHub repository. The final implementation of the CQs in the profiles is validated with SPARQL queries that can be run on RO-Crate metadata samples, also available on the GitHub repository~\mbox{%DIFAUXCMD
\cite{cqs-sparql-queries}}\hskip0pt%DIFAUXCMD
.
}\DIFaddend
As requirements were being defined, it became apparent that one single profile would not have been sufficient to cater for all possible usage scenarios.
In particular, while some use cases required a detailed description of all computations orchestrated by the workflow, others were only concerned with a ``black box'' representation of the workflow and its execution as a whole (i.e., whether the \DIFdelbegin \DIFdel{execution }\DIFdelend \DIFaddbegin \DIFadd{workflow execution as a whole }\DIFaddend was successful and which results were obtained).
Additionally, some computations involve a data flow across multiple applications that are executed without the aid of a WMS and thus are not formally described in a standard workflow language.
These observations led to the development of three profiles:
\DIFdelbegin \DIFdel{(1) Process Run Crate (}%DIFDELCMD < \url{https://w3id.org/ro/wfrun/process}%%%
\DIFdel{)
}\DIFdelend \DIFaddbegin \begin{enumerate}
\item \textit{\DIFadd{Process Run Crate}}\DIFadd{,
}\DIFaddend to describe the execution of one or more tools that contribute to a computation;
\DIFdelbegin \DIFdel{(2) Workflow Run Crate (}%DIFDELCMD < \url{https://w3id.org/ro/wfrun/workflow}%%%
\DIFdel{)
}\DIFdelend \DIFaddbegin \item \textit{\DIFadd{Workflow Run Crate}}\DIFadd{,
}\DIFaddend to describe a computation orchestrated by a predefined workflow;
\DIFdelbegin \DIFdel{(3) Provenance Run Crate (}%DIFDELCMD < \url{https://w3id.org/ro/wfrun/provenance}%%%
\DIFdel{)
}\DIFdelend \DIFaddbegin \item \textit{\DIFadd{Provenance Run Crate}}\DIFadd{,
}\DIFaddend to describe a workflow computation including the internal details of individual step executions.
\DIFaddbegin \end{enumerate}
\DIFaddend
In the rest of this section we describe each of \DIFdelbegin \DIFdel{the above }\DIFdelend \DIFaddbegin \DIFadd{these }\DIFaddend profiles in detail. We use \DIFaddbegin \DIFadd{the term ``class'' to refer to a type as defined in RDF(s) and ``entity'' to refer to an instance of a class. We use }\DIFaddend italics to denote the \DIFdelbegin \DIFdel{types and properties describing entities and their relationships}\DIFdelend \DIFaddbegin \DIFadd{properties and classes in each profile}\DIFaddend : these are defined in the RO-Crate JSON-LD context\DIFdelbegin \DIFdel{(}%DIFDELCMD < \url{https://www.researchobject.org/ro-crate/1.1/context.jsonld}%%%
\DIFdel{)}\DIFdelend \DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{roc-context}}\hskip0pt%DIFAUXCMD
}\DIFaddend , which extends Schema.org with terms from the Bioschemas\DIFaddbegin \DIFadd{~}\DIFaddend \cite{Gray 2017} ComputationalWorkflow profile\DIFdelbegin \DIFdel{(}%DIFDELCMD < \url{https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE}%%%
\DIFdel{) }\DIFdelend \DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{computational-workflow-profile} }\hskip0pt%DIFAUXCMD
}\DIFaddend and other vocabularies.
\DIFdelbegin \DIFdel{More specifically, from Bioschemas we use the }\emph{\DIFdel{ComputationalWorkflow}} %DIFAUXCMD
\DIFdel{and }\emph{\DIFdel{FormalParameter}} %DIFAUXCMD
\DIFdel{types as well as the }\emph{\DIFdel{input}} %DIFAUXCMD
\DIFdel{and }\emph{\DIFdel{output}} %DIFAUXCMD
\DIFdel{properties.
Note that these terms , though }\DIFdelend \DIFaddbegin \DIFadd{Note that terms }\DIFaddend coming from Bioschemas \DIFdelbegin \DIFdel{, }\DIFdelend are not specific to the life sciences.
We also developed a \DIFdelbegin \DIFdel{context extension through a dedicated ``workflow-run'' namespace (}%DIFDELCMD < \url{https://w3id.org/ro/terms/workflow-run\#}%%%
\DIFdel{) }\DIFdelend \DIFaddbegin \DIFadd{dedicated term set~\mbox{%DIFAUXCMD
\cite{wrroc-terms} }\hskip0pt%DIFAUXCMD
}\DIFaddend to represent concepts that are not captured by terms in the RO-Crate context. \DIFaddbegin \DIFadd{New terms are defined in RDF(s) following Schema.org guidelines (i.e., using }\emph{\DIFadd{domainIncludes}} \DIFadd{and }\emph{\DIFadd{rangeIncludes}} \DIFadd{to define domains and ranges of properties).
In the rest of the text and images, the following prefixes are used to represent the corresponding namespaces:
}\begin{tabular}{rcl}
\emph{\DIFadd{s:}} & \DIFadd{$\rightarrow$ }& \url{https://schema.org/} \\
\emph{\DIFadd{bioschemas:}}& \DIFadd{$\rightarrow$ }& \url{https://bioschemas.org/} \\
\emph{\DIFadd{bsp:}} & \DIFadd{$\rightarrow$ }& \url{https://bioschemas.org/properties/} \\
\emph{\DIFadd{wfrun:}} & \DIFadd{$\rightarrow$ }& \url{https://w3id.org/ro/terms/workflow-run\#} \\
\end{tabular}
\DIFaddend
\subsection{Process Run Crate}\label{process-run-crate}
The Process Run Crate profile\DIFdelbegin \DIFdel{\mbox{%DIFAUXCMD
\cite{WRROC 2023a} }\hskip0pt%DIFAUXCMD
contains specifications on describing }\DIFdelend \DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{WRROC 2024a} }\hskip0pt%DIFAUXCMD
contains specifications to describe }\DIFaddend the execution of one or more software applications that contribute to the same overall computation, but are not necessarily coordinated by a top-level workflow or script \DIFdelbegin \DIFdel{.
For instance, they could be }\DIFdelend \DIFaddbegin \DIFadd{(e.g. when }\DIFaddend executed manually by a human\DIFdelbegin \DIFdel{agent}\DIFdelend , one after the other as intermediate datasets become available\DIFdelbegin \DIFdel{, as shown in the process run crate (}%DIFDELCMD < \url{https://w3id.org/ro/doi/10.5281/zenodo.6913045}%%%
\DIFdel{)from \mbox{%DIFAUXCMD
\cite{Meurisse 2023}}\hskip0pt%DIFAUXCMD
).
}\DIFdelend \DIFaddbegin \DIFadd{).
%DIF > as shown in the process run crate (\url{https://w3id.org/ro/doi/10.5281/zenodo.6913045}) from~\cite{Meurisse 2023}).
}\DIFaddend
\DIFdelbegin \DIFdel{Being }\DIFdelend \DIFaddbegin \DIFadd{The Process Run Crate is }\DIFaddend the basis for all profiles in the WRROC collection\DIFdelbegin \DIFdel{, Process Run Crate }\DIFdelend \DIFaddbegin \DIFadd{. It }\DIFaddend specifies how to describe the fundamental \DIFdelbegin \DIFdel{entities }\DIFdelend \DIFaddbegin \DIFadd{classes }\DIFaddend involved in a computational run: \DIFaddbegin \begin{inlineenum}
\item \DIFaddend a software application \DIFdelbegin \DIFdel{(}\DIFdelend represented by a \DIFdelbegin \emph{\DIFdel{SoftwareApplication}}%DIFAUXCMD
\DIFdel{, }\emph{\DIFdel{SoftwareSourceCode}} %DIFAUXCMD
\DIFdel{or }\emph{\DIFdel{ComputationalWorkflow}} %DIFAUXCMD
\DIFdel{entity) and
its execution (}\DIFdelend \DIFaddbegin \termsorg{SoftwareApplication}\DIFadd{, }\termsorg{SoftwareSourceCode} \DIFadd{or }\termbioschemas{ComputationalWorkflow} \DIFadd{class; and
}\item \DIFadd{its execution, }\DIFaddend represented by a \DIFdelbegin \emph{\DIFdel{CreateAction}} %DIFAUXCMD
\DIFdel{entity), with the latter }\DIFdelend \DIFaddbegin \termsorg{CreateAction} \DIFadd{class, and }\DIFaddend linking to the \DIFdelbegin \DIFdel{former via the }\emph{\DIFdel{instrument}} %DIFAUXCMD
\DIFdel{property.
}\DIFdelend \DIFaddbegin \DIFadd{application via the }\termsorg{instrument} \DIFadd{property.
}\end{inlineenum}
\DIFaddend Other important properties of the
\DIFdelbegin \emph{\DIFdel{CreateAction}} %DIFAUXCMD
\DIFdel{entity are }\emph{\DIFdel{object}}%DIFAUXCMD
\DIFdelend \DIFaddbegin \termsorg{CreateAction} \DIFadd{class are }\termsorg{object}\DIFaddend , which links to the action's inputs, and \DIFdelbegin \emph{\DIFdel{result}}%DIFAUXCMD
\DIFdelend \DIFaddbegin \termsorg{result}\DIFaddend , which links to its outputs.
The time the execution started and ended can be provided, respectively, via the
\DIFdelbegin \emph{\DIFdel{startTime}} %DIFAUXCMD
\DIFdel{and }\emph{\DIFdel{endTime}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \termsorg{startTime} \DIFadd{and }\termsorg{endTime} \DIFaddend properties.
The \DIFdelbegin \emph{\DIFdel{Person}} %DIFAUXCMD
\DIFdel{or
}\emph{\DIFdel{Organization}} %DIFAUXCMD
\DIFdel{entity }\DIFdelend \DIFaddbegin \termsorg{Person} \DIFadd{or
}\termsorg{Organization} \DIFadd{class }\DIFaddend that performed the action is \DIFdelbegin \DIFdel{referred to via the }\emph{\DIFdel{agent}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \DIFadd{specified via the }\termsorg{agent} \DIFaddend property.
Fig~\ref{fig:process_crate_er} shows the \DIFdelbegin \DIFdel{entities }\DIFdelend \DIFaddbegin \DIFadd{classes }\DIFaddend used in Process Run Crate together with their relationships.
\begin{figure}[!h]
%DIF < \includegraphics[width=\textwidth]{image1.png}
%DIF < \includegraphics[width=26em]{wrroc-figure1.drawio.pdf}
%DIF > figure-process-rc-uml
%\includegraphics[width=26em]{Fig1.eps}
\caption{{\bf UML class diagram for Process Run Crate.}
The central \DIFdelbeginFL \DIFdelFL{entity }\DIFdelendFL \DIFaddbeginFL \DIFaddFL{class }\DIFaddendFL is the \DIFdelbeginFL \emph{\DIFdelFL{CreateAction}}%DIFAUXCMD
\DIFdelendFL \DIFaddbeginFL \termsorg{CreateAction}\DIFaddendFL , which represents the execution of an application.
It \DIFdelbeginFL \DIFdelFL{relates with }\DIFdelendFL \DIFaddbeginFL \DIFaddFL{links to }\DIFaddendFL the application itself via \DIFdelbeginFL \emph{\DIFdelFL{instrument}}%DIFAUXCMD
\DIFdelendFL \DIFaddbeginFL \termsorg{instrument}\DIFaddendFL , \DIFdelbeginFL \DIFdelFL{with }\DIFdelendFL \DIFaddbeginFL \DIFaddFL{to }\DIFaddendFL the entity that executed it via \DIFdelbeginFL \emph{\DIFdelFL{agent}} %DIFAUXCMD
\DIFdelendFL \DIFaddbeginFL \termsorg{agent}\DIFaddFL{, }\DIFaddendFL and \DIFdelbeginFL \DIFdelFL{with }\DIFdelendFL \DIFaddbeginFL \DIFaddFL{to }\DIFaddendFL its inputs and outputs via \DIFdelbeginFL \emph{\DIFdelFL{object}}
%DIFAUXCMD
\DIFdelendFL \DIFaddbeginFL \termsorg{object}
\DIFaddendFL and \DIFdelbeginFL \emph{\DIFdelFL{result}}%DIFAUXCMD
\DIFdelendFL \DIFaddbeginFL \termsorg{result}\DIFaddendFL , respectively.
\DIFdelbeginFL \emph{\DIFdelFL{File}} %DIFAUXCMD
\DIFdelFL{is an RO-Crate alias for Schema}\DIFdelendFL \DIFaddbeginFL \DIFaddFL{In this and following figures, classes and properties are shown with prefixes to indicate their origin}\DIFaddendFL . \DIFdelbeginFL \DIFdelFL{org's }\emph{\DIFdelFL{MediaObject}}%DIFAUXCMD
\DIFdelFL{.
}\DIFdelendFL %DIF > , note however that the WRROC and RO-Crate JSON-LD contexts map them without needing prefixes. %DG: They do use context, so no prefix is needed...
%DIF > \emph{File} is a mapping in the RO-Crate context to Schema.org's \termsorg{MediaObject}. %SL: we don't mention File anymore
Some inputs (and, less commonly, outputs) \DIFdelbeginFL \DIFdelFL{, however, }\DIFdelendFL are not stored as files or directories, but passed to the application (e.g., via a command line interface) as values of various types (e.g., a number or string). In this case, the profile recommends a representation via \DIFdelbeginFL \emph{\DIFdelFL{PropertyValue}}%DIFAUXCMD
\DIFdelendFL \DIFaddbeginFL \termsorg{PropertyValue}\DIFaddendFL .
For simplicity, we left out the rest of the RO-Crate structure (e.g. the root \DIFdelbeginFL \emph{\DIFdelFL{Dataset}}%DIFAUXCMD
\DIFdelendFL \DIFaddbeginFL \termsorg{Dataset}\DIFaddendFL )\DIFaddbeginFL \DIFaddFL{, and attributes (e.g. }\termsorg{startTime}\DIFaddFL{, }\termsorg{endTime}\DIFaddFL{, }\termsorg{description}\DIFaddFL{, }\termsorg{actionStatus}\DIFaddFL{)}\DIFaddendFL .
In this UML class notation\DIFaddbeginFL \DIFaddFL{, }\DIFaddendFL diamond $\Diamond$ arrows indicate aggregation and regular arrows indicate references, $*$ indicates \DIFdelbeginFL \DIFdelFL{multiple instances}\DIFdelendFL \DIFaddbeginFL \DIFaddFL{zero or more occurrences}\DIFaddendFL , $1$ means single \DIFdelbeginFL \DIFdelFL{instance}\DIFdelendFL \DIFaddbeginFL \DIFaddFL{occurrence}\DIFaddendFL .
%DIF > Prefix and namespace is \emph{s:} \url{https://schema.org/} %already defined in the table above
}
\label{fig:process_crate_er}
\end{figure}
As an example,
suppose a user \DIFdelbegin \DIFdel{called }\DIFdelend \DIFaddbegin \DIFadd{named }\DIFaddend John Doe runs the \DIFdelbegin \texttt{\DIFdel{head}} %DIFAUXCMD
\DIFdel{UNIX command }\DIFdelend \DIFaddbegin \DIFadd{UNIX command }\texttt{\DIFadd{head}} \DIFaddend to extract the first ten lines of an input file named \texttt{lines.txt}, storing the result in another file called \texttt{selection.txt}.
John then runs the \texttt{sort}
\DIFaddbegin \DIFadd{UNIX }\DIFaddend command on \texttt{selection.txt}, storing the sorted output in a new file named \texttt{sorted\_selection.txt}.
\DIFaddbegin
\DIFaddend Fig~\ref{fig:head_sort} contains a diagram of the two actions and their relationships to the other \DIFdelbegin \DIFdel{entities involved }\DIFdelend \DIFaddbegin \DIFadd{involved entities}\DIFaddend .
Note how the actions are connected by the fact that the output of ``Run Head'' is also the input of ``Run Sort'': they form an ``implicit workflow'', whose steps have been executed manually rather than by a software tool.
\begin{figure}[!ht]
%DIF < \includegraphics[width=29em]{image2.png}
%DIF < \includegraphics[width=29em]{wrroc-figure-example.drawio.pdf}
%DIF > figure-example.eps
%\includegraphics[width=29em]{Fig2.eps}
\caption{{\bf Diagram of a simple workflow} where the \texttt{head} and \texttt{sort} programs were run manually by a user.
The executions of the individual software programs are connected by the fact that the file output by \texttt{head} was used as input for \texttt{sort}, documenting the computational flow in an implicit way.
Such executions can be represented with Process Run Crate.
%DIF > Prefix and namespace: \emph{s:} \url{https://schema.org/}
}
\label{fig:head_sort}
\end{figure}
Process Run Crate extends the RO-Crate guidelines on representing software used to create files with additional requirements and conventions.
This arrangement is typical of the RO-Crate approach, where the base specification provides general recommendations to allow for high flexibility, while profiles -- being more concerned with the representation of specific domains and machine actionability -- provide more detailed and structured definitions.
Nevertheless, in order to be broadly applicable, profiles also need to avoid the specification of too many strict requirements, trying to strike a good trade-off between flexibility and actionability.
\DIFdelbegin \DIFdel{One of the implications of this approach is that consumers need to code defensively, avoiding unwarranted assumptions -- e.g. by verifying that a value exists for an optional property before trying to retrieve it and use it.
}\DIFdelend %DIF > One of the implications of this approach is that consumers need to code defensively, avoiding unwarranted assumptions -- e.g. by verifying that a value exists for an optional property before trying to retrieve it and use it.
\subsection{Workflow Run Crate}\label{workflow-run-crate}
The Workflow Run Crate profile\DIFdelbegin \DIFdel{\mbox{%DIFAUXCMD
\cite{WRROC 2023b} }\hskip0pt%DIFAUXCMD
}\DIFdelend \DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{WRROC 2024b} }\hskip0pt%DIFAUXCMD
}\DIFaddend combines the Process Run Crate and WorkflowHub's Workflow RO-Crate\DIFaddbegin \DIFadd{~}\DIFaddend \cite{Bacall 2022} profiles to describe the execution of \DIFdelbegin \DIFdel{``proper'' }\DIFdelend computational workflows managed by a WMS.
Such workflows are typically written in a \DIFdelbegin \DIFdel{special-purpose }\DIFdelend \DIFaddbegin \DIFadd{domain-specific }\DIFaddend language, such as CWL or Snakemake
\cite{Koster 2012}, and run by one or more WMS (e.g., StreamFlow\DIFaddbegin \DIFadd{~}\DIFaddend \cite{Colonnelli 2021}, Galaxy~\cite{Galaxy 2022}).
\DIFaddbegin \DIFadd{Fig~\ref{fig:workflow_crate_er} illustrates the classes used in this profile together with their relationships.
%DIF >
}\DIFaddend As in Process Run Crate, the execution is described by a \DIFdelbegin \emph{\DIFdel{CreateAction}}
%DIFAUXCMD
\DIFdelend \DIFaddbegin \termsorg{CreateAction}
\DIFaddend that links to the application via \DIFdelbegin \emph{\DIFdel{instrument}}%DIFAUXCMD
\DIFdelend \DIFaddbegin \termsorg{instrument}\DIFaddend , but in this case the application must be a workflow, as prescribed by Workflow RO-Crate.
More specifically, Workflow RO-Crate states that the RO-Crate must contain a main workflow typed as \emph{File} \DIFdelbegin \DIFdel{, }\emph{\DIFdel{SoftwareSourceCode}}
%DIFAUXCMD
\DIFdel{and }\emph{\DIFdel{ComputationalWorkflow}}%DIFAUXCMD
\DIFdelend \DIFaddbegin \DIFadd{(an RO-Crate mapping to }\termsorg{MediaObject}\DIFadd{), }\termsorg{SoftwareSourceCode}
\DIFadd{and }\termbioschemas{ComputationalWorkflow}\DIFaddend .
The execution of the individual workflow steps, instead, is not represented: that is left to the more detailed Provenance Run Crate profile (described in the next section).
The Workflow Run \DIFdelbegin \DIFdel{RO-Crate }\DIFdelend \DIFaddbegin \DIFadd{Crate }\DIFaddend profile also contains recommendations on how to represent the workflow's input and output parameters, based on the \DIFdelbegin \DIFdel{aforementioned Bioschemas ~\mbox{%DIFAUXCMD
\cite{Gray 2017} }\hskip0pt%DIFAUXCMD
}\DIFdelend \DIFaddbegin \DIFadd{Bioschemas }\DIFaddend ComputationalWorkflow profile.
All these elements are represented via the \DIFdelbegin \emph{\DIFdel{FormalParameter}} %DIFAUXCMD
\DIFdel{entity }\DIFdelend \DIFaddbegin \termbioschemas{FormalParameter} \DIFadd{class }\DIFaddend and are referenced from the main workflow via the \DIFdelbegin \emph{\DIFdel{input}} %DIFAUXCMD
\DIFdel{and
}\emph{\DIFdel{output}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \termbsp{input} \DIFadd{and
}\termbsp{output} \DIFaddend properties.
While the \DIFdelbegin \DIFdel{entities referenced from
}\emph{\DIFdel{object}} %DIFAUXCMD
\DIFdel{and }\emph{\DIFdel{result}} %DIFAUXCMD
\DIFdel{in the }\emph{\DIFdel{CreateAction}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \DIFadd{classes referenced from
}\termsorg{object} \DIFadd{and }\termsorg{result} \DIFadd{in the }\termsorg{CreateAction} \DIFaddend represent data entities and argument values that were actually used in the workflow execution, the ones referenced from \DIFdelbegin \emph{\DIFdel{input}} %DIFAUXCMD
\DIFdel{and
}\emph{\DIFdel{output}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \termbsp{input} \DIFadd{and
}\termbsp{output} \DIFaddend correspond to formal parameters, which acquire a value when the workflow is run (see Fig\DIFdelbegin \DIFdel{.}\DIFdelend ~\ref{fig:workflow_crate_er}).
In the profile, the relationship between an actual value and the corresponding formal parameter is expressed through the \DIFdelbegin \emph{\DIFdel{exampleOfWork}} %DIFAUXCMD
\DIFdel{property-- the downloadable file is a realisation of the formal parameter definition}\DIFdelend \DIFaddbegin \termsorg{exampleOfWork} \DIFadd{property}\DIFaddend .
For instance, in the following JSON-LD snippet a formal parameter (\texttt{\#annotations}) is illustrated together with a corresponding \texttt{final-annotations.tsv} file:
\DIFdelbegin %DIFDELCMD <
%DIFDELCMD < %%%
\DIFdelend %DIF >
\begin{verbatim}
{
"@id": "#annotations",
"@type": "FormalParameter",
"additionalType": "File",
"encodingFormat": "text/tab-separated-values",
"valueRequired": "True",
"name": "annotations"
},
{
"@id": "final-annotations.tsv",
"@type": "File",
"contentSize": "14784",
"exampleOfWork": {"@id": "#annotations"}
}
\end{verbatim}
\DIFdelbegin \DIFdel{The derivation of Workflow Run Crate from Workflow RO-Crate makes RO-Crates that conform to this profile compatible with the WorkflowHub workflow registry by also conforming to its Workflow RO-Crate profile.
Thus, users of a WMS that implements this profile (or Provenance Run Crate, which inherits it) are able to register their workflows in WorkflowHub -- together with an execution trace -- by simply running them and uploading the resulting RO-Crates.
Additionally, the inheritance mechanism allows to reuse the specifications already developed for Workflow RO-Crate, which form part of the guidelines on representing the prospective provenance.
}\DIFdelend %DIF > % This paragraph now moved to Discussion
%DIF > The derivation of Workflow Run Crate from Workflow RO-Crate makes RO-Crates that conform to this profile compatible with the WorkflowHub workflow registry by also conforming to its Workflow RO-Crate profile.
%DIF > Thus, users of a WMS that implements this profile (or Provenance Run Crate, which inherits it) are able to register their workflows in WorkflowHub -- together with an execution trace -- by simply running them and uploading the resulting RO-Crates.
%DIF > Additionally, the inheritance mechanism allows to reuse the specifications already developed for Workflow RO-Crate, which form part of the guidelines on representing the prospective provenance.
\DIFdelbegin \DIFdel{Fig~\ref{fig:workflow_crate_er} shows the entities used in Workflow Run Crate together with their relationships.
}%DIFDELCMD <
%DIFDELCMD < \begin{figure}[!h]
%DIFDELCMD < %%%
%DIF < \includegraphics[width=26em]{wrroc-figure2.drawio.pdf}
\DIFdelendFL \DIFaddbeginFL \begin{figure}[!htb]
%DIF > figure-workflow-rc-uml
\DIFaddendFL %\includegraphics[width=26em]{Fig3.eps}
\caption{{\bf UML class diagram for Workflow Run Crate.}
The main differences with Process Run Crate are the representation of formal parameters and the fact that the \DIFdelbeginFL \DIFdelFL{application }\DIFdelendFL \DIFaddbeginFL \DIFaddFL{workflow }\DIFaddendFL is expected to be an entity with types \DIFaddbeginFL \termsorg{MediaObject} \DIFaddFL{(}\DIFaddendFL \emph{File} \DIFaddbeginFL \DIFaddFL{in RO-Crate JSON-LD)}\DIFaddendFL , \DIFdelbeginFL \emph{\DIFdelFL{SoftwareSourceCode}} %DIFAUXCMD
\DIFdelendFL \DIFaddbeginFL \termsorg{SoftwareSourceCode} \DIFaddendFL and \DIFdelbeginFL \emph{\DIFdelFL{ComputationalWorkflow}}%DIFAUXCMD
\DIFdelendFL \DIFaddbeginFL \termbioschemas{ComputationalWorkflow}\DIFaddendFL .
Effectively, the \DIFdelbeginFL \DIFdelFL{entity }\DIFdelendFL \DIFaddbeginFL \DIFaddFL{workflow }\DIFaddendFL belongs to all three types, and its properties are the union of the properties of the individual types.
\DIFaddbeginFL \DIFaddFL{In this profile, the execution history (retrospective provenance) is augmented by a (prospective) workflow definition, giving a high-level overview of the workflow and its input and output parameter definitions (}\termbioschemas{FormalParameter}\DIFaddFL{). }\DIFaddendFL The \DIFaddbeginFL \DIFaddFL{inner structure of the workflow is not represented in this profile.
In the provenance part, individual files (}\termsorg{MediaObject}\DIFaddFL{) or arguments (}\termsorg{PropertyValue}\DIFaddFL{) are then connected to the parameters they realise. Most workflow systems can consume and produce multiple files, and this mechanism helps to declare each file's role in the workflow execution.
The }\DIFaddendFL filled diamond $\blacklozenge$ indicates composition, empty diamond $\Diamond$ aggregation, and other arrows relations.
%DIF > Prefixes and namespaces are
%DIF > \emph{s:} \url{https://schema.org/}\hspace{1ex}
%DIF > \emph{bioschemas:} \url{https://bioschemas.org/}\hspace{1ex}
%DIF > \emph{bsp:} \url{https://bioschemas.org/properties/}
%DIF > DG: altrady added in the table before figs, I think it's not needed to add them again
}
\label{fig:workflow_crate_er}
\end{figure}
\subsection{Provenance Run Crate}\label{provenance-run-crate}
The Provenance Run Crate profile\DIFdelbegin \DIFdel{\mbox{%DIFAUXCMD
\cite{WRROC 2023c} }\hskip0pt%DIFAUXCMD
}\DIFdelend \DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{WRROC 2024c} }\hskip0pt%DIFAUXCMD
}\DIFaddend extends Workflow Run Crate by adding new concepts to describe the internal details of a workflow run, including individual tool executions, intermediate outputs and related parameters.
Individual tool executions are represented by additional \DIFdelbegin \emph{\DIFdel{CreateAction}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \termsorg{CreateAction} \DIFaddend instances that refer to the tool itself via \DIFdelbegin \emph{\DIFdel{instrument}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \termsorg{instrument} \DIFaddend -- analogously to its use in Process Run Crate.
The workflow is required to refer to the tools it orchestrates through the \DIFdelbegin \emph{\DIFdel{hasPart}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \termsorg{hasPart} \DIFaddend property, as suggested in the Bioschemas ComputationalWorkflow profile, though in the latter it is only a recommendation.
To represent the logical steps defined by the workflow, this profile uses \DIFdelbegin \emph{\DIFdel{HowToStep}} %DIFAUXCMD
\DIFdel{i.e., ``}\DIFdelend \DIFaddbegin \termsorg{HowToStep} \DIFadd{-- i.e., “}\DIFaddend A step in the instructions for how to achieve a result\DIFdelbegin \DIFdel{'' (}%DIFDELCMD < \url{https://schema.org/HowToStep}%%%
\DIFdel{)}\DIFdelend \DIFaddbegin \DIFadd{”~\mbox{%DIFAUXCMD
\cite{howtostep-def}}\hskip0pt%DIFAUXCMD
}\DIFaddend .
Steps point to the corresponding tools via the \DIFdelbegin \emph{\DIFdel{workExample}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \termsorg{workExample} \DIFaddend property and are referenced from the workflow via the \DIFdelbegin \emph{\DIFdel{step}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \termsorg{step} \DIFaddend property; the execution of a step is represented by a \DIFdelbegin \emph{\DIFdel{ControlAction}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \termsorg{ControlAction} \DIFaddend pointing to the
\DIFdelbegin \emph{\DIFdel{HowToStep}} %DIFAUXCMD
\DIFdel{via }\emph{\DIFdel{instrument}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \termsorg{HowToStep} \DIFadd{via }\termsorg{instrument} \DIFaddend and to the \DIFdelbegin \emph{\DIFdel{CreateAction}}
%DIFAUXCMD
\DIFdel{instance(s) }\DIFdelend \DIFaddbegin \termsorg{CreateAction}
\DIFadd{entities }\DIFaddend that represent the corresponding tool execution(s) via
\DIFdelbegin \emph{\DIFdel{object}}%DIFAUXCMD
\DIFdelend \DIFaddbegin \termsorg{object}\DIFaddend .
Note that a step execution does not coincide with a tool execution: an example where this distinction is apparent is when a step maps to multiple executions of the same tool over a list of inputs (e.g. the ``scattering'' feature in CWL).
An RO-Crate following this profile can also represent the execution of the WMS itself (e.g., cwltool) via
\DIFdelbegin \emph{\DIFdel{OrganizeAction}}%DIFAUXCMD
\DIFdelend \DIFaddbegin \termsorg{OrganizeAction}\DIFaddend , pointing to a representation of the WMS via
\DIFdelbegin \emph{\DIFdel{instrument}}%DIFAUXCMD
\DIFdelend \DIFaddbegin \termsorg{instrument}\DIFaddend , to the steps via \DIFdelbegin \emph{\DIFdel{object}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \termsorg{object} \DIFaddend and to the workflow run via \DIFdelbegin \emph{\DIFdel{result}}%DIFAUXCMD
\DIFdel{.
The }\emph{\DIFdel{object}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \termsorg{result}\DIFadd{.
The }\termsorg{object} \DIFaddend attribute of the
\DIFdelbegin \emph{\DIFdel{OrganizeAction}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \termsorg{OrganizeAction} \DIFaddend can additionally point to a configuration file containing a description of the settings that affected the behaviour of the WMS during the execution.
\DIFdelbegin %DIFDELCMD <
%DIFDELCMD < %%%
\DIFdelend %DIF >
Fig~\ref{fig:provenance_crate_er} \DIFdelbegin \DIFdel{shows the various entities }\DIFdelend \DIFaddbegin \DIFadd{illustrates the various classes }\DIFaddend involved in the representation of a workflow run via Provenance Run Crate together with their relationships.
\DIFdelbegin %DIFDELCMD < \begin{figure}[!h]
%DIFDELCMD < %%%
%DIF < \includegraphics[width=21em]{image4.png}
%DIF < \includegraphics[width=\textwidth]{wrroc-figure3.drawio.pdf}
\DIFdelendFL \DIFaddbeginFL \begin{figure}[!htb]
%DIF > figure-provenance-rc-uml
\DIFaddendFL %\includegraphics[width=\textwidth]{Fig4.eps}
\caption{{\bf UML class diagram for Provenance Run Crate.}
In addition to the workflow run, this profile represents the execution of individual steps and their related tools.
\DIFaddbeginFL \DIFaddFL{The prospective side (the execution plan) is shown by the workflow listing a series of }\termsorg{HowToStep}\DIFaddFL{s, each linking to the }\termsorg{SoftwareApplication} \DIFaddFL{that is to be executed. The }\termbsp{input} \DIFaddFL{and }\termbsp{output} \DIFaddFL{parameters for each tool are described in a similar way to the overall workflow parameter in Fig~\ref{fig:workflow_crate_er}.
The retrospective provenance side of this profile includes each tool execution as an additional }\termsorg{CreateAction} \DIFaddFL{with similar mapping to the realised parameters as }\termsorg{MediaObject} \DIFaddFL{or }\termsorg{PropertyValue}\DIFaddFL{, allowing intermediate values to be included in the RO-Crate even if they are not workflow outputs.
The workflow execution is described the same as in the Workflow Run Crate profile with an overall }\termsorg{CreateAction} \DIFaddFL{(the workflow outputs will typically also appear as outputs from inner tool executions). An additional }\termsorg{OrganizeAction} \DIFaddFL{represents the workflow engine execution, which orchestrated the steps from the workflow plan through corresponding }\termsorg{ControlAction}\DIFaddFL{s that spawned the tool's execution (}\termsorg{CreateAction}\DIFaddFL{). It is possible that a single workflow step had multiple such executions (e.g. array iterations). Not shown in figure: }\termsorg{actionStatus} \DIFaddFL{and }\termsorg{error} \DIFaddFL{to indicate step/workflow execution status.
The filled diamond $\blacklozenge$ indicates composition, empty diamond $\Diamond$ aggregation, and other arrows relations.
%DIF > Prefixes and namespaces are
%DIF > \emph{s:} \url{https://schema.org/}\hspace{1ex}
%DIF > \emph{bioschemas:} \url{https://bioschemas.org/}\hspace{1ex}
%DIF > \emph{bsp:} \url{https://bioschemas.org/properties/}
}\DIFaddendFL }
\label{fig:provenance_crate_er}
\end{figure}
\DIFdelbegin \DIFdel{This profile also includes specifications on }\DIFdelend \DIFaddbegin \DIFadd{Additionally, this profile specifies }\DIFaddend how to describe connections between parameters\DIFdelbegin \DIFdel{.
Parameter connections }\DIFdelend \DIFaddbegin \DIFadd{,
through }\textit{\DIFadd{parameter connections}} \DIFaddend -- a fundamental feature of computational workflows\DIFdelbegin \DIFdel{-- describe}\DIFdelend \DIFaddbegin \DIFadd{.
Specifically, parameter connections describe: }\DIFaddend (i) how tools \DIFdelbegin \DIFdel{take }\DIFdelend \DIFaddbegin \DIFadd{consume }\DIFaddend as input the intermediate outputs generated by other tools\DIFaddbegin \DIFadd{; }\DIFaddend and (ii) how workflow-level parameters are mapped to tool-level parameters.
\DIFdelbegin \DIFdel{For instance}\DIFdelend \DIFaddbegin \DIFadd{As an example}\DIFaddend , consider again the workflow depicted in Fig\DIFdelbegin \DIFdel{. }\DIFdelend ~\ref{fig:head_sort},
and suppose it is implemented in a workflow language such as CWL\DIFdelbegin \DIFdel{. The }\DIFdelend \DIFaddbegin \DIFadd{: the }\DIFaddend workflow-level input (a text file) is \DIFdelbegin \DIFdel{connected }\DIFdelend \DIFaddbegin \DIFadd{linked through a parameter connection }\DIFaddend to the input of the \DIFdelbegin \DIFdel{``head'' }\DIFdelend \DIFaddbegin \texttt{\DIFadd{head}} \DIFaddend tool wrapper, and \DIFdelbegin \DIFdel{the output of the latter is connected }\DIFdelend \DIFaddbegin \DIFadd{then a second parameter connection links this tool's output }\DIFaddend to the input of the \DIFdelbegin \DIFdel{``sort'' }\DIFdelend \DIFaddbegin \texttt{\DIFadd{sort}} \DIFaddend tool wrapper.
\DIFdelbegin %DIFDELCMD <
%DIFDELCMD < %%%
\DIFdelend %DIF >
A representation of parameter connections is particularly useful for traceability, since it \DIFdelbegin \DIFdel{allows }\DIFdelend \DIFaddbegin \DIFadd{provides the means }\DIFaddend to document the inputs and tools on which workflow outputs depend.
Since the current RO-Crate context has no suitable terms for the description of such relationships,
we added appropriate ones to the aforementioned \DIFdelbegin \DIFdel{``workflow-run'' context extension (the }%DIFDELCMD < \url{https://w3id.org/ro/terms/workflow-run\#} %%%
\DIFdel{namespace):
a }\emph{\DIFdel{ParameterConnection}} %DIFAUXCMD
\DIFdel{type with
}\emph{\DIFdel{sourceParameter}} %DIFAUXCMD
\DIFdel{and }\emph{\DIFdel{targetParameter}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \DIFadd{dedicated term set~\mbox{%DIFAUXCMD
\cite{wrroc-terms}}\hskip0pt%DIFAUXCMD
:
a }\termwfrun{ParameterConnection} \DIFadd{type with
}\termwfrun{sourceParameter} \DIFadd{and }\termwfrun{targetParameter} \DIFaddend attributes that respectively map to the source and target formal parameters, and a
\DIFdelbegin \emph{\DIFdel{connection}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \termwfrun{connection} \DIFaddend property to link from the relevant step or workflow to the \DIFdelbegin \emph{\DIFdel{ParameterConnection}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \termwfrun{ParameterConnection} \DIFaddend instances.
\DIFdelbegin \DIFdel{This profile }\DIFdelend \DIFaddbegin \DIFadd{In our set of profiles, Provenance Run Crate }\DIFaddend is the most detailed \DIFdelbegin \DIFdel{of the three, }\DIFdelend \DIFaddbegin \DIFadd{one }\DIFaddend and offers the highest level of granularity\DIFdelbegin \DIFdel{. Fig.~\ref{fig:profile_venn}shows the relationship between the specifications of the profiles as a Venn diagram. }\DIFdelend \DIFaddbegin \DIFadd{; its specification is a superset of Workflow Run RO-Crate, which in turn is a superset of Process Run Crate. This relationship between the three profiles is illustrated in Fig~\ref{fig:profile_venn}, as a Venn diagram.
Theoretically, all computational provenance information could be represented through the Provenance Run Crate profile alone (possibly relaxing some requirements), since it inherits from the other ones. In practice, though, this choice would require the use of the most complex model even for simple use cases. Having three separate profiles provides a way to represent information at different levels of granularity, while keeping all RO-Crates generated with them interoperable. This approach gives a straightforward path to supporting the representation of computational provenance in simpler use cases such as with simple command executions, i.e. the Process Run Crate. Additionally, the approach lowers the accessibility barrier for implementation in WMSs, as developers may choose to initially implement only the more basic support in their WMS, with reduced effort and complexity, and gradually scale to more detailed representations. This encourages the adoption of WRROC across the diverse landscape of use cases and WMSs.
}\DIFaddend
\DIFdelbegin %DIFDELCMD < \begin{figure}[!h]
%DIFDELCMD < %%%
%DIF < \includegraphics[width=21em]{venn.png}
%DIF < \includegraphics[width=26em]{wrroc-venn.drawio.pdf}
\DIFdelendFL \DIFaddbeginFL \begin{figure}[htb]
%DIF > figure-venn.eps
\DIFaddendFL %\includegraphics[width=26em]{Fig5.eps}
\caption{{\bf Venn diagram of the specifications for the various RO-Crate profiles.}
\DIFaddbeginFL \DIFaddFL{Process Run Crate specifies how to describe the fundamental classes involved in a computational run, and thus is the basis for all profiles in the WRROC collection.
}\DIFaddendFL Workflow Run Crate inherits the specifications of both Process Run Crate and Workflow RO-Crate. Provenance Run Crate, in turn, inherits the specifications of Workflow Run Crate \DIFaddbeginFL \DIFaddFL{(and in a sense includes multiple Process Runs for each step execution, but within a single Crate)}\DIFaddendFL .
}
\label{fig:profile_venn}
\end{figure}
\DIFaddbegin \subsection{\DIFadd{Profile formats}}\label{profile-formats}
\DIFadd{The WRROC profiles are available both in human-readable (HTML) and in machine-readable format (RO-Crate). The human-readable profiles are at:
%DIF >
}\begin{itemize}
\item \url{https://w3id.org/ro/wfrun/process/0.5}
\item \url{https://w3id.org/ro/wfrun/workflow/0.5}
\item \url{https://w3id.org/ro/wfrun/provenance/0.5}
\end{itemize}
%DIF >
\DIFadd{And the corresponding machine-readable ones at:
%DIF >
}\begin{itemize}
\item \url{https://doi.org/10.5281/zenodo.12158562}
\item \url{https://doi.org/10.5281/zenodo.12159311}
\item \url{https://doi.org/10.5281/zenodo.12160782}
\end{itemize}
%DIF >
\DIFadd{The RO-Crate metadata files for the machine readable profiles can be retrieved using the same URLs as the human-readable ones, but with JSON-LD content negotiation: this is done by setting }\texttt{\DIFadd{"Accept:application/ld+json"}} \DIFadd{in the HTTP header.
}
\DIFadd{The new terms we defined to represent concepts that could not be expressed with existing Schema.org ones are at:
%DIF >
}\begin{itemize}
\item \url{https://w3id.org/ro/terms/workflow-run}
\end{itemize}
%DIF >
\DIFadd{These terms are available in multiple formats with content negotiation, as explained at the above link.
}
\DIFaddend %%
\section{Implementations}\label{implementations}
Support for the Workflow Run RO-Crate profiles presented in this work has been implemented in a number of systems, showing support from the community and demonstrating their usability in practice.
We describe seven of these implementations (one in a conversion tool and six in WMS) in the following sections.
\DIFaddbegin \DIFadd{Table~\ref{implementation_summary_table} provides an overview of the implementations, along with the respective profile implemented, and links to the implementation itself and to an example RO-Crate.
%DIF >
}\DIFaddend These tools have been developed in parallel by different teams, and independently from each other.
RO-Crate has a strong ecosystem of tools\DIFaddbegin \DIFadd{~}\DIFaddend \cite{Soiland-Reyes 2022a}, and \DIFdelbegin \DIFdel{these }\DIFdelend \DIFaddbegin \DIFadd{the WRROC }\DIFaddend implementations have either re-used these or added their own approach to the standards.
\subsection{Runcrate}\label{runcrate}
Runcrate\DIFdelbegin \DIFdel{(}%DIFDELCMD < \url{https://github.com/ResearchObject/runcrate}%%%
\DIFdel{) }\DIFdelend \DIFaddbegin \DIFadd{~}\DIFaddend \cite{runcrate} is a Workflow Run RO-Crate toolkit which also serves as a reference implementation of the proposed profiles.
It consists of a Python package with a command line interface, providing a straightforward path to integration in Python software and other workflows.
The runcrate toolkit includes functionality to convert CWLProv ROs to RO-Crates conforming to the Provenance Run Crate profile (\DIFdelbegin \emph{\DIFdel{runcrate convert}}%DIFAUXCMD
\DIFdelend \DIFaddbegin \texttt{\DIFadd{runcrate convert}}\DIFaddend ), effectively providing an indirect implementation of the format for cwltool.
Indeed, the CWLProv model provided a basis for the Provenance Run Crate profile, and the implementation of a conversion tool in runcrate at times drove the improvement and extension of the profile as new requirements or gaps in the old designs emerged.
Runcrate converts both the retrospective provenance part of the CWLProv RO (the RDF graph of the workflow's execution) and the prospective provenance part (the CWL files, including the workflow itself).
Both parts are thus converted into a single, \DIFdelbegin \DIFdel{workflow language-agnostic }\DIFdelend \DIFaddbegin \DIFadd{workflow-language-agnostic }\DIFaddend metadata resource.
Another functionality offered by the runcrate package is \DIFdelbegin \emph{\DIFdel{runcrate report}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \texttt{\DIFadd{runcrate report}}\DIFadd{, }\DIFaddend which reports on the various executions described in an input RO-Crate, listing their starting and ending times, the values of the various parameters, etc.
Runcrate report demonstrates how the provenance profiles presented in this work enable comparison of runs interoperably across different workflow languages or different implementations of the same language.
This functionality has also been used as a lightweight validator for the various implementations.
\DIFdelbegin \DIFdel{We also added a }\emph{\DIFdel{run}} %DIFAUXCMD
\DIFdelend \DIFaddbegin \DIFadd{Runcrate also includes a }\texttt{\DIFadd{run}} \DIFaddend subcommand to re-execute the computation described by an input Workflow Run Crate or Provenance Run Crate where CWL \DIFdelbegin \DIFdel{was }\DIFdelend \DIFaddbegin \DIFadd{is }\DIFaddend used as a workflow language.
It works by mapping the RO-Crate description of input parameters and their values (the workflow's