-
Notifications
You must be signed in to change notification settings - Fork 1
/
dprod.html
1362 lines (1190 loc) · 58.5 KB
/
dprod.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html xmlns:Ownership="http://www.w3.org/1999/xhtml">
<head>
<meta charset='utf-8'>
<script src='https://www.w3.org/Tools/respec/respec-w3c' async class='remove'></script>
<script class='remove'>
// All config options at https://respec.org/docs/
var respecConfig = {
// Working Groups ids at https://respec.org/w3c/groups/
// group: "Semantic Data Products Working Group",
latestVersion: "https://ekgf.github.io/data-product-spec/dprod" ,
specStatus: "base",
editors: [
{
name: "Tony Seale",
url: "https://www.linkedin.com/in/tonyseale/",
},
{
name: "Natasa Varytimou",
url: "https://www.linkedin.com/in/natasavaritimou/",
},
{
name: "Andrea Gioia",
url: "https://www.linkedin.com/in/andreagioia/",
}
],
github: {
branch: "main",
repoURL: "https://github.com/EKGF/data-product",
},
//license: "w3c-software-doc",
logos: [ {
src: "./dprod.jpg",
alt: "DPROD",
width: 200,
},
],
};
</script>
</head>
<body>
<h1 id="title">Data Product Vocabulary (DPROD)</h1>
<section id='abstract'>
<p>
The <a href="https://ekgf.github.io/data-product-spec/dprod.ttl">Data Product (DPROD)</a> specification is a profile of the <a href="https://www.w3.org/TR/vocab-dcat-3/">Data Catalog (DCAT) Vocabulary</a>, designed to describe Data Products. This document defines the schema and provides examples for its use.
</p>
<p>
DPROD extends DCAT to enable publishers to describe Data Products and data services in a decentralized way. By using a standard model and vocabulary, DPROD facilitates the consumption and aggregation of metadata from multiple Data Marketplaces. This approach increases the discoverability of products and services, supports decentralized data publishing, and enables federated search across multiple sites using a uniform query mechanism and structure.
</p>
<p>
The namespace for DPROD terms is <a href="https://ekgf.github.io/data-product-spec/dprod">https://ekgf.github.io/data-product-spec/dprod</a>
</p>
<p>
The suggested prefix for the DPROD namespace is <code>dprod</code>
</p>
<br>
<p><strong>DPROD follows two basic principles:</strong></p>
<p>
🔵Decentralize Data Ownership: To make data integration more efficient, tasks should be shared among multiple teams. DCAT helps by offering a standard way to publish datasets in a decentralized manner.
</p>
<p>
🔵Harmonize Data Schemas: Using shared schemas helps unify different data formats. For instance, the <a href="https://ekgf.github.io/data-product-spec/dprod.ttl">DPROD</a> specification provides a common set of rules for defining a <a href="#dataproduct">Data Product</a>. You can extend this schema as needed.
</p>
<br>
The DPROD specification builds on DCAT by connecting <a href="https://www.w3.org/TR/vocab-dcat-3/#Class:Data_Service">DCAT Data Services</a> to DPROD <a href="#dataproduct">Data Products</a> using <a href="#inputport">Input</a> and <a href="#outputport">output</a> ports. These ports are used to publish and consume data from a Data Product. DPROD treats ports as <a href="https://www.w3.org/TR/vocab-dcat-3/#Class:Data_Service">dcat data services</a>, so the data exchanged can be described using DCAT's highly expressive metadata around <a href="https://www.w3.org/TR/vocab-dcat-3/#Class:Distribution">distributions</a> and <a href="https://www.w3.org/TR/vocab-dcat-3/#Class:Dataset">datasets</a>.
This approach also allows you to create your own descriptions for the data you are sharing. You can use a special property called <a href="https://www.w3.org/TR/vocab-dcat-3/#Property:resource_conforms_to">conformsTo</a> from DCAT to link to your own set of rules or guidelines for your data.
<br>
<p><strong>The DPROD specification has four main aims:</strong> </p>
<p>
🔵 To provide unambiguous and sharable semantics to answer the question: 'What is a <a href="#dataproduct">data product</a>?'
</p>
<p>
🔵 Be simple for anyone to use, but expressive enough to power large data marketplaces
</p>
<p>
🔵 Allow organisations to reuse their existing data catalogues and dataset infrastructure
</p>
<p>
🔵 To share common semantics across different Data Products and promote harmonisation
</p>
</section>
<section id="sotd" class="override">
<h2>Status of this document</h2>
<p>The current version is DRAFT. Feedback and comments welcome via the Github Issue feature. </p>
</section>
<section id='conformance' class="override">
<h2>Conformance</h2>
<p>As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this Profile are non-normative.
Everything else in this Profile is normative.</p>
<p>The keywords MAY, MUST, MUST NOT, RECOMMENDED, SHOULD, and SHOULD NOT are to be interpreted as described in [[!RFC2119]].
</section>
<section>
<h3>Normative namespaces</h3>
<p>Namespaces and prefixes used in normative parts of this Profile are shown in the following table.</p>
<table id="table-namespaces" class="simple">
<thead>
<tr>
<th>Prefix</th>
<th>Namespace IRI</th>
<th>Source</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<code>dcat</code>
</td>
<td>
<code>http://www.w3.org/ns/dcat#</code>
</td>
<td>[[VOCAB-DCAT-3]]</td>
</tr>
<tr>
<td>
<code>dct</code>
</td>
<td>
<code>http://purl.org/dc/terms/</code>
</td>
<td>[[DCTERMS]]</td>
</tr>
<tr>
<td>
<code>odrl</code>
</td>
<td>
<code>http://www.w3.org/ns/odrl/2/</code>
</td>
<td>[[ODRL-VOCAB]]</td>
</tr>
<tr>
<td>
<code>sdo</code>
</td>
<td>
<code>https://schema.org</code>
</td>
<td>[[SCHEMA-ORG]]</td>
</tr>
</tbody>
</table>
</section>
<section>
<h2>Data Product (DPROD) Model</h2>
<p>
<a href="https://en.wikipedia.org/wiki/Data_mesh">Data Mesh Architectures</a> use input and output ports to manage how data enters and leaves a Data Product. These ports can handle different formats, schemas, and protocols. Input ports bring in data, while output ports send data to other Data Products for aggregation, reuse, analysis or reporting etc.
</p>
<p>
In the Data Catalog Vocabulary (DCAT) framework, a <a href="https://www.w3.org/TR/vocab-dcat-3/#Class:DataService">Data Service</a> is a way to describe services that provide access to data. Data Services give standardized, machine-readable descriptions of how to access one or more datasets or data processing functions.
</p>
<p>
Data Services specify how to access and download the data. In DPROD Data Services are connected to <a href="https://www.w3.org/TR/vocab-dcat-3/#Class:Distribution">Distributions</a> by a property called <a href="#isaccessserviceof">isAccessServiceOf</a>, on the Distribution you can specify formats (like CSV or JSON etc) and provide metadata about the "physical model" of the data. Distributions link to <a href="https://www.w3.org/TR/vocab-dcat-3/#Class:Dataset">Datasets</a> and DCAT has a very rich vocabulary for describing every aspect of your dataset. Finally, Datasets use the <a href="https://www.w3.org/TR/vocab-dcat-3/#Property:resource_conforms_to">conformsTo</a> property to link to the "logical model" where you can specify rich semantic metadata of your own.
</p>
<p>
By linking Data Product ports to DCAT DataServices, DPROD can describe Data Products in a way that machines can read across the organization. This makes it easier for data teams to build and manage their own data products independently, while still working well with the rest of the organization's data.
</p>
<p>
Using standards like DCAT helps create a strong and clear way to define Data Products. It ensures that as data becomes more complex, the methods for describing, sharing, and using data stay consistent and reliable. It also allows different organizations to share data securely and in a standardized way.
</p>
<figure id="ProfileModel">
<img alt="Information model for the Profile" src="./dprod-model.png">
<figcaption>
Overview of DPROD model and its relationship with DCAT classes
</figcaption>
</figure>
</section>
<p>
The Profile consists of the following classes:
<UL>
<li> Data Mesh (<code>dcat:Catalog</code>) - The collection of Data Products </li>
<li> Data Product (<code>dprod:DataProduct</code>) - A data product may have input and output ports, code and metadata</li>
<li> Port (<code>dcat:DataService</code>) - A digital interface that provides access to a Dataset. The can be a HTTP URL, a Database or a FileShare etc</li>
<li> Distribution (<code>dcat:Distribution</code>) - A specific representation of a dataset (CSV, JSON, ADLS etc) which can conform to a physical model</li>
<li> Dataset (<code>dcat:Dataset</code>) - A collection of related data that can conform to a logical model</li>
</ul>
</p>
<p>
As <a href="https://www.w3.org/TR/vocab-dcat-3/#Class:Data_Service">DCAT Data Services</a>, the DPROD <a href="#inputport">input</a> and <a href="#outputport">output</a> ports can specify connection details, they have <a href="https://www.w3.org/TR/vocab-dcat-3/#Class:Distribution">distributions</a> that define formats, and link to <a href="https://www.w3.org/TR/vocab-dcat-3/#Class:Dataset">datasets</a> that conform to shared schemas. In this example, the UK Bonds Data Product includes an output port, which is a RESTful API. This API delivers JSON data conforming to the shared FIBO specification for callable bonds.
</p>
<pre id="eg12" class="example hljs json">
{
"@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld",
"id": "https://y.com/products/uk-bonds",
"type": "DataProduct",
"title": "UK Bonds",
"description": "UK Bonds is your one-stop-shop for all your bonds!",
"dataProductOwner": "https://www.linkedin.com/in/tonyseale/",
"lifecycleStatus" : "https://ekgf.github.io/data-product-spec/dprod/data/lifecycle-status/Consume",
"outputPort": {
"type": "DataService",
"endpointURL": "https://y.com/uk-10-year-bonds",
"isAccessServiceOf": {
"type": "Distribution",
"format": "https://www.iana.org/assignments/media-types/application/json",
"isDistributionOf": {
"type": "Dataset",
"id": "https://y.com/products/uk-bonds/datasets/10-year",
"conformsTo": "https://spec.edmcouncil.org/fibo/ontology/SEC/Debt/Bonds/CallableBond"
}
}
}
}
</pre>
<p class="note">
The examples in map the type of the above classes to <code>@type</code> in the JSON-LD serialisations. You can use JSON-LD to extend the familiar JSON syntax with the shared semantics defined by DCAT and DPROD.
<br/>
You can copy the json above and paste it into <a href="https://json-ld.org/playground/">https://json-ld.org/playground</a>. You can see that the schema resolves.
</p>
</section>
<section>
<h2>DataProduct</h2>
A data product is a rational, managed, and governed collection of data, with purpose, value and ownership, meeting consumer needs over a planned life-cycle.
<section>
<h2>label</h2>
The name given to the Data Product
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="http://www.w3.org/2000/01/rdf-schema#label">rdfs:label</a></td></tr>
<tr><th>Domain:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/DataProduct">dprod:DataProduct</a></td></tr>
<tr><th>Range:</th><td><a href="http://www.w3.org/2001/XMLSchema#string">xsd:string</a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>description</h2>
A free text description of the Data Product
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="http://purl.org/dc/terms/description">dcterms:description</a></td></tr>
<tr><th>Domain:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/DataProduct">dprod:DataProduct</a></td></tr>
<tr><th>Range:</th><td><a href="http://www.w3.org/2001/XMLSchema#string">xsd:string</a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>dataProductOwner</h2>
The Agent that is overall accountable for the data product. This includes managing the data product along its lifecycle ( creation, usage, versioning, deletion).
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/dataProductOwner">dprod:dataProductOwner</a></td></tr>
<tr><th>Label:</th><td>dataProductowner</td></tr>
<tr><th>Domain:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/DataProduct">dprod:DataProduct</a></td></tr>
<tr><th>Range:</th><td><a href="http://xmlns.com/foaf/0.1/Agent">foaf:Agent</a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>domain</h2>
The business or information area supported by the data product.
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/domain">dprod:domain</a></td></tr>
<tr><th>Comment:</th><td>The domain is intended to be a resource in its own right. This specification does not constrain the class to be used.</td></tr>
<tr><th>Domain:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/DataProduct">dprod:DataProduct</a></td></tr>
<tr><th>Range:</th><td><a href=""></a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>inputPort</h2>
an input port describes a set of services exposed by a data product to collect its source data and makes it available for further internal transformation. An input port can receive data from one or more upstream sources in a push (i.e. asynchronous subscription) or pop mode (i.e. synchronous query). Each data product may have one or more input ports
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/inputPort">dprod:inputPort</a></td></tr>
<tr><th>Label:</th><td>inputPort</td></tr>
<tr><th>Domain:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/DataProduct">dprod:DataProduct</a></td></tr>
<tr><th>Range:</th><td><a href="http://www.w3.org/ns/dcat#DataService">dcat:DataService</a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>outputPort</h2>
an output port describes a set of services exposed by a data product to share the generated data in a way that can be understood and trusted
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/outputPort">dprod:outputPort</a></td></tr>
<tr><th>Label:</th><td>outputPort</td></tr>
<tr><th>Domain:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/DataProduct">dprod:DataProduct</a></td></tr>
<tr><th>Range:</th><td><a href="http://www.w3.org/ns/dcat#DataService">dcat:DataService</a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>inputDataset</h2>
the source data made available to the data product through input data services. Depending on the lifecycle of the data product, this may be a stated or inferred relationship aligned with the input ports
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/inputDataset">dprod:inputDataset</a></td></tr>
<tr><th>Label:</th><td>input Dataset</td></tr>
<tr><th>Domain:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/DataProduct">dprod:DataProduct</a></td></tr>
<tr><th>Range:</th><td><a href="http://www.w3.org/ns/dcat#Dataset">dcat:Dataset</a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>outputDataset</h2>
the data that is exposed by the data product through output data services in a way that can be understood and trusted. Depending on the lifecycle of the data product, this may be a stated or inferred relationship aligned with the output ports
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/outputDataset">dprod:outputDataset</a></td></tr>
<tr><th>Label:</th><td>output Dataset</td></tr>
<tr><th>Domain:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/DataProduct">dprod:DataProduct</a></td></tr>
<tr><th>Range:</th><td><a href="http://www.w3.org/ns/dcat#Dataset">dcat:Dataset</a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>purpose</h2>
A description of the objectives and intended usage of the data product.
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/purpose">dprod:purpose</a></td></tr>
<tr><th>Domain:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/DataProduct">dprod:DataProduct</a></td></tr>
<tr><th>Range:</th><td><a href="http://www.w3.org/2001/XMLSchema#string">xsd:string</a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>hasPolicy</h2>
An ODRL conformant policy expressing the rights associated with the data product. This is an inferred relationship based on the rights expressed on the individual datasets of the data product
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="http://www.w3.org/ns/odrl/2/hasPolicy">odrl:hasPolicy</a></td></tr>
<tr><th>Domain:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/DataProduct">dprod:DataProduct</a></td></tr>
<tr><th>Range:</th><td><a href="http://www.w3.org/ns/odrl/2/Policy">odrl:Policy</a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>lifecycleStatus</h2>
The lifecycle status of the Data Product taken from a control list ( Ideation, Design, Build, Deploy, Consume ).
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/lifecycleStatus">dprod:lifecycleStatus</a></td></tr>
<tr><th>Label:</th><td>lifecycleStatus</td></tr>
<tr><th>Domain:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/DataProduct">dprod:DataProduct</a></td></tr>
<tr><th>Range:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/DataProductLifecycleStatus">dprod:DataProductLifecycleStatus</a></td></tr>
</tbody>
</table>
</section>
</section>
<section>
<h2>DataService</h2>
A collection of operations that provides access to one or more datasets or data processing functions.
<section>
<h2>isAccessServiceOf</h2>
The dataset distribution that is being offered through this Data Service
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/isAccessServiceOf">dprod:isAccessServiceOf</a></td></tr>
<tr><th>Label:</th><td>is Access Service Of</td></tr>
<tr><th>Domain:</th><td><a href="http://www.w3.org/ns/dcat#DataService">dcat:DataService</a></td></tr>
<tr><th>Range:</th><td><a href="http://www.w3.org/ns/dcat#Distribution">dcat:Distribution</a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>protocol</h2>
A protocol (possibly one of many options) used to communicate with this Data Service
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/protocol">dprod:protocol</a></td></tr>
<tr><th>Domain:</th><td><a href="http://www.w3.org/ns/dcat#DataService">dcat:DataService</a></td></tr>
<tr><th>Range:</th><td><a href="http://www.w3.org/ns/dcat#Protocol">dcat:Protocol</a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>securitySchemaType</h2>
The security schema type used for authentication and to communication with this Data Service
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/securitySchemaType">dprod:securitySchemaType</a></td></tr>
<tr><th>Domain:</th><td><a href="http://www.w3.org/ns/dcat#DataService">dcat:DataService</a></td></tr>
<tr><th>Range:</th><td><a href="http://www.w3.org/ns/dcat#SecuritySchemaType">dcat:SecuritySchemaType</a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>endpointURL</h2>
The root location or primary endpoint of the service
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="http://www.w3.org/ns/dcat#endpointURL">dcat:endpointURL</a></td></tr>
<tr><th>Domain:</th><td><a href="http://www.w3.org/ns/dcat#DataService">dcat:DataService</a></td></tr>
<tr><th>Range:</th><td><a href=""></a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>endpointDescription</h2>
A description of the services available via the end-points, including their operations, parameters etc
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="http://www.w3.org/ns/dcat#endpointDescription">dcat:endpointDescription</a></td></tr>
<tr><th>Domain:</th><td><a href="http://www.w3.org/ns/dcat#DataService">dcat:DataService</a></td></tr>
<tr><th>Range:</th><td><a href=""></a></td></tr>
</tbody>
</table>
</section>
</section>
<section>
<h2>Distribution</h2>
A specific representation of a dataset. A dataset might be available in multiple serializations that may differ in various ways, including natural language, media-type or format, schematic organization, temporal and spatial resolution, level of detail or profiles (which might specify any or all of the above).
<section>
<h2>accessService</h2>
A data service that gives access to the distribution of the dataset
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="http://www.w3.org/ns/dcat#accessService">dcat:accessService</a></td></tr>
<tr><th>Domain:</th><td><a href="http://www.w3.org/ns/dcat#Distribution">dcat:Distribution</a></td></tr>
<tr><th>Range:</th><td><a href="http://www.w3.org/ns/dcat#DataService">dcat:DataService</a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>conformsTo</h2>
The schema that the distribution conforms to that is format and technology dependent
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="http://purl.org/dc/terms/conformsTo">dcterms:conformsTo</a></td></tr>
<tr><th>Domain:</th><td><a href="http://www.w3.org/ns/dcat#Distribution">dcat:Distribution</a></td></tr>
<tr><th>Range:</th><td><a href=""></a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>isDistributionOf</h2>
The dataset that this distribution makes available
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/isDistributionOf">dprod:isDistributionOf</a></td></tr>
<tr><th>Label:</th><td>isDistributionOf</td></tr>
<tr><th>Domain:</th><td><a href="http://www.w3.org/ns/dcat#Distribution">dcat:Distribution</a></td></tr>
<tr><th>Range:</th><td><a href="http://www.w3.org/ns/dcat#Dataset">dcat:Dataset</a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>format</h2>
The file format of the distribution
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="http://purl.org/dc/terms/format">dcterms:format</a></td></tr>
<tr><th>Domain:</th><td><a href="http://www.w3.org/ns/dcat#Distribution">dcat:Distribution</a></td></tr>
<tr><th>Range:</th><td><a href=""></a></td></tr>
</tbody>
</table>
</section>
</section>
<section>
<h2>Dataset</h2>
A collection of data, published or curated by a single agent, and available for access or download in one or more representations.
<section>
<h2>label</h2>
The name given to the dataset
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="http://www.w3.org/2000/01/rdf-schema#label">rdfs:label</a></td></tr>
<tr><th>Domain:</th><td><a href="http://www.w3.org/ns/dcat#Dataset">dcat:Dataset</a></td></tr>
<tr><th>Range:</th><td><a href="http://www.w3.org/2001/XMLSchema#string">xsd:string</a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>description</h2>
Free text description of the dataset
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="http://purl.org/dc/terms/description">dcterms:description</a></td></tr>
<tr><th>Domain:</th><td><a href="http://www.w3.org/ns/dcat#Dataset">dcat:Dataset</a></td></tr>
<tr><th>Range:</th><td><a href="http://www.w3.org/2001/XMLSchema#string">xsd:string</a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>type</h2>
The type or genre of the Dataset
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="http://purl.org/dc/terms/type">dcterms:type</a></td></tr>
<tr><th>Domain:</th><td><a href="http://www.w3.org/ns/dcat#Dataset">dcat:Dataset</a></td></tr>
<tr><th>Range:</th><td><a href=""></a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>distribution</h2>
An available distribution of the dataset
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="http://www.w3.org/ns/dcat#distribution">dcat:distribution</a></td></tr>
<tr><th>Domain:</th><td><a href="http://www.w3.org/ns/dcat#Dataset">dcat:Dataset</a></td></tr>
<tr><th>Range:</th><td><a href="http://www.w3.org/ns/dcat#Distribution">dcat:Distribution</a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>conformsTo</h2>
A model, schema, ontology, view or profile that the dataset conformsTo
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="http://purl.org/dc/terms/conformsTo">dcterms:conformsTo</a></td></tr>
<tr><th>Domain:</th><td><a href="http://www.w3.org/ns/dcat#Dataset">dcat:Dataset</a></td></tr>
<tr><th>Range:</th><td><a href=""></a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>hasPolicy</h2>
An ODRL conformant policy expressing the rights associated with the resource
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="http://www.w3.org/ns/odrl/2/hasPolicy">odrl:hasPolicy</a></td></tr>
<tr><th>Domain:</th><td><a href="http://www.w3.org/ns/dcat#Dataset">dcat:Dataset</a></td></tr>
<tr><th>Range:</th><td><a href=""></a></td></tr>
</tbody>
</table>
</section>
<section>
<h2>informationSensitivityClassification</h2>
The relationship to a taxonomy that defines the different levels of control and protection that must be applied to the dataset. This is a more granular relationship of the classification of a dataset that includes other classification concepts
<table class="def propdef">
<tbody>
<tr><th>Identifier:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/informationSensitivityClassification">dprod:informationSensitivityClassification</a></td></tr>
<tr><th>Label:</th><td>information Sensitivity Classification</td></tr>
<tr><th>Domain:</th><td><a href="http://www.w3.org/ns/dcat#Dataset">dcat:Dataset</a></td></tr>
<tr><th>Range:</th><td><a href="https://ekgf.github.io/data-product-spec/dprod/InformationSensitivityClassification">dprod:InformationSensitivityClassification</a></td></tr>
</tbody>
</table>
</section>
</section>
<section>
<h2>SecuritySchemaType</h2>
A security schema type used for authentication and communication.
</section>
<section>
<h2>InformationSensitivityClassification</h2>
The shape of Information Sensitivity Classification as defined in the dprod schema
</section>
<section>
<h2>DataProductLifecycleStatus</h2>
The shape of Data Product Lifecycle Status
</section>
<section>
<h2>Protocol</h2>
A protocol, possibly including a specific version, used for communicating with a service
</section>
<section class="Examples">
<h2>Worked Examples</h2>
<p>Here are some worked examples of how to use DPROD for some common use cases</p>
<section>
<h2>Core Data Product Extensions</h2>
<p>For real world data products, the core data product details will be part of a wider set of metadata that allows the data and data product to be used effectively. Below is an example of extending the DPROD data product, specifically by adding an agreement to a data product.</p>
<p>In this example, a Data Product Agreement is defined as a subclass of FIBO Agreement. </p>
<p><em>Definition of a simple Agreement based on FIBO:</em></p>
<pre>
<code>
[
{
"@context": [
https://ekgf.github.io/data-product-spec/dprod.jsonld,
{
"fibo": http://spec.edmcouncil.org/fibo/ontology/FND/Agreements/MetadataFNDAgreements/#,
"ex": http://example.org/dp#
}
],
"@id": "ex:isSubjectToAgreement",
"@type": "rdf:Property",
"rdfs:label": "Data Product is Subject To FIBO Agreement",
"rdfs:domain": {
"@id": "DataProduct"
},
"rdfs:range": {
"@id": "DataProductAgreement"
}
},
{
"@id": "ex:DataProductAgreement",
"@type": "rdfs:class",
"rdfs:label": "DataProductAgreement",
"rdfs:subClassOf": {
"@id": "fibo:Agreement"
}
}
]
</code>
</pre>
<p>A full definition of agreements for data products is likely to be more complex than a single class and may use other information models or their profiles (such as ODRL Policy) or create dedicated definitions.</p>
<p>Below is an example of a Data Product with an associated Data Product Agreement with an effective date.</p>
<p><em>Using the agreement:</em></p>
<pre>
<code>
{
"@context": [
https://ekgf.github.io/data-product-spec/dprod.jsonld,
{
"fibo": http://spec.edmcouncil.org/fibo/ontology/FND/Agreements/MetadataFNDAgreements/#,
"ex": http://example.org/dp#
}
],
"dataProducts": [
{
"id": https://y.com/data-product/company-sales,
"type": "DataProduct",
"outputPort": {
"id": https://y.com/data-product/company-sales/port/2025-sales,
"type": "DataService",
"label": "Sales",
"endpointURL": https://y.com/data-product/company-sales/port/2025-sales,
"isAccessServiceOf": {
"type": "Distribution",
"format": https://www.iana.org/assignments/media-types/application/json,
"isDistributionOf": {
"type": "Dataset",
"label": "Sales",
"id": https://y.com/data-product/company-sales/dataset/2025-sales,
"conformsTo": https://y.com/schema/Sale
}
}
},
"ex:iSubjectToAgreement": {
"@id": "ex:VVSimpleAgreement",
"@type": "ex:DataProductagreement"
}
}
],
"agreements": [
{
"@id": "ex:VVSimpleAgreement",
"@type": "ex:DataProductAgreement",
"rdfs:label": "Very Simple Data Product Agreement",
"fibo:hasEffectiveDate": {
"@type": "xsd:date",
"@value": "2024-08-31"
}
}
]
}
</code>
</pre>
<pre class="example hljs json"></pre>
</section>
<section>
<h2>Data Lineage</h2>
<h2>Data Lineage</h2>
<p>It is important to be able to trace the lineage of data. Within DPROD, this can be done in two ways: at a high level from one data product to another and, if you want, at a more detailed level of the underlying datasets.</p>
<h3>High Level Lineage: Between Data Products</h3>
<p>Data products have input and output ports, and one data product’s input port will point to another data product’s output port.</p>
<p>This allows a user to query the lineage. The data products all have URLs as identifiers, and properties all connect to each other, so you can walk from one data product to the downstream data products that feed it.</p>
<p>You can follow the path that leads from one data product to another like this:</p>
<pre>
<code>
Data Product >> inputPort >> isAccessServiceOf >> isDistributionOf >> Input Data Product
</code>
</pre>
<p>Let's look at some example data with three data products that connect to each other through their input and output ports: </p>
<pre>
<code>
{
"@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld",
"dataProducts": [
{
"id": "https://y.com/data-product/company-finance",
"type": "DataProduct",
"inputPort": [
{
"id": "https://y.com/data-product/company-sales/port/2025-sales",
"type": "DataService"
},
{
"id": "https://y.com/data-product/company-hr/port/2025-payroll",
"type": "DataService"
}
],
"outputPort": {
"id": "https://y.com/data-product/company-sales/port/2025-balance-sheet",
"type": "DataService",
"label": "Balance Sheet",
"endpointURL": "https://y.com/data-product/company-sales/port/2025-c",
"isAccessServiceOf": {
"type": "Distribution",
"format": "https://www.iana.org/assignments/media-types/application/json",
"isDistributionOf": {
"type": "Dataset",
"id": "https://y.com/data-product/company-sales/dataset/2025-balance-sheet",
"conformsTo": "https://y.com/schema/BalanceSheet"
}
}
}
},
{
"id": "https://y.com/data-product/company-sales",
"type": "DataProduct",
"outputPort": {
"id": "https://y.com/data-product/company-sales/port/2025-sales",
"type": "DataService",
"label": "Sales",
"endpointURL": "https://y.com/data-product/company-sales/port/2025-sales",
"isAccessServiceOf": {
"type": "Distribution",
"format": "https://www.iana.org/assignments/media-types/application/json",
"isDistributionOf": {
"type": "Dataset",
"label": "Sales",
"id": "https://y.com/data-product/company-sales/dataset/2025-sales",
"conformsTo": "https://y.com/schema/Sale"
}
}
}
},
{
"id": "https://y.com/data-product/company-hr",
"type": "DataProduct",
"outputPort": {
"id": "https://y.com/data-product/company-sales/port/2025-payroll",
"type": "DataService",
"label": "Payroll",
"endpointURL": "https://y.com/data-product/company-hr/port/2025-payroll",
"isAccessServiceOf": {
"type": "Distribution",
"format": "https://www.iana.org/assignments/media-types/text/csv",
"isDistributionOf": {
"type": "Dataset",
"label": "Payroll",
"id": "https://y.com/data-product/company-sales/dataset/2025-payroll",
"conformsTo": "https://y.com/schema/Payroll"
}
}
}
}
]
}
</code>
</pre>
<p>Given this example data, if we started at the data product <code>https://y.com/data-product/company-finance</code>, we could walk the relationships to find the input data products that feed it:</p>
<pre>
<code>
https://y.com/data-product/company-finance >> :inputPort >> :isAccessServiceOf >> :isDistributionOf >> [https://y.com/data-product/company-sales , https://y.com/data-product/company-hr]
</code>
</pre>
<p>In Linked Data, we would actually do this with a query like this:</p>
<pre>
<code>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dprod: <https://ekgf.github.io/data-product-spec/dprod/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX : <https://y.com/data-product/>
SELECT DISTINCT ?input
WHERE
{
:company-finance dprod:inputPort ?inputPort.
?inputPort dprod:isAccessServiceOf/dprod:isDistributionOf/rdfs:label ?input.
}
</code>
</pre>
<h3>Detailed Level: Between Datasets</h3>
<p>If you wish to track lineage at a more granular level, you can also use PROV (https://www.w3.org/TR/prov-o/) at the dataset level.</p>
<pre>
<code>
dap:atnf-P366-2003SEPT
rdf:type dcat:Dataset ;
dcterms:bibliographicCitation "Burgay, M; McLaughlin, M; Kramer, M; Lyne, A; Joshi, B; Pearce, G; D'Amico, N; Possenti, A; Manchester, R; Camilo, F (2017): Parkes observations for project P366 semester 2003SEPT. v1. CSIRO. Data Collection. https://doi.org/10.4225/08/598dc08d07bb7" ;
dcterms:title "Parkes observations for project P366 semester 2003SEPT"@en ;
dcat:landingPage <https://data.csiro.au/dap/landingpage?pid=csiro:P366-2003SEPT> ;
prov:wasGeneratedBy dap:P366 ;
.
dap:P366
rdf:type prov:Activity ;
dcterms:type <http://dbpedia.org/resource/Observation> ;
prov:startedAtTime "2000-11-01"^^xsd:date ;
prov:used dap:Parkes-radio-telescope ;
prov:wasInformedBy dap:ATNF ;
rdfs:label "P366 - Parkes multibeam high-latitude pulsar survey"@en ;
rdfs:seeAlso <https://doi.org/10.1111/j.1365-2966.2006.10100.x> ;
.
</code>
</pre>
<p>See: https://www.w3.org/TR/vocab-dcat-3/#examples-dataset-provenance.</p>
<pre class="example hljs json"></pre>
</section>
<section>
<h2>Data Quality</h2>
<ul>
<li>DQV is a standard vocabulary to describe data quality and is proposed along with the DCAT vocabulary to measure the quality of Datasets.</li>
<li>It is based on 3 basic entities: Dimensions, the Metrics that belong to its dimension, and the Measurement for each Metric.</li>
<li>Data Quality can be computed in different levels, eg. data product, dataset, table, column level etc. </li>
<li>Usually, data quality metrics are measured on a Dataset level and can inform more high level quality metrics on a data product level.</li>
</ul>
<pre class="example hljs json">[
{
"@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld",
"id": "https://y.com/derived-quality-measurementA",
"@type": "QualityMeasurement",
"value": 1,
"computedOn": {
"@type": "DataProduct",
"@id": "https://y.com/products/uk-bonds"
},
"isMeasurementOf": {
"@type": "Metric",
"label": "Number of stale datasets"
}
}
,
{
"@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld",
"@id": "https://y.com/quality-measurement-B",
"@type": "QualityMeasurement",
"value": "false",
"computedOn": {
"@type": "Dataset",
"@id": "https://y.com/products/uk-bonds/yearlyPrices"
},
"isMeasurementOf": {
"@type": "Metric",
"label": "Expected distribution frequency achieved"
}
}
,
{
"@context": "https://ekgf.github.io/data-product-spec/dprod.jsonld",
"id": "https://y.com/products/uk-bonds",
"type": "DataProduct",
"outputPort": {
"type": "DataService",
"endpointURL": "https://y.com/uk-bonds/quality-report",
"isAccessServiceOf": {
"type": "Distribution",
"isDistributionOf": {
"type": "Dataset",
"conformsTo": "https://www.w3.org/TR/vocab-dqv/#dqv:QualityMeasurement"
}
}
}
}
]
</pre>
</section>
<section>
<h2>Data Rights</h2>
<p><a href="https://www.w3.org/TR/odrl-model/">ODRL</a> is a W3C standard to describe rights and entitlements</p>
<p>More specifically based on ODRL, data product and dataset publishers can describe the policies in a consistent, standard and machine-readable manner. Policies contain permissions and prohibitions on specific actions that are required to be met by stakeholders.</p>
<p>In addition, policies may be limited by constraints (eg. temporal or geographical constraints) and duties ( eg. payments) that may be imposed on the permissions.</p>
<p>Policies and their permitted or prohibited actions can be described on different levels, eg. a Policy can target a Data Product, a Dataset, a Data Service or even a Column.</p>
<p>Sophisticated engines should interpret and enforce the odrl policies on the appropriate level eg.:</p>
<pre>
<code>
examplePolicyA odrl:targets exampleProduct:ProductA.
examplePolicyB odrl:targets exampleDataset:DatasetA1
</code>
</pre>
<p>An example of a Policy follows, that describes permission to distribute the data only inside a specific region:</p>
<pre>
<code>