-
Notifications
You must be signed in to change notification settings - Fork 1
/
Vocabularies.tex
2060 lines (1663 loc) · 86.1 KB
/
Vocabularies.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[11pt,a4paper]{ivoa}
\input tthdefs
\input gitmeta
\usepackage{todonotes}
\lstloadlanguages{XML,python}
\lstset{flexiblecolumns=true,tagstyle=\ttfamily, showstringspaces=False,
basicstyle=\footnotesize}
\definecolor{termcolor}{rgb}{0.6,0.1,0.1}
\iftth
\def\vocterm#1{\emph{\color{termcolor}#1}}
\else
\def\vocterm{\startvocterm\realvocterm}
\def\realvocterm#1{\emph{\color{termcolor}#1}\endvocterm}
\begingroup
\gdef\breakablecolon{:\hskip0pt}
\catcode`\:=\active
\gdef\startvocterm{\begingroup
\catcode`\:=\active\let:=\breakablecolon}
\gdef\endvocterm{\endgroup}
\endgroup
\fi
\newcommand{\vepitem}[1]{\emph{#1}}
\title{Vocabularies in the VO}
% see ivoatexDoc for what group names to use here
\ivoagroup{Semantics}
\author[https://wiki.ivoa.net/twiki/bin/view/IVOA/MarkusDemleitner]{Markus
Demleitner}
\author[https://wiki.ivoa.net/twiki/bin/view/IVOA/NormanGray]{Norman
Gray}
\author[https://wiki.ivoa.net/twiki/bin/view/IVOA/MarkTaylor]{Mark
Taylor}
\editor{Markus Demleitner}
\previousversion[https://ivoa.net/documents/Vocabularies/20220801/]
{PR-20220801}
\previousversion[https://ivoa.net/documents/Vocabularies/20220516/]
{WD-20220516}
\previousversion[https://ivoa.net/documents/Vocabularies/20210525/]
{REC-2.0}
\previousversion[https://ivoa.net/documents/Vocabularies/20210114/]
{PR-20210114}
\previousversion[https://ivoa.net/documents/Vocabularies/20200612/]
{WD-20200612}
\previousversion[https://ivoa.net/documents/Vocabularies/20200326/]
{WD-20200326}
\previousversion[http://ivoa.net/documents/Vocabularies/20190905/]
{WD-20190905}
\begin{document}
\begin{abstract}
In this document, we discuss practices related to the use of RDF-based
consensus vocabularies in the Virtual Observatory, that is the creation,
publication, maintenance, and consumption of
hierarchical word lists agreed upon within the IVOA.
To cover the wide range of use cases envisoned, we define different
vocabulary types for informal knowledge organisation on the
one hand, and strict hierarchies of classes and properties on the other.
While the framework rests on the solid foundations of W3C RDF,
provisions are made to facilitate using IVOA vocabularies without
specific RDF tooling.
Non-normative appendices detail the current vocabulary-related tooling.
\end{abstract}
\section*{Acknowledgments}
While this is a complete rewrite of the specification of how vocabularies
are treated in the VO, we gratefully acknowlegde the groundbreaking work
of the authors of version 1 of Vocabulary in the VO, S\'ebastien
Derriere, Alasdair Gray, Norman Gray, Frederic Hessmann, Tony Linde,
Andrea Preite Martinez, Rob Seaman, and Brian Thomas.
In particular, the vocabulary for datalink semantics done by Norman Gray
was formative for many aspects of what is specified here.
\section*{Conformance-related definitions}
The words ``MUST'', ``SHALL'', ``SHOULD'', ``MAY'', ``RECOMMENDED'', and
``OPTIONAL'' (in upper or lower case) used in this document are to be
interpreted as described in IETF standard RFC2119 \citep{std:RFC2119}.
The \emph{Virtual Observatory (VO)} is a
general term for a collection of federated resources that can be used
to conduct astronomical research, education, and outreach.
The \href{http://www.ivoa.net}{International
Virtual Observatory Alliance (IVOA)} is a global
collaboration of separately funded projects to develop standards and
infrastructure that enable VO applications.
\section{Introduction}
The W3C's Resource Description Framework RDF \citep{note:rdfprimer} is a powerful
and very generic means to represent, transmit, and reason on highly
structured, ``semantic'' information. With both its power and
generality, however, comes a high complexity for consumers of this
information if no further conventions are in force. Also, the generic
W3C standards understandably do not cover how semantic resources (e.g.,
vocabularies or ontologies) are to be managed, let alone developed
within organisations like the IVOA.
For many applications, even within the VO, the significant
complexity and the lack of defined management processes is acceptable.
However, for several other use cases -- in particular those given in
sect.~\ref{sect:usecases} -- extra conventions
help with implementability and interoperability.
Based on requirements derived from these use cases
(sect.~\ref{sect:requirements}), this standard will therefore define
conventions for vocabularies based on either SKOS \citep{std:skos} or
RDFS \citep{std:rdfs} in
sect.~\ref{sect:voccontent}. Where these vocabularies -- and hence, in
particular, the permanent URIs of their RDF resources (``terms'')
-- are managed by the
IVOA, they need to be reviewed and consensus be found. A process to
ensure this is described in
sect.~\ref{sect:management}. In order
to provide certain guarantees to clients, sect.~\ref{sect:deployment}
defines minimal standards for how IVOA-managed vocabularies must be made
available. In order to help adopters simply looking for simple
vocabulary-related recipes, sect.~\ref{sect:withoutrdf} discusses how IVOA
vocabularies can be used without knowledge of RDF.
The non-normative appendices~\ref{app:tools} and \ref{app:curtech}
describe the tooling
currently used or recommended for building and managing vocabularies in the
IVOA.
\subsection{Role within the VO Architecture}
\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{role_diagram.pdf}
\caption{Architecture diagram for this document}
\label{fig:archdiag}
\end{figure}
Fig.~\ref{fig:archdiag} shows the role the Vocabularies in the VO standard
plays within the IVOA architecture \citep{2021ivoa.spec.1101D}.
This standard defines a set of conventions on procedures on
top of several W3C standards that can be adopted by other VO standards
that require interoperable, consensus vocabularies, such as:
\begin{bigdescription}
\item[Datalink \citep{2015ivoa.spec.0617D}] Datalink includes a
vocabulary letting clients work out the kind of artefact a row pertains
to.
\item[VOResource \citep{2018ivoa.spec.0625P}] VOResource 1.1 comes with
several (rather flat) vocabularies enumerating, for instance, the types
of relationships between VO resources, their intended audiences, or
classes of actions performed on them.
\item[VOEvent \citep{2011ivoa.spec.0711S}] VOEvent defines \emph{Why}
and \emph{What} elements. While their content is not formally required
to be drawn from a specific vocabulary in VOEvent's version
1.11, it certainly becomes significantly more useful if it is.
\item[VOTable \citep{2019ivoa.spec.1021O}] VOTable, in its version 1.4,
introduces vocabularies for time scales and reference positions.
\item[UCDs \citep{2005ivoa.spec.0819D}] UCDs are related to vocabularies in
that they provide machine-readable semantics. Because the terms listed
in the document can be combined and have an underlying grammar, however,
they go beyond standard RDF. Hence, no attempt is being made to
integrate them into the framework proposed here at this time. The
UCD atoms might be organised in an RDF vocabulary, though, and doing so
might be considered in the future.
\end{bigdescription}
Not all VO standards need these normative constraints though.
In situations when the use cases do not require extra management and definition,
or where more complex structures such as full ontologies are needed, it is
encouraged to use W3C standards without the extra requirements listed here.
An example for a direct use of SKOS
without adoption of the present document is the Simulation Data Model
SimDM \citep{2012ivoa.spec.0503L}, where several fields constrain their
values to be \vocterm{skos:narrower} than certain top-level concepts.
\subsection{Relationship to Vocabularies in the VO Version 1}
\label{sect:version1rel}
Published in 2009, version 1.19 of the IVOA Recommendation on
Vocabularies in the VO had an outlook fairly different from the present
document: the big use case was VOEvent's Why and What, and so its focus
was on large, general-purpose vocabularies, of which several existed even
back then. Meanwhile, an overhaul of a thesaurus of general astronomical
terms approved by the IAU in 1993 was underway as part of IVOA's
activities. Mapping between vocabularies maintained by different VO
and non-VO parties seemed to be the way to ensure interoperability and
therefore played a large role in the document. Also, the use cases
called for ``soft'' relations, which is why the standard confined itself
to SKOS as the vocabulary formalism.
In contrast, today ``the'' large astronomy thesaurus is being maintained
outside of the IVOA (the UAT\footnote{\url{http://astrothesaurus.org}}).
It seems likely that its takeup will be sufficient that general clients
will not have to map between it and, say, legacy journal keyword
systems.
Instead, in 2010, a fairly formal vocabulary of what
should be properties (in the RDF sense) rather than \vocterm{skos:Concept}-s
was required during the development of the datalink standard. The
vocabulary was (and still is) small in comparison to, say, the UAT. In
contrast to the expectations of Vocabularies~1, the plan had been that
most data providers would work with this small vocabulary, and terms
from external vocabularies would only be used as temporary stand-ins
until the consensus vocabulary was updated. Of course, this required a
process for managing such vocabularies. The lack of such a process
became even more noticeable when VOResource 1.1 and VOTable 1.4
introduced vocabularies of their own, similar in size and scope to the
datalink vocabulary.
On the other hand, we are not aware of a single attempt to map
between different vocabularies in a VO context, and the SKOS versions of
some vocabularies that Vocabularies 1 declared as normative in its
section~4 were largely unused and have been unmaintained for a while now.
Since large parts of the original specification turned out to be
irrelevant or unsustainable as the VO ecosystem evolved,
while some core requirements found later
were not addressed, it was decided to prepare a new major version of the
Vocabularies in the VO standard.
\subsection{Reading Guide}
We hope that software authors or annotators just wanting to consume IVOA
vocabularies, or use them to annotate documents, will be able to
do so after reading just section~\ref{sect:withoutrdf}. In particular, no
deeper understanding of RDF should be necessary.
Persons intending to participate in vocabulary evolution should skim
sect.~\ref{sect:voccontent}, in particular the subsection on the kind of
vocabulary they want to modify, and must study
sect.~\ref{sect:management}.
Readers unfamiliar with RDF should read \citet{local:normanspaper} before
reading anything outside of section~\ref{sect:withoutrdf}.
In particular, we assume familiarity with all RDF
terminology discussed there. Concepts not covered by Gray's
essay will be informally introduced here. Of course, the
underlying W3C standards are normative where applicable.
\subsection{Terminology, Conventions, Typography}
When we speak of \emph{term} here, that either means a \vocterm{skos:Concept}
in SKOS vocabularies, an \vocterm{rdfs:Class} in RDF class vocabularies,
or an \vocterm{rdf:Property} in RDF property vocabularies. We also use
\emph{term} for ``the string after the hash character in
the RDF resource URI'', i.e., the machine-readable string typically used
in annotation. It is rarely necessary to distinguish between the two
meanings.
We refer to classes and properties by CURIEs \citep{std:curie}, i.e.,
URIs shortened by replacing long strings with compact prefixes and a
colon. The prefixes in this
document correspond to the following base URIs:
\begin{compactitem}
\item dc -- \url{http://purl.org/dc/terms/}
\item rdf -- \url{http://www.w3.org/1999/02/22-rdf-syntax-ns#}
\item rdfs -- \url{http://www.w3.org/2000/01/rdf-schema#}
\item owl -- \url{http://www.w3.org/2002/07/owl#}
\item skos -- \url{http://www.w3.org/2004/02/skos/core#}
\item ivoasem -- \url{http://www.ivoa.net/rdf/ivoasem#}
\end{compactitem}
Vocabulary terms are written in italics (e.g., \vocterm{rdfs:Class})
and, where supported, in a reddish hue. As common in IVOA
specifications, XML element and attribute names are written in
typewriter italic (e.g., \xmlel{img}).
\section{Derivation of Requirements (Non-Normative)}
\subsection{Use Cases}
\label{sect:usecases}
The normative content of this document is guided by a set of
requirements derived from the following use cases.
\subsubsection{Controlled Vocabulary in VOResource}
\label{uc:simplevoc}
In VOResource, in certain use cases clients have to find services that
publish a given data collection. This is effected by linking the resource
records for service and data with a
DataCite-compatible \vocterm{isServedBy} relationship.
Its concrete literal needs to be reliably defined in order to let
clients find such relationships by a simple string comparison in RegTAP
queries.
A related use case is that validators can flag errors (or at least
warnings) when resource records use terms that are not part of some
controlled vocabulary (e.g., content levels or types of events in a
resource's history). Very typically, such out-of-vocabulary terms
indicate small oversights on the part of the resource record author that
will lead to hard-to-debug problems in data discovery.
\subsubsection{Controlled Vocabularies in VOTable}
\label{uc:votvoc}
VOTable 1.4 constrains two attributes of TIMESYS elements
-- reference positions and time
scales -- using vocabularies.
With time scales, the situation is not fundamentally
different from the VOResource case discussed in
use case~\ref{uc:simplevoc}: a simple enumeration of agreed-upon strings
is enough to uniquely determine what operations need to be performed to
combine times given in different time scales. With
reference positions, however, even if a client does
not exactly know the location of, say, the Hubble Space Telescope at any
given time, several important use cases can already be satisfied if a
client knows it is in lower Earth orbit (e.g., assuming a reference
position Geocenter and adjusting the systematic error estimates). For
this, a client needs information of the type ``\vocterm{HST}
\vocterm{is-close-to} \vocterm{GEOCENTER\/}'' (or similar).
There is also another difference between this and at least the
VOResource relationship vocabulary from use case~\ref{uc:simplevoc}.
The latter is property-like, as
in ``Resource-1 \vocterm{isServedBy} Resource-2\/''. In contrast with
this, a time scale would be used like ``Time-coordinate
\vocterm{is-given-in} \vocterm{TT\/}''. In RDFS terminology, time scales
are therefore better modelled as classes rather than properties.
\subsubsection{Datalink Link Selection}
\label{uc:links}
In Datalink, clients receive a set of links
to pieces of information (e.g., previews, additional metadata,
progenitors, or
derived data) and need to present to the user only those items
relevant to the task at hand. For instance, in a discovery phase, only
previews should be offered, while scientific exploitation would call for
cutout services, alternate formats, or derived data. For debugging,
progenitors should be made accessible, and so on.
Operators of datalink services, on the other hand, want to be precise in
their annotation of datasets. For instance, they may want to discern
between a dark frame and a flat field in calibration data.
Clients should, however, still be able to work out that both sorts of
artefacts are progenitors.
\subsubsection{VOEvent Filtering, Query Expansion}
\label{uc:filtering}
In VOEvent, an event stream can contain a classification of what the
observers believe was observed, for instance ``supernova Ia explosion''.
While an event stream from one project might provide a classification on
that level for some event, it might not (yet) be able to do that in
another event, and a different event stream might not be able to
distinguish between different sorts of supernovae at all.
In this situation, an event broker looking for supernovae of type Ia
will filter out anything not related to supernovae. However, since a Ia
supernova might be tagged only as ``supernova'',
it will want to widen its filter somewhat. Some backend process
might then prioritise events classified as Ia upstream over those only tagged
as a generic supernova, and those, again, over those tagged explicitly
as some different type of supernova.
Similar use cases exist, for instance, in the discovery of simulations
and possibly for subjects of VO resources.
\subsubsection{Vocabulary Updates in VOResource}
\label{uc:deprecation}
In VOResource 1.0 \citep{2008ivoa.spec.0222P}, relationship types
like \vocterm{served-by} or
\vocterm{service-for} were defined. Later, DataCite defined equivalent
terms \vocterm{IsServedBy} and \vocterm{IsServiceFor}. Arguably, the VO should,
as far as sensible, take up standards in the wider data management
community, and so VOResource 1.1 adopts the DataCite terms. In a minor
version, it cannot forbid the old terms. It can, however, say not only
``\vocterm{served-by\/} is the same as \vocterm{isServedBy\/}'' but also
``Use the latter term in preference to the former''. If this information is
available machine-readably, validators can warn against the use of
deprecated terms and user interfaces can transparently replace
deprecated terms with current ones. This latter use case is
already specified in RegTAP 1.1 \citep{2019ivoa.spec.1011D}.
Another use case in the context of VOResource and vocabulary updating
is the definition of content levels. In VOResource 1.0, a list of
terms was adopted that was far too fine-grained in the area of public
outreach, distinguishing, for instance, ``Middle School'' from
``Secondary Education''. While this granularity was useful for the
original realm of the list of terms, in the VO it resulted in extremely
inhomogeneous annotation. Obviously, persons employed in research
institutions can hardly be expected to assess needs and capabilities of
middle school versus elementary school educators. Eventually, for
VOResource 1.1 a three-term list was drawn up and is now actually used.
To avoid a repetition of such an experience, we want to enable small
initial vocabularies easily extendable as new terms are actually needed
and the use of the existing terms is well understood.
\subsubsection{Vocabularies in VO-DML}
The modelling language VO-DML \citep{2018ivoa.spec.0910L} lets model
designers constrain attribute values using external resources defined
through a vocabulary URI and possibly a top concept. The standard
mentions both SKOS -- inspired by version 1 of this document -- and RDFS
as possible technologies for such constraints.
Depending on the nature of the attributes constrained, modellers might
forsee the need for having these vocabularies managed by the IVOA. Of
course, that is up to the modeller: There are certainly many cases in
which there is no need for the overhead this specification brings with
it, be it because vocabularies are externally defined or because the
concrete application profits from less-constrained vocabularies.
\subsubsection{Discovering Meanings}
\label{uc:discovering}
Software developers or researchers want to work out
what some term mentioned ``means'' (where we are agnostic as to what
``means'' should mean here). If the term URI alone is insufficient,
they can simply paste the resource URI of the term into a web browser
and read (at least) its description and perhaps find out even more using
relationships between terms.
\subsubsection{Simple Review Process}
\label{uc:simplereview}
As vocabularies evolve, new terms are being added to
vocabularies. To facilitate their review and enable rapid uptake
of the proposed terms, it is desirable that new terms and even
new vocabularies are immediately visible to users and tools.
Note that since terms under review might be modified or removed later,
this use case is somewhat in conflict with the basic requirement
of stable vocabularies (i.e., a document valid once will not
become invalid later because of changes in vocabularies).
\subsubsection{Understanding Vocabulary Evolution}
\label{uc:understanding}
When a question comes up, such as what \vocterm{calibration} actually means
in the datalink core vocabulary, and the (legacy) description is not
sufficiently clear, people can go back to the discussions that led up
to the addition of that term. This will also help clarify existing
usage that might have begun at the time of the initial definition.
\subsubsection{Offline operation}
\label{uc:offline}
A system doing, say, coordinate transformations might run without an internet
connection but still needs to use semantic resources on frames and
reference positions (e.g., figure out that a given space probe is in L1
and use that as reference position). To do that, it wants to use a
previously downloaded copy of the vocabulary.
\subsubsection{UAT in VOResource}
\label{uc:uat}
VOResource 1.1, in the description of the \xmlel{subject} element, says
that its content ``should be drawn from the Unified Astronomy
Thesaurus''. This is intended to later facilitate interactive topic
navigation within the Registry or semantic expansion of Registry queries
(``include narrower terms'').
\subsection{Requirements}
\label{sect:requirements}
\subsubsection{Lists of Terms}
\label{req:lists}
We need to be able to represent simple lists of terms even for the most
basic use case~\ref{uc:simplevoc}. As per
use case~\ref{uc:votvoc}, we will have to represent instances of both
\vocterm{rdf:Property} and \vocterm{rdfs:Class} (though not necessarily
in one vocabulary). In order to not break existing practices (e.g.,
use cases \ref{uc:simplevoc}, \ref{uc:votvoc}, \ref{uc:links}), the
machine-readable terms must be allowed to follow existing patterns of
essentially human-readable identifiers (against external best practices
of using non-informative URI forms). In general, in essentially all use
cases discussed, making the machine-readable terms discernable by a
human is an advantage.
\subsubsection{Hierarchies of Terms}
\label{req:hierarchy}
Both use case~\ref{uc:links} and use case~\ref{uc:filtering} require a hierarchy
of terms, where clients can find wider and narrower terms
relative to an original one. There is a difference,
however: in the datalink use case, strict \vocterm{is-a} relationships
are what clients need (e.g., ``give me all kinds of previews''). In the
VOEvent case, however, a somewhat softer sort of hierarchy is required.
For instance, a filter for accretion disks might very well expand to
match both quasars and cataclysmic variables. Hence, we want to
be able to represent strict class hierarchies as well as thesaurus-like
soft knowledge structures.
\subsubsection{Tree-like Hierarchies}
\label{req:tree}
Where we expect some sort of semi-formal inference to take place on the
vocabularies, the hierarchy should be a tree in order to facilitate
traversal and controlled query expansion. In other words, outside of
SKOS we do not support multiple inheritance. Use cases requiring
something equivalent would have to resort to supporting multiple terms
on the annotation level.
\subsubsection{Consensus Vocabularies}
\label{req:consensus}
Essentially all our our use cases will be much easier to implement if
clients can work through simple string comparisons. Therefore,
wherever feasible IVOA standards should build on IVOA-sanctioned,
consensus vocabularies.
\subsubsection{Deprecating Terms}
\label{req:deprecating}
While we believe at this point that terms once approved by the IVOA
should never disappear -- for instance, because validators might
otherwise flag previously valid instance documents as invalid --, use
case~\ref{uc:deprecation} shows that some way of declaring
deprecations must be forseen.
\subsubsection{Public Availability of Machine-Readable Vocabularies}
\label{req:machine}
In particular in use cases~\ref{uc:links} and \ref{uc:filtering},
clients can flexibly incorporate vocabulary updates without code
changes, perhaps even without re-deployment, if vocabularies are
available at constant, public URIs. Using these, clients must be able to
retrieve vocabulary data in formats reasonably easy to parse.
Use case~\ref{uc:discovering} implies that at least one representation
of the vocabulary should be human-readable.
\subsubsection{Minimal Term Metadata}
\label{req:mtm}
To support use case~\ref{uc:discovering}, all terms in IVOA vocabularies
must come with a non-trivial description.
\subsubsection{Simple Cases do not Require RDF Tooling}
\label{req:nordf}
(Not derived from any specific use case). Since libraries implementing
(some subset of) RDF tend to be rather massive and thus appear
unproportional when all a client wants is an up-to date list of terms
with their descriptions, at least the basic use cases must not require
specific RDF tooling. Indeed, simple uses should not require an
understanding of RDF in the first place.
\subsubsection{Vocabulary Evolution}
\label{req:evolution}
Most use cases make it desirable that terms can be added to existing
vocabularies; this is very clear for the reference positions in
use case~\ref{uc:votvoc}, where new instruments would imply new
terms. The history of content level annotation in VOResource mentioned
in use case~\ref{uc:deprecation} illustrates the desirability of a
simple process that invites standard authors to start with minimal
vocabularies, relying on later extensions.
\subsubsection{Traceable Provenance}
\label{req:traceable}
To satisfy use case~\ref{uc:understanding}, the considerations that led
to the adoption or modification of a term must be documented publicly
in sufficient detail. It is clearly an advantage if a brief, accessible
summary of these considerations can easily be found without, say,
resorting to version control logs.
\subsubsection{Preliminary Vocabularies and Terms}
\label{req:preliminary}
In use case~\ref{uc:simplereview}, it is desirable to admit
``preliminary'' vocabularies and terms. For these, both humans
and machines must be able to discern a temporary status, and
their use implies that the general rule ``once valid, always
valid'' does not apply. Validators and similar software could
then add notices to that effect in their outputs.
\subsubsection{Vocabulary Files are Usable Stand-Alone}
\label{req:standalone}
Vocabulary files need to be cacheable without applications having to
manage extra metadata (e.g., the URL from which the file was obtained)
in order to easily satisfy use case~\ref{uc:offline} (or other scenarios
in which vocabulary content cannot be retrieved from the IVOA
site for each session).
\subsubsection{Externally Curated Vocabularies and VO Tooling}
\label{req:external}
Regrettably, VOResource does not explain how use case~\ref{uc:uat} would
look like in actual documents, and the example given in the document
clearly does not use UAT concepts.
The first difficulty in a straightforward uptake is that UAT URIs look
like \url{http://astrothesaurus.org/uat/1774}. Given that, should
publishers have such URIs in \xmlel{subject}? Or should they rather use
just the last URI segment for conciseness? Or perhaps the preferred
labels, in keeping with the style of existing subject content and its
use by clients (which typically look for natural language in subject),
even though the labels are not considered stable?
Regardless of how VOResource clarifies this matter, UAT artefacts (e.g.,
SKOS files) do not match some of our other requirements. In particular,
the human-readable URIs from \ref{req:lists}, the specific way we
satisfy \ref{req:machine}, and the non-RDF requirement \ref{req:nordf} are
not immediately satisfied by the UAT as distributed at the time of
writing.
For simple, uniform use of such externally curated vocabularies, it
should be possible to have some sort of endorsement process and then
distribute the vocabularies in a form compliant with this specification.
This will entail IVOA-specific concept URIs, and we must be able to
express that these resources have the same meaning as the ones
externally maintained.
\subsection{Non-Requirement}
This specification is not called ``Semantics in the VO'' or the like
because we do \emph{not} intend to prescribe ways to turn any VO
artefact into RDF triples\footnote{i.e., basic statements of the form
(subject, predicate, object) within the
RDF; see page 8 of \citet{local:normanspaper} for a less terse
definition.}.
Indeed, for many existing vocabularies, it
is left open what exactly the domain or range of properties might be or
what subject and predicate the classes or concepts should be used with.
This is partly because this would substantially complicate the
generation of vocabularies, which would quickly turn into proper
ontologies. Another consideration is that the information encoded by
triples generated in this way has traditionally been expressed
using techniques developed
by the Data Models working group in the VO.
In particular with a view to later use in linked data scenarios,
vocabulary authors should neverthess take care that, given appropriate
properties or annotation tools, the vocabularies \emph{could} be used in
meaningful RDF triples.
Conversely, this specification is written with future ``deeper''
semantics in the VO in mind; tools restricting their operations to the ones
discussed here should not break when future specifications enrich
existing vocabularies towards full ontologies.
\section{Using IVOA Vocabularies without RDF Tooling}
\label{sect:withoutrdf}
RDF is a
powerful system for expressing a wide range of semantics and enriching
various documents with semantic information in a globally distributed
fashion. Due to its generality, handling its artefacts is relatively
involved and in general requires special tooling, non-negligible
investment in understanding RDF, and non-trivial management of URIs and
prefix mappings.
To lower the bar for an adoption of IVOA vocabularies
[requirement~\ref{req:nordf}], they are given in
two formats usable without RDF tooling or, indeed, deeper knowledge of
RDF. This section discusses these.
\subsection{Choosing Terms From IVOA Vocabularies (non-normative)}
Resource annotators can usually treat IVOA Vocabularies as simple lists
of (case-sensitive) strings with human-readable labels and definitions.
These lists can be inspected with a simple web browser.
Each IVOA vocabulary has an associated URI starting with
\url{http://www.ivoa.net/rdf}. Dereferencing that URI yields a list of
the vocabularies approved or under review.
An individual vocabulary has a
URI like \url{http://www.ivoa.net/rdf/refposition}. Dereferencing this URI
with a web browser (or, indeed, any user agent indicating it prefers
text/html media) redirects to a tabular representation of the vocabulary,
giving:
\begin{itemize}
\item \emph{terms} -- i.e., the strings actually used in annotation,
\item \emph{labels} -- i.e., strings that should be presented to humans instead of
the slightly formalised terms, and
\item \emph{descriptions}, which should
be sufficiently precise to allow someone with a certain amount
of domain expertise to decide whether a certain ``thing'' is or is not
covered by the term (or more precisely, the underlying concept).
\end{itemize}
Some terms may be marked as deprecated, in which case they should no
longer be used in new annotations. In most cases, deprecated terms will
come with information about what to use instead.
Some terms may be marked as preliminary. Such terms might disappear
without further notice. Casual users should avoid the use of such
terms; if they find they want to use them, the semantics working group
requests notification over its mailing list, since such use is clearly
relevant to the term's adoption process.
Once a term is located within the HTML page, annotators can usually
directly use it in instance documents. For instance, continuing the
refposition example, the string \texttt{BARYCENTER} found in the
vocabulary is directly used in VOTable's TIMESYS element.
Some applications (Datalink being the prime example) instead use URIs
relative to the vocabulary URI. In practical terms, this just means
that a hash sign is prepended to the term (e.g., \texttt{\#progenitor}).
This latter practice builds on the property of IVOA vocabularies that if
one adds the term as fragment to the vocabulary URI (e.g.,
\url{http://ivoa.net/rdf/refposition#BARYCENTER}), that URI is the full,
RDF-compliant resource identifier of the concept. When used in
HTML-aware user agents (such as a web browser), dereferencing this URI
(i.e., opening it) will give the table of terms with the chosen term
highlighted. How exactly this is represented depends on the user agent.
\subsection{Semantic Operations Without RDF Tooling}
\label{sect:desise}
Many VO components need a machine-readable representation of the
entire vocabulary, for instance in order to
(cf.~sect.~\ref{sect:usecases}):
\begin{compactitem}
\item display labels and descriptions for terms to users,
\item perform query expansion or similar exploitation of hierarchical
relationships, or
\item validate annotated instances for the use of correct and current
terms.
\end{compactitem}
\subsubsection{Vocabularies in desise}
To let VO programs perform such tasks with minimal technical overhead,
in addition to the RDF artefacts described in
sect.~\ref{sect:deployment}, IVOA vocabularies are also available in an
ad-hoc format defined here for VO-internal use, nicknamed ``desise''
(``dead simple semantics''). Clients can retrieve vocabularies in
desise by requesting the vocabulary URI with the HTTP accept header set
to \texttt{application/x-desise+json}.
What is returned is a JSON-encoded \citep{std:JSON} mapping (``object''
in JSON terms)
containing the following keys (all mandatory):
\begin{description}
\item[uri] The vocabulary URI. All terms occurring in desise documents
can be turned into full, RDF-compliant resource URIs by prefixing them
with this URI and a hash character.
\item[flavour] The flavour of the vocabulary (can generally be ignored;
see sect.~\ref{sect:voccontent}).
\item[terms] A JSON object mapping the (machine-readable) terms to a
JSON object giving the term's properties as described below.
The keys in \textit{terms} are the strings used in
machine-readable data.
\end{description}
The JSON objects present as values in the terms object can have the
following keys:
\begin{description}
\item[label] (mandatory)
A human-readable label for display purposes; clients should
always try to display this rather than the raw term.
\item[description] (mandatory) A human-readable definition of the underlying
concept.
\item[deprecated] present and mapped to an arbitrary value if the term is
deprecated and should no longer be used; validators will warn against
its use.
\item[preliminary] present and mapped to an arbitrary value if the term
is preliminary, meaning that in contrast to the other, ``eternal'' terms
it can disappear again; validators should qualify a validation as
preliminary if a document uses such a term.
\item[wider] (mandatory) A JSON array
of ``wider'' terms. Most IVOA vocabularies are
tree-like, and for them, there is only up to one term in here, which
would be the the parent node, which is the hypernym of the current term.
In SKOS-flavoured vocabularies, multiple terms can be here, and the
meaning of ``wider'' is a bit less clear-cut. The \textit{wider} list
is empty for top-level terms.
\item[narrower] (mandatory) A JSON array
of ``narrower'' terms. In SKOS-flavoured
vocabularies, that is just a list of all terms that list the current
term as wider. Otherwise, the vocabularies are tree-like and
\textit{narrower} is a list of all terms on the term's branch and below
it in the tree (it is the ``transitive closure of the inverse of
wider''). This is much more easily understood in an example, which we
give below in the discussion on addressing use case~\ref{uc:links}.
\end{description}
Note that, while \textit{wider} and \textit{narrower} are mandatory
keys, their values can of course be empty lists.
See appendix~\ref{app:desiseexample} for a example of a vocabulary
represented in desise.
\subsubsection{Working with desise (non-normative)}
For illustration, here are recipes showing how to address
the various use cases in Python:
\paragraph{Load a vocabulary} Using the popular requests module:\\
\begin{lstlisting}
import requests
voc = requests.get(
"http://www.ivoa.net/rdf/uat",
headers={"accept": "application/x-desise+json"}
).json()
\end{lstlisting}
Note, however, that non-trivial clients should cache files retrieved in
this way for a reasonable time span; IVOA vocabularies typically do not
change on time scales of months.
\paragraph{See if a term is in the vocabulary} (\ref{uc:simplevoc},
\ref{uc:votvoc})\\ \lstinline{term in voc["terms"]}
\paragraph{See if a term is deprecated} (\ref{uc:deprecation})\\
\lstinline{"deprecated" in voc["terms"][term]}
\paragraph{Find a human-readable label for a term}
(\ref{uc:discovering})\\
\lstinline{voc["terms"][term]["label"]}
\paragraph{Find a human-readable description for a term}
(\ref{uc:discovering})\\
\lstinline{voc["terms"][term]["description"]}
\paragraph{Find out if a term is preliminary} (\ref{uc:simplereview})\\
\lstinline{"preliminary" in voc["terms"][term]}
\paragraph{Query expansion: select branch} (in \ref{uc:links}, select all
progenitors, including flat fields, dark frames, etc)
\begin{lstlisting}[language=python]
base_term = "progenitor"
expanded_terms = set(
[base_term]
+voc["terms"][base_term]["narrower"])
is_match = datalink_row["semantics"][1:] in expanded_terms
\end{lstlisting}
\paragraph{SKOS-type query expansion by neighbouring terms}
(\ref{uc:filtering})
\begin{lstlisting}[language=python]
assert voc["flavour"]=="SKOS"
expanded_terms = set(
[base_term]
+voc["terms"][base_term]["narrower"]
+voc["terms"][base_term]["wider"])
is_match = keyword_found in expanded_terms
\end{lstlisting}
\section{Vocabulary Content}
\label{sect:voccontent}
IVOA vocabularies MUST be based on W3C's Resource Description Framework.
Details on required serialisations are given in
sect.~\ref{sect:deployment}. This section deals with what kinds of
statements users of IVOA vocabularies SHOULD evaluate to ensure
interoperability. Statements of other types are legal in IVOA
vocabularies but are not expected to be interpreted interoperably.
Clients MAY ignore them.
In IVOA vocabularies, the concept URI MUST begin with
\url{http://www.ivoa.net/rdf}\footnote{In retrospect, the unnecessary
``www'' in this URI is somewhat regrettable, but existing vocabularies
have used URIs including it, and it seems a small price to pay for
having uniform URIs.}. It is recommended to not introduce
additional hierarchy levels, i.e., vocabulary URIs SHOULD be direct children
of \texttt{rdf}\footnote{Some existing vocabularies do not follow this
rule; since vocabulary URI changes will break certain usage scenarios,
their URIs are still retained.}.
Since all vocabularies specified here are
single-file, the full term (i.e., RDF resource)
URI is formed by appending a hash sign
and a fragment identifier. In IVOA vocabularies, this fragment
identifier MUST consist of ASCII letters, numbers, underscores and
dashes exclusively [for requirement~\ref{req:machine}].
The fragment identifiers in the vocabulary URIs SHOULD be
human-readable, usually by suitably contracting the
preferred label. In the IVOA, we do \emph{not} use natural
language-neutral concept identifiers but instead expect that domain
experts will already have an impression of a term's meaning from looking
at its URI.
Examples of URIs in the recommended form include:
\begin{itemize}
\item \url{http://www.ivoa.net/rdf/ivoasem#preliminary} for a
preliminary term by this specification.
\item \url{http://www.ivoa.net/rdf/timescale#TT} for the Terrestial Time
time scale.
\item \url{http://www.ivoa.net/rdf/uat#active-galactic-nuclei} for the
concept ``Active Galactic Nuclei''.
\end{itemize}
In this specification, we distinguish three different ``flavours'' of
vocabularies. Each covers a particular domain of problems and is
therefore subject to different requirements.
Although the requirements are largely non-contradicting, each vocabulary must
be clearly identified as \emph{either} giving SKOS concepts, RDFS
classes or RDF properties so clients know how to extract word lists and
hierarchies; see sect.~\ref{sect:genprop}
for details.
\subsection{SKOS Vocabularies}
\label{sect:skosvoc}
SKOS vocabularies should be used where terms are organised
in informal (i.e., non necessarily strict is-a)
hierarchies. The classic use case here is query expansion, where, for
instance, a search for ``AGN'' might be expanded to include matches for
``accretion disk'' (under certain circumstances).
The terms in SKOS vocabularies have the RDF type \vocterm{skos:Concept}.
\subsubsection{Properties in SKOS Vocabularies}
\label{sect:skosvoc-prop}
IVOA SKOS vocabularies use the following properties:
\begin{itemize}
\item \vocterm{skos:broader} -- interpreted in the standard SKOS sense.
The reverse property, \vocterm{skos:narrower}, MAY be given, but clients
MUST NOT depend on their presence [this satisifies
requirement~\ref{req:hierarchy}].
\item \vocterm{skos:prefLabel} -- all concepts MUST have an
English-language preferred label, which is an RDF plain literal [by
requirement~\ref{req:mtm}]. No RDF language label is allowed on the
literal, and only one preferred label is permitted
[these help requirement~\ref{req:nordf}].
\item \vocterm{skos:definition} -- all concepts MUST have a non-trivial
English-language definition. It is obviously impossible to define
``non-trivial'' in a rigorous way; a suggested criterion is that a
domain expert would, given the definition, presumably arrive at a
similar preferred label, and recursive definitions (i.e., those using
the label itself) should be avoided whenever possible. Definitions in
non-English languages are not permitted, and only one definition is
permitted [again, this helps requirement~\ref{req:mtm}].
\item \vocterm{skos:exactMatch} -- for externally managed vocabularies
the IVOA has endorsed (see sect.~\ref{sect:externally-managed}), this
property links the IVOA term (subject) to the external RDF resource
(object) [mostly for requirement~\ref{req:external}].
\item General properties discussed in \ref{sect:genprop} [this is
for requirements~\ref{req:deprecating} and
\ref{req:preliminary}]. The \vocterm{ivoasem:vocflavour} of these
vocabularies is \verb|SKOS|.
\end{itemize}
This specification does not include requirements on the use or the
interpretation of \vocterm{skos:related},
\vocterm{skos:closeMatch}, \vocterm{skos:broadMatch},
\vocterm{skos:narrowMatch}, \vocterm{skos:ConceptScheme},
\vocterm{skos:inScheme}, \vocterm{skos:hasTopconcept},
\vocterm{skos:altLabel}, and \vocterm{skos:hiddenLabel}. If use cases
are found that require those, this specification will be amended. Until
then, vocabulary authors SHOULD NOT use them in order to avoid creating
practices that might conflict with later usage patterns.
This specification does not include requirements on the use or the
interpretation of the transitive SKOS properties
(\vocterm{skos:broaderTransitive}, \vocterm{skos:narrowerTransitive}).
At this point, we believe that applications requiring this type of
reasoning-friendly semantics should preferably use RDF class
vocabularies.
\subsubsection{Example (non-normative)}