-
Notifications
You must be signed in to change notification settings - Fork 47
Material for a SPRINT on Versioning
This page collects preparatory material for group discussion about Versioning. It is work in progress, and it does not describe an official view of the group.
Dataset (and other DCAT) versioning provides a discussion of the issues about Versioning, which might need some refresh.
A short compendium of existing vocabularies in respect to Versioning.
The pros and cons are a starting point to help the group in comparing existing solutions, and to understand if there is one specific vocabulary that can be adopted for modeling versioning or we need to provide new terms in DCAT.
The following table lists terms from relevant vocabularies, tentatively grouped by versioning aspects. Terms on the same row are not necessarily equivalent or linked via a super-/sub-property relationship; rather, they provide different options for describing a versioning aspect. Moreover, the listed versioning aspects are just tentative, and meant for discussion - including whether they should be actually considered as related to "versioning" (e.g., the "derivation" aspect may not fit into this category, as it is very generic, and it covers also cases that are usually not considered in scope with versioning).
References: https://www.w3.org/TR/owl-ref/#versionInfo-def OWL offers a set of properties for versioning. owl:imports, owl:priorVersion, owl:backwardCompatibleWith and owl:incompatibleWith. The ontology-import construct owl:imports and the ontology-versioning constructs owl:priorVersion, owl:backwardCompatibleWith and owl:incompatibleWith are defined in the OWL vocabulary as instances of the OWL built-in class owl:OntologyProperty. Instances of owl:OntologyProperty must have the class owl:Ontology as their domain and range.
The only property not assuming owl:Ontology as domain is owl:versionInfo
- owl:versionInfo: An owl:versionInfo statement generally has as its object a string giving information about this version. This statement does not contribute to the logical meaning of the ontology other than that given by the RDF(S) model theory. Although this property is typically used to make statements about ontologies, it may be applied to any OWL construct. For example, one could attach an owl:versionInfo statement to an OWL class.
- (cons) With the exception of owl:versionInfo, the properties for versioning are meant for ontologies.
- (pros) owl:versionInfo is a term coming from a solid rec, which is already suggested for representing versionInfo in a wider context (see for example Data on the Web Best practices examples for versioning
DCTerms provides the following terms to express information pertaining to Versioning:
- dcterms:hasVersion links a related resource that is a version, edition, or adaptation of the described resource, to be used with entities not literals. Changes in version imply substantive changes in content rather than differences in format. This property is intended to be used with non-literal values. This property is an inverse property of dcterms:isVersionOf.
- dcterms:replaces links a related resource that is supplanted, displaced, or superseded by the described resource. This property is intended to be used with non-literal values. This property is an inverse property of dcterms:isReplacedBy.
All the above are sub-property of
- (Pros) DCTERM is a prominent, long-lasting and largely adopted, and extended.
- (Pros) A non-normative mapping with prov exist, which might provide a bridge also for qualified relations in prov.
- (Cons) dcterms:hasVersion seems not to distinguish between adaptation and version in the sense of "new release". As result, the mapping of dcterms in prov defines it as super property of prov:wasRevisionOf as it can be other than a revision (e.g., "West Side Story" is a version (adaptation) of "Romeo and Juliet").
- (cons) missing terms for version information
PROV provides the following terms to express information pertaining to Versioning:
-
prov:wasRevisionOf indicates a revision, which is a derivation for which the resulting entity is a revised version of some original. The implication here is that the resulting entity contains substantial content from the original. Revision is a particular case of derivation.
- It is a subproperty of prov:wasDerivedFrom, the inverse of prov:hadRevision;
- it can be qualified with by means of prov:Revision and prov:qualifiedRevision
- (Pros) Prov is a well-known w3c rec;
- (Pros) prov offers prov:wasRevisionOf , which is also qualified by using prov:Revision and prov:qualifiedRevision. It is coherently defined as a sub relation of prov:wasDerivedFrom, which is a sub properties of prov:qualifiedInfluence;
- (Pros) A non normative mapping with Dcterms exist;
- (???) Perhaps we need to discuss how prov:wasRevisionOf fits with dcat:Relationships?
Resources:
- Ciccarese, P., Soiland-reyes, S., Belhajjame, K., Gray, A.J.G., Goble, C., Clark, T.: PAV ontology : provenance, authoring and versioning. 1–22 (2013).
- https://pav-ontology.github.io/pav/
PAV somehow distinguishes between long-living unversioned resources and more specific "versioned resources".
For versioned resources, PAV provides the following properties:
- pav:previousVersion indicates the previous version of a resource in a lineage. For instance, a news article updated to correct factual information would point to the previous version of the article with pav:previousVersion. If however, the content has significantly changed so that the two resources no longer share lineage (say a new article that talks about the same facts), they can instead be related using pav:derivedFrom.
- pav:hasEarlierVersion indicates a versioned resource has an earlier version. It is a superproperty of pav:previousVersion SHOULD be used if the earlier version is the direct ancestor of this version. pav:hasEarlierVersion is transitive, so it should not be necessary to repeat the earlier versions of an earlier version. A chain of previous versions can be declared using the subproperty pav:previousVersion, implying that the previous version is also an earlier version. It might however still be useful to declare an earlier version explicitly, for instance, because it is an earlier version of high relevance or because of the complete chain of pav:previousVersion is not available.
- pav:derivedFrom indicates derivation from a different resource. Derivation concerns itself with derived knowledge. If this resource has the same content as the other resource, but has simply been transcribed to fit a different model (like XML -> RDF or SQL -> CVS), use pav:importedFrom. If a resource was simply retrieved, use pav:retrievedFrom. If the content has however been further refined or modified, pav:derivedFrom should be used.
- pav:version to provide a resource with a version number.
For instance a news article updated to correct factual information would point to the previous version of the article with pav:previousVersion. If however the content has significantly changed so that the two resources no longer share lineage (say a new article that talks about the same facts), they can instead be related using pav:derivedFrom. pav:previousVersion is normally used in a functional way, although PAV does not formally restrict this. Earlier versions that are not direct ancestors of this resource may instead be provided using the superproperty pav:hasEarlierVersion. A version number of this resource can be provided using the data property pav:version. To indicate that this version is a snapshot of a more general, non-versioned resource, e.g. "Weather Today" vs. "Weather Today on 2013-12-07", see pav:hasVersion. Note that it might be confusing to indicate pav:previousVersion from a resource that also has pav:hasVersion or pav:hasCurrentVersion relations, as such resources are intended to be a long-living and "unversioned", while pav:previousVersion is intended for use between permalink-like "snapshots" arranged in a linear history
For unversioned long-living resources, PAV provides the following properties:-
- pav:hasVersion links a more specific, versioned resource. This property is intended for relating a non-versioned or abstract resource to several versioned resources, e.g. snapshots.
- pav:hasCurrentVersion links a more specific, versioned resource with equivalent content. (???)
pav:hasCurrentVersion is intended for relating a non-versioned or abstract resource to a single snapshot that can be used as a permalink to indicate the current version of the content.
For instance, if today is 2013-12-25, then a News page can indicate a corresponding snapshot resource which will refer to the news as they were of 2013-12-25.
<http://news.example.com/> pav:hasCurrentVersion <http://news.example.com/2013-12-25/> .
"Equivalent content" is a loose definition, for instance the snapshot resource might include additional information to indicate it is a snapshot, and is not required to be immutable.
Other versioned resources indicating the content at earlier times MAY be indicated with the super property pav:hasVersion, one of which MAY be related to the current version using pav:hasCurrentVersion:
<http://news.example.com/2013-12-25/> pav:previousVersion <http://news.example.com/2013-12-24/> .
<http://news.example.com/> pav:hasVersion <http://news.example.com/2013-12-23/> .
Note that it might be confusing to also indicate pav:previousVersion from a resource that has hasCurrentVersion relations, as such a resource is intended to be a long-living "unversioned" resource. The PAV ontology does however not formally restrict this, to cater for more complex scenarios with multiple abstraction levels.
Similarly, it would normally be incorrect to indicate a pav:hasCurrentVersion from an older version; instead the current version would be found by finding the non-versioned resource that the particular resource is a version of, and then its current version.
This property is normally used in a functional way, although PAV does not formally restrict this.
- (pros) It considers and relates with Prov and DCterms
- (pros/cons) It ideally divides long-living and unversioned resources from versioning resources. pav:hasVersion, pav:hasCurrentVersion relates the unversioned res to the versioned, and are sub-properties of dcterms:hasVersion. Not sure that division makes full-sense.
- (cons) It is not very clear the distinction between the PAV properties (pav:previousVersion, pav:derivedFrom) and the PROV’s counterparts. What do PAV properties add to PROV’s? Also, PAV has no qualified counterparts.
- (???) PAV provides a different kind of roles for contributors which are embedded in specific properties (e.g. pav:authoredBy, pav:curatedBy). I wonder if these roles can conflict with roles provided by other metadata standards, as at some extent the roles are community-dependent.
- (cons) due to the previous assumption. It ignores the dcterms:replaces, which links A related resource that is supplanted, displaced, or superseded by the described resource.
- (cons) It is not a “standard”. Can we suggest it as a non-normative reference? It is not a standard, not sure about its adoption, the namespace is provided by purl, perhaps we might replicate the pattern in DCAT, offering a stable house for the relations and defining the equivalence or sub-properties to the original?
- (cons) We might use PAV for versioning, but we might be not very interested in other features such as PAV roles or other, taking from PAV some parts and not others might result in a quite confusing picture for the adopters.
ADMS provides these terms to express information pertaining to Versioning:
- adms:prev: A link to the previous version of the Asset, sub property of xhv:prev, range in rdfs:Resource
- adms:next: A link to the next version of the Asset, subproperty of xhv:next, range in rdfs:Resource
- adms:last: A link to the current or latest version of the Asset, subproperty of xhv:last, range in rdfs:Resource
- adms:versionNotes: A description of changes between this version and the previous version of the Asset, ranges in rdfs:Literal
- (Pros) it is simple, it provides adms:versionNotes, which might provide a base for meeting Issue #89
- (Pros) It is offered in the W3C family, that implies the same stability that we have for DCAT
- (Pros) DCAT already use some terms of ADMS for identifies
- (Cons) It is not a REC but a Working note, so we might not want to have terms for ADMS in the normative part.
- (Cons) it does not directly relate with DCTERMS, and prov for the part related to versioning
- ...
References: https://vocab.org/frbr/core#
The FRBR relies on a conceptual model that consists of Four-level: Work, Expression, Manifestation, Item. Represented as four disjoint classes:
- frbr:Work: An abstract notion of an artistic or intellectual creation.
- frbr:Expression: A realization of a single work usually in a physical form.
- frbr:Manifestation: The physical embodiment of one or more expressions.
- frbr:Item: An exemplar of a single manifestation.
Applying some versioning properties to an object implies belonging to a certain class.
Examples of properties inducing to be a frbr:Expression
- frbr:translation: Having a frbr:translation implies being something that, amongst other things, is a frbr:Expression. So implies being a frbr:translationOf which is the inverse of frbr:translation.
- frbr:revision Having a frbr:revision implies being something that, amongst other things, is a frbr:Expression [frbr:revisionOf](https://vocab.org/frbr/core#term-revisionOf]
Examples of properties inducing to be a [frbr:Manifestation]
- frbr:alternate: Having a frbr:alternate implies being something that, amongst other things, is a frbr:Manifestation. So does its inverse frbr:alternateOf
Examples of properties inducing to be either a frbr:Work or a frbr:Expression:
- frbr:successor: Having a frbr:successor implies being something that, amongst other things, is a frbr:Work or a frbr:Expression.
- (??) FRBR is for bibliographic records.
- (??) It relies on a Four-level structure (Work, Expression; Manifestation; Item) which may be subject to interpretation when implementing it. For example, there is more than one mapping in the case of sensor-based scientific data (e.g., see RDA document on principle and best practices for data versioning. How these levels map to DCAT? Is there one or more than one possible mapping to DCAT? Might the mapping be depending on the context and the specific application context and data management guidelines?
References: https://github.com/UKGovLD/registry-core/wiki/Principles-and-concepts
DataCite Metadata Working Group: DataCite Metadata Schema Documentation for the Publication and Citation of Research Data. Version 4.3. (2019). https://schema.datacite.org/meta/kernel-4.3/ DataCite provides these terms to express information pertaining to Versioning:
- HasVersion: indicates A has a version (B). The registered resource such as a software package or code repository has a versioned instance (indicates A has the instance B) e.g. it may be used to relate an un-versioned code repository to one of its specific software versions.
- IsVersionOf: indicates A is a version of B. The registered resource is an instance of a target resource (indicates that A is an instance of B) e.g. it may be used to relate a specific version of a software package to its software code repository.
- IsNewVersionOf: indicates A is a new edition of B, where the new edition has been modified or updated.
- IsPreviousVersionOf: indicates A is a previous edition of B.
Other terms might be relevant .
- IsVariantFormOf
- IsOriginalFormOf
- IsObsoletedBy
- Obsoletes
- (pros) it somehow relates to DCTERM and PAV though I haven't found explicit mapping ( same relation names implies same semantics?)
- (cons) XML, not in RDF
- (cons) no mapping with prov
The following mappings are copied from https://www.w3.org/TR/prov-dc/#provenance-in-dublin-core
-
dcterms:hasVersion rdfs:subPropertyOf prov:hadRevision, Inverse property of dcterms:isVersionOf.
-
prov:wasRevisionOf rdfs:subPropertyOf dcterms:isVersionOf prov:wasRevisionOf is more restrictive in the sense that it refers to a revised version of a resource, while dcterms:isVersionOf involves versions, editions or adaptations of the original resource. As an example, "West Side Story" is a version (adaptation) of "Romeo and Juliet", but not a revision.
There is a relation between two resources when the former replaces or displaces the latter. However, we can't always assume the replacement is derived from the former resource, because the replacement could have existed and been generated independently from the original (for example, if a catalog replaces a book entitled "Introduction to provenance" with one entitled "Provenance in a nutshell). Therefore the "replace" Activity uses a specialization of the replaced entity (:oldEntity) and generated a specialization of the replacement (:newEntity). These specializations model the aspect of the resource which is the subject of replacement, thus, _:newEntity was derived from _:oldEntity.
CONSTRUCT{
?document a prov:Entity.
?document2 a prov:Entity.
_:activity a prov:Activity, prov:Replace;
prov:used _:oldEntity.
# The “input”
_:oldEntity a prov:Entity;
prov:specializationOf ?document2;
# The “output”
_:newEntity a prov:Entity;
prov:specializationOf ?document;
prov:wasGeneratedBy _:activity;
prov:wasDerivedFrom _:oldEntity;
prov:alternateOf _:oldEntity.
} WHERE {
?document dcterms:replaces ?document2.
}
The term dcterms:isReplacedBy would produce a similar mapping, inverting the roles of document and document2.
DCAT 2 already acknowledges that
- Versioning can be applied to any of the first-class citizens DCAT resources including Catalogs, Datasets, Distributions.
- The notion of version is very much related to the community practices, data management policy and the workflows in place. It is up to data providers to decide when and why a new version should be released. For this reason, DCAT refrains from providing definitions or rules about when changes in a resource should turn in a new release of it.
- Versioning may be understood as involving relationships between datasets, which is supported by the
dcat:qualifiedRelation
and described in § 13.2 Relationships between datasets and other resources. The classdcat:Relationship
supports providing information about the relationship and could be extended for versioning information.
Ideally, DCAT 3 might want to consider further desiderata, for example
- Desiderata 1: DCAT should acknowledge and interoperate with vocabularies (DCTERMS, ADMS, PAV, PROV), which are already in use.
- Desiderata 2: Both unqualified and qualify relations might be required for versioning. The idea could be unqualified relations provide a quick way to assert minimal facts about versioning, while qualified relation allows more extensive descriptions inducing n-ary relations.
-
Desiderata 3: DCAT 3 should provide feasible "mechanics" (possibly, the simplest implementable within SPARQL or OWL reasoning?!?) to answer to the following competency questions:
- Is X a more updated version than Y?
- What is the latest version of Y? Is X the most updated version of Y?
- What is the difference between version X1 and X2?
We might want to discuss and build upon distinct technical solutions:
Solution A: To use PAV as it is. Though, there are many cons to discuss and overcome.
Solution B: Cherry-picking from different vocabularies. DCAT could mint terms from existing vocabularies and harmonize them. For example, we can created three equivalent PAV properties in DCAT ( dcat:previousVersion, dcat:hasEarlierVersion, dcat:derivedFrom) and one dcat:versionNotes which mimics adms:versionNotes. This will solve the issue about terms from vocabularies that are not standard or stable enough, in case we want to include terms for versioning in the normative part of DCAT and it also might save the best from each approach.
Solution C: to use Prov properties (prov:wasRevisionOf, prov:wasDerivedFrom) which have a qualified counterpart as a starting point and complement them with light guidelines references about how to use the semantic versioning (e.g., by owl:versionInfo), diff of versions ( e.g., by adms:versionNotes ) etc.
Anyway, in order to meet desiderata 1, we might consider providing mappings to relate the chosen DCAT solution with other highly adopted vocabularies.
For Issue #90 - Version definition - Not DCAT job as domain and community dependent already written in DCAT2. Perhaps some general suggestions might be provided relying on DWBP.
For Issue #92 - Version identifier - Depending on the vocabulary chosen we can adopt owl:versionInfo or pav:version, for expressing string in line with the semantic versioning.