Skip to content

Material for a SPRINT on Versioning

Riccardo Albertoni edited this page May 27, 2020 · 35 revisions

This page collects preparatory material for group discussion about Versioning. All the pertaining issues are included in the versioning GitHub project. It is work in progress, and it does not describe an official view of the group.

Dataset (and other DCAT) versioning provides a discussion of the issues about Versioning, which might need some refresh.

1 - Overview of existing vocabularies

A short compendium of existing vocabularies in respect to Versioning.

The pros and cons are a starting point to help the group in comparing existing solutions, and to understand if there is one specific vocabulary that can be adopted for modeling versioning or we need to provide new terms in DCAT.

1.1 Summary table

The following table lists terms from relevant vocabularies, tentatively grouped by versioning aspects. Terms on the same row are not necessarily equivalent or linked via a super-/sub-property relationship; rather, they provide different options for describing a versioning aspect. Moreover, the listed versioning aspects are just tentative, and meant for discussion - including whether they should be actually considered as related to "versioning" (e.g., the "derivation" aspect may not fit into this category, as it is very generic, and it covers also cases that are usually not considered in scope with versioning).

Versioning aspect OWL DCTERMS PROV-O PAV ADMS FRBR (TBD) Registry / Version DataCite
Version information owl:versionInfo pav:version owl:versionInfo owl:versionInfo Version
adms:versionNotes
Resource status / lifecycle owl:DeprecatedClass, owl:DeprecatedProperty adms:status reg:status
dcterms:dateAccepted dcterms:dateAccepted Date/Accepted
dcterms:available Date/Available
dcterms:dateCopyrighted Date/Copyrighted
dcterms:created Date/Created
dcterms:issued dcterms:issued Date/Created
dcterms:dateSubmitted dcterms:dateSubmitted Date/Submitted
dcterms:modified dcterms:modified dcterms:modified Date/Updated
dcterms:valid version:interval Date/Valid
Date/Collected
Date/Withdrawn
Variation prov:alternateOf prov:alternateOf adms:translation frbr:translation
frbr:translationOf
frbr:alternateOf IsVariantFormOf
frbr:alternate IsOriginalFormOf
Earlier/later versions dcterms:hasVersion prov:hadRevision dcterms:hasVersion frbr:revision HasVersion
prov:hadRevision
pav:hasVersion
pav:hasCurrentVersion version:currentVersion
pav:hasEarlierVersion
owl:priorVersion pav:previousVersion adms:prev IsNewVersionOf
dcterms:isVersionOf prov:wasRevisionOf prov:wasRevisionOf frbr:revisionOf dcterms:isVersionOf IsVersionOf
adms:next IsPreviousVersionOf
adms:last
Compatibility owl:backwardCompatibleWith
owl:incompatibleWith
Supersession dcterms:replaces dcterms:replaces Obsoletes
reg:predecessor
dcterms:isReplacedBy dcterms:isReplacedBy IsObsoletedBy
reg:successor
Derivation dcterms:source prov:wasDerivedFrom pav:derivedFrom IsDerivedFrom
prov:hadDerivation IsSourceOf

1.2 OWL

References: https://www.w3.org/TR/owl-ref/#versionInfo-def OWL offers a set of properties for versioning. owl:imports, owl:priorVersion, owl:backwardCompatibleWith and owl:incompatibleWith. The ontology-import construct owl:imports and the ontology-versioning constructs owl:priorVersion, owl:backwardCompatibleWith and owl:incompatibleWith are defined in the OWL vocabulary as instances of the OWL built-in class owl:OntologyProperty. Instances of owl:OntologyProperty must have the class owl:Ontology as their domain and range.

The only property not assuming owl:Ontology as domain is owl:versionInfo

  • owl:versionInfo: An owl:versionInfo statement generally has as its object a string giving information about this version. This statement does not contribute to the logical meaning of the ontology other than that given by the RDF(S) model theory. Although this property is typically used to make statements about ontologies, it may be applied to any OWL construct. For example, one could attach an owl:versionInfo statement to an OWL class.

Pros and cons

  • (cons) With the exception of owl:versionInfo, the properties for versioning are meant for ontologies.
  • (pros) owl:versionInfo is a term coming from a solid rec, which is already suggested for representing versionInfo in a wider context (see for example Data on the Web Best practices examples for versioning

1.3 DCTerms

DCTerms provides the following terms to express information pertaining to Versioning:

  • dcterms:hasVersion links a related resource that is a version, edition, or adaptation of the described resource, to be used with entities not literals. Changes in version imply substantive changes in content rather than differences in format. This property is intended to be used with non-literal values. This property is an inverse property of dcterms:isVersionOf.
  • dcterms:replaces links a related resource that is supplanted, displaced, or superseded by the described resource. This property is intended to be used with non-literal values. This property is an inverse property of dcterms:isReplacedBy.

All the above are sub-property of

Pros and Cons:

  • (Pros) DCTERM is a prominent, long-lasting and largely adopted, and extended.
  • (Pros) A non-normative mapping with prov exist, which might provide a bridge also for qualified relations in prov.
  • (Cons) dcterms:hasVersion seems not to distinguish between adaptation and version in the sense of "new release". As result, the mapping of dcterms in prov defines it as super property of prov:wasRevisionOf as it can be other than a revision (e.g., "West Side Story" is a version (adaptation) of "Romeo and Juliet").
  • (cons) missing terms for version information

1.4 PROV

PROV provides the following terms to express information pertaining to Versioning:

  • prov:wasRevisionOf indicates a revision, which is a derivation for which the resulting entity is a revised version of some original. The implication here is that the resulting entity contains substantial content from the original. Revision is a particular case of derivation.

Pros and Cons

1.5 PAV

Resources:

  • Ciccarese, P., Soiland-reyes, S., Belhajjame, K., Gray, A.J.G., Goble, C., Clark, T.: PAV ontology : provenance, authoring and versioning. 1–22 (2013).
  • https://pav-ontology.github.io/pav/

PAV somehow distinguishes between long-living unversioned resources and more specific "versioned resources".

For versioned resources, PAV provides the following properties:

  • pav:previousVersion indicates the previous version of a resource in a lineage. For instance, a news article updated to correct factual information would point to the previous version of the article with pav:previousVersion. If however, the content has significantly changed so that the two resources no longer share lineage (say a new article that talks about the same facts), they can instead be related using pav:derivedFrom.
  • pav:hasEarlierVersion indicates a versioned resource has an earlier version. It is a superproperty of pav:previousVersion SHOULD be used if the earlier version is the direct ancestor of this version. pav:hasEarlierVersion is transitive, so it should not be necessary to repeat the earlier versions of an earlier version. A chain of previous versions can be declared using the subproperty pav:previousVersion, implying that the previous version is also an earlier version. It might however still be useful to declare an earlier version explicitly, for instance, because it is an earlier version of high relevance or because of the complete chain of pav:previousVersion is not available.
  • pav:derivedFrom indicates derivation from a different resource. Derivation concerns itself with derived knowledge. If this resource has the same content as the other resource, but has simply been transcribed to fit a different model (like XML -> RDF or SQL -> CVS), use pav:importedFrom. If a resource was simply retrieved, use pav:retrievedFrom. If the content has however been further refined or modified, pav:derivedFrom should be used.
  • pav:version to provide a resource with a version number.

For instance a news article updated to correct factual information would point to the previous version of the article with pav:previousVersion. If however the content has significantly changed so that the two resources no longer share lineage (say a new article that talks about the same facts), they can instead be related using pav:derivedFrom. pav:previousVersion is normally used in a functional way, although PAV does not formally restrict this. Earlier versions that are not direct ancestors of this resource may instead be provided using the superproperty pav:hasEarlierVersion. A version number of this resource can be provided using the data property pav:version. To indicate that this version is a snapshot of a more general, non-versioned resource, e.g. "Weather Today" vs. "Weather Today on 2013-12-07", see pav:hasVersion. Note that it might be confusing to indicate pav:previousVersion from a resource that also has pav:hasVersion or pav:hasCurrentVersion relations, as such resources are intended to be a long-living and "unversioned", while pav:previousVersion is intended for use between permalink-like "snapshots" arranged in a linear history

For unversioned long-living resources, PAV provides the following properties:-

  • pav:hasVersion links a more specific, versioned resource. This property is intended for relating a non-versioned or abstract resource to several versioned resources, e.g. snapshots.
  • pav:hasCurrentVersion links a more specific, versioned resource with equivalent content. (???)

pav:hasCurrentVersion is intended for relating a non-versioned or abstract resource to a single snapshot that can be used as a permalink to indicate the current version of the content.

For instance, if today is 2013-12-25, then a News page can indicate a corresponding snapshot resource which will refer to the news as they were of 2013-12-25.

<http://news.example.com/> pav:hasCurrentVersion <http://news.example.com/2013-12-25/> .

"Equivalent content" is a loose definition, for instance the snapshot resource might include additional information to indicate it is a snapshot, and is not required to be immutable.

Other versioned resources indicating the content at earlier times MAY be indicated with the super property pav:hasVersion, one of which MAY be related to the current version using pav:hasCurrentVersion:

<http://news.example.com/2013-12-25/> pav:previousVersion <http://news.example.com/2013-12-24/> .
<http://news.example.com/> pav:hasVersion <http://news.example.com/2013-12-23/> .

Note that it might be confusing to also indicate pav:previousVersion from a resource that has hasCurrentVersion relations, as such a resource is intended to be a long-living "unversioned" resource. The PAV ontology does however not formally restrict this, to cater for more complex scenarios with multiple abstraction levels.

Similarly, it would normally be incorrect to indicate a pav:hasCurrentVersion from an older version; instead the current version would be found by finding the non-versioned resource that the particular resource is a version of, and then its current version.

This property is normally used in a functional way, although PAV does not formally restrict this.

Pros and cons

  • (pros) It considers and relates with Prov and DCterms
  • (pros/cons) It ideally divides long-living and unversioned resources from versioning resources. pav:hasVersion, pav:hasCurrentVersion relates the unversioned res to the versioned, and are sub-properties of dcterms:hasVersion. Not sure that division makes full-sense.
  • (cons) It is not very clear the distinction between the PAV properties (pav:previousVersion, pav:derivedFrom) and the PROV’s counterparts. What do PAV properties add to PROV’s? Also, PAV has no qualified counterparts.
  • (???) PAV provides a different kind of roles for contributors which are embedded in specific properties (e.g. pav:authoredBy, pav:curatedBy). I wonder if these roles can conflict with roles provided by other metadata standards, as at some extent the roles are community-dependent.
  • (cons) due to the previous assumption. It ignores the dcterms:replaces, which links A related resource that is supplanted, displaced, or superseded by the described resource.
  • (cons) It is not a “standard”. Can we suggest it as a non-normative reference? It is not a standard, not sure about its adoption, the namespace is provided by purl, perhaps we might replicate the pattern in DCAT, offering a stable house for the relations and defining the equivalence or sub-properties to the original?
  • (cons) We might use PAV for versioning, but we might be not very interested in other features such as PAV roles or other, taking from PAV some parts and not others might result in a quite confusing picture for the adopters.

1.6 ADMS

ADMS provides these terms to express information pertaining to Versioning:

  • adms:prev: A link to the previous version of the Asset, sub property of xhv:prev, range in rdfs:Resource
  • adms:next: A link to the next version of the Asset, subproperty of xhv:next, range in rdfs:Resource
  • adms:last: A link to the current or latest version of the Asset, subproperty of xhv:last, range in rdfs:Resource
  • adms:versionNotes: A description of changes between this version and the previous version of the Asset, ranges in rdfs:Literal
  • adms:status links the status of the Asset or Asset Distribution in the context of a particular workflow process. No domain is specified for this property, so the property can be applied to entities without inferring that they are adms:Asset(s). The range is skos:Concept. AFAIU, ADMS does not define a built-in codelist for status, so the conceptual schema can be chosen by adopters.

Pros and cons:

  • (Pros) it is simple, it provides adms:versionNotes, which might provide a base for meeting Issue #89
  • (Pros) it provides adms:status which might be a starting point for solving Issue #1238
  • (Pros) It is offered in the W3C family, that implies the same stability that we have for DCAT
  • (Pros) DCAT already uses some terms of ADMS for identifies
  • (Cons) It is not a REC but a Working note, so we might not want to have terms for ADMS in the normative part.
  • (Cons) it does not directly relate with DCTERMS, and prov for the part related to versioning
  • ...

1.7 Functional Requirement Bibliographic Records (FRBR)

References: https://vocab.org/frbr/core#

The FRBR relies on a conceptual model that consists of Four-level: Work, Expression, Manifestation, Item. Represented as four disjoint classes:

  • frbr:Work: An abstract notion of an artistic or intellectual creation.
  • frbr:Expression: A realization of a single work usually in a physical form.
  • frbr:Manifestation: The physical embodiment of one or more expressions.
  • frbr:Item: An exemplar of a single manifestation.

Applying some versioning properties to an object implies belonging to a certain class.

Examples of properties inducing to be a frbr:Expression

Examples of properties inducing to be a [frbr:Manifestation]

  • frbr:alternate: Having a frbr:alternate implies being something that, amongst other things, is a frbr:Manifestation. So does its inverse frbr:alternateOf

Examples of properties inducing to be either a frbr:Work or a frbr:Expression:

  • frbr:successor: Having a frbr:successor implies being something that, amongst other things, is a frbr:Work or a frbr:Expression.

Pros and cons:

  • (??) FRBR is for bibliographic records.
  • (??) It relies on a four-level structure (Work, Expression; Manifestation; Item), which may be subject to interpretation when implementing it outside the artistic and intellectual creations. For example, there is more than one mapping in the case of sensor-based scientific data (e.g., see RDA document on principle and best practices for data versioning. How these levels map to DCAT? Is there one or more than one possible mapping to DCAT? Might the mapping be depending on the context and the specific application context and data management guidelines?

1.8 Registry / Version

References: https://github.com/UKGovLD/registry-core/wiki/Principles-and-concepts

Pros and Cons

  • (pros/cons) Interesting solution that distinguishes between VersionedThing(s) and their Version(s), using an owl-full version vocabulary. Properties of a versioned thing that are essential to its nature (e.g. its type) are termed rigid and the version vocabulary provides a version:rigidProperty annotation to declare the rigid properties of a class of versioned things. The rigid properties MAY be stored on the base version:VersionedThing since they don’t change. All the properties of a particular version, including the rigid properties are materialized on each version:Version instance. (see documentation)
  • (pros) It reused dcterms terms such as dct:replace, dct:replacedby, dct:isVersionOf
  • (pros) It provides property (reg:status) to register status and life cycle of the items which ranges into skos:Concept defined in the reg:StatusScheme.
  • (pros) Customized life-cycle: It is possible to customize the status life-cycle by adding status values and declaring which status values can succeed or precede them.
  • (cons) versioning vocabulary is OWL-full?!?

1.9 Datacite:

DataCite Metadata Working Group: DataCite Metadata Schema Documentation for the Publication and Citation of Research Data. Version 4.3. (2019). https://schema.datacite.org/meta/kernel-4.3/ DataCite provides these terms to express information pertaining to Versioning:

  • HasVersion: indicates A has a version (B). The registered resource such as a software package or code repository has a versioned instance (indicates A has the instance B) e.g. it may be used to relate an un-versioned code repository to one of its specific software versions.
  • IsVersionOf: indicates A is a version of B. The registered resource is an instance of a target resource (indicates that A is an instance of B) e.g. it may be used to relate a specific version of a software package to its software code repository.
  • IsNewVersionOf: indicates A is a new edition of B, where the new edition has been modified or updated.
  • IsPreviousVersionOf: indicates A is a previous edition of B.

Other terms might be relevant .

  • IsVariantFormOf
  • IsOriginalFormOf
  • IsObsoletedBy
  • Obsoletes

Pros and cons

  • (pros) it somehow relates to DCTERM and PAV though I haven't found explicit mapping ( same relation names implies same semantics?)
  • (cons) XML, not in RDF (though, http://www.sparontologies.net/ontologies/datacite provides an independent RDF translation for a previous version 3.1.)
  • (cons) no mapping with prov

1.10 Mappings DCTerms - Prov

The following mappings are copied from https://www.w3.org/TR/prov-dc/#provenance-in-dublin-core

Mapping dcterms:isVersionOf and dcterms:hasVersion with Prov

  • dcterms:hasVersion rdfs:subPropertyOf prov:hadRevision, Inverse property of dcterms:isVersionOf.

  • prov:wasRevisionOf rdfs:subPropertyOf dcterms:isVersionOf prov:wasRevisionOf is more restrictive in the sense that it refers to a revised version of a resource, while dcterms:isVersionOf involves versions, editions or adaptations of the original resource. As an example, "West Side Story" is a version (adaptation) of "Romeo and Juliet", but not a revision.

Mapping dcterms:replaces with Prov

There is a relation between two resources when the former replaces or displaces the latter. However, we can't always assume the replacement is derived from the former resource, because the replacement could have existed and been generated independently from the original (for example, if a catalog replaces a book entitled "Introduction to provenance" with one entitled "Provenance in a nutshell). Therefore the "replace" Activity uses a specialization of the replaced entity (:oldEntity) and generated a specialization of the replacement (:newEntity). These specializations model the aspect of the resource which is the subject of replacement, thus, _:newEntity was derived from _:oldEntity.

CONSTRUCT{
?document a prov:Entity.
?document2 a prov:Entity.
_:activity a prov:Activity, prov:Replace;
prov:used _:oldEntity.

# The “input”
_:oldEntity a prov:Entity;
prov:specializationOf ?document2;

# The “output”
_:newEntity a prov:Entity;
prov:specializationOf ?document;
prov:wasGeneratedBy _:activity;
prov:wasDerivedFrom _:oldEntity;
prov:alternateOf _:oldEntity.
} WHERE {
?document dcterms:replaces ?document2.
}

The term dcterms:isReplacedBy would produce a similar mapping, inverting the roles of document and document2.

2 Design Considerations

2.1 Desiderata

DCAT 2 already acknowledges that

  • Versioning can be applied to any of the first-class citizens DCAT resources including Catalogs, Datasets, Distributions.
  • The notion of version is very much related to the community practices, data management policy and the workflows in place. It is up to data providers to decide when and why a new version should be released. For this reason, DCAT refrains from providing definitions or rules about when changes in a resource should turn in a new release of it.
  • Versioning may be understood as involving relationships between datasets, which is supported by the dcat:qualifiedRelation and described in § 13.2 Relationships between datasets and other resources. The class dcat:Relationship supports providing information about the relationship and could be extended for versioning information.

Ideally, DCAT 3 might want to consider further desiderata, for example

  • Desiderata 1: DCAT should acknowledge and interoperate with vocabularies (DCTERMS, ADMS, PAV, PROV), which are already in use. Possibly providing a sort of lingua franca for versioning.
  • Desiderata 2: Both unqualified and qualify relations might be required for versioning. The idea could be unqualified relations provide a quick way to assert minimal facts about versioning, while qualified relation allows more extensive descriptions inducing n-ary relations.
  • Desiderata 3: DCAT 3 should provide feasible "mechanics" (possibly, the simplest implementable within SPARQL or OWL reasoning?!?) to answer to the following competency questions:
    • Is X a more updated version than Y?
    • What is the latest version of Y? Is X the most updated version of Y?
    • What is the difference between version X1 and X2?
      • ...

2.2 Design solutions to explore

We might want to discuss and build upon distinct technical solutions:

Solution A: To use PAV as it is. Though, there are many cons to discuss and overcome.

Solution B: Cherry-picking from different vocabularies. DCAT could mint terms from existing vocabularies and harmonize them. For example, we can created three equivalent PAV properties in DCAT (dcat:previousVersion, dcat:hasEarlierVersion, dcat:derivedFrom) and one dcat:versionNotes which mimics adms:versionNotes. This will solve the issue about terms from vocabularies that are not standard or stable enough, in case we want to include terms for versioning in the normative part of DCAT and it also might save the best from each approach.

Solution C: to use Prov properties (prov:wasRevisionOf, prov:wasDerivedFrom) which have a qualified counterpart as a starting point and complement them with light guidelines references about how to use the semantic versioning (e.g., by owl:versionInfo), diff of versions ( e.g., by adms:versionNotes), etc.

Anyway, in order to meet desiderata 1, we might consider providing mappings to relate the chosen DCAT solution with other highly adopted vocabularies.

For Issue #90 - Version definition - Not DCAT job as domain and community dependent already written in DCAT2. Perhaps some general suggestions might be provided relying on DWBP.

For Issue #92 - Version identifier - Depending on the vocabulary chosen we can adopt owl:versionInfo or pav:version, for expressing string in line with the semantic versioning.