From 2b7de8b408c0bb701b56ae94f98135e19bcf7102 Mon Sep 17 00:00:00 2001 From: Balduin Landolt <33053745+BalduinLandolt@users.noreply.github.com> Date: Mon, 4 Nov 2024 18:28:42 +0100 Subject: [PATCH 1/8] move old documentation for disambiguity --- docs/data/{datamodel.md => current-datamodel.md} | 2 +- docs/index.md | 2 +- mkdocs.yml | 6 +++--- 3 files changed, 5 insertions(+), 5 deletions(-) rename docs/data/{datamodel.md => current-datamodel.md} (99%) diff --git a/docs/data/datamodel.md b/docs/data/current-datamodel.md similarity index 99% rename from docs/data/datamodel.md rename to docs/data/current-datamodel.md index cd1c7c85..1372fb6a 100644 --- a/docs/data/datamodel.md +++ b/docs/data/current-datamodel.md @@ -1,4 +1,4 @@ -# Data Model +# Current Data Model All metadata are modelled according to the model as described in the following. diff --git a/docs/index.md b/docs/index.md index af6cb9f5..d5e22889 100644 --- a/docs/index.md +++ b/docs/index.md @@ -10,7 +10,7 @@ as well as the code of the [DSP Metadata Browser](https://meta.dasch.swiss). If you are interested in viewing the metadata in human-readable form, you can visit the [DSP Metadata Browser](https://meta.dasch.swiss). -If you are interested in re-using our metadata, you can find extensive documentation [here](data/datamodel.md). +If you are interested in re-using our metadata, you can find extensive documentation [here](data/current-datamodel.md). The metadata itself can be found [here](https://github.com/dasch-swiss/dsp-meta/tree/main/data/json) or requested over the API as described [here](data/api.md). diff --git a/mkdocs.yml b/mkdocs.yml index 27e3f62a..f5544ebf 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -4,7 +4,7 @@ nav: - DSP-META: index.md - Consuming Metadata: - Metadata API: data/api.md - - Data Model: data/datamodel.md + - Current Data Model: data/current-datamodel.md - Adding Metadata: adding-metadata.md - Code Documentation: - Overview: code/overview.md @@ -33,8 +33,8 @@ theme: name: Switch to light mode features: - search.suggest - - navigation.tabs - - navigation.sections + # - navigation.tabs + # - navigation.sections markdown_extensions: - admonition From 92ead29623a243c867a71879cdd126b89e9846b5 Mon Sep 17 00:00:00 2001 From: Balduin Landolt <33053745+BalduinLandolt@users.noreply.github.com> Date: Tue, 5 Nov 2024 16:56:51 +0100 Subject: [PATCH 2/8] set up new model based on the old one --- docs/data/provisional-datamodel.md | 601 +++++++++++++++++++++++++++++ mkdocs.yml | 1 + 2 files changed, 602 insertions(+) create mode 100644 docs/data/provisional-datamodel.md diff --git a/docs/data/provisional-datamodel.md b/docs/data/provisional-datamodel.md new file mode 100644 index 00000000..b5218120 --- /dev/null +++ b/docs/data/provisional-datamodel.md @@ -0,0 +1,601 @@ +# Provisional Data Model + +!!! warning + This document does _not_ represent the current state of the metadata model. + It is a working document for planned upcoming changes to the metadata model. + +!!! note + This model is an idealized version of the metadata model. + With the current implementation that is entirely separate from the DSP, + it is not feasible to implement metadata on the record level. + Such a system may be implemented in the archive in the future, + but for now, we will keep the metadata on the dataset level. + A separate, simplified model for applying some of these changes, + while remaining compatible with the current implementation, + should be created alongside this model. + +## Overview + +The metadata model is a hierarchical structure of metadata elements. + +```mermaid + +flowchart TD + hyper-project[Hyper-Project /
Uber-Project /
Meta-Project /
Compound Project] -->|1-n| project[Project /
Research Project] + project -->|1-n| dataset[Dataset] + dataset -->|1-n| record[Record /
Resource] + project -->|0-n| collection[Collection] + collection --> collection + hyper-project -->|0-n| collection + collection --> record +``` + +- A `Compound Project` is optional and collects one or more `Research Projects`. + It is typically of institutional nature, + not directly tied to a specific funding grant, + and may be long-lived. + Examples are EKWS/CAS, BEOL or LIMC. +- A `Research Project` is the main entity of the metadata model. + It corresponds to a `project` in the DSP. + It is typically tied to a specific funding grant, + and hence has a limited lifetime of ~3-5 years; + multiple funding rounds and a longer lifetime are possible. + A `Research Project` is part of 0-1 `Compound Project`, + it has 1-n `Datasets` and 0-n `Collections`. +- A `Dataset` is a collection of `Records` within a `Research Project`. + It is mostly meant for system-internal and technical use, + and should not have particular semantics or a "historical meaning" in the context of the project. + A `Dataset` is part of exactly 1 `Research Project` + and contains 1-n `Records`. +- A `Collection` is also a collection of `Records` within a `Research Project`. + It is meant for semantic grouping of `Records` within a `Research Project`, + and may have a "historical meaning" in the context of the project. + Examples may be physical collections such as p person's "Nachlass" in an archive, + or groupings of records based on a specific research question within a project. + A `Collection` is part of at least 1 `Research Project`, `Compound Project` or `Collection`, + but can be part of multiple. It may either contain 0-n `Collections` or 1-n `Records`. +- A `Record` is a single resource within a `Dataset`. + It represents a single entity, and the smallest unit that can meaningfully have an identifier. + It maps to a `knora-base:Resource` (DSP-API) or an `Asset` (SIPI/Ingest) in the DSP. + A `Record` is part of exactly 1 `Dataset` and may be part of 0-n `Collections`. + +Additionally, there are the entities `Person` and `Organization`: +`Person` and `Organization` are entities that are independent of the `Research Project` hierarchy, +and may be related to various entities within the hierarchy. + + +## Top Level + +A set of metadata consists of the following top-level elements: + +- Compound Project +- Project +- Dataset +- Collection +- Record +- Person +- Organization + +Each of these elements is an entity identified by a unique identifier. +Other elements can refer to these entities by their identifier. + +Any other metadata element may itself be a complex object, +but it is always part of one of the top-level elements. +Such elements do not have an identifier, +but are identified by their position in the hierarchy. + +| Field | Type | Cardinality | +| ----------------- | --------------- | ----------- | +| `$schema` | string | 0-1 | +| `compoundProject` | compoundProject | 0-1 | +| `project` | project | 1 | +| `datasets` | dataset[] | 1-n | +| `collections` | collection[] | 0-n | +| `records` | record[] | 0-n | +| `persons` | person[] | 0-n | +| `organizations` | organization[] | 0-n | + + +## Types + +### Entity Types + +#### Compound Project + +| Field | Type | Cardinality | Restrictions | Remarks | +| ------------------------ | ------------------- | ----------- | ------------------------------------------------------------ | ------------------ | +| `__type` | string | 1 | Literal 'CompoundProject' | | +| `name` | string | 1 | | | +| `url` | url | 1 | | | +| `howToCite` | string | 1 | | Needed? | +| `projects` | id[] | 1-n | String containing the identifier of a project | | +| `description` | lang_string | 0-1 | | Optional? | +| `contactPoint` | id | 0-1 | String containing the identifier of a person or organization | Optional? | +| `keywords` | lang_string[] | 0-n | | Needed? | +| `disciplines` | lang_string / url[] | 0-n | | Needed? | +| `temporalCoverage` | lang_string / url[] | 0-n | | Needed? | +| `spatialCoverage` | url[] | 0-n | | Needed? | +| `funders` | id[] | 0-n | String containing the identifier of a person | Needed? | +| `publications` | publication[] | 0-n | | Needed? | +| `grants` | grant[] | 0-n | | Needed? | +| `alternativeNames` | lang_string[] | 0-n | | Needed? | +| `consistingInstitutions` | id[] | 0-n | String containing the identifier of an organization | Makes sense? Name? | + +!!! question + This opens up the questions of how to deal with multiple projects in a compound project. + We probably want to keep one entry per project, + so this leaves us with either duplicating the compound project metadata for each project, + or having compound project metadata separately and only linking it from the project. + The latter seems preferable, + but then the question arises who gets to edit the compound project metadata. + For a first implementation, we could simply duplicate the metadata for each project, + and later factor it out. + +!!! important + The properties for `Compound Project` were invented by me on the fly. + That does not mean they are correct or useful. + + +#### Project + +| Field | Type | Cardinality | Restrictions | Remarks | +| -------------------- | ------------------- | ----------- | ------------------------------------------------------------ | --------------------- | +| `__type` | string | 1 | Literal "Project" | | +| `shortcode` | string | 1 | 4 char hexadecimal | | +| `status` | string | 1 | Literal "Ongoing" or "Finished" | | +| `name` | string | 1 | | | +| `description` | lang_string | 1 | | | +| `startDate` | date | 1 | String of format "YYYY-MM-DD" | | +| `teaserText` | string | 1 | | | +| `url` | url | 1 | | | +| `howToCite` | string | 1 | | | +| `datasets` | id[] | 1-n | String containing the identifier of a dataset | | +| `keywords` | lang_string[] | 1-n | | | +| `disciplines` | lang_string / url[] | 1-n | | | +| `temporalCoverage` | lang_string / url[] | 1-n | | | +| `spatialCoverage` | url[] | 1-n | | | +| `funders` | id[] | 1-n | String containing the identifier of a person or organization | Does this make sense? | +| `endDate` | date | 0-1 | String of format "YYYY-MM-DD" | | +| `secondaryURL` | url | 0-1 | | | +| `dataManagementPlan` | dmp | 0-1 | | | +| `contactPoint` | id | 0-1 | String containing the identifier of a person or organization | | +| `publications` | publication[] | 0-n | | | +| `grants` | grant[] | 0-n | | Does this make sense? | +| `alternativeNames` | lang_string[] | 0-n | | | + +!!! question + If we can have copyright/license on dataset level, + do we want to have it on project level as well? + +!!! question + Do we still need funders if we have grants? + +!!! question + What about projects that do not have funding? + + +#### Dataset + +| Field | Type | Cardinality | Restrictions | Remarks | +| ------------------- | ----------------- | ----------- | ------------------------------------------------------- | ----------------------------------- | +| `__id` | string | 1 | | | +| `__type` | string | 1 | Literal "Dataset" | | +| `title` | string | 1 | | | +| `accessConditions` | string | 1 | Literal "open", "restricted" or "closed" | change to proper terms | +| `howToCite` | string | 1 | | | +| `status` | string | 1 | Literal "In Planning", "Ongoing", "On hold", "Finished" | not aligned with project status | +| `abstract` | lang_string / url | 1-n | | naming: maybe 'description'? | +| `typeOfData` | string[] | 1-n | Literal "XML", "Text", "Image", "Video", "Audio" | does this still make sense? | +| `licenses` | license[] | 1-n | | should be computed from the records | +| `copyright` | string[] | 1-n | | computed along with license | +| `languages` | lang_string[] | 1-n | | does this make sense? | +| `attributions` | attribution[] | 1-n | | can this be calculated? | +| `datePublished` | date | 0-1 | | | +| `dateCreated` | date | 0-1 | | | +| `dateModified` | date | 0-1 | | | +| `distribution` | url | 0-1 | | does this make sense? | +| `alternativeTitles` | lang_string[] | 0-n | | | +| `urls` | url[] | 0-n | | | +| `additional` | lang_string / url | 0-n | | | + +!!! question + Do we conssider datasets something merely "internal"? + If so, do metadata on datasets even make sense at all? Should we even "expose" datasets publicly? + +!!! question + Do we need to store the license on the dataset level, + or can we compute it from the records? + If we store it on the dataset level, + how do we deal with datasets that contain records with different licenses? + +!!! question + Do we need to store the language on the dataset level, + or can we compute it from the records? + If we store it on the dataset level, + how do we deal with datasets that contain records in different languages? + +!!! question + Do we need to store the attribution on the dataset level, + or can we compute it from the records? + If we store it on the dataset level, + how do we deal with datasets that contain records with different attributions? + +!!! question + Do we need a reference to the records in the dataset? + + +#### Collection + +| Field | Type | Cardinality | Restrictions | Remarks | +| ------------------ | ----------------- | ----------- | ------------------------------------------------ | -------------------------------------------------------- | +| `__id` | string | 1 | | | +| `__type` | string | 1 | Literal 'Collection' | | +| `name` | string | 1 | | | +| `accessConditions` | string | 1 | Literal "open", "restricted" or "closed" | copied from dataset; change to proper terms | +| `provenance` | string | 0-1 | | | +| `datePublished` | date | 0-1 | | copied from dataset; do we still need those? | +| `dateCreated` | date | 0-1 | | copied from dataset; do we still need those? | +| `dateModified` | date | 0-1 | | copied from dataset; do we still need those? | +| `distribution` | url | 0-1 | | copied from dataset; does this make sense? | +| `records` | id[] | 0-n | Record IDs | can be 0 in case it points to a collection | +| `collections` | id[] | 0-n | Collection IDs | | +| `alternativeNames` | lang_string[] | 0-n | | | +| `keywords` | lang_string[] | 0-n | | does this make sense? | +| `urls` | url[] | 0-n | | copied from dataset; | +| `additional` | lang_string / url | 0-n | | copied from dataset; | +| `description` | string / url | 1-n | | | +| `typeOfData` | string[] | 1-n | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; does this still make sense? | +| `licenses` | license[] | 1-n | | copied from dataset; should be computed from the records | +| `copyright` | string[] | 1-n | | computed along with license | +| `languages` | lang_string[] | 1-n | | copied from dataset; does this make sense? | +| `attributions` | attribution[] | 1-n | | copied from dataset; can this be calculated? | + + +!!! important + The properties for `Compound Project` were invented by me on the fly. + That does not mean they are correct or useful. + + +!!! question + Do we need a reference to the records in the collection? + + +#### Record + +| Field | Type | Cardinality | Restrictions | Remarks | +| ------------------ | ----------- | ----------- | ------------------------------------------------ | -------------------------------------------------------- | +| `__id` | string | 1 | | | +| `__type` | string | 1 | Literal 'Record' | | +| `pid` | id | 1 | | or `ARK`? | +| `label` | lang_string | 1 | | do we want this, or does it go too far? | +| `accessConditions` | string | 1 | Literal "open", "restricted" or "closed" | copied from dataset; change to proper terms | +| `license` | license | 1 | | copied from dataset; should be computed from the records | +| `copyright` | string | 1 | | computed along with license | +| `attribution` | attribution | 1 | | do we want this, or does it go too far? | +| `provenance` | string | 0-1 | | do we want this, or does it go too far? | +| `datePublished` | date | 0-1 | | copied from dataset; do they make sense? | +| `dateCreated` | date | 0-1 | | copied from dataset; do they make sense? | +| `dateModified` | date | 0-1 | | copied from dataset; do they make sense? | +| `typeOfData` | string | 0-1 | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; wanted? what values? | + +!!! important + The properties for `Record` were invented by me on the fly. + That does not mean they are correct or useful. + +!!! question + How granular do we want to be with the metadata on the record level? + +!!! question + If we have copyright, what is the purpose of attribution? + + +#### Person + +| Field | Type | Cardinality | Restrictions | Remarks | +| ---------------- | -------- | ----------- | -------------------------------------- | ------- | +| `__id` | string | 1 | | | +| `__type` | string | 1 | Literal 'Person' | | +| `givenNames` | string[] | 1-n | | | +| `familyNames` | string[] | 1-n | | | +| `jobTitles` | string[] | 0-n | | | +| `affiliations` | id[] | 0-n | Organization IDs | | +| `address` | address | 0-1 | | | +| `email` | string | 0-1 | | | +| `secondaryEmail` | string | 0-1 | | | +| `authorityRefs` | url[] | 0-n | References to external authority files | | + + +#### Organization + +| Field | Type | Cardinality | Restrictions | Remarks | +| ----------------- | ----------- | ----------- | -------------------------------------- | ------- | +| `__id` | string | 1 | | | +| `__type` | string | 1 | Literal 'Organization' | | +| `name` | string | 1 | | | +| `url` | url | 1 | | | +| `address` | address | 0-1 | | | +| `email` | string | 0-1 | | | +| `alternativeName` | lang_string | 0-1 | | | +| `authorityRefs` | url[] | 0-n | References to external authority files | | + + +### Value Types + +#### String with Language Tag (`lang_string`) + +Object with an ISO language code as key and a string as value. + +```json +{ + "en": "Lorem ipsum in English.", + "de": "Lorem ipsum auf Deutsch." +} +``` + + +#### Date + +String with the format `YYYY-MM-DD`. + + +#### URL + +An object representing a URL. +Depending on the `type` field, +the URL may be a generic URL +or a more specific link, like a PID +or a reference to a resource in an external authority file. + + +| Field | Type | Cardinality | Restrictions | +| -------- | ------ | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------- | +| `__type` | string | 1 | Literal 'URL' | +| `type` | string | 1 | Literal 'URL', 'Geonames', 'Pleiades', 'Skos', 'Periodo', 'Chronontology', 'GND', 'VIAF', 'Grid', 'ORCID', 'Creative Commons', 'DOI', 'ARK' | +| `url` | string | 1 | | +| `text` | string | 0-1 | | + +!!! question + can we model different types of URLs in a more sensible way? + + +#### Data Management Plan (`dmp`) + +| Field | Type | Cardinality | Restrictions | +| ----------- | ------- | ----------- | ---------------------------- | +| `__type` | string | 1 | Literal 'DataManagementPlan' | +| `available` | boolean | 0-1 | | +| `url` | url | 0-1 | | + + +!!! question + Does the model for `Data Management Plan` still make sense? + Could it be a string? + Is "available" useful information? + How do we ensure that either `available` or `url` is set? + + +#### Publication + +| Field | Type | Cardinality | Restrictions | +| ------ | ------ | ----------- | ------------ | +| `text` | string | 1 | | +| `url` | url | 0-1 | | + + +#### Address + +| Field | Type | Cardinality | Restrictions | +| ------------ | ------ | ----------- | ----------------- | +| `__type` | string | 1 | Literal 'Address' | +| `street` | string | 1 | | +| `postalCode` | string | 1 | | +| `locality` | string | 1 | | +| `country` | string | 1 | | +| `canton` | string | 0-1 | | +| `additional` | string | 0-1 | | + + +#### License + +| Field | Type | Cardinality | Restrictions | +| --------- | ------ | ----------- | ----------------- | +| `__type` | string | 1 | Literal 'License' | +| `license` | url | 1 | | +| `date` | date | 1 | | +| `details` | string | 0-1 | | + +!!! question + Is this model up to date with our current understanding of licenses? + Is `details` ever used? + What is the purpose of `date` here? + How does it relate to a copyright statement? + + +#### Attribution + +| Field | Type | Cardinality | Restrictions | Remark | +| -------- | ------ | ----------- | ------------------------- | --------------------------- | +| `__type` | string | 1 | Literal 'Attribution' | | +| `agent` | id | 1 | Person or Organization ID | Or can this only be person? | +| `roles` | string | 1-n | | | + + +#### Grant + +| Field | Type | Cardinality | Restrictions | +| --------- | ------ | ----------- | -------------------------- | +| `__type` | string | 1 | Literal 'Grant' | +| `funders` | id[] | 1-n | Person or Organization IDs | +| `number` | string | 0-1 | | +| `name` | string | 0-1 | | +| `url` | url | 0-1 | | + + +## Entity-Relationship Diagram + +```mermaid +erDiagram + compoundProject |o--|{ project : projects + project ||--|{ dataset : datasets + project ||--|| person : contactPoint + project ||--|| organization : contactPoint + project ||--|{ person : funders + project ||--|{ organization : funders + project |o--|{ collection : collections + dataset ||--|{ record : records + collection |o--o{ collection : collections + collection |o--o{ record : records + person ||--|{ organization : affiliations + + compoundProject { + string __type "1; Literal 'CompoundProject'" + string name "1" + url url "1" + string howToCite "1" + lang_string description "0-1" + id contactPoint "0-1" + id[] projects "1-n; Project IDs" + lang_string[] keywords "0-n" + lang_string_or_url[] disciplines "0-n" + lang_string_or_url[] temporalCoverage "0-n" + url[] spatialCoverage "0-n" + id[] funders "0-n; Person or Organization IDs" + publication[] publications "0-n" + grant[] grants "0-n" + lang_string[] alternativeNames "0-n" + id[] consistingInstitutions "0-n; Organization IDs" + } + + project { + string __type "1; Literal 'Project'" + string shortcode "1" + string status "1; Literal 'Ongoing' or 'Finished'" + string name "1" + lang_string description "1" + date startDate "1" + string teaserText "1" + url url "1" + string howToCite "1" + id[] datasets "1-n; Dataset IDs" + lang_string[] keywords "1-n" + lang_string_or_url[] disciplines "1-n" + lang_string_or_url[] temporalCoverage "1-n" + url[] spatialCoverage "1-n" + id[] funders "1-n; Person or Organization IDs" + date endDate "0-1" + url secondaryURL "0-1" + dmp dataManagementPlan "0-1" + id contactPoint "0-1" + publication[] publications "0-n" + grant[] grants "0-n" + lang_string[] alternativeNames "0-n" + } + + dataset { + string __id "1" + string __type "1; Literal 'Dataset'" + string title "1" + string accessConditions "1; Literal 'open', 'restricted' or 'closed'" + string howToCite "1" + string status "1; Literal 'In Planning', 'Ongoing', 'On hold', 'Finished'" + lang_string_or_url[] abstract "1-n" + string[] typeOfData "1-n; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'" + license[] licenses "1-n" + string[] copyright "1-n" + lang_string[] languages "1-n" + attribution[] attributions "1-n" + date datePublished "0-1" + date dateCreated "0-1" + date dateModified "0-1" + url distribution "0-1" + lang_string[] alternativeTitles "0-n" + url[] urls "0-n" + lang_string_or_url[] additional "0-n" + } + + collection { + string __id "1" + string __type "1; Literal 'Collection'" + string name "1" + string accessConditions "1; Literal 'open', 'restricted' or 'closed'" + string provenance "0-1" + date datePublished "0-1" + date dateCreated "0-1" + date dateModified "0-1" + url distribution "0-1" + id[] records "0-n; Record IDs" + id[] collections "0-n; Collection IDs" + lang_string[] alternativeNames "0-n" + lang_string[] keywords "0-n" + url[] urls "0-n" + lang_string_or_url[] additional "0-n" + lang_string_or_url[] description "1-n" + string[] typeOfData "1-n; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'" + license[] licenses "1-n" + string[] copyright "1-n" + lang_string[] languages "1-n" + attribution[] attributions "1-n" + } + + record { + string __id "1" + string __type "1; Literal 'Record'" + string pid "1" + lang_string label "1" + string accessConditions "1; Literal 'open', 'restricted' or 'closed'" + license license "1" + string copyright "1" + attribution attribution "1" + string provenance "0-1" + date datePublished "0-1" + date dateCreated "0-1" + date dateModified "0-1" + string typeOfData "0-1; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'" + } + + person { + string __id "1" + string __type "1; Literal 'Person'" + string[] givenNames "1-n" + string[] familyNames "1-n" + string[] jobTitles "0-n" + id[] affiliations "0-n; Organization IDs" + address address "0-1" + string email "0-1" + string secondaryEmail "0-1" + url[] authorityRefs "0-n" + } + + organization { + string __id "1" + string __type "1; Literal 'Organization'" + string name "1" + url url "1" + address address "0-1" + string email "0-1" + lang_string alternativeName "0-1" + url[] authorityRefs "0-n" + } +``` + + + +## Change Log + +### Changes + +- Make `Grant` a value type and remove it from the top level. +- Added entity `compoundProject` to the top level. +- Added entity `collection` to the top level. +- Added entity `record` to the top level. +- Added `copyright` to `dataset`. + +### Implementation/migration Notes + +- inline grant in project +- add/remove entities and properties accordingly + + +### Mapping Old -> New + +TODO: Add mapping from old to new model. diff --git a/mkdocs.yml b/mkdocs.yml index f5544ebf..f8ced1ff 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -5,6 +5,7 @@ nav: - Consuming Metadata: - Metadata API: data/api.md - Current Data Model: data/current-datamodel.md + - Provisional Data Model: data/provisional-datamodel.md - Adding Metadata: adding-metadata.md - Code Documentation: - Overview: code/overview.md From 888e444767f45c5be5e22a8d027278c6b52fb0e9 Mon Sep 17 00:00:00 2001 From: Balduin Landolt <33053745+BalduinLandolt@users.noreply.github.com> Date: Tue, 5 Nov 2024 17:10:36 +0100 Subject: [PATCH 3/8] remove broken markdown linting rule --- .markdownlint.yml | 8 +++++--- docs/data/provisional-datamodel.md | 2 +- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/.markdownlint.yml b/.markdownlint.yml index a4c058ac..c467d2fe 100644 --- a/.markdownlint.yml +++ b/.markdownlint.yml @@ -1,7 +1,7 @@ # Config file for https://github.com/igorshubovych/markdownlint-cli # MD007/ul-indent - Unordered list indentation -MD007: +MD007: # Whether to indent the first level of the list start_indented: false # By how many spaces every next level must be indented. The default of 2 is not compatible with mkdocs! @@ -14,7 +14,7 @@ MD009: false MD012: false # MD013/line-length - Line length -MD013: +MD013: line_length: 120 heading_line_length: 120 code_block_line_length: 120 @@ -30,8 +30,10 @@ MD013: # Stern length checking stern: false +MD018: false + # MD033/no-inline-html - Inline HTML -MD033: +MD033: allowed_elements: [br, center] # MD041/first-line-heading/first-line-h1 - First line in a file should be a top-level heading diff --git a/docs/data/provisional-datamodel.md b/docs/data/provisional-datamodel.md index b5218120..cb8ff6ec 100644 --- a/docs/data/provisional-datamodel.md +++ b/docs/data/provisional-datamodel.md @@ -596,6 +596,6 @@ erDiagram - add/remove entities and properties accordingly -### Mapping Old -> New +### Mapping Old -> New TODO: Add mapping from old to new model. From 4b57f6a986213cb0cc1e1e4aab3eb3bdf8a33b65 Mon Sep 17 00:00:00 2001 From: Balduin Landolt <33053745+BalduinLandolt@users.noreply.github.com> Date: Tue, 5 Nov 2024 17:12:14 +0100 Subject: [PATCH 4/8] maybe fix linting issue? --- .markdownlint.yml | 2 -- docs/data/provisional-datamodel.md | 2 +- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/.markdownlint.yml b/.markdownlint.yml index c467d2fe..8960ebe8 100644 --- a/.markdownlint.yml +++ b/.markdownlint.yml @@ -30,8 +30,6 @@ MD013: # Stern length checking stern: false -MD018: false - # MD033/no-inline-html - Inline HTML MD033: allowed_elements: [br, center] diff --git a/docs/data/provisional-datamodel.md b/docs/data/provisional-datamodel.md index cb8ff6ec..880d558c 100644 --- a/docs/data/provisional-datamodel.md +++ b/docs/data/provisional-datamodel.md @@ -596,6 +596,6 @@ erDiagram - add/remove entities and properties accordingly -### Mapping Old -> New +### Mapping (Old to New) TODO: Add mapping from old to new model. From 7bce4ce356bf20335ca0b0ec3edd85d5752020a8 Mon Sep 17 00:00:00 2001 From: Balduin Landolt <33053745+BalduinLandolt@users.noreply.github.com> Date: Tue, 5 Nov 2024 17:27:51 +0100 Subject: [PATCH 5/8] Update provisional-datamodel.md --- docs/data/provisional-datamodel.md | 134 ++++++++++++++++++++++++++++- 1 file changed, 133 insertions(+), 1 deletion(-) diff --git a/docs/data/provisional-datamodel.md b/docs/data/provisional-datamodel.md index 880d558c..c65e51be 100644 --- a/docs/data/provisional-datamodel.md +++ b/docs/data/provisional-datamodel.md @@ -104,6 +104,7 @@ but are identified by their position in the hierarchy. | Field | Type | Cardinality | Restrictions | Remarks | | ------------------------ | ------------------- | ----------- | ------------------------------------------------------------ | ------------------ | +| `__id` | string | 1 | | | | `__type` | string | 1 | Literal 'CompoundProject' | | | `name` | string | 1 | | | | `url` | url | 1 | | | @@ -448,6 +449,7 @@ erDiagram person ||--|{ organization : affiliations compoundProject { + string __id "1" string __type "1; Literal 'CompoundProject'" string name "1" url url "1" @@ -598,4 +600,134 @@ erDiagram ### Mapping (Old to New) -TODO: Add mapping from old to new model. +#### Compound Project + +- `compoundProject.__id` : new +- `compoundProject.__type` : new +- `compoundProject.name`: new +- `compoundProject.url`: new +- `compoundProject.howToCite`: new +- `compoundProject.description`: new +- `compoundProject.contactPoint`: new +- `compoundProject.keywords`: new +- `compoundProject.disciplines`: new +- `compoundProject.temporalCoverage`: new +- `compoundProject.spatialCoverage`: new +- `compoundProject.funders`: new +- `compoundProject.publications`: new +- `compoundProject.grants`: new +- `compoundProject.alternativeNames`: new +- `compoundProject.consistingInstitutions`: new + +This entity is new and does not have a direct mapping from the old model. +All values need to be defined and added manually. + +#### Project + +- `project.__type`: unchanged +- `project.shortcode`: unchanged +- `project.status`: unchanged +- `project.name`: unchanged +- `project.description`: unchanged +- `project.startDate`: unchanged +- `project.teaserText`: unchanged +- `project.url`: unchanged +- `project.howToCite`: unchanged +- `project.datasets`: unchanged +- `project.keywords`: unchanged +- `project.disciplines`: unchanged +- `project.temporalCoverage`: unchanged +- `project.spatialCoverage`: unchanged +- `project.funders`: unchanged +- `project.endDate`: unchanged +- `project.secondaryURL`: unchanged +- `project.dataManagementPlan`: unchanged +- `project.contactPoint`: unchanged +- `project.publications`: unchanged +- `project.grants`: inlined from top level to project +- `project.alternativeNames`: unchanged + +#### Dataset + +- `dataset.__id`: unchanged +- `dataset.__type`: unchanged +- `dataset.title`: unchanged +- `dataset.accessConditions`: unchanged +- `dataset.howToCite`: unchanged +- `dataset.status`: unchanged +- `dataset.abstract`: unchanged +- `dataset.typeOfData`: unchanged +- `dataset.licenses`: unchanged +- `dataset.copyright`: newly added +- `dataset.languages`: unchanged +- `dataset.attributions`: unchanged +- `dataset.datePublished`: unchanged +- `dataset.dateCreated`: unchanged +- `dataset.dateModified`: unchanged +- `dataset.distribution`: unchanged +- `dataset.alternativeTitles`: unchanged +- `dataset.urls`: unchanged +- `dataset.additional`: unchanged + +#### Collection + +- `collection.__id`: new +- `collection.__type`: new +- `collection.name`: new +- `collection.accessConditions`: new +- `collection.provenance`: new +- `collection.datePublished`: new +- `collection.dateCreated`: new +- `collection.dateModified`: new +- `collection.distribution`: new +- `collection.records`: new +- `collection.collections`: new +- `collection.alternativeNames`: new +- `collection.keywords`: new +- `collection.urls`: new +- `collection.additional`: new +- `collection.description`: new +- `collection.typeOfData`: new +- `collection.licenses`: new +- `collection.copyright`: new +- `collection.languages`: new +- `collection.attributions`: new + +#### Record + +- `record.__id`: new +- `record.__type`: new +- `record.pid`: new +- `record.label`: new +- `record.accessConditions`: new +- `record.license`: new +- `record.attribution`: new +- `record.provenance`: new +- `record.datePublished`: new +- `record.dateCreated`: new +- `record.dateModified`: new +- `record.typeOfData`: new + +#### Person + +- `person.__id`: unchanged +- `person.__type`: unchanged +- `person.givenNames`: unchanged +- `person.familyNames`: unchanged +- `person.jobTitles`: unchanged +- `person.affiliations`: unchanged +- `person.address`: unchanged +- `person.email`: unchanged +- `person.secondaryEmail`: unchanged +- `person.authorityRefs`: unchanged + +#### Organization + +- `organization.__id`: unchanged +- `organization.__type`: unchanged +- `organization.name`: unchanged +- `organization.url`: unchanged +- `organization.address`: unchanged +- `organization.email`: unchanged +- `organization.alternativeName`: unchanged +- `organization.authorityRefs`: unchanged From 88323d1a20dcd9b59e80d820a72034161538d06e Mon Sep 17 00:00:00 2001 From: Balduin Landolt <33053745+BalduinLandolt@users.noreply.github.com> Date: Tue, 5 Nov 2024 17:28:19 +0100 Subject: [PATCH 6/8] revert unrelated changes --- .markdownlint.yml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/.markdownlint.yml b/.markdownlint.yml index 8960ebe8..a4c058ac 100644 --- a/.markdownlint.yml +++ b/.markdownlint.yml @@ -1,7 +1,7 @@ # Config file for https://github.com/igorshubovych/markdownlint-cli # MD007/ul-indent - Unordered list indentation -MD007: +MD007: # Whether to indent the first level of the list start_indented: false # By how many spaces every next level must be indented. The default of 2 is not compatible with mkdocs! @@ -14,7 +14,7 @@ MD009: false MD012: false # MD013/line-length - Line length -MD013: +MD013: line_length: 120 heading_line_length: 120 code_block_line_length: 120 @@ -31,7 +31,7 @@ MD013: stern: false # MD033/no-inline-html - Inline HTML -MD033: +MD033: allowed_elements: [br, center] # MD041/first-line-heading/first-line-h1 - First line in a file should be a top-level heading From 3ae0b5c19c9630e7d4012f20d364f314fe6ebea8 Mon Sep 17 00:00:00 2001 From: Balduin Landolt <33053745+BalduinLandolt@users.noreply.github.com> Date: Tue, 5 Nov 2024 18:11:24 +0100 Subject: [PATCH 7/8] Update .markdownlint.yml --- .markdownlint.yml | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/.markdownlint.yml b/.markdownlint.yml index a4c058ac..3a503fca 100644 --- a/.markdownlint.yml +++ b/.markdownlint.yml @@ -1,7 +1,7 @@ # Config file for https://github.com/igorshubovych/markdownlint-cli # MD007/ul-indent - Unordered list indentation -MD007: +MD007: # Whether to indent the first level of the list start_indented: false # By how many spaces every next level must be indented. The default of 2 is not compatible with mkdocs! @@ -14,7 +14,7 @@ MD009: false MD012: false # MD013/line-length - Line length -MD013: +MD013: line_length: 120 heading_line_length: 120 code_block_line_length: 120 @@ -30,8 +30,11 @@ MD013: # Stern length checking stern: false +MD024: + siblings_only: true + # MD033/no-inline-html - Inline HTML -MD033: +MD033: allowed_elements: [br, center] # MD041/first-line-heading/first-line-h1 - First line in a file should be a top-level heading From 4973ce74cae675a5c6116d2d375b7c26b27c9674 Mon Sep 17 00:00:00 2001 From: Balduin Landolt <33053745+BalduinLandolt@users.noreply.github.com> Date: Wed, 6 Nov 2024 18:17:47 +0100 Subject: [PATCH 8/8] changes according to discussion --- docs/data/provisional-datamodel.md | 408 ++++++++++------------------- 1 file changed, 137 insertions(+), 271 deletions(-) diff --git a/docs/data/provisional-datamodel.md b/docs/data/provisional-datamodel.md index c65e51be..ca104cb6 100644 --- a/docs/data/provisional-datamodel.md +++ b/docs/data/provisional-datamodel.md @@ -21,7 +21,7 @@ The metadata model is a hierarchical structure of metadata elements. ```mermaid flowchart TD - hyper-project[Hyper-Project /
Uber-Project /
Meta-Project /
Compound Project] -->|1-n| project[Project /
Research Project] + hyper-project[Umbrella Project] -->|1-n| project[Research Project] project -->|1-n| dataset[Dataset] dataset -->|1-n| record[Record /
Resource] project -->|0-n| collection[Collection] @@ -30,7 +30,7 @@ flowchart TD collection --> record ``` -- A `Compound Project` is optional and collects one or more `Research Projects`. +- A `Umbrella Project` is optional and collects one or more `Research Projects`. It is typically of institutional nature, not directly tied to a specific funding grant, and may be long-lived. @@ -40,7 +40,7 @@ flowchart TD It is typically tied to a specific funding grant, and hence has a limited lifetime of ~3-5 years; multiple funding rounds and a longer lifetime are possible. - A `Research Project` is part of 0-1 `Compound Project`, + A `Research Project` is part of 0-1 `Umbrella Project`, it has 1-n `Datasets` and 0-n `Collections`. - A `Dataset` is a collection of `Records` within a `Research Project`. It is mostly meant for system-internal and technical use, @@ -52,7 +52,7 @@ flowchart TD and may have a "historical meaning" in the context of the project. Examples may be physical collections such as p person's "Nachlass" in an archive, or groupings of records based on a specific research question within a project. - A `Collection` is part of at least 1 `Research Project`, `Compound Project` or `Collection`, + A `Collection` is part of at least 1 `Research Project`, `Umbrella Project` or `Collection`, but can be part of multiple. It may either contain 0-n `Collections` or 1-n `Records`. - A `Record` is a single resource within a `Dataset`. It represents a single entity, and the smallest unit that can meaningfully have an identifier. @@ -68,7 +68,7 @@ and may be related to various entities within the hierarchy. A set of metadata consists of the following top-level elements: -- Compound Project +- Umbrella Project - Project - Dataset - Collection @@ -87,7 +87,7 @@ but are identified by their position in the hierarchy. | Field | Type | Cardinality | | ----------------- | --------------- | ----------- | | `$schema` | string | 0-1 | -| `compoundProject` | compoundProject | 0-1 | +| `umbrellaProject` | umbrellaProject | 0-1 | | `project` | project | 1 | | `datasets` | dataset[] | 1-n | | `collections` | collection[] | 0-n | @@ -96,77 +96,89 @@ but are identified by their position in the hierarchy. | `organizations` | organization[] | 0-n | +!!! question + Do we consider "permissions" as metadata? + (Not as they are in the DSP, but as they will be in the archive; + that is: "open", "restricted", "embargo", "metadata only".) + If so, this should be added on each level, I suppose. + + ## Types ### Entity Types -#### Compound Project - -| Field | Type | Cardinality | Restrictions | Remarks | -| ------------------------ | ------------------- | ----------- | ------------------------------------------------------------ | ------------------ | -| `__id` | string | 1 | | | -| `__type` | string | 1 | Literal 'CompoundProject' | | -| `name` | string | 1 | | | -| `url` | url | 1 | | | -| `howToCite` | string | 1 | | Needed? | -| `projects` | id[] | 1-n | String containing the identifier of a project | | -| `description` | lang_string | 0-1 | | Optional? | -| `contactPoint` | id | 0-1 | String containing the identifier of a person or organization | Optional? | -| `keywords` | lang_string[] | 0-n | | Needed? | -| `disciplines` | lang_string / url[] | 0-n | | Needed? | -| `temporalCoverage` | lang_string / url[] | 0-n | | Needed? | -| `spatialCoverage` | url[] | 0-n | | Needed? | -| `funders` | id[] | 0-n | String containing the identifier of a person | Needed? | -| `publications` | publication[] | 0-n | | Needed? | -| `grants` | grant[] | 0-n | | Needed? | -| `alternativeNames` | lang_string[] | 0-n | | Needed? | -| `consistingInstitutions` | id[] | 0-n | String containing the identifier of an organization | Makes sense? Name? | +#### Unbrella Project + +| Field | Type | Card. | Restrictions | +| ---------------------- | ------------- | ----- | ------------------------------------------------------------ | +| `__id` | string | 1 | | +| `__type` | string | 1 | Literal 'UmbrellaProject' | +| `name` | string | 1 | | +| `projects` | id[] | 1-n | String containing the identifier of a project | +| `description` | lang_string | 0-1 | | +| `alternativeNames` | lang_string[] | 0-n | | +| `url` | url | 0-1 | | +| `contactPoint` | id | 0-1 | String containing the identifier of a person or organization | +| `institutionalPartner` | id[] | 0-n | String containing the identifier of an organization | !!! question - This opens up the questions of how to deal with multiple projects in a compound project. + This opens up the questions of how to deal with multiple projects in a umbrella project. We probably want to keep one entry per project, - so this leaves us with either duplicating the compound project metadata for each project, - or having compound project metadata separately and only linking it from the project. + so this leaves us with either duplicating the umbrella project metadata for each project, + or having umbrella project metadata separately and only linking it from the project. The latter seems preferable, - but then the question arises who gets to edit the compound project metadata. + but then the question arises who gets to edit the umbrella project metadata. For a first implementation, we could simply duplicate the metadata for each project, and later factor it out. -!!! important - The properties for `Compound Project` were invented by me on the fly. - That does not mean they are correct or useful. +!!! question + what is the best name for `institutionalPartner`? + AI suggested: + - Affiliated Institution + - Associated Body + - Supporting Organization + - Institutional Partner + +!!! question + How do we capture the time aspect of the data provenance and genesis in this context? Should this be here? + Concretely, an umbrella project is often like a "timeline" of projects, or the "history" of a series of projects. + +To make the model of this entity as flexible as possible, +most of the fields are optional. #### Project -| Field | Type | Cardinality | Restrictions | Remarks | -| -------------------- | ------------------- | ----------- | ------------------------------------------------------------ | --------------------- | -| `__type` | string | 1 | Literal "Project" | | -| `shortcode` | string | 1 | 4 char hexadecimal | | -| `status` | string | 1 | Literal "Ongoing" or "Finished" | | -| `name` | string | 1 | | | -| `description` | lang_string | 1 | | | -| `startDate` | date | 1 | String of format "YYYY-MM-DD" | | -| `teaserText` | string | 1 | | | -| `url` | url | 1 | | | -| `howToCite` | string | 1 | | | -| `datasets` | id[] | 1-n | String containing the identifier of a dataset | | -| `keywords` | lang_string[] | 1-n | | | -| `disciplines` | lang_string / url[] | 1-n | | | -| `temporalCoverage` | lang_string / url[] | 1-n | | | -| `spatialCoverage` | url[] | 1-n | | | -| `funders` | id[] | 1-n | String containing the identifier of a person or organization | Does this make sense? | -| `endDate` | date | 0-1 | String of format "YYYY-MM-DD" | | -| `secondaryURL` | url | 0-1 | | | -| `dataManagementPlan` | dmp | 0-1 | | | -| `contactPoint` | id | 0-1 | String containing the identifier of a person or organization | | -| `publications` | publication[] | 0-n | | | -| `grants` | grant[] | 0-n | | Does this make sense? | -| `alternativeNames` | lang_string[] | 0-n | | | +| Field | Type | Cardinality | Restrictions | +| -------------------- | ------------------- | ----------- | ------------------------------------------------------------ | +| `__type` | string | 1 | Literal "Project" | +| `shortcode` | string | 1 | 4 char hexadecimal | +| `status` | string | 1 | Literal "Ongoing" or "Finished" | +| `name` | string | 1 | | +| `description` | lang_string | 1 | | +| `startDate` | date | 1 | String of format "YYYY-MM-DD" | +| `teaserText` | string | 1 | | +| `url` | url | 1 | | +| `howToCite` | string | 1 | | +| `datasets` | id[] | 1-n | String containing the identifier of a dataset | +| `keywords` | lang_string[] | 1-n | | +| `disciplines` | lang_string / url[] | 1-n | | +| `temporalCoverage` | lang_string / url[] | 1-n | | +| `spatialCoverage` | url[] | 1-n | | +| `funders` | id[] | 1-n | String containing the identifier of a person or organization | +| `attributions` | attribution[] | 1-n | | +| `endDate` | date | 0-1 | String of format "YYYY-MM-DD" | +| `secondaryURL` | url | 0-1 | | +| `dataManagementPlan` | dmp | 0-1 | | +| `contactPoint` | id | 0-1 | String containing the identifier of a person or organization | +| `publications` | publication[] | 0-n | | +| `grants` | grant[] | 0-n | | +| `alternativeNames` | lang_string[] | 0-n | | !!! question If we can have copyright/license on dataset level, - do we want to have it on project level as well? + do we want to have it on project level as well? + In any case, it should be computed from the datasets/records. !!! question Do we still need funders if we have grants? @@ -174,34 +186,33 @@ but are identified by their position in the hierarchy. !!! question What about projects that do not have funding? +!!! question + Do we want my proposed `attributions` field n project? + +!!! question + Should we have an `abstract` field in the project, like we used to have in the dataset? + #### Dataset -| Field | Type | Cardinality | Restrictions | Remarks | -| ------------------- | ----------------- | ----------- | ------------------------------------------------------- | ----------------------------------- | -| `__id` | string | 1 | | | -| `__type` | string | 1 | Literal "Dataset" | | -| `title` | string | 1 | | | -| `accessConditions` | string | 1 | Literal "open", "restricted" or "closed" | change to proper terms | -| `howToCite` | string | 1 | | | -| `status` | string | 1 | Literal "In Planning", "Ongoing", "On hold", "Finished" | not aligned with project status | -| `abstract` | lang_string / url | 1-n | | naming: maybe 'description'? | -| `typeOfData` | string[] | 1-n | Literal "XML", "Text", "Image", "Video", "Audio" | does this still make sense? | -| `licenses` | license[] | 1-n | | should be computed from the records | -| `copyright` | string[] | 1-n | | computed along with license | -| `languages` | lang_string[] | 1-n | | does this make sense? | -| `attributions` | attribution[] | 1-n | | can this be calculated? | -| `datePublished` | date | 0-1 | | | -| `dateCreated` | date | 0-1 | | | -| `dateModified` | date | 0-1 | | | -| `distribution` | url | 0-1 | | does this make sense? | -| `alternativeTitles` | lang_string[] | 0-n | | | -| `urls` | url[] | 0-n | | | -| `additional` | lang_string / url | 0-n | | | +| Field | Type | Cardinality | Restrictions | Remarks | +| -------------- | ------------- | ----------- | ------------------------------------------------ | ------------------------------------------------------- | +| `__id` | string | 1 | | | +| `__type` | string | 1 | Literal "Dataset" | | +| `title` | string | 1 | | may be auto-generated? | +| `typeOfData` | string[] | 1-n | Literal "XML", "Text", "Image", "Video", "Audio" | does this still make sense? should it be cardinality 1? | +| `licenses` | license[] | 1-n | | should be computed from the records | +| `copyright` | string[] | 1-n | | computed along with license | +| `attributions` | attribution[] | 1-n | | can this be computed? | +| `howToCite` | string | 0-1 | | still wanted? | +| `description` | lang_string | 0-1 | | | +| `dateCreated` | date | 0-1 | | | -!!! question - Do we conssider datasets something merely "internal"? - If so, do metadata on datasets even make sense at all? Should we even "expose" datasets publicly? +!!! note + If we think of a dataset as something internal, + we should limit the metadata to what is necessary for the system to work. + Additionally, we may want to have some minimal descriptive metadata for the dataset, + (like for the use case that a project once a year grabs a box of achrival material and digitizes it). !!! question Do we need to store the license on the dataset level, @@ -224,6 +235,17 @@ but are identified by their position in the hierarchy. !!! question Do we need a reference to the records in the dataset? +!!! question + Does `dateCreated` suffice here? There were more date properties in the old model. + +Data sets arefor internal use, +they serve to partition the data into manageable chunks. +This is done both by type of data (RDF vs. assets), and by size. + +In some cases, there may be a "logical" grouping consisting a dataset, +e.g. if data is digitized in a batch and there is a temporal separation between the batches. +In these cases, the project may make use of the descriptive metadata of the dataset. +But normally, the dataset is just a technical entity, and should not carry semantic information. #### Collection @@ -232,11 +254,13 @@ but are identified by their position in the hierarchy. | `__id` | string | 1 | | | | `__type` | string | 1 | Literal 'Collection' | | | `name` | string | 1 | | | -| `accessConditions` | string | 1 | Literal "open", "restricted" or "closed" | copied from dataset; change to proper terms | +| `description` | string / url | 1-n | | | +| `typeOfData` | string[] | 1-n | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; does this still make sense? | +| `licenses` | license[] | 1-n | | copied from dataset; should be computed from the records | +| `copyright` | string[] | 1-n | | computed along with license | +| `languages` | lang_string[] | 1-n | | copied from dataset; does this make sense? | +| `attributions` | attribution[] | 1-n | | copied from dataset; can this be calculated? | | `provenance` | string | 0-1 | | | -| `datePublished` | date | 0-1 | | copied from dataset; do we still need those? | -| `dateCreated` | date | 0-1 | | copied from dataset; do we still need those? | -| `dateModified` | date | 0-1 | | copied from dataset; do we still need those? | | `distribution` | url | 0-1 | | copied from dataset; does this make sense? | | `records` | id[] | 0-n | Record IDs | can be 0 in case it points to a collection | | `collections` | id[] | 0-n | Collection IDs | | @@ -244,17 +268,6 @@ but are identified by their position in the hierarchy. | `keywords` | lang_string[] | 0-n | | does this make sense? | | `urls` | url[] | 0-n | | copied from dataset; | | `additional` | lang_string / url | 0-n | | copied from dataset; | -| `description` | string / url | 1-n | | | -| `typeOfData` | string[] | 1-n | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; does this still make sense? | -| `licenses` | license[] | 1-n | | copied from dataset; should be computed from the records | -| `copyright` | string[] | 1-n | | computed along with license | -| `languages` | lang_string[] | 1-n | | copied from dataset; does this make sense? | -| `attributions` | attribution[] | 1-n | | copied from dataset; can this be calculated? | - - -!!! important - The properties for `Compound Project` were invented by me on the fly. - That does not mean they are correct or useful. !!! question @@ -279,10 +292,6 @@ but are identified by their position in the hierarchy. | `dateModified` | date | 0-1 | | copied from dataset; do they make sense? | | `typeOfData` | string | 0-1 | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; wanted? what values? | -!!! important - The properties for `Record` were invented by me on the fly. - That does not mean they are correct or useful. - !!! question How granular do we want to be with the metadata on the record level? @@ -436,7 +445,7 @@ or a reference to a resource in an external authority file. ```mermaid erDiagram - compoundProject |o--|{ project : projects + umbrellaProject |o--|{ project : projects project ||--|{ dataset : datasets project ||--|| person : contactPoint project ||--|| organization : contactPoint @@ -448,30 +457,23 @@ erDiagram collection |o--o{ record : records person ||--|{ organization : affiliations - compoundProject { + umbrellaProject { string __id "1" - string __type "1; Literal 'CompoundProject'" + string __type "1; Literal 'UmbrellaProject'" string name "1" - url url "1" - string howToCite "1" - lang_string description "0-1" - id contactPoint "0-1" id[] projects "1-n; Project IDs" - lang_string[] keywords "0-n" - lang_string_or_url[] disciplines "0-n" - lang_string_or_url[] temporalCoverage "0-n" - url[] spatialCoverage "0-n" - id[] funders "0-n; Person or Organization IDs" - publication[] publications "0-n" - grant[] grants "0-n" + lang_string description "0-1" lang_string[] alternativeNames "0-n" - id[] consistingInstitutions "0-n; Organization IDs" + url url "0-1" + id contactPoint "0-1" + id[] institutionalPartner "0-n; Organization IDs" } project { + string __id "1" string __type "1; Literal 'Project'" string shortcode "1" - string status "1; Literal 'Ongoing' or 'Finished'" + string status "1; Literal 'Ongoing', 'Finished'" string name "1" lang_string description "1" date startDate "1" @@ -484,6 +486,7 @@ erDiagram lang_string_or_url[] temporalCoverage "1-n" url[] spatialCoverage "1-n" id[] funders "1-n; Person or Organization IDs" + attribution[] attributions "1-n" date endDate "0-1" url secondaryURL "0-1" dmp dataManagementPlan "0-1" @@ -497,22 +500,13 @@ erDiagram string __id "1" string __type "1; Literal 'Dataset'" string title "1" - string accessConditions "1; Literal 'open', 'restricted' or 'closed'" - string howToCite "1" - string status "1; Literal 'In Planning', 'Ongoing', 'On hold', 'Finished'" - lang_string_or_url[] abstract "1-n" string[] typeOfData "1-n; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'" license[] licenses "1-n" string[] copyright "1-n" - lang_string[] languages "1-n" attribution[] attributions "1-n" - date datePublished "0-1" + string howToCite "0-1" + lang_string description "0-1" date dateCreated "0-1" - date dateModified "0-1" - url distribution "0-1" - lang_string[] alternativeTitles "0-n" - url[] urls "0-n" - lang_string_or_url[] additional "0-n" } collection { @@ -584,150 +578,22 @@ erDiagram ## Change Log -### Changes - Make `Grant` a value type and remove it from the top level. -- Added entity `compoundProject` to the top level. +- Added entity `umbrellaProject` to the top level. - Added entity `collection` to the top level. - Added entity `record` to the top level. - Added `copyright` to `dataset`. - -### Implementation/migration Notes - -- inline grant in project -- add/remove entities and properties accordingly - - -### Mapping (Old to New) - -#### Compound Project - -- `compoundProject.__id` : new -- `compoundProject.__type` : new -- `compoundProject.name`: new -- `compoundProject.url`: new -- `compoundProject.howToCite`: new -- `compoundProject.description`: new -- `compoundProject.contactPoint`: new -- `compoundProject.keywords`: new -- `compoundProject.disciplines`: new -- `compoundProject.temporalCoverage`: new -- `compoundProject.spatialCoverage`: new -- `compoundProject.funders`: new -- `compoundProject.publications`: new -- `compoundProject.grants`: new -- `compoundProject.alternativeNames`: new -- `compoundProject.consistingInstitutions`: new - -This entity is new and does not have a direct mapping from the old model. -All values need to be defined and added manually. - -#### Project - -- `project.__type`: unchanged -- `project.shortcode`: unchanged -- `project.status`: unchanged -- `project.name`: unchanged -- `project.description`: unchanged -- `project.startDate`: unchanged -- `project.teaserText`: unchanged -- `project.url`: unchanged -- `project.howToCite`: unchanged -- `project.datasets`: unchanged -- `project.keywords`: unchanged -- `project.disciplines`: unchanged -- `project.temporalCoverage`: unchanged -- `project.spatialCoverage`: unchanged -- `project.funders`: unchanged -- `project.endDate`: unchanged -- `project.secondaryURL`: unchanged -- `project.dataManagementPlan`: unchanged -- `project.contactPoint`: unchanged -- `project.publications`: unchanged -- `project.grants`: inlined from top level to project -- `project.alternativeNames`: unchanged - -#### Dataset - -- `dataset.__id`: unchanged -- `dataset.__type`: unchanged -- `dataset.title`: unchanged -- `dataset.accessConditions`: unchanged -- `dataset.howToCite`: unchanged -- `dataset.status`: unchanged -- `dataset.abstract`: unchanged -- `dataset.typeOfData`: unchanged -- `dataset.licenses`: unchanged -- `dataset.copyright`: newly added -- `dataset.languages`: unchanged -- `dataset.attributions`: unchanged -- `dataset.datePublished`: unchanged -- `dataset.dateCreated`: unchanged -- `dataset.dateModified`: unchanged -- `dataset.distribution`: unchanged -- `dataset.alternativeTitles`: unchanged -- `dataset.urls`: unchanged -- `dataset.additional`: unchanged - -#### Collection - -- `collection.__id`: new -- `collection.__type`: new -- `collection.name`: new -- `collection.accessConditions`: new -- `collection.provenance`: new -- `collection.datePublished`: new -- `collection.dateCreated`: new -- `collection.dateModified`: new -- `collection.distribution`: new -- `collection.records`: new -- `collection.collections`: new -- `collection.alternativeNames`: new -- `collection.keywords`: new -- `collection.urls`: new -- `collection.additional`: new -- `collection.description`: new -- `collection.typeOfData`: new -- `collection.licenses`: new -- `collection.copyright`: new -- `collection.languages`: new -- `collection.attributions`: new - -#### Record - -- `record.__id`: new -- `record.__type`: new -- `record.pid`: new -- `record.label`: new -- `record.accessConditions`: new -- `record.license`: new -- `record.attribution`: new -- `record.provenance`: new -- `record.datePublished`: new -- `record.dateCreated`: new -- `record.dateModified`: new -- `record.typeOfData`: new - -#### Person - -- `person.__id`: unchanged -- `person.__type`: unchanged -- `person.givenNames`: unchanged -- `person.familyNames`: unchanged -- `person.jobTitles`: unchanged -- `person.affiliations`: unchanged -- `person.address`: unchanged -- `person.email`: unchanged -- `person.secondaryEmail`: unchanged -- `person.authorityRefs`: unchanged - -#### Organization - -- `organization.__id`: unchanged -- `organization.__type`: unchanged -- `organization.name`: unchanged -- `organization.url`: unchanged -- `organization.address`: unchanged -- `organization.email`: unchanged -- `organization.alternativeName`: unchanged -- `organization.authorityRefs`: unchanged +- Changed type of `abstract`/`description` in `dataset` to `lang_string`. +- Changed cardinality of `abstract`/`description` in `dataset` to 1. +- Changed cardinality of `howToCite` in `dataset` to 0-1. +- Changed cardinality of `description` in `dataset` to 0-1. +- Removed `accessConditions` from `dataset`. +- Removed `status` from `dataset`. +- Renamed `abstract` to `description` in `dataset`. +- Removed `languages` from `dataset`. +- Removed `datePublished`, and `dateModified` from `dataset`. +- Removed `distribution` from `dataset`. +- Removed `additional` from `dataset`. +- Removed `alternativeTitles` from `dataset`. +- Removed `urls` from `dataset`.