From 9734c1d3f7a43db788e939e2ac8782a6eccdb8c5 Mon Sep 17 00:00:00 2001 From: Balduin Landolt <33053745+BalduinLandolt@users.noreply.github.com> Date: Tue, 3 Dec 2024 08:43:12 +0100 Subject: [PATCH] continue --- docs/data/future-datamodel.md | 116 +++++++++++++++++----------------- 1 file changed, 57 insertions(+), 59 deletions(-) diff --git a/docs/data/future-datamodel.md b/docs/data/future-datamodel.md index aed6d1d..8f614f8 100644 --- a/docs/data/future-datamodel.md +++ b/docs/data/future-datamodel.md @@ -184,6 +184,10 @@ most of the fields are optional. Do permissions need to be a complex object? Embargo probably needs the date when it ends. +!!! question + `permissions` should be renamed to `rights` or `accessRights` + (datacite uses `rights` and openAIRE and COAR use `accessRights`). + #### Dataset @@ -217,6 +221,9 @@ most of the fields are optional. !!! answer What is the meaning of `dateCreated` in this context? +!!! question + Should `rights`/`accessRights` be added? + A project can have more than one dataset if it's the project's wish and if it provides meaningful grouping of the records e.g., 2 researchers worked one one part of the data and the 2 other researchers on the other part of the data, EKWS digitizing different boxes and each box becomes a dataset. @@ -224,25 +231,24 @@ A record can only be part of one dataset. #### Collection -| Field | Type | Cardinality | Restrictions | Remarks | -| ------------------ | ----------------- | ----------- | ------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- | -| `__id` | string | 1 | | | -| `__type` | string | 1 | Literal 'Collection' | | -| `pid` | id | 1 | | | -| `name` | string | 1 | | | -| `description` | string / url | 1-n | | | -| `typeOfData` | string[] | 1-n | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; does this still make sense? -> Maybe not. -> should it be optional? or removed? | -| `licenses` | license[] | 1-n | | computed from the records if available and optionally added manually | -| `copyright` | string[] | 1-n | | computed from the records if available and optionally added manually | -| `languages` | lang_string[] | 1-n | | copied from dataset; does this make sense? -> computed if available and optionally added manually. -> ? | -| `attributions` | attribution[] | 1-n | | computed from the records if available and optionally added manually | -| `provenance` | string | 0-1 | | see: [openAIRE Guidelines](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_source.html#dc-source) | -| `records` | id[] | 0-n | Record IDs | can be 0 in case it points to a collection | -| `collections` | id[] | 0-n | Collection IDs | | -| `alternativeNames` | lang_string[] | 0-n | | | -| `keywords` | lang_string[] | 0-n | | does this make sense? -> Interesting for the search. | -| `urls` | url[] | 0-n | | copied from dataset; | -| `additional` | lang_string / url | 0-n | | copied from dataset; -> Probably not needed. | +| Field | Type | Cardinality | Restrictions | Remarks | +| ------------------ | ------------- | ----------- | ------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- | +| `__id` | string | 1 | | | +| `__type` | string | 1 | Literal 'Collection' | | +| `pid` | id | 1 | | | +| `name` | string | 1 | | | +| `description` | string / url | 1-n | | | +| `typeOfData` | string[] | 1-n | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; does this still make sense? -> Maybe not. -> should it be optional? or removed? | +| `licenses` | license[] | 1-n | | computed from the records if available and optionally added manually | +| `copyright` | string[] | 1-n | | computed from the records if available and optionally added manually | +| `languages` | lang_string[] | 1-n | | copied from dataset; does this make sense? -> computed if available and optionally added manually. -> ? | +| `attributions` | attribution[] | 1-n | | computed from the records if available and optionally added manually | +| `provenance` | string | 0-1 | | see: [openAIRE Guidelines](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_source.html#dc-source) | +| `records` | id[] | 0-n | Record IDs | can be 0 in case it points to a collection | +| `collections` | id[] | 0-n | Collection IDs | | +| `alternativeNames` | lang_string[] | 0-n | | | +| `keywords` | lang_string[] | 0-n | | | +| `urls` | url[] | 0-n | | | !!! note In the long term (not for now), we need to reference the records in the collection. @@ -250,42 +256,44 @@ A record can only be part of one dataset. !!! question Is it correct, that collections are completely unuseable, as long as we don't have metadata on the record level? +!!! question + Should `rights`/`accessRights` be added? + #### Record -| Field | Type | Cardinality | Restrictions | Remarks | -| ------------------- | ----------- | ----------- | ------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `__id` | string | 1 | | | -| `__type` | string | 1 | Literal 'Record' | | -| `pid` | id | 1 | | or `ARK`? -> probably use `pid` | -| `label` | lang_string | 1 | | do we want this, or does it go too far? -> We want to keep it because it's the "name" of the record. But we can think about renaming it. | -| `accessConditions` | string | 1 | Literal "open", "restricted" or "closed" | copied from dataset; change to proper terms -> open, restricted, embargoed, metadata-only and renaming `accessConditions` to `rights` to be in line with openAIRE. | -| `embargoPeriodDate` | date | 0-1 | | -> needs to be added to be in line with openAIRE, e.g., ``` 2011-12-01 2012-12-01 ``` | -| `publisher` | string | 1 | | should be DaSCH | -| `license` | license | 1 | | copied from dataset; should be computed from the records -> No, you have to indicate the license here. Computation is not possible. | -| `copyright` | string | 1 | | computed along with license -> -> No, you have to indicate the copyright here. Computation is not possible. | -| `attribution` | attribution | 1 | | do we want this, or does it go too far? -> Yes | -| `provenance` | string | 0-1 | | do we want this, or does it go too far? -> Yes, [openAIRE data-source](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_source.html#dc-source) | -| `datePublished` | date | 0-1 | | copied from dataset; do they make sense? -> Yes | -| `dateCreated` | date | 0-1 | | copied from dataset; do they make sense? -> Yes | -| `dateModified` | date | 0-1 | | copied from dataset; do they make sense? -> Yes | -| `typeOfData` | string | 0-1 | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; wanted? what values? -> Yes, type is computed and should represent: [openAIRE Resource Type](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_publicationtype.html#aire-resourcetype) | -| `size` | string | 0-1 | | needs to be added, see: [openAIRE Size](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_size.html#dci-size) | -| `audience` | string | 0-n | | needs to be added, see: [openAIRE Audience](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_audience.html#dct-audience) | +| Field | Type | Cardinality | Restrictions | Remarks | +| ------------------- | ----------- | ----------- | ---------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `__id` | string | 1 | | | +| `__type` | string | 1 | Literal 'Record' | | +| `pid` | id | 1 | | | +| `label` | lang_string | 1 | | do we want this, or does it go too far? -> We want to keep it because it's the "name" of the record. But we can think about renaming it. | +| `accessConditions` | string | 1 | Literal "open", "restricted", "embargo" or "metadata only" | copied from dataset; change to proper terms -> open, restricted, embargoed, metadata-only and renaming `accessConditions` to `rights` to be in line with openAIRE. | +| `embargoPeriodDate` | date | 0-1 | | -> needs to be added to be in line with openAIRE, e.g., ``` 2011-12-01 2012-12-01 ``` | +| `publisher` | string | 1 | | should be DaSCH | +| `license` | license | 1 | | copied from dataset; should be computed from the records -> No, you have to indicate the license here. Computation is not possible. | +| `copyright` | string | 1 | | computed along with license -> -> No, you have to indicate the copyright here. Computation is not possible. | +| `attribution` | attribution | 1 | | do we want this, or does it go too far? -> Yes | +| `provenance` | string | 0-1 | | do we want this, or does it go too far? -> Yes, [openAIRE data-source](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_source.html#dc-source) | +| `datePublished` | date | 0-1 | | copied from dataset; do they make sense? -> Yes | +| `dateCreated` | date | 0-1 | | copied from dataset; do they make sense? -> Yes | +| `dateModified` | date | 0-1 | | copied from dataset; do they make sense? -> Yes | +| `typeOfData` | string | 0-1 | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; wanted? what values? -> Yes, type is computed and should represent: [openAIRE Resource Type](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_publicationtype.html#aire-resourcetype) | +| `size` | string | 0-1 | | needs to be added, see: [openAIRE Size](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_size.html#dci-size) | +| `audience` | string | 0-n | | needs to be added, see: [openAIRE Audience](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_audience.html#dct-audience) | + +!!! question + rename `accessConditions` to `rights` or `accessRights`? !!! question -How granular do we want to be with the metadata on the record level? + How granular do we want to be with the metadata on the record level? !!! answer -We need provenance, -see: [openAIRE Source](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_source.html#dc-source) + We need provenance, + see: [openAIRE Source](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_source.html#dc-source) !!! question -If we have copyright, what is the purpose of attribution? + Not sure what to make of that. -!!! answer -Copyright doesn't have anything to do with attribution. Attribution is who did something with the data. Copyright is -person/organization who holds the right to this record and can give others the permission to do something with this -record aka license. #### Person @@ -427,6 +435,8 @@ License are depending on dates. It doesn't relate to a copyright statement. ## Entity-Relationship Diagram + + ```mermaid erDiagram umbrellaProject |o--|{ project : projects @@ -558,19 +568,6 @@ erDiagram } ``` -## Notes - -- [ ] Permissions: open, restricted, embargo, metadata only - -- [ ] ... - ## Change Log - Make `Grant` a value type and remove it from the top level. @@ -597,3 +594,4 @@ erDiagram - Removed `additional` from `dataset`. - Removed `alternativeTitles` from `dataset`. - Removed `urls` from `dataset`. +- Changed options of `rights` etc. to "open", "restricted", "embargo", "metadata only".