From 2b7de8b408c0bb701b56ae94f98135e19bcf7102 Mon Sep 17 00:00:00 2001
From: Balduin Landolt <33053745+BalduinLandolt@users.noreply.github.com>
Date: Mon, 4 Nov 2024 18:28:42 +0100
Subject: [PATCH 1/8] move old documentation for disambiguity

---
 docs/data/{datamodel.md => current-datamodel.md} | 2 +-
 docs/index.md                                    | 2 +-
 mkdocs.yml                                       | 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)
 rename docs/data/{datamodel.md => current-datamodel.md} (99%)
diff --git a/docs/data/datamodel.md b/docs/data/current-datamodel.md
similarity index 99%
rename from docs/data/datamodel.md
rename to docs/data/current-datamodel.md
index cd1c7c85..1372fb6a 100644
--- a/docs/data/datamodel.md
+++ b/docs/data/current-datamodel.md
@@ -1,4 +1,4 @@
-# Data Model
+# Current Data Model
 
 All metadata are modelled according to the model as described in the following.
 
diff --git a/docs/index.md b/docs/index.md
index af6cb9f5..d5e22889 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -10,7 +10,7 @@ as well as the code of the [DSP Metadata Browser](https://meta.dasch.swiss).
 If you are interested in viewing the metadata in human-readable form, 
 you can visit the [DSP Metadata Browser](https://meta.dasch.swiss).
 
-If you are interested in re-using our metadata, you can find extensive documentation [here](data/datamodel.md).
+If you are interested in re-using our metadata, you can find extensive documentation [here](data/current-datamodel.md).
 
 The metadata itself can be found [here](https://github.com/dasch-swiss/dsp-meta/tree/main/data/json)
 or requested over the API as described [here](data/api.md).
diff --git a/mkdocs.yml b/mkdocs.yml
index 27e3f62a..f5544ebf 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -4,7 +4,7 @@ nav:
     - DSP-META: index.md
     - Consuming Metadata:
           - Metadata API: data/api.md
-          - Data Model: data/datamodel.md
+          - Current Data Model: data/current-datamodel.md
     - Adding Metadata: adding-metadata.md
     - Code Documentation:
           - Overview: code/overview.md
@@ -33,8 +33,8 @@ theme:
               name: Switch to light mode
     features:
         - search.suggest
-        - navigation.tabs
-        - navigation.sections
+        # - navigation.tabs
+        # - navigation.sections
 
 markdown_extensions:
     - admonition

From 92ead29623a243c867a71879cdd126b89e9846b5 Mon Sep 17 00:00:00 2001
From: Balduin Landolt <33053745+BalduinLandolt@users.noreply.github.com>
Date: Tue, 5 Nov 2024 16:56:51 +0100
Subject: [PATCH 2/8] set up new model based on the old one

---
 docs/data/provisional-datamodel.md | 601 +++++++++++++++++++++++++++++
 mkdocs.yml                         |   1 +
 2 files changed, 602 insertions(+)
 create mode 100644 docs/data/provisional-datamodel.md

diff --git a/docs/data/provisional-datamodel.md b/docs/data/provisional-datamodel.md
new file mode 100644
index 00000000..b5218120
--- /dev/null
+++ b/docs/data/provisional-datamodel.md
@@ -0,0 +1,601 @@
+# Provisional Data Model
+
+!!! warning
+    This document does _not_ represent the current state of the metadata model.  
+    It is a working document for planned upcoming changes to the metadata model.
+
+!!! note
+    This model is an idealized version of the metadata model. 
+    With the current implementation that is entirely separate from the DSP,
+    it is not feasible to implement metadata on the record level.  
+    Such a system may be implemented in the archive in the future,
+    but for now, we will keep the metadata on the dataset level.  
+    A separate, simplified model for applying some of these changes, 
+    while remaining compatible with the current implementation,
+    should be created alongside this model.
+
+## Overview
+
+The metadata model is a hierarchical structure of metadata elements. 
+
+```mermaid
+
+flowchart TD
+    hyper-project[Hyper-Project /<br/>Uber-Project /<br/>Meta-Project /<br/>Compound Project] -->|1-n| project[Project /<br/>Research Project]
+    project -->|1-n| dataset[Dataset]
+    dataset -->|1-n| record[Record /<br/>Resource]
+    project -->|0-n| collection[Collection]
+    collection --> collection
+    hyper-project -->|0-n| collection
+    collection --> record
+```
+
+- A `Compound Project` is optional and collects one or more `Research Projects`.  
+  It is typically of institutional nature, 
+  not directly tied to a specific funding grant, 
+  and may be long-lived.  
+  Examples are EKWS/CAS, BEOL or LIMC.
+- A `Research Project` is the main entity of the metadata model.  
+  It corresponds to a `project` in the DSP.
+  It is typically tied to a specific funding grant, 
+  and hence has a limited lifetime of ~3-5 years;
+  multiple funding rounds and a longer lifetime are possible.  
+  A `Research Project` is part of 0-1 `Compound Project`,
+  it has 1-n `Datasets` and 0-n `Collections`.
+- A `Dataset` is a collection of `Records` within a `Research Project`.  
+  It is mostly meant for system-internal and technical use,
+  and should not have particular semantics or a "historical meaning" in the context of the project.  
+  A `Dataset` is part of exactly 1 `Research Project`
+  and contains 1-n `Records`.
+- A `Collection` is also a collection of `Records` within a `Research Project`.  
+  It is meant for semantic grouping of `Records` within a `Research Project`,
+  and may have a "historical meaning" in the context of the project.  
+  Examples may be physical collections such as p person's "Nachlass" in an archive,
+  or groupings of records based on a specific research question within a project.  
+  A `Collection` is part of at least 1 `Research Project`, `Compound Project` or `Collection`, 
+  but can be part of multiple. It may either contain 0-n `Collections` or 1-n `Records`.
+- A `Record` is a single resource within a `Dataset`.  
+  It represents a single entity, and the smallest unit that can meaningfully have an identifier. 
+  It maps to a `knora-base:Resource` (DSP-API) or an `Asset` (SIPI/Ingest) in the DSP.  
+  A `Record` is part of exactly 1 `Dataset` and may be part of 0-n `Collections`. 
+
+Additionally, there are the entities `Person` and `Organization`:  
+`Person` and `Organization` are entities that are independent of the `Research Project` hierarchy,
+and may be related to various entities within the hierarchy.  
+
+
+## Top Level
+
+A set of metadata consists of the following top-level elements:
+
+- Compound Project
+- Project
+- Dataset
+- Collection
+- Record
+- Person
+- Organization
+
+Each of these elements is an entity identified by a unique identifier. 
+Other elements can refer to these entities by their identifier.
+
+Any other metadata element may itself be a complex object,
+but it is always part of one of the top-level elements.
+Such elements do not have an identifier, 
+but are identified by their position in the hierarchy.
+
+| Field             | Type            | Cardinality |
+| ----------------- | --------------- | ----------- |
+| `$schema`         | string          | 0-1         |
+| `compoundProject` | compoundProject | 0-1         |
+| `project`         | project         | 1           |
+| `datasets`        | dataset[]       | 1-n         |
+| `collections`     | collection[]    | 0-n         |
+| `records`         | record[]        | 0-n         |
+| `persons`         | person[]        | 0-n         |
+| `organizations`   | organization[]  | 0-n         |
+
+
+## Types
+
+### Entity Types
+
+#### Compound Project
+
+| Field                    | Type                | Cardinality | Restrictions                                                 | Remarks            |
+| ------------------------ | ------------------- | ----------- | ------------------------------------------------------------ | ------------------ |
+| `__type`                 | string              | 1           | Literal 'CompoundProject'                                    |                    |
+| `name`                   | string              | 1           |                                                              |                    |
+| `url`                    | url                 | 1           |                                                              |                    |
+| `howToCite`              | string              | 1           |                                                              | Needed?            |
+| `projects`               | id[]                | 1-n         | String containing the identifier of a project                |                    |
+| `description`            | lang_string         | 0-1         |                                                              | Optional?          |
+| `contactPoint`           | id                  | 0-1         | String containing the identifier of a person or organization | Optional?          |
+| `keywords`               | lang_string[]       | 0-n         |                                                              | Needed?            |
+| `disciplines`            | lang_string / url[] | 0-n         |                                                              | Needed?            |
+| `temporalCoverage`       | lang_string / url[] | 0-n         |                                                              | Needed?            |
+| `spatialCoverage`        | url[]               | 0-n         |                                                              | Needed?            |
+| `funders`                | id[]                | 0-n         | String containing the identifier of a person                 | Needed?            |
+| `publications`           | publication[]       | 0-n         |                                                              | Needed?            |
+| `grants`                 | grant[]             | 0-n         |                                                              | Needed?            |
+| `alternativeNames`       | lang_string[]       | 0-n         |                                                              | Needed?            |
+| `consistingInstitutions` | id[]                | 0-n         | String containing the identifier of an organization          | Makes sense? Name? |
+
+!!! question
+    This opens up the questions of how to deal with multiple projects in a compound project. 
+    We probably want to keep one entry per project, 
+    so this leaves us with either duplicating the compound project metadata for each project,
+    or having compound project metadata separately and only linking it from the project.
+    The latter seems preferable, 
+    but then the question arises who gets to edit the compound project metadata.  
+    For a first implementation, we could simply duplicate the metadata for each project, 
+    and later factor it out.
+
+!!! important
+    The properties for `Compound Project` were invented by me on the fly. 
+    That does not mean they are correct or useful.
+
+
+#### Project
+
+| Field                | Type                | Cardinality | Restrictions                                                 | Remarks               |
+| -------------------- | ------------------- | ----------- | ------------------------------------------------------------ | --------------------- |
+| `__type`             | string              | 1           | Literal "Project"                                            |                       |
+| `shortcode`          | string              | 1           | 4 char hexadecimal                                           |                       |
+| `status`             | string              | 1           | Literal "Ongoing" or "Finished"                              |                       |
+| `name`               | string              | 1           |                                                              |                       |
+| `description`        | lang_string         | 1           |                                                              |                       |
+| `startDate`          | date                | 1           | String of format "YYYY-MM-DD"                                |                       |
+| `teaserText`         | string              | 1           |                                                              |                       |
+| `url`                | url                 | 1           |                                                              |                       |
+| `howToCite`          | string              | 1           |                                                              |                       |
+| `datasets`           | id[]                | 1-n         | String containing the identifier of a dataset                |                       |
+| `keywords`           | lang_string[]       | 1-n         |                                                              |                       |
+| `disciplines`        | lang_string / url[] | 1-n         |                                                              |                       |
+| `temporalCoverage`   | lang_string / url[] | 1-n         |                                                              |                       |
+| `spatialCoverage`    | url[]               | 1-n         |                                                              |                       |
+| `funders`            | id[]                | 1-n         | String containing the identifier of a person or organization | Does this make sense? |
+| `endDate`            | date                | 0-1         | String of format "YYYY-MM-DD"                                |                       |
+| `secondaryURL`       | url                 | 0-1         |                                                              |                       |
+| `dataManagementPlan` | dmp                 | 0-1         |                                                              |                       |
+| `contactPoint`       | id                  | 0-1         | String containing the identifier of a person or organization |                       |
+| `publications`       | publication[]       | 0-n         |                                                              |                       |
+| `grants`             | grant[]             | 0-n         |                                                              | Does this make sense? |
+| `alternativeNames`   | lang_string[]       | 0-n         |                                                              |                       |
+
+!!! question
+    If we can have copyright/license on dataset level,
+    do we want to have it on project level as well?
+
+!!! question
+    Do we still need funders if we have grants?
+
+!!! question
+    What about projects that do not have funding?
+
+
+#### Dataset
+
+| Field               | Type              | Cardinality | Restrictions                                            | Remarks                             |
+| ------------------- | ----------------- | ----------- | ------------------------------------------------------- | ----------------------------------- |
+| `__id`              | string            | 1           |                                                         |                                     |
+| `__type`            | string            | 1           | Literal "Dataset"                                       |                                     |
+| `title`             | string            | 1           |                                                         |                                     |
+| `accessConditions`  | string            | 1           | Literal "open", "restricted" or "closed"                | change to proper terms              |
+| `howToCite`         | string            | 1           |                                                         |                                     |
+| `status`            | string            | 1           | Literal "In Planning", "Ongoing", "On hold", "Finished" | not aligned with project status     |
+| `abstract`          | lang_string / url | 1-n         |                                                         | naming: maybe 'description'?        |
+| `typeOfData`        | string[]          | 1-n         | Literal "XML", "Text", "Image", "Video", "Audio"        | does this still make sense?         |
+| `licenses`          | license[]         | 1-n         |                                                         | should be computed from the records |
+| `copyright`         | string[]          | 1-n         |                                                         | computed along with license         |
+| `languages`         | lang_string[]     | 1-n         |                                                         | does this make sense?               |
+| `attributions`      | attribution[]     | 1-n         |                                                         | can this be calculated?             |
+| `datePublished`     | date              | 0-1         |                                                         |                                     |
+| `dateCreated`       | date              | 0-1         |                                                         |                                     |
+| `dateModified`      | date              | 0-1         |                                                         |                                     |
+| `distribution`      | url               | 0-1         |                                                         | does this make sense?               |
+| `alternativeTitles` | lang_string[]     | 0-n         |                                                         |                                     |
+| `urls`              | url[]             | 0-n         |                                                         |                                     |
+| `additional`        | lang_string / url | 0-n         |                                                         |                                     |
+
+!!! question
+    Do we conssider datasets something merely "internal"?  
+    If so, do metadata on datasets even make sense at all? Should we even "expose" datasets publicly?
+
+!!! question
+    Do we need to store the license on the dataset level, 
+    or can we compute it from the records?  
+    If we store it on the dataset level, 
+    how do we deal with datasets that contain records with different licenses?
+
+!!! question
+    Do we need to store the language on the dataset level, 
+    or can we compute it from the records?  
+    If we store it on the dataset level, 
+    how do we deal with datasets that contain records in different languages?
+
+!!! question
+    Do we need to store the attribution on the dataset level, 
+    or can we compute it from the records?  
+    If we store it on the dataset level, 
+    how do we deal with datasets that contain records with different attributions?
+
+!!! question
+    Do we need a reference to the records in the dataset?
+
+
+#### Collection
+
+| Field              | Type              | Cardinality | Restrictions                                     | Remarks                                                  |
+| ------------------ | ----------------- | ----------- | ------------------------------------------------ | -------------------------------------------------------- |
+| `__id`             | string            | 1           |                                                  |                                                          |
+| `__type`           | string            | 1           | Literal 'Collection'                             |                                                          |
+| `name`             | string            | 1           |                                                  |                                                          |
+| `accessConditions` | string            | 1           | Literal "open", "restricted" or "closed"         | copied from dataset; change to proper terms              |
+| `provenance`       | string            | 0-1         |                                                  |                                                          |
+| `datePublished`    | date              | 0-1         |                                                  | copied from dataset; do we still need those?             |
+| `dateCreated`      | date              | 0-1         |                                                  | copied from dataset; do we still need those?             |
+| `dateModified`     | date              | 0-1         |                                                  | copied from dataset; do we still need those?             |
+| `distribution`     | url               | 0-1         |                                                  | copied from dataset; does this make sense?               |
+| `records`          | id[]              | 0-n         | Record IDs                                       | can be 0 in case it points to a collection               |
+| `collections`      | id[]              | 0-n         | Collection IDs                                   |                                                          |
+| `alternativeNames` | lang_string[]     | 0-n         |                                                  |                                                          |
+| `keywords`         | lang_string[]     | 0-n         |                                                  | does this make sense?                                    |
+| `urls`             | url[]             | 0-n         |                                                  | copied from dataset;                                     |
+| `additional`       | lang_string / url | 0-n         |                                                  | copied from dataset;                                     |
+| `description`      | string / url      | 1-n         |                                                  |                                                          |
+| `typeOfData`       | string[]          | 1-n         | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; does this still make sense?         |
+| `licenses`         | license[]         | 1-n         |                                                  | copied from dataset; should be computed from the records |
+| `copyright`        | string[]          | 1-n         |                                                  | computed along with license                              |
+| `languages`        | lang_string[]     | 1-n         |                                                  | copied from dataset; does this make sense?               |
+| `attributions`     | attribution[]     | 1-n         |                                                  | copied from dataset; can this be calculated?             |
+
+
+!!! important
+    The properties for `Compound Project` were invented by me on the fly. 
+    That does not mean they are correct or useful.
+
+
+!!! question
+    Do we need a reference to the records in the collection?
+
+
+#### Record
+
+| Field              | Type        | Cardinality | Restrictions                                     | Remarks                                                  |
+| ------------------ | ----------- | ----------- | ------------------------------------------------ | -------------------------------------------------------- |
+| `__id`             | string      | 1           |                                                  |                                                          |
+| `__type`           | string      | 1           | Literal 'Record'                                 |                                                          |
+| `pid`              | id          | 1           |                                                  | or `ARK`?                                                |
+| `label`            | lang_string | 1           |                                                  | do we want this, or does it go too far?                  |
+| `accessConditions` | string      | 1           | Literal "open", "restricted" or "closed"         | copied from dataset; change to proper terms              |
+| `license`          | license     | 1           |                                                  | copied from dataset; should be computed from the records |
+| `copyright`        | string      | 1           |                                                  | computed along with license                              |
+| `attribution`      | attribution | 1           |                                                  | do we want this, or does it go too far?                  |
+| `provenance`       | string      | 0-1         |                                                  | do we want this, or does it go too far?                  |
+| `datePublished`    | date        | 0-1         |                                                  | copied from dataset; do they make sense?                 |
+| `dateCreated`      | date        | 0-1         |                                                  | copied from dataset; do they make sense?                 |
+| `dateModified`     | date        | 0-1         |                                                  | copied from dataset; do they make sense?                 |
+| `typeOfData`       | string      | 0-1         | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; wanted? what values?                |
+
+!!! important
+    The properties for `Record` were invented by me on the fly.
+    That does not mean they are correct or useful.
+
+!!! question
+    How granular do we want to be with the metadata on the record level?
+
+!!! question
+    If we have copyright, what is the purpose of attribution?
+
+
+#### Person
+
+| Field            | Type     | Cardinality | Restrictions                           | Remarks |
+| ---------------- | -------- | ----------- | -------------------------------------- | ------- |
+| `__id`           | string   | 1           |                                        |         |
+| `__type`         | string   | 1           | Literal 'Person'                       |         |
+| `givenNames`     | string[] | 1-n         |                                        |         |
+| `familyNames`    | string[] | 1-n         |                                        |         |
+| `jobTitles`      | string[] | 0-n         |                                        |         |
+| `affiliations`   | id[]     | 0-n         | Organization IDs                       |         |
+| `address`        | address  | 0-1         |                                        |         |
+| `email`          | string   | 0-1         |                                        |         |
+| `secondaryEmail` | string   | 0-1         |                                        |         |
+| `authorityRefs`  | url[]    | 0-n         | References to external authority files |         |
+
+
+#### Organization
+
+| Field             | Type        | Cardinality | Restrictions                           | Remarks |
+| ----------------- | ----------- | ----------- | -------------------------------------- | ------- |
+| `__id`            | string      | 1           |                                        |         |
+| `__type`          | string      | 1           | Literal 'Organization'                 |         |
+| `name`            | string      | 1           |                                        |         |
+| `url`             | url         | 1           |                                        |         |
+| `address`         | address     | 0-1         |                                        |         |
+| `email`           | string      | 0-1         |                                        |         |
+| `alternativeName` | lang_string | 0-1         |                                        |         |
+| `authorityRefs`   | url[]       | 0-n         | References to external authority files |         |
+
+
+### Value Types
+
+#### String with Language Tag (`lang_string`)
+
+Object with an ISO language code as key and a string as value.
+
+```json
+{
+    "en": "Lorem ipsum in English.",
+    "de": "Lorem ipsum auf Deutsch."
+}
+```
+
+
+#### Date
+
+String with the format `YYYY-MM-DD`.
+
+
+#### URL
+
+An object representing a URL. 
+Depending on the `type` field,
+the URL may be a generic URL
+or a more specific link, like a PID
+or a reference to a resource in an external authority file.
+
+
+| Field    | Type   | Cardinality | Restrictions                                                                                                                                |
+| -------- | ------ | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
+| `__type` | string | 1           | Literal 'URL'                                                                                                                               |
+| `type`   | string | 1           | Literal 'URL', 'Geonames', 'Pleiades', 'Skos', 'Periodo', 'Chronontology', 'GND', 'VIAF', 'Grid', 'ORCID', 'Creative Commons', 'DOI', 'ARK' |
+| `url`    | string | 1           |                                                                                                                                             |
+| `text`   | string | 0-1         |                                                                                                                                             |
+
+!!! question
+    can we model different types of URLs in a more sensible way?
+
+
+#### Data Management Plan (`dmp`)
+
+| Field       | Type    | Cardinality | Restrictions                 |
+| ----------- | ------- | ----------- | ---------------------------- |
+| `__type`    | string  | 1           | Literal 'DataManagementPlan' |
+| `available` | boolean | 0-1         |                              |
+| `url`       | url     | 0-1         |                              |
+
+
+!!! question
+    Does the model for `Data Management Plan` still make sense? 
+    Could it be a string? 
+    Is "available" useful information? 
+    How do we ensure that either `available` or `url` is set?
+
+
+#### Publication
+
+| Field  | Type   | Cardinality | Restrictions |
+| ------ | ------ | ----------- | ------------ |
+| `text` | string | 1           |              |
+| `url`  | url    | 0-1         |              |
+
+
+#### Address
+
+| Field        | Type   | Cardinality | Restrictions      |
+| ------------ | ------ | ----------- | ----------------- |
+| `__type`     | string | 1           | Literal 'Address' |
+| `street`     | string | 1           |                   |
+| `postalCode` | string | 1           |                   |
+| `locality`   | string | 1           |                   |
+| `country`    | string | 1           |                   |
+| `canton`     | string | 0-1         |                   |
+| `additional` | string | 0-1         |                   |
+
+
+#### License
+
+| Field     | Type   | Cardinality | Restrictions      |
+| --------- | ------ | ----------- | ----------------- |
+| `__type`  | string | 1           | Literal 'License' |
+| `license` | url    | 1           |                   |
+| `date`    | date   | 1           |                   |
+| `details` | string | 0-1         |                   |
+
+!!! question
+    Is this model up to date with our current understanding of licenses? 
+    Is `details` ever used? 
+    What is the purpose of `date` here? 
+    How does it relate to a copyright statement?
+
+
+#### Attribution
+
+| Field    | Type   | Cardinality | Restrictions              | Remark                      |
+| -------- | ------ | ----------- | ------------------------- | --------------------------- |
+| `__type` | string | 1           | Literal 'Attribution'     |                             |
+| `agent`  | id     | 1           | Person or Organization ID | Or can this only be person? |
+| `roles`  | string | 1-n         |                           |                             |
+
+
+#### Grant
+
+| Field     | Type   | Cardinality | Restrictions               |
+| --------- | ------ | ----------- | -------------------------- |
+| `__type`  | string | 1           | Literal 'Grant'            |
+| `funders` | id[]   | 1-n         | Person or Organization IDs |
+| `number`  | string | 0-1         |                            |
+| `name`    | string | 0-1         |                            |
+| `url`     | url    | 0-1         |                            |
+
+
+## Entity-Relationship Diagram
+
+```mermaid
+erDiagram
+    compoundProject |o--|{ project : projects
+    project ||--|{ dataset : datasets
+    project ||--|| person : contactPoint
+    project ||--|| organization : contactPoint
+    project ||--|{ person : funders
+    project ||--|{ organization : funders
+    project |o--|{ collection : collections
+    dataset ||--|{ record : records
+    collection |o--o{ collection : collections
+    collection |o--o{ record : records
+    person ||--|{ organization : affiliations
+
+    compoundProject {
+        string __type "1; Literal 'CompoundProject'"
+        string name "1"
+        url url "1"
+        string howToCite "1"
+        lang_string description "0-1"
+        id contactPoint "0-1"
+        id[] projects "1-n; Project IDs"
+        lang_string[] keywords "0-n"
+        lang_string_or_url[] disciplines "0-n"
+        lang_string_or_url[] temporalCoverage "0-n"
+        url[] spatialCoverage "0-n"
+        id[] funders "0-n; Person or Organization IDs"
+        publication[] publications "0-n"
+        grant[] grants "0-n"
+        lang_string[] alternativeNames "0-n"
+        id[] consistingInstitutions "0-n; Organization IDs"
+    }
+    
+    project {
+        string __type "1; Literal 'Project'"
+        string shortcode "1"
+        string status "1; Literal 'Ongoing' or 'Finished'"
+        string name "1"
+        lang_string description "1"
+        date startDate "1"
+        string teaserText "1"
+        url url "1"
+        string howToCite "1"
+        id[] datasets "1-n; Dataset IDs"
+        lang_string[] keywords "1-n"
+        lang_string_or_url[] disciplines "1-n"
+        lang_string_or_url[] temporalCoverage "1-n"
+        url[] spatialCoverage "1-n"
+        id[] funders "1-n; Person or Organization IDs"
+        date endDate "0-1"
+        url secondaryURL "0-1"
+        dmp dataManagementPlan "0-1"
+        id contactPoint "0-1"
+        publication[] publications "0-n"
+        grant[] grants "0-n"
+        lang_string[] alternativeNames "0-n"
+    }
+
+    dataset {
+        string __id "1"
+        string __type "1; Literal 'Dataset'"
+        string title "1"
+        string accessConditions "1; Literal 'open', 'restricted' or 'closed'"
+        string howToCite "1"
+        string status "1; Literal 'In Planning', 'Ongoing', 'On hold', 'Finished'"
+        lang_string_or_url[] abstract "1-n"
+        string[] typeOfData "1-n; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'"
+        license[] licenses "1-n"
+        string[] copyright "1-n"
+        lang_string[] languages "1-n"
+        attribution[] attributions "1-n"
+        date datePublished "0-1"
+        date dateCreated "0-1"
+        date dateModified "0-1"
+        url distribution "0-1"
+        lang_string[] alternativeTitles "0-n"
+        url[] urls "0-n"
+        lang_string_or_url[] additional "0-n"
+    }
+
+    collection {
+        string __id "1"
+        string __type "1; Literal 'Collection'"
+        string name "1"
+        string accessConditions "1; Literal 'open', 'restricted' or 'closed'"
+        string provenance "0-1"
+        date datePublished "0-1"
+        date dateCreated "0-1"
+        date dateModified "0-1"
+        url distribution "0-1"
+        id[] records "0-n; Record IDs"
+        id[] collections "0-n; Collection IDs"
+        lang_string[] alternativeNames "0-n"
+        lang_string[] keywords "0-n"
+        url[] urls "0-n"
+        lang_string_or_url[] additional "0-n"
+        lang_string_or_url[] description "1-n"
+        string[] typeOfData "1-n; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'"
+        license[] licenses "1-n"
+        string[] copyright "1-n"
+        lang_string[] languages "1-n"
+        attribution[] attributions "1-n"
+    }
+
+    record {
+        string __id "1"
+        string __type "1; Literal 'Record'"
+        string pid "1"
+        lang_string label "1"
+        string accessConditions "1; Literal 'open', 'restricted' or 'closed'"
+        license license "1"
+        string copyright "1"
+        attribution attribution "1"
+        string provenance "0-1"
+        date datePublished "0-1"
+        date dateCreated "0-1"
+        date dateModified "0-1"
+        string typeOfData "0-1; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'"
+    }
+
+    person {
+        string __id "1"
+        string __type "1; Literal 'Person'"
+        string[] givenNames "1-n"
+        string[] familyNames "1-n"
+        string[] jobTitles "0-n"
+        id[] affiliations "0-n; Organization IDs"
+        address address "0-1"
+        string email "0-1"
+        string secondaryEmail "0-1"
+        url[] authorityRefs "0-n"
+    }
+
+    organization {
+        string __id "1"
+        string __type "1; Literal 'Organization'"
+        string name "1"
+        url url "1"
+        address address "0-1"
+        string email "0-1"
+        lang_string alternativeName "0-1"
+        url[] authorityRefs "0-n"
+    }
+```
+
+
+
+## Change Log
+
+### Changes
+
+- Make `Grant` a value type and remove it from the top level.
+- Added entity `compoundProject` to the top level.
+- Added entity `collection` to the top level.
+- Added entity `record` to the top level.
+- Added `copyright` to `dataset`.
+
+### Implementation/migration Notes
+
+- inline grant in project
+- add/remove entities and properties accordingly
+
+
+### Mapping Old -> New
+
+TODO: Add mapping from old to new model.
diff --git a/mkdocs.yml b/mkdocs.yml
index f5544ebf..f8ced1ff 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -5,6 +5,7 @@ nav:
     - Consuming Metadata:
           - Metadata API: data/api.md
           - Current Data Model: data/current-datamodel.md
+          - Provisional Data Model: data/provisional-datamodel.md
     - Adding Metadata: adding-metadata.md
     - Code Documentation:
           - Overview: code/overview.md

From 888e444767f45c5be5e22a8d027278c6b52fb0e9 Mon Sep 17 00:00:00 2001
From: Balduin Landolt <33053745+BalduinLandolt@users.noreply.github.com>
Date: Tue, 5 Nov 2024 17:10:36 +0100
Subject: [PATCH 3/8] remove broken markdown linting rule

---
 .markdownlint.yml                  | 8 +++++---
 docs/data/provisional-datamodel.md | 2 +-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/.markdownlint.yml b/.markdownlint.yml
index a4c058ac..c467d2fe 100644
--- a/.markdownlint.yml
+++ b/.markdownlint.yml
@@ -1,7 +1,7 @@
 # Config file for https://github.com/igorshubovych/markdownlint-cli
 
 # MD007/ul-indent - Unordered list indentation
-MD007: 
+MD007:
     # Whether to indent the first level of the list
     start_indented: false
     # By how many spaces every next level must be indented. The default of 2 is not compatible with mkdocs!
@@ -14,7 +14,7 @@ MD009: false
 MD012: false
 
 # MD013/line-length - Line length
-MD013: 
+MD013:
     line_length: 120
     heading_line_length: 120
     code_block_line_length: 120
@@ -30,8 +30,10 @@ MD013:
     # Stern length checking
     stern: false
 
+MD018: false
+
 # MD033/no-inline-html - Inline HTML
-MD033: 
+MD033:
     allowed_elements: [br, center]
 
 # MD041/first-line-heading/first-line-h1 - First line in a file should be a top-level heading
diff --git a/docs/data/provisional-datamodel.md b/docs/data/provisional-datamodel.md
index b5218120..cb8ff6ec 100644
--- a/docs/data/provisional-datamodel.md
+++ b/docs/data/provisional-datamodel.md
@@ -596,6 +596,6 @@ erDiagram
 - add/remove entities and properties accordingly
 
 
-### Mapping Old -> New
+### Mapping Old -> New
 
 TODO: Add mapping from old to new model.

From 4b57f6a986213cb0cc1e1e4aab3eb3bdf8a33b65 Mon Sep 17 00:00:00 2001
From: Balduin Landolt <33053745+BalduinLandolt@users.noreply.github.com>
Date: Tue, 5 Nov 2024 17:12:14 +0100
Subject: [PATCH 4/8] maybe fix linting issue?

---
 .markdownlint.yml                  | 2 --
 docs/data/provisional-datamodel.md | 2 +-
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/.markdownlint.yml b/.markdownlint.yml
index c467d2fe..8960ebe8 100644
--- a/.markdownlint.yml
+++ b/.markdownlint.yml
@@ -30,8 +30,6 @@ MD013:
     # Stern length checking
     stern: false
 
-MD018: false
-
 # MD033/no-inline-html - Inline HTML
 MD033:
     allowed_elements: [br, center]
diff --git a/docs/data/provisional-datamodel.md b/docs/data/provisional-datamodel.md
index cb8ff6ec..880d558c 100644
--- a/docs/data/provisional-datamodel.md
+++ b/docs/data/provisional-datamodel.md
@@ -596,6 +596,6 @@ erDiagram
 - add/remove entities and properties accordingly
 
 
-### Mapping Old -> New
+### Mapping (Old to New)
 
 TODO: Add mapping from old to new model.

From 7bce4ce356bf20335ca0b0ec3edd85d5752020a8 Mon Sep 17 00:00:00 2001
From: Balduin Landolt <33053745+BalduinLandolt@users.noreply.github.com>
Date: Tue, 5 Nov 2024 17:27:51 +0100
Subject: [PATCH 5/8] Update provisional-datamodel.md

---
 docs/data/provisional-datamodel.md | 134 ++++++++++++++++++++++++++++-
 1 file changed, 133 insertions(+), 1 deletion(-)

diff --git a/docs/data/provisional-datamodel.md b/docs/data/provisional-datamodel.md
index 880d558c..c65e51be 100644
--- a/docs/data/provisional-datamodel.md
+++ b/docs/data/provisional-datamodel.md
@@ -104,6 +104,7 @@ but are identified by their position in the hierarchy.
 
 | Field                    | Type                | Cardinality | Restrictions                                                 | Remarks            |
 | ------------------------ | ------------------- | ----------- | ------------------------------------------------------------ | ------------------ |
+| `__id`                   | string              | 1           |                                                              |                    |
 | `__type`                 | string              | 1           | Literal 'CompoundProject'                                    |                    |
 | `name`                   | string              | 1           |                                                              |                    |
 | `url`                    | url                 | 1           |                                                              |                    |
@@ -448,6 +449,7 @@ erDiagram
     person ||--|{ organization : affiliations
 
     compoundProject {
+        string __id "1"
         string __type "1; Literal 'CompoundProject'"
         string name "1"
         url url "1"
@@ -598,4 +600,134 @@ erDiagram
 
 ### Mapping (Old to New)
 
-TODO: Add mapping from old to new model.
+#### Compound Project
+
+- `compoundProject.__id` : new
+- `compoundProject.__type` : new
+- `compoundProject.name`: new
+- `compoundProject.url`: new
+- `compoundProject.howToCite`: new
+- `compoundProject.description`: new
+- `compoundProject.contactPoint`: new
+- `compoundProject.keywords`: new
+- `compoundProject.disciplines`: new
+- `compoundProject.temporalCoverage`: new
+- `compoundProject.spatialCoverage`: new
+- `compoundProject.funders`: new
+- `compoundProject.publications`: new
+- `compoundProject.grants`: new
+- `compoundProject.alternativeNames`: new
+- `compoundProject.consistingInstitutions`: new
+
+This entity is new and does not have a direct mapping from the old model. 
+All values need to be defined and added manually.
+
+#### Project
+
+- `project.__type`: unchanged
+- `project.shortcode`: unchanged
+- `project.status`: unchanged
+- `project.name`: unchanged
+- `project.description`: unchanged
+- `project.startDate`: unchanged
+- `project.teaserText`: unchanged
+- `project.url`: unchanged
+- `project.howToCite`: unchanged
+- `project.datasets`: unchanged
+- `project.keywords`: unchanged
+- `project.disciplines`: unchanged
+- `project.temporalCoverage`: unchanged
+- `project.spatialCoverage`: unchanged
+- `project.funders`: unchanged
+- `project.endDate`: unchanged
+- `project.secondaryURL`: unchanged
+- `project.dataManagementPlan`: unchanged
+- `project.contactPoint`: unchanged
+- `project.publications`: unchanged
+- `project.grants`: inlined from top level to project
+- `project.alternativeNames`: unchanged
+
+#### Dataset
+
+- `dataset.__id`: unchanged
+- `dataset.__type`: unchanged
+- `dataset.title`: unchanged
+- `dataset.accessConditions`: unchanged
+- `dataset.howToCite`: unchanged
+- `dataset.status`: unchanged
+- `dataset.abstract`: unchanged
+- `dataset.typeOfData`: unchanged
+- `dataset.licenses`: unchanged
+- `dataset.copyright`: newly added
+- `dataset.languages`: unchanged
+- `dataset.attributions`: unchanged
+- `dataset.datePublished`: unchanged
+- `dataset.dateCreated`: unchanged
+- `dataset.dateModified`: unchanged
+- `dataset.distribution`: unchanged
+- `dataset.alternativeTitles`: unchanged
+- `dataset.urls`: unchanged
+- `dataset.additional`: unchanged
+
+#### Collection
+
+- `collection.__id`: new
+- `collection.__type`: new
+- `collection.name`: new
+- `collection.accessConditions`: new
+- `collection.provenance`: new
+- `collection.datePublished`: new
+- `collection.dateCreated`: new
+- `collection.dateModified`: new
+- `collection.distribution`: new
+- `collection.records`: new
+- `collection.collections`: new
+- `collection.alternativeNames`: new
+- `collection.keywords`: new
+- `collection.urls`: new
+- `collection.additional`: new
+- `collection.description`: new
+- `collection.typeOfData`: new
+- `collection.licenses`: new
+- `collection.copyright`: new
+- `collection.languages`: new
+- `collection.attributions`: new
+
+#### Record
+
+- `record.__id`: new
+- `record.__type`: new
+- `record.pid`: new
+- `record.label`: new
+- `record.accessConditions`: new
+- `record.license`: new
+- `record.attribution`: new
+- `record.provenance`: new
+- `record.datePublished`: new
+- `record.dateCreated`: new
+- `record.dateModified`: new
+- `record.typeOfData`: new
+
+#### Person
+
+- `person.__id`: unchanged
+- `person.__type`: unchanged
+- `person.givenNames`: unchanged
+- `person.familyNames`: unchanged
+- `person.jobTitles`: unchanged
+- `person.affiliations`: unchanged
+- `person.address`: unchanged
+- `person.email`: unchanged
+- `person.secondaryEmail`: unchanged
+- `person.authorityRefs`: unchanged
+
+#### Organization
+
+- `organization.__id`: unchanged
+- `organization.__type`: unchanged
+- `organization.name`: unchanged
+- `organization.url`: unchanged
+- `organization.address`: unchanged
+- `organization.email`: unchanged
+- `organization.alternativeName`: unchanged
+- `organization.authorityRefs`: unchanged

From 88323d1a20dcd9b59e80d820a72034161538d06e Mon Sep 17 00:00:00 2001
From: Balduin Landolt <33053745+BalduinLandolt@users.noreply.github.com>
Date: Tue, 5 Nov 2024 17:28:19 +0100
Subject: [PATCH 6/8] revert unrelated changes

---
 .markdownlint.yml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/.markdownlint.yml b/.markdownlint.yml
index 8960ebe8..a4c058ac 100644
--- a/.markdownlint.yml
+++ b/.markdownlint.yml
@@ -1,7 +1,7 @@
 # Config file for https://github.com/igorshubovych/markdownlint-cli
 
 # MD007/ul-indent - Unordered list indentation
-MD007:
+MD007: 
     # Whether to indent the first level of the list
     start_indented: false
     # By how many spaces every next level must be indented. The default of 2 is not compatible with mkdocs!
@@ -14,7 +14,7 @@ MD009: false
 MD012: false
 
 # MD013/line-length - Line length
-MD013:
+MD013: 
     line_length: 120
     heading_line_length: 120
     code_block_line_length: 120
@@ -31,7 +31,7 @@ MD013:
     stern: false
 
 # MD033/no-inline-html - Inline HTML
-MD033:
+MD033: 
     allowed_elements: [br, center]
 
 # MD041/first-line-heading/first-line-h1 - First line in a file should be a top-level heading

From 3ae0b5c19c9630e7d4012f20d364f314fe6ebea8 Mon Sep 17 00:00:00 2001
From: Balduin Landolt <33053745+BalduinLandolt@users.noreply.github.com>
Date: Tue, 5 Nov 2024 18:11:24 +0100
Subject: [PATCH 7/8] Update .markdownlint.yml

---
 .markdownlint.yml | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/.markdownlint.yml b/.markdownlint.yml
index a4c058ac..3a503fca 100644
--- a/.markdownlint.yml
+++ b/.markdownlint.yml
@@ -1,7 +1,7 @@
 # Config file for https://github.com/igorshubovych/markdownlint-cli
 
 # MD007/ul-indent - Unordered list indentation
-MD007: 
+MD007:
     # Whether to indent the first level of the list
     start_indented: false
     # By how many spaces every next level must be indented. The default of 2 is not compatible with mkdocs!
@@ -14,7 +14,7 @@ MD009: false
 MD012: false
 
 # MD013/line-length - Line length
-MD013: 
+MD013:
     line_length: 120
     heading_line_length: 120
     code_block_line_length: 120
@@ -30,8 +30,11 @@ MD013:
     # Stern length checking
     stern: false
 
+MD024:
+    siblings_only: true
+
 # MD033/no-inline-html - Inline HTML
-MD033: 
+MD033:
     allowed_elements: [br, center]
 
 # MD041/first-line-heading/first-line-h1 - First line in a file should be a top-level heading

From 4973ce74cae675a5c6116d2d375b7c26b27c9674 Mon Sep 17 00:00:00 2001
From: Balduin Landolt <33053745+BalduinLandolt@users.noreply.github.com>
Date: Wed, 6 Nov 2024 18:17:47 +0100
Subject: [PATCH 8/8] changes according to discussion

---
 docs/data/provisional-datamodel.md | 408 ++++++++++-------------------
 1 file changed, 137 insertions(+), 271 deletions(-)

diff --git a/docs/data/provisional-datamodel.md b/docs/data/provisional-datamodel.md
index c65e51be..ca104cb6 100644
--- a/docs/data/provisional-datamodel.md
+++ b/docs/data/provisional-datamodel.md
@@ -21,7 +21,7 @@ The metadata model is a hierarchical structure of metadata elements.
 ```mermaid
 
 flowchart TD
-    hyper-project[Hyper-Project /<br/>Uber-Project /<br/>Meta-Project /<br/>Compound Project] -->|1-n| project[Project /<br/>Research Project]
+    hyper-project[Umbrella Project] -->|1-n| project[Research Project]
     project -->|1-n| dataset[Dataset]
     dataset -->|1-n| record[Record /<br/>Resource]
     project -->|0-n| collection[Collection]
@@ -30,7 +30,7 @@ flowchart TD
     collection --> record
 ```
 
-- A `Compound Project` is optional and collects one or more `Research Projects`.  
+- A `Umbrella Project` is optional and collects one or more `Research Projects`.  
   It is typically of institutional nature, 
   not directly tied to a specific funding grant, 
   and may be long-lived.  
@@ -40,7 +40,7 @@ flowchart TD
   It is typically tied to a specific funding grant, 
   and hence has a limited lifetime of ~3-5 years;
   multiple funding rounds and a longer lifetime are possible.  
-  A `Research Project` is part of 0-1 `Compound Project`,
+  A `Research Project` is part of 0-1 `Umbrella Project`,
   it has 1-n `Datasets` and 0-n `Collections`.
 - A `Dataset` is a collection of `Records` within a `Research Project`.  
   It is mostly meant for system-internal and technical use,
@@ -52,7 +52,7 @@ flowchart TD
   and may have a "historical meaning" in the context of the project.  
   Examples may be physical collections such as p person's "Nachlass" in an archive,
   or groupings of records based on a specific research question within a project.  
-  A `Collection` is part of at least 1 `Research Project`, `Compound Project` or `Collection`, 
+  A `Collection` is part of at least 1 `Research Project`, `Umbrella Project` or `Collection`, 
   but can be part of multiple. It may either contain 0-n `Collections` or 1-n `Records`.
 - A `Record` is a single resource within a `Dataset`.  
   It represents a single entity, and the smallest unit that can meaningfully have an identifier. 
@@ -68,7 +68,7 @@ and may be related to various entities within the hierarchy.
 
 A set of metadata consists of the following top-level elements:
 
-- Compound Project
+- Umbrella Project
 - Project
 - Dataset
 - Collection
@@ -87,7 +87,7 @@ but are identified by their position in the hierarchy.
 | Field             | Type            | Cardinality |
 | ----------------- | --------------- | ----------- |
 | `$schema`         | string          | 0-1         |
-| `compoundProject` | compoundProject | 0-1         |
+| `umbrellaProject` | umbrellaProject | 0-1         |
 | `project`         | project         | 1           |
 | `datasets`        | dataset[]       | 1-n         |
 | `collections`     | collection[]    | 0-n         |
@@ -96,77 +96,89 @@ but are identified by their position in the hierarchy.
 | `organizations`   | organization[]  | 0-n         |
 
 
+!!! question
+    Do we consider "permissions" as metadata?  
+    (Not as they are in the DSP, but as they will be in the archive; 
+    that is: "open", "restricted", "embargo", "metadata only".)  
+    If so, this should be added on each level, I suppose.
+
+
 ## Types
 
 ### Entity Types
 
-#### Compound Project
-
-| Field                    | Type                | Cardinality | Restrictions                                                 | Remarks            |
-| ------------------------ | ------------------- | ----------- | ------------------------------------------------------------ | ------------------ |
-| `__id`                   | string              | 1           |                                                              |                    |
-| `__type`                 | string              | 1           | Literal 'CompoundProject'                                    |                    |
-| `name`                   | string              | 1           |                                                              |                    |
-| `url`                    | url                 | 1           |                                                              |                    |
-| `howToCite`              | string              | 1           |                                                              | Needed?            |
-| `projects`               | id[]                | 1-n         | String containing the identifier of a project                |                    |
-| `description`            | lang_string         | 0-1         |                                                              | Optional?          |
-| `contactPoint`           | id                  | 0-1         | String containing the identifier of a person or organization | Optional?          |
-| `keywords`               | lang_string[]       | 0-n         |                                                              | Needed?            |
-| `disciplines`            | lang_string / url[] | 0-n         |                                                              | Needed?            |
-| `temporalCoverage`       | lang_string / url[] | 0-n         |                                                              | Needed?            |
-| `spatialCoverage`        | url[]               | 0-n         |                                                              | Needed?            |
-| `funders`                | id[]                | 0-n         | String containing the identifier of a person                 | Needed?            |
-| `publications`           | publication[]       | 0-n         |                                                              | Needed?            |
-| `grants`                 | grant[]             | 0-n         |                                                              | Needed?            |
-| `alternativeNames`       | lang_string[]       | 0-n         |                                                              | Needed?            |
-| `consistingInstitutions` | id[]                | 0-n         | String containing the identifier of an organization          | Makes sense? Name? |
+#### Unbrella Project
+
+| Field                  | Type          | Card. | Restrictions                                                 |
+| ---------------------- | ------------- | ----- | ------------------------------------------------------------ |
+| `__id`                 | string        | 1     |                                                              |
+| `__type`               | string        | 1     | Literal 'UmbrellaProject'                                    |
+| `name`                 | string        | 1     |                                                              |
+| `projects`             | id[]          | 1-n   | String containing the identifier of a project                |
+| `description`          | lang_string   | 0-1   |                                                              |
+| `alternativeNames`     | lang_string[] | 0-n   |                                                              |
+| `url`                  | url           | 0-1   |                                                              |
+| `contactPoint`         | id            | 0-1   | String containing the identifier of a person or organization |
+| `institutionalPartner` | id[]          | 0-n   | String containing the identifier of an organization          |
 
 !!! question
-    This opens up the questions of how to deal with multiple projects in a compound project. 
+    This opens up the questions of how to deal with multiple projects in a umbrella project. 
     We probably want to keep one entry per project, 
-    so this leaves us with either duplicating the compound project metadata for each project,
-    or having compound project metadata separately and only linking it from the project.
+    so this leaves us with either duplicating the umbrella project metadata for each project,
+    or having umbrella project metadata separately and only linking it from the project.
     The latter seems preferable, 
-    but then the question arises who gets to edit the compound project metadata.  
+    but then the question arises who gets to edit the umbrella project metadata.  
     For a first implementation, we could simply duplicate the metadata for each project, 
     and later factor it out.
 
-!!! important
-    The properties for `Compound Project` were invented by me on the fly. 
-    That does not mean they are correct or useful.
+!!! question
+    what is the best name for `institutionalPartner`?  
+    AI suggested:  
+    - Affiliated Institution  
+    - Associated Body  
+    - Supporting Organization  
+    - Institutional Partner
+
+!!! question
+    How do we capture the time aspect of the data provenance and genesis in this context? Should this be here?  
+    Concretely, an umbrella project is often like a "timeline" of projects, or the "history" of a series of projects.
+
+To make the model of this entity as flexible as possible,
+most of the fields are optional.
 
 
 #### Project
 
-| Field                | Type                | Cardinality | Restrictions                                                 | Remarks               |
-| -------------------- | ------------------- | ----------- | ------------------------------------------------------------ | --------------------- |
-| `__type`             | string              | 1           | Literal "Project"                                            |                       |
-| `shortcode`          | string              | 1           | 4 char hexadecimal                                           |                       |
-| `status`             | string              | 1           | Literal "Ongoing" or "Finished"                              |                       |
-| `name`               | string              | 1           |                                                              |                       |
-| `description`        | lang_string         | 1           |                                                              |                       |
-| `startDate`          | date                | 1           | String of format "YYYY-MM-DD"                                |                       |
-| `teaserText`         | string              | 1           |                                                              |                       |
-| `url`                | url                 | 1           |                                                              |                       |
-| `howToCite`          | string              | 1           |                                                              |                       |
-| `datasets`           | id[]                | 1-n         | String containing the identifier of a dataset                |                       |
-| `keywords`           | lang_string[]       | 1-n         |                                                              |                       |
-| `disciplines`        | lang_string / url[] | 1-n         |                                                              |                       |
-| `temporalCoverage`   | lang_string / url[] | 1-n         |                                                              |                       |
-| `spatialCoverage`    | url[]               | 1-n         |                                                              |                       |
-| `funders`            | id[]                | 1-n         | String containing the identifier of a person or organization | Does this make sense? |
-| `endDate`            | date                | 0-1         | String of format "YYYY-MM-DD"                                |                       |
-| `secondaryURL`       | url                 | 0-1         |                                                              |                       |
-| `dataManagementPlan` | dmp                 | 0-1         |                                                              |                       |
-| `contactPoint`       | id                  | 0-1         | String containing the identifier of a person or organization |                       |
-| `publications`       | publication[]       | 0-n         |                                                              |                       |
-| `grants`             | grant[]             | 0-n         |                                                              | Does this make sense? |
-| `alternativeNames`   | lang_string[]       | 0-n         |                                                              |                       |
+| Field                | Type                | Cardinality | Restrictions                                                 |
+| -------------------- | ------------------- | ----------- | ------------------------------------------------------------ |
+| `__type`             | string              | 1           | Literal "Project"                                            |
+| `shortcode`          | string              | 1           | 4 char hexadecimal                                           |
+| `status`             | string              | 1           | Literal "Ongoing" or "Finished"                              |
+| `name`               | string              | 1           |                                                              |
+| `description`        | lang_string         | 1           |                                                              |
+| `startDate`          | date                | 1           | String of format "YYYY-MM-DD"                                |
+| `teaserText`         | string              | 1           |                                                              |
+| `url`                | url                 | 1           |                                                              |
+| `howToCite`          | string              | 1           |                                                              |
+| `datasets`           | id[]                | 1-n         | String containing the identifier of a dataset                |
+| `keywords`           | lang_string[]       | 1-n         |                                                              |
+| `disciplines`        | lang_string / url[] | 1-n         |                                                              |
+| `temporalCoverage`   | lang_string / url[] | 1-n         |                                                              |
+| `spatialCoverage`    | url[]               | 1-n         |                                                              |
+| `funders`            | id[]                | 1-n         | String containing the identifier of a person or organization |
+| `attributions`       | attribution[]       | 1-n         |                                                              |
+| `endDate`            | date                | 0-1         | String of format "YYYY-MM-DD"                                |
+| `secondaryURL`       | url                 | 0-1         |                                                              |
+| `dataManagementPlan` | dmp                 | 0-1         |                                                              |
+| `contactPoint`       | id                  | 0-1         | String containing the identifier of a person or organization |
+| `publications`       | publication[]       | 0-n         |                                                              |
+| `grants`             | grant[]             | 0-n         |                                                              |
+| `alternativeNames`   | lang_string[]       | 0-n         |                                                              |
 
 !!! question
     If we can have copyright/license on dataset level,
-    do we want to have it on project level as well?
+    do we want to have it on project level as well?  
+    In any case, it should be computed from the datasets/records.
 
 !!! question
     Do we still need funders if we have grants?
@@ -174,34 +186,33 @@ but are identified by their position in the hierarchy.
 !!! question
     What about projects that do not have funding?
 
+!!! question
+    Do we want my proposed `attributions` field n project?
+
+!!! question
+    Should we have an `abstract` field in the project, like we used to have in the dataset?
+
 
 #### Dataset
 
-| Field               | Type              | Cardinality | Restrictions                                            | Remarks                             |
-| ------------------- | ----------------- | ----------- | ------------------------------------------------------- | ----------------------------------- |
-| `__id`              | string            | 1           |                                                         |                                     |
-| `__type`            | string            | 1           | Literal "Dataset"                                       |                                     |
-| `title`             | string            | 1           |                                                         |                                     |
-| `accessConditions`  | string            | 1           | Literal "open", "restricted" or "closed"                | change to proper terms              |
-| `howToCite`         | string            | 1           |                                                         |                                     |
-| `status`            | string            | 1           | Literal "In Planning", "Ongoing", "On hold", "Finished" | not aligned with project status     |
-| `abstract`          | lang_string / url | 1-n         |                                                         | naming: maybe 'description'?        |
-| `typeOfData`        | string[]          | 1-n         | Literal "XML", "Text", "Image", "Video", "Audio"        | does this still make sense?         |
-| `licenses`          | license[]         | 1-n         |                                                         | should be computed from the records |
-| `copyright`         | string[]          | 1-n         |                                                         | computed along with license         |
-| `languages`         | lang_string[]     | 1-n         |                                                         | does this make sense?               |
-| `attributions`      | attribution[]     | 1-n         |                                                         | can this be calculated?             |
-| `datePublished`     | date              | 0-1         |                                                         |                                     |
-| `dateCreated`       | date              | 0-1         |                                                         |                                     |
-| `dateModified`      | date              | 0-1         |                                                         |                                     |
-| `distribution`      | url               | 0-1         |                                                         | does this make sense?               |
-| `alternativeTitles` | lang_string[]     | 0-n         |                                                         |                                     |
-| `urls`              | url[]             | 0-n         |                                                         |                                     |
-| `additional`        | lang_string / url | 0-n         |                                                         |                                     |
+| Field          | Type          | Cardinality | Restrictions                                     | Remarks                                                 |
+| -------------- | ------------- | ----------- | ------------------------------------------------ | ------------------------------------------------------- |
+| `__id`         | string        | 1           |                                                  |                                                         |
+| `__type`       | string        | 1           | Literal "Dataset"                                |                                                         |
+| `title`        | string        | 1           |                                                  | may be auto-generated?                                  |
+| `typeOfData`   | string[]      | 1-n         | Literal "XML", "Text", "Image", "Video", "Audio" | does this still make sense? should it be cardinality 1? |
+| `licenses`     | license[]     | 1-n         |                                                  | should be computed from the records                     |
+| `copyright`    | string[]      | 1-n         |                                                  | computed along with license                             |
+| `attributions` | attribution[] | 1-n         |                                                  | can this be computed?                                   |
+| `howToCite`    | string        | 0-1         |                                                  | still wanted?                                           |
+| `description`  | lang_string   | 0-1         |                                                  |                                                         |
+| `dateCreated`  | date          | 0-1         |                                                  |                                                         |
 
-!!! question
-    Do we conssider datasets something merely "internal"?  
-    If so, do metadata on datasets even make sense at all? Should we even "expose" datasets publicly?
+!!! note
+    If we think of a dataset as something internal, 
+    we should limit the metadata to what is necessary for the system to work.  
+    Additionally, we may want to have some minimal descriptive metadata for the dataset, 
+    (like for the use case that a project once a year grabs a box of achrival material and digitizes it). 
 
 !!! question
     Do we need to store the license on the dataset level, 
@@ -224,6 +235,17 @@ but are identified by their position in the hierarchy.
 !!! question
     Do we need a reference to the records in the dataset?
 
+!!! question
+    Does `dateCreated` suffice here? There were more date properties in the old model.
+
+Data sets arefor internal use, 
+they serve to partition the data into manageable chunks. 
+This is done both by type of data (RDF vs. assets), and by size.
+
+In some cases, there may be a "logical" grouping consisting a dataset, 
+e.g. if data is digitized in a batch and there is a temporal separation between the batches.  
+In these cases, the project may make use of the descriptive metadata of the dataset. 
+But normally, the dataset is just a technical entity, and should not carry semantic information.
 
 #### Collection
 
@@ -232,11 +254,13 @@ but are identified by their position in the hierarchy.
 | `__id`             | string            | 1           |                                                  |                                                          |
 | `__type`           | string            | 1           | Literal 'Collection'                             |                                                          |
 | `name`             | string            | 1           |                                                  |                                                          |
-| `accessConditions` | string            | 1           | Literal "open", "restricted" or "closed"         | copied from dataset; change to proper terms              |
+| `description`      | string / url      | 1-n         |                                                  |                                                          |
+| `typeOfData`       | string[]          | 1-n         | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; does this still make sense?         |
+| `licenses`         | license[]         | 1-n         |                                                  | copied from dataset; should be computed from the records |
+| `copyright`        | string[]          | 1-n         |                                                  | computed along with license                              |
+| `languages`        | lang_string[]     | 1-n         |                                                  | copied from dataset; does this make sense?               |
+| `attributions`     | attribution[]     | 1-n         |                                                  | copied from dataset; can this be calculated?             |
 | `provenance`       | string            | 0-1         |                                                  |                                                          |
-| `datePublished`    | date              | 0-1         |                                                  | copied from dataset; do we still need those?             |
-| `dateCreated`      | date              | 0-1         |                                                  | copied from dataset; do we still need those?             |
-| `dateModified`     | date              | 0-1         |                                                  | copied from dataset; do we still need those?             |
 | `distribution`     | url               | 0-1         |                                                  | copied from dataset; does this make sense?               |
 | `records`          | id[]              | 0-n         | Record IDs                                       | can be 0 in case it points to a collection               |
 | `collections`      | id[]              | 0-n         | Collection IDs                                   |                                                          |
@@ -244,17 +268,6 @@ but are identified by their position in the hierarchy.
 | `keywords`         | lang_string[]     | 0-n         |                                                  | does this make sense?                                    |
 | `urls`             | url[]             | 0-n         |                                                  | copied from dataset;                                     |
 | `additional`       | lang_string / url | 0-n         |                                                  | copied from dataset;                                     |
-| `description`      | string / url      | 1-n         |                                                  |                                                          |
-| `typeOfData`       | string[]          | 1-n         | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; does this still make sense?         |
-| `licenses`         | license[]         | 1-n         |                                                  | copied from dataset; should be computed from the records |
-| `copyright`        | string[]          | 1-n         |                                                  | computed along with license                              |
-| `languages`        | lang_string[]     | 1-n         |                                                  | copied from dataset; does this make sense?               |
-| `attributions`     | attribution[]     | 1-n         |                                                  | copied from dataset; can this be calculated?             |
-
-
-!!! important
-    The properties for `Compound Project` were invented by me on the fly. 
-    That does not mean they are correct or useful.
 
 
 !!! question
@@ -279,10 +292,6 @@ but are identified by their position in the hierarchy.
 | `dateModified`     | date        | 0-1         |                                                  | copied from dataset; do they make sense?                 |
 | `typeOfData`       | string      | 0-1         | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; wanted? what values?                |
 
-!!! important
-    The properties for `Record` were invented by me on the fly.
-    That does not mean they are correct or useful.
-
 !!! question
     How granular do we want to be with the metadata on the record level?
 
@@ -436,7 +445,7 @@ or a reference to a resource in an external authority file.
 
 ```mermaid
 erDiagram
-    compoundProject |o--|{ project : projects
+    umbrellaProject |o--|{ project : projects
     project ||--|{ dataset : datasets
     project ||--|| person : contactPoint
     project ||--|| organization : contactPoint
@@ -448,30 +457,23 @@ erDiagram
     collection |o--o{ record : records
     person ||--|{ organization : affiliations
 
-    compoundProject {
+    umbrellaProject {
         string __id "1"
-        string __type "1; Literal 'CompoundProject'"
+        string __type "1; Literal 'UmbrellaProject'"
         string name "1"
-        url url "1"
-        string howToCite "1"
-        lang_string description "0-1"
-        id contactPoint "0-1"
         id[] projects "1-n; Project IDs"
-        lang_string[] keywords "0-n"
-        lang_string_or_url[] disciplines "0-n"
-        lang_string_or_url[] temporalCoverage "0-n"
-        url[] spatialCoverage "0-n"
-        id[] funders "0-n; Person or Organization IDs"
-        publication[] publications "0-n"
-        grant[] grants "0-n"
+        lang_string description "0-1"
         lang_string[] alternativeNames "0-n"
-        id[] consistingInstitutions "0-n; Organization IDs"
+        url url "0-1"
+        id contactPoint "0-1"
+        id[] institutionalPartner "0-n; Organization IDs"
     }
     
     project {
+        string __id "1"
         string __type "1; Literal 'Project'"
         string shortcode "1"
-        string status "1; Literal 'Ongoing' or 'Finished'"
+        string status "1; Literal 'Ongoing', 'Finished'"
         string name "1"
         lang_string description "1"
         date startDate "1"
@@ -484,6 +486,7 @@ erDiagram
         lang_string_or_url[] temporalCoverage "1-n"
         url[] spatialCoverage "1-n"
         id[] funders "1-n; Person or Organization IDs"
+        attribution[] attributions "1-n"
         date endDate "0-1"
         url secondaryURL "0-1"
         dmp dataManagementPlan "0-1"
@@ -497,22 +500,13 @@ erDiagram
         string __id "1"
         string __type "1; Literal 'Dataset'"
         string title "1"
-        string accessConditions "1; Literal 'open', 'restricted' or 'closed'"
-        string howToCite "1"
-        string status "1; Literal 'In Planning', 'Ongoing', 'On hold', 'Finished'"
-        lang_string_or_url[] abstract "1-n"
         string[] typeOfData "1-n; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'"
         license[] licenses "1-n"
         string[] copyright "1-n"
-        lang_string[] languages "1-n"
         attribution[] attributions "1-n"
-        date datePublished "0-1"
+        string howToCite "0-1"
+        lang_string description "0-1"
         date dateCreated "0-1"
-        date dateModified "0-1"
-        url distribution "0-1"
-        lang_string[] alternativeTitles "0-n"
-        url[] urls "0-n"
-        lang_string_or_url[] additional "0-n"
     }
 
     collection {
@@ -584,150 +578,22 @@ erDiagram
 
 ## Change Log
 
-### Changes
 
 - Make `Grant` a value type and remove it from the top level.
-- Added entity `compoundProject` to the top level.
+- Added entity `umbrellaProject` to the top level.
 - Added entity `collection` to the top level.
 - Added entity `record` to the top level.
 - Added `copyright` to `dataset`.
-
-### Implementation/migration Notes
-
-- inline grant in project
-- add/remove entities and properties accordingly
-
-
-### Mapping (Old to New)
-
-#### Compound Project
-
-- `compoundProject.__id` : new
-- `compoundProject.__type` : new
-- `compoundProject.name`: new
-- `compoundProject.url`: new
-- `compoundProject.howToCite`: new
-- `compoundProject.description`: new
-- `compoundProject.contactPoint`: new
-- `compoundProject.keywords`: new
-- `compoundProject.disciplines`: new
-- `compoundProject.temporalCoverage`: new
-- `compoundProject.spatialCoverage`: new
-- `compoundProject.funders`: new
-- `compoundProject.publications`: new
-- `compoundProject.grants`: new
-- `compoundProject.alternativeNames`: new
-- `compoundProject.consistingInstitutions`: new
-
-This entity is new and does not have a direct mapping from the old model. 
-All values need to be defined and added manually.
-
-#### Project
-
-- `project.__type`: unchanged
-- `project.shortcode`: unchanged
-- `project.status`: unchanged
-- `project.name`: unchanged
-- `project.description`: unchanged
-- `project.startDate`: unchanged
-- `project.teaserText`: unchanged
-- `project.url`: unchanged
-- `project.howToCite`: unchanged
-- `project.datasets`: unchanged
-- `project.keywords`: unchanged
-- `project.disciplines`: unchanged
-- `project.temporalCoverage`: unchanged
-- `project.spatialCoverage`: unchanged
-- `project.funders`: unchanged
-- `project.endDate`: unchanged
-- `project.secondaryURL`: unchanged
-- `project.dataManagementPlan`: unchanged
-- `project.contactPoint`: unchanged
-- `project.publications`: unchanged
-- `project.grants`: inlined from top level to project
-- `project.alternativeNames`: unchanged
-
-#### Dataset
-
-- `dataset.__id`: unchanged
-- `dataset.__type`: unchanged
-- `dataset.title`: unchanged
-- `dataset.accessConditions`: unchanged
-- `dataset.howToCite`: unchanged
-- `dataset.status`: unchanged
-- `dataset.abstract`: unchanged
-- `dataset.typeOfData`: unchanged
-- `dataset.licenses`: unchanged
-- `dataset.copyright`: newly added
-- `dataset.languages`: unchanged
-- `dataset.attributions`: unchanged
-- `dataset.datePublished`: unchanged
-- `dataset.dateCreated`: unchanged
-- `dataset.dateModified`: unchanged
-- `dataset.distribution`: unchanged
-- `dataset.alternativeTitles`: unchanged
-- `dataset.urls`: unchanged
-- `dataset.additional`: unchanged
-
-#### Collection
-
-- `collection.__id`: new
-- `collection.__type`: new
-- `collection.name`: new
-- `collection.accessConditions`: new
-- `collection.provenance`: new
-- `collection.datePublished`: new
-- `collection.dateCreated`: new
-- `collection.dateModified`: new
-- `collection.distribution`: new
-- `collection.records`: new
-- `collection.collections`: new
-- `collection.alternativeNames`: new
-- `collection.keywords`: new
-- `collection.urls`: new
-- `collection.additional`: new
-- `collection.description`: new
-- `collection.typeOfData`: new
-- `collection.licenses`: new
-- `collection.copyright`: new
-- `collection.languages`: new
-- `collection.attributions`: new
-
-#### Record
-
-- `record.__id`: new
-- `record.__type`: new
-- `record.pid`: new
-- `record.label`: new
-- `record.accessConditions`: new
-- `record.license`: new
-- `record.attribution`: new
-- `record.provenance`: new
-- `record.datePublished`: new
-- `record.dateCreated`: new
-- `record.dateModified`: new
-- `record.typeOfData`: new
-
-#### Person
-
-- `person.__id`: unchanged
-- `person.__type`: unchanged
-- `person.givenNames`: unchanged
-- `person.familyNames`: unchanged
-- `person.jobTitles`: unchanged
-- `person.affiliations`: unchanged
-- `person.address`: unchanged
-- `person.email`: unchanged
-- `person.secondaryEmail`: unchanged
-- `person.authorityRefs`: unchanged
-
-#### Organization
-
-- `organization.__id`: unchanged
-- `organization.__type`: unchanged
-- `organization.name`: unchanged
-- `organization.url`: unchanged
-- `organization.address`: unchanged
-- `organization.email`: unchanged
-- `organization.alternativeName`: unchanged
-- `organization.authorityRefs`: unchanged
+- Changed type of `abstract`/`description` in `dataset` to `lang_string`.
+- Changed cardinality of `abstract`/`description` in `dataset` to 1.
+- Changed cardinality of `howToCite` in `dataset` to 0-1.
+- Changed cardinality of `description` in `dataset` to 0-1.
+- Removed `accessConditions` from `dataset`.
+- Removed `status` from `dataset`.
+- Renamed `abstract` to `description` in `dataset`.
+- Removed `languages` from `dataset`.
+- Removed `datePublished`, and `dateModified` from `dataset`.
+- Removed `distribution` from `dataset`.
+- Removed `additional` from `dataset`.
+- Removed `alternativeTitles` from `dataset`.
+- Removed `urls` from `dataset`.