From 0094fbe232f50d3e5377e43228073383704cc27e Mon Sep 17 00:00:00 2001
From: Ivan Subotic <400790+subotic@users.noreply.github.com>
Date: Tue, 26 Nov 2024 13:10:55 +0100
Subject: [PATCH] docs: extend documentation for future metadata model (#283)

Co-authored-by: danielasubotic <48174709+danielasubotic@users.noreply.github.com>
---
 docs/{ => data}/adding-metadata.md |   0
 docs/data/future-datamodel.md      | 670 +++++++++++++++++++++++++++++
 docs/data/provisional-datamodel.md | 599 --------------------------
 docs/index.md                      |  37 +-
 mkdocs.yml                         |   4 +-
 5 files changed, 703 insertions(+), 607 deletions(-)
 rename docs/{ => data}/adding-metadata.md (100%)
 create mode 100644 docs/data/future-datamodel.md
 delete mode 100644 docs/data/provisional-datamodel.md

diff --git a/docs/adding-metadata.md b/docs/data/adding-metadata.md
similarity index 100%
rename from docs/adding-metadata.md
rename to docs/data/adding-metadata.md
diff --git a/docs/data/future-datamodel.md b/docs/data/future-datamodel.md
new file mode 100644
index 00000000..680b3d09
--- /dev/null
+++ b/docs/data/future-datamodel.md
@@ -0,0 +1,670 @@
+m# Future Data Model
+
+!!! warning
+This document does _not_ represent the current state of the metadata model.  
+It is a working document for planned upcoming changes to the metadata model.
+
+!!! note
+This model is an idealized version of the metadata model.
+With the current implementation that is entirely separate from the DSP,
+it is not feasible to implement metadata on the record level.  
+Such a system may be implemented in the archive in the future,
+but for now, we will keep the metadata on the dataset level.  
+A separate, simplified model for applying some of these changes,
+while remaining compatible with the current implementation,
+should be created alongside this model.
+
+The enhancements to the DSP metadata model are thoughtfully designed to better accommodate
+the inherent complexity of humanities projects, while still being flexible enough to
+support simpler project structures.
+
+One of the key improvements is the introduction of an additional hierarchical level above
+the research project, which we refer to as the umbrella project. This allows for a more
+accurate representation of overarching initiatives that span multiple research projects
+over extended periods. Additionally, we have implemented collections and subcollections
+to facilitate more precise referencing and organization of different parts of the data.
+
+By expanding our metadata model in this way, we aim to provide a more robust framework
+that supports the integrity and longevity of humanities research data. This evolution
+reflects our commitment to capturing the rich, nuanced histories of research projects
+with greater accuracy and detail.
+
+## Overview
+
+The metadata model is a hierarchical structure of metadata elements.
+
+```mermaid
+
+flowchart TD
+    hyper-project[Umbrella Project] -->|1-n| project[Research Project]
+    project -->|1-n| dataset[Dataset]
+    dataset -->|1-n| record[Record]
+    project -->|0-n| collection[Collection]
+    collection --> collection
+    hyper-project -->|0-n| collection
+    collection --> record
+```
+
+- A `Umbrella Project` is optional and collects one or more `Research Projects`.  
+  It is typically of institutional nature,
+  not directly tied to a specific funding grant,
+  and may be long-lived.  
+  Examples are EKWS/CAS, BEOL or LIMC.
+- A `Research Project` is the main entity of the metadata model.  
+  It corresponds to a `project` in the DSP.
+  It is typically tied to a specific funding grant,
+  and hence has a limited lifetime of ~3-5 years;
+  multiple funding rounds and a longer lifetime are possible.  
+  A `Research Project` is part of 0-1 `Umbrella Project`,
+  it has 1-n `Datasets` and 0-n `Collections`.
+- A `Dataset` is a collection of `Records` within a `Research Project`.  
+  It is mostly meant for system-internal and technical use,
+  and should not have particular semantics or a "historical meaning" in the context of the project.  
+  A `Dataset` is part of exactly 1 `Research Project`
+  and contains 1-n `Records`.
+- A `Collection` is also a collection of `Records` within a `Research Project`.  
+  It is meant for semantic grouping of `Records` within a `Research Project`,
+  and may have a "historical meaning" in the context of the project.  
+  Examples may be physical collections such as p person's "Nachlass" in an archive,
+  or groupings of records based on a specific research question within a project.  
+  A `Collection` is part of at least 1 `Research Project`, `Umbrella Project` or `Collection`,
+  but can be part of multiple. It may either contain 0-n `Collections` or 1-n `Records`.
+- A `Record` is a single entry within a `Dataset`.  
+  It represents a single entity, and the smallest unit that can meaningfully have an identifier.
+  It maps to a `knora-base:Resource` (DSP-API) or an `Asset` (SIPI/Ingest) in the DSP.  
+  A `Record` is part of exactly 1 `Dataset` and may be part of 0-n `Collections`.
+
+Additionally, there are the entities `Person` and `Organization`:  
+`Person` and `Organization` are entities that are independent of the `Research Project` hierarchy,
+and may be related to various entities within the hierarchy.
+
+## Top Level
+
+A set of metadata consists of the following top-level elements:
+
+- Umbrella Project
+- Project
+- Dataset
+- Collection
+- Record
+- Person
+- Organization
+
+Each of these elements is an entity identified by a unique identifier.
+Other elements can refer to these entities by their identifier.
+
+Any other metadata element may itself be a complex object,
+but it is always part of one of the top-level elements.
+Such elements do not have an identifier,
+but are identified by their position in the hierarchy.
+
+| Field             | Type            | Cardinality |
+|-------------------|-----------------|-------------|
+| `$schema`         | string          | 0-1         |
+| `umbrellaProject` | umbrellaProject | 0-1         |
+| `project`         | project         | 1           |
+| `datasets`        | dataset[]       | 1-n         |
+| `collections`     | collection[]    | 0-n         |
+| `records`         | record[]        | 0-n         |
+| `persons`         | person[]        | 0-n         |
+| `organizations`   | organization[]  | 0-n         |
+
+!!! question
+Do we consider "permissions" as metadata?  
+(Not as they are in the DSP, but as they will be in the archive;
+that is: "open", "restricted", "embargo", "metadata only".)  
+If so, this should be added on each level, I suppose.
+
+!!! answer
+Yes, as COAR indicates, [COAR Access Rights](https://vocabularies.coar-repositories.org/access_rights/)
+
+## Types
+
+### Entity Types
+
+#### Umbrella Project
+
+| Field                  | Type          | Card. | Restrictions                                                 |
+|------------------------|---------------|-------|--------------------------------------------------------------|
+| `pid`                  | id            | 1     | or `ARK`? -> probably use `pid`                              |
+| `__id`                 | string        | 1     |                                                              |
+| `__type`               | string        | 1     | Literal 'UmbrellaProject'                                    |
+| `name`                 | string        | 1     |                                                              |
+| `projects`             | id[]          | 1-n   | String containing the identifier of a project                |
+| `description`          | lang_string   | 0-1   |                                                              |
+| `alternativeNames`     | lang_string[] | 0-n   |                                                              |
+| `url`                  | url           | 0-1   |                                                              |
+| `contactPoint`         | id            | 0-1   | String containing the identifier of a person or organization |
+| `institutionalPartner` | id[]          | 0-n   | String containing the identifier of an organization          |
+
+!!! question
+This opens up the questions of how to deal with multiple projects in a umbrella project.
+We probably want to keep one entry per project,
+so this leaves us with either duplicating the umbrella project metadata for each project,
+or having umbrella project metadata separately and only linking it from the project.
+The latter seems preferable,
+but then the question arises who gets to edit the umbrella project metadata.  
+For a first implementation, we could simply duplicate the metadata for each project,
+and later factor it out.
+
+!!! question
+what is the best name for `institutionalPartner`?  
+AI suggested:
+
+- Affiliated Institution
+- Associated Body
+- Supporting Organization
+- Institutional Partner
+
+!!! answer
+We don't need `institutionalPartner` since contactPoint can be an organziation or a person.
+
+!!! question
+How do we capture the time aspect of the data provenance and genesis in this context? Should this be here?  
+Concretely, an umbrella project is often like a "timeline" of projects, or the "history" of a series of projects.
+
+!!! answer
+We don't need this information here on this level. The umbrella project needs to know what projects are under it and
+then it's a matter of displaying the timeline.
+
+To make the model of this entity as flexible as possible,
+most of the fields are optional.
+
+#### Project
+
+| Field                | Type                | Cardinality | Restrictions                                                 |
+|----------------------|---------------------|-------------|--------------------------------------------------------------|
+| `pid`                | id                  | 1           | or `ARK`? -> probably use `pid`                              |
+| `__type`             | string              | 1           | Literal "Project"                                            |
+| `shortcode`          | string              | 1           | 4 char hexadecimal                                           |
+| `status`             | string              | 1           | Literal "Ongoing" or "Finished"                              |
+| `name`               | string              | 1           |                                                              |
+| `description`        | lang_string         | 1           |                                                              |
+| `startDate`          | date                | 1           | String of format "YYYY-MM-DD"                                |
+| `teaserText`         | string              | 1           |                                                              |
+| `url`                | url                 | 1           |                                                              |
+| `howToCite`          | string              | 1           |                                                              |
+| `datasets`           | id[]                | 1-n         | String containing the identifier of a dataset                |
+| `keywords`           | lang_string[]       | 1-n         |                                                              |
+| `disciplines`        | lang_string / url[] | 1-n         |                                                              |
+| `temporalCoverage`   | lang_string / url[] | 1-n         |                                                              |
+| `spatialCoverage`    | url[]               | 1-n         |                                                              |
+| `funders`            | id[]                | 1-n         | String containing the identifier of a person or organization |
+| `attributions`       | attribution[]       | 1-n         |                                                              |
+| `endDate`            | date                | 0-1         | String of format "YYYY-MM-DD"                                |
+| `secondaryURL`       | url                 | 0-1         |                                                              |
+| `dataManagementPlan` | dmp                 | 0-1         |                                                              |
+| `contactPoint`       | id                  | 0-1         | String containing the identifier of a person or organization |
+| `publications`       | publication[]       | 0-n         |                                                              |
+| `grants`             | grant[]             | 0-n         |                                                              |
+| `alternativeNames`   | lang_string[]       | 0-n         |                                                              |
+
+!!! question
+If we can have copyright/license on dataset level,
+do we want to have it on project level as well?
+
+!!! answer
+Since we have copyright/license on a record level, everything above should be a computed field if available and
+optionally added manually. And then it's a matter of displaying it.
+
+!!! question
+Do we still need funders if we have grants?
+
+!!! answer
+No, we don't need funders.
+
+!!! question
+What about projects that do not have funding?
+
+!!! answer
+Then it's self-funded.
+
+!!! question
+Do we want my proposed `attributions` field n project?
+
+!!! answer
+Yes, but it should be a computed field if available and optionally added manually.
+
+!!! question
+Should we have an `abstract` field in the project, like we used to have in the dataset?
+
+!!! answer
+We should only have it in the project but not in the dataset anymore.
+
+#### Dataset
+
+| Field          | Type          | Cardinality | Restrictions                                     | Remarks                                                                                                                                                                                         |
+|----------------|---------------|-------------|--------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `pid`          | id            | 1           |                                                  | or `ARK`? -> probably use `pid`                                                                                                                                                                 |
+| `__id`         | string        | 1           |                                                  |                                                                                                                                                                                                 |
+| `__type`       | string        | 1           | Literal "Dataset"                                |                                                                                                                                                                                                 |
+| `title`        | string        | 1           |                                                  | may be auto-generated? -> No                                                                                                                                                                    |
+| `typeOfData`   | string[]      | 1-n         | Literal "XML", "Text", "Image", "Video", "Audio" | does this still make sense? should it be cardinality 1? -> This does still make sense but it should be computed if available and optionally added manually. And the cardinality needs to be 1-n |
+| `licenses`     | license[]     | 1-n         |                                                  | should be computed from the records if available and optionally added manually.                                                                                                                 |
+| `copyright`    | string[]      | 1-n         |                                                  | computed along with license -> should be computed from the records if available and optionally added manually.                                                                                  |
+| `attributions` | attribution[] | 1-n         |                                                  | can this be computed? -> Yes, if available and optionally added manually.                                                                                                                       |
+| `howToCite`    | string        | 0-1         |                                                  | still wanted? -> A generated field along with the ARK.                                                                                                                                          |
+| `description`  | lang_string   | 0-1         |                                                  |                                                                                                                                                                                                 |
+| `dateCreated`  | date          | 0-1         |                                                  |                                                                                                                                                                                                 |
+
+!!! question
+Are PIDs missing for umbrella-project, dataset and collection? Are generated how-to-cites missing for them as well?
+
+!!! answer
+Yes, we need PIDs for all levels (umbrella-project, dataset and collection).
+
+!!! note
+If we think of a dataset as something internal,
+we should limit the metadata to what is necessary for the system to work.  
+Additionally, we may want to have some minimal descriptive metadata for the dataset,
+(like for the use case that a project once a year grabs a box of archival material and digitizes it).
+
+!!! question
+Do we need to store the license on the dataset level,
+or can we compute it from the records?  
+If we store it on the dataset level,
+how do we deal with datasets that contain records with different licenses?
+
+!!! answer
+We compute it if available and optionally added manually. And when there are different licenses then we display those.
+
+!!! question
+Do we need to store the language on the dataset level,
+or can we compute it from the records?  
+If we store it on the dataset level,
+how do we deal with datasets that contain records in different languages?
+
+!!! answer
+We compute it if available and optionally added manually. And when there are different languages then we display those.
+
+!!! question
+Do we need to store the attribution on the dataset level,
+or can we compute it from the records?  
+If we store it on the dataset level,
+how do we deal with datasets that contain records with different attributions?
+
+!!! answer
+We compute it if available and optionally added manually. And when there are different attributions then we display
+those. We need to keep in mind that we don't inlcude us in these computations.
+
+!!! question
+Do we need a reference to the records in the dataset?
+
+!!! answer
+At the moment no. Unsure. Tbd.
+
+!!! question
+Does `dateCreated` suffice here? There were more date properties in the old model.
+
+!!! answer
+What is the meaning of `dateCreated` in this context?
+
+~~Datasets are for internal use,they serve to partition the data into manageable chunks. This is done both by type of
+data (RDF vs. assets), and by size.~~
+
+~~In some cases, there may be a "logical" grouping consisting a dataset, e.g. if data is digitized in a batch and there
+is a temporal separation between the batches. In these cases, the project may make use of the descriptive metadata of
+the dataset. But normally, the dataset is just a technical entity, and should not carry semantic information.~~
+
+A project can have more than one dataset if it's the project's wish and if it provides meaningful grouping of the
+records e.g., 2 researchers worked one one part of the data and the 2 other researchers on the other part of the data,
+EKWS digitizing different boxes and each box becomes a dataset.
+A record can only be part of one dataset.
+
+#### Collection
+
+| Field              | Type              | Cardinality | Restrictions                                     | Remarks                                                                                                                                                    |
+|--------------------|-------------------|-------------|--------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `pid`              | id                | 1           |                                                  | or `ARK`? -> probably use `pid`                                                                                                                            |
+| `__id`             | string            | 1           |                                                  |                                                                                                                                                            |
+| `__type`           | string            | 1           | Literal 'Collection'                             |                                                                                                                                                            |
+| `name`             | string            | 1           |                                                  |                                                                                                                                                            |
+| `description`      | string / url      | 1-n         |                                                  |                                                                                                                                                            |
+| `typeOfData`       | string[]          | 1-n         | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; does this still make sense? -> Maybe not.                                                                                             |
+| `licenses`         | license[]         | 1-n         |                                                  | copied from dataset; should be computed from the records if available and optionally added manually.                                                       |
+| `copyright`        | string[]          | 1-n         |                                                  | computed along with license -> should be computed from the records if available and optionally added manually.                                             |
+| `languages`        | lang_string[]     | 1-n         |                                                  | copied from dataset; does this make sense? -> computed if available and optionally added manually.                                                         |
+| `attributions`     | attribution[]     | 1-n         |                                                  | copied from dataset; can this be calculated? -> Yes, if available and optionally added manually.                                                           |
+| `provenance`       | string            | 0-1         |                                                  | -> needed, see: [openAIRE Guidelines](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_source.html#dc-source) |
+| `distribution`     | url               | 0-1         |                                                  | copied from dataset; does this make sense? -> not needed                                                                                                   |
+| `records`          | id[]              | 0-n         | Record IDs                                       | can be 0 in case it points to a collection                                                                                                                 |
+| `collections`      | id[]              | 0-n         | Collection IDs                                   |                                                                                                                                                            |
+| `alternativeNames` | lang_string[]     | 0-n         |                                                  |                                                                                                                                                            |
+| `keywords`         | lang_string[]     | 0-n         |                                                  | does this make sense? -> Interesting for the search.                                                                                                       |
+| `urls`             | url[]             | 0-n         |                                                  | copied from dataset;                                                                                                                                       |
+| `additional`       | lang_string / url | 0-n         |                                                  | copied from dataset;  -> Probably not needed.                                                                                                              |
+
+!!! question
+Do we need a reference to the records in the collection?
+
+!!! answer
+Yes, we would need that.
+
+#### Record
+
+| Field               | Type        | Cardinality | Restrictions                                     | Remarks                                                                                                                                                                                                                                                    |
+|---------------------|-------------|-------------|--------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `__id`              | string      | 1           |                                                  |                                                                                                                                                                                                                                                            |
+| `__type`            | string      | 1           | Literal 'Record'                                 |                                                                                                                                                                                                                                                            |
+| `pid`               | id          | 1           |                                                  | or `ARK`? -> probably use `pid`                                                                                                                                                                                                                            |
+| `label`             | lang_string | 1           |                                                  | do we want this, or does it go too far? -> We want to keep it because it's the "name" of the record. But we can think about renaming it.                                                                                                                   |
+| `accessConditions`  | string      | 1           | Literal "open", "restricted" or "closed"         | copied from dataset; change to proper terms -> open, restricted, embargoed, metadata-only and renaming  `accessConditions` to `rights` to be in line with openAIRE.                                                                                        |
+| `embargoPeriodDate` | date        | 0-1         |                                                  | -> needs to be added to be in line with openAIRE, e.g., ```<datacite:dates> <datacite:date dateType="Accepted">2011-12-01</datacite:date> <datacite:date dateType="Available">2012-12-01</datacite:date> </datacite:dates>```                              |
+| `publisher`         | string      | 1           |                                                  | should be DaSCH                                                                                                                                                                                                                                            |
+| `license`           | license     | 1           |                                                  | copied from dataset; should be computed from the records -> No, you have to indicate the license here. Computation is not possible.                                                                                                                        |
+| `copyright`         | string      | 1           |                                                  | computed along with license -> -> No, you have to indicate the copyright here. Computation is not possible.                                                                                                                                                |
+| `attribution`       | attribution | 1           |                                                  | do we want this, or does it go too far? -> Yes                                                                                                                                                                                                             |
+| `provenance`        | string      | 0-1         |                                                  | do we want this, or does it go too far? -> Yes, [openAIRE data-source](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_source.html#dc-source)                                                                |
+| `datePublished`     | date        | 0-1         |                                                  | copied from dataset; do they make sense? -> Yes                                                                                                                                                                                                            |
+| `dateCreated`       | date        | 0-1         |                                                  | copied from dataset; do they make sense?  -> Yes                                                                                                                                                                                                           |
+| `dateModified`      | date        | 0-1         |                                                  | copied from dataset; do they make sense?   -> Yes                                                                                                                                                                                                          |
+| `typeOfData`        | string      | 0-1         | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; wanted? what values?    -> Yes, type is computed and should represent: [openAIRE Resource Type](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_publicationtype.html#aire-resourcetype) |
+| `size`              | string      | 0-1         |                                                  | needs to be added, see: [openAIRE Size](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_size.html#dci-size)                                                                                                  |
+| `audience`          | string      | 0-n         |                                                  | needs to be added, see: [openAIRE Audience](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_audience.html#dct-audience)                                                                                      |
+
+!!! question
+How granular do we want to be with the metadata on the record level?
+
+!!! answer
+We need provenance,
+see: [openAIRE Source](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_source.html#dc-source)
+
+!!! question
+If we have copyright, what is the purpose of attribution?
+
+!!! answer
+Copyright doesn't have anything to do with attribution. Attribution is who did something with the data. Copyright is
+person/organization who holds the right to this record and can give others the permission to do something with this
+record aka license.
+
+#### Person
+
+| Field            | Type     | Cardinality | Restrictions                           | Remarks |
+|------------------|----------|-------------|----------------------------------------|---------|
+| `__id`           | string   | 1           |                                        |         |
+| `__type`         | string   | 1           | Literal 'Person'                       |         |
+| `givenNames`     | string[] | 1-n         |                                        |         |
+| `familyNames`    | string[] | 1-n         |                                        |         |
+| `jobTitles`      | string[] | 0-n         |                                        |         |
+| `affiliations`   | id[]     | 0-n         | Organization IDs                       |         |
+| `address`        | address  | 0-1         |                                        |         |
+| `email`          | string   | 0-1         |                                        |         |
+| `secondaryEmail` | string   | 0-1         |                                        |         |
+| `authorityRefs`  | url[]    | 0-n         | References to external authority files |         |
+
+#### Organization
+
+| Field             | Type        | Cardinality | Restrictions                           | Remarks |
+|-------------------|-------------|-------------|----------------------------------------|---------|
+| `__id`            | string      | 1           |                                        |         |
+| `__type`          | string      | 1           | Literal 'Organization'                 |         |
+| `name`            | string      | 1           |                                        |         |
+| `url`             | url         | 1           |                                        |         |
+| `address`         | address     | 0-1         |                                        |         |
+| `email`           | string      | 0-1         |                                        |         |
+| `alternativeName` | lang_string | 0-1         |                                        |         |
+| `authorityRefs`   | url[]       | 0-n         | References to external authority files |         |
+
+### Value Types
+
+#### String with Language Tag (`lang_string`)
+
+Object with an ISO language code as key and a string as value.
+
+```json
+{
+  "en": "Lorem ipsum in English.",
+  "de": "Lorem ipsum auf Deutsch."
+}
+```
+
+#### Date
+
+String with the format `YYYY-MM-DD`.
+
+#### URL
+
+An object representing a URL.
+Depending on the `type` field,
+the URL may be a generic URL
+or a more specific link, like a PID
+or a reference to a resource in an external authority file.
+
+| Field    | Type   | Cardinality | Restrictions                                                                                                                                |
+|----------|--------|-------------|---------------------------------------------------------------------------------------------------------------------------------------------|
+| `__type` | string | 1           | Literal 'URL'                                                                                                                               |
+| `type`   | string | 1           | Literal 'URL', 'Geonames', 'Pleiades', 'Skos', 'Periodo', 'Chronontology', 'GND', 'VIAF', 'Grid', 'ORCID', 'Creative Commons', 'DOI', 'ARK' |
+| `url`    | string | 1           |                                                                                                                                             |
+| `text`   | string | 0-1         |                                                                                                                                             |
+
+!!! question
+can we model different types of URLs in a more sensible way?
+
+!!! answer
+In the mid-term we should untangle this mess of URLs, ARKs, Geonames etc.
+
+#### Data Management Plan (`dmp`)
+
+| Field       | Type    | Cardinality | Restrictions                 |
+|-------------|---------|-------------|------------------------------|
+| `__type`    | string  | 1           | Literal 'DataManagementPlan' |
+| `available` | boolean | 0-1         |                              |
+| `url`       | url     | 0-1         |                              |
+
+!!! question
+Does the model for `Data Management Plan` still make sense?
+Could it be a string?
+Is "available" useful information?
+How do we ensure that either `available` or `url` is set?
+
+!!! answer
+If we cannot upload the DMP or provide a reference to a published, then we don't need this.
+
+#### Publication
+
+| Field  | Type   | Cardinality | Restrictions |
+|--------|--------|-------------|--------------|
+| `text` | string | 1           |              |
+| `url`  | url    | 0-1         |              |
+
+#### Address
+
+| Field        | Type   | Cardinality | Restrictions      |
+|--------------|--------|-------------|-------------------|
+| `__type`     | string | 1           | Literal 'Address' |
+| `street`     | string | 1           |                   |
+| `postalCode` | string | 1           |                   |
+| `locality`   | string | 1           |                   |
+| `country`    | string | 1           |                   |
+| `canton`     | string | 0-1         |                   |
+| `additional` | string | 0-1         |                   |
+
+#### License
+
+| Field     | Type   | Cardinality | Restrictions      |
+|-----------|--------|-------------|-------------------|
+| `__type`  | string | 1           | Literal 'License' |
+| `license` | url    | 1           |                   |
+| `date`    | date   | 1           |                   |
+| `details` | string | 0-1         |                   |
+
+!!! question
+Is this model up to date with our current understanding of licenses?
+Is `details` ever used?
+What is the purpose of `date` here?
+How does it relate to a copyright statement?
+
+!!! answer
+License are depending on dates. It doesn't relate to a copyright statement.
+
+#### Attribution
+
+| Field    | Type   | Cardinality | Restrictions              | Remark                            |
+|----------|--------|-------------|---------------------------|-----------------------------------|
+| `__type` | string | 1           | Literal 'Attribution'     |                                   |
+| `agent`  | id     | 1           | Person or Organization ID | Or can this only be person? -> No |
+| `roles`  | string | 1-n         |                           |                                   |
+
+#### Grant
+
+| Field     | Type   | Cardinality | Restrictions               |
+|-----------|--------|-------------|----------------------------|
+| `__type`  | string | 1           | Literal 'Grant'            |
+| `funders` | id[]   | 1-n         | Person or Organization IDs |
+| `number`  | string | 0-1         |                            |
+| `name`    | string | 0-1         |                            |
+| `url`     | url    | 0-1         |                            |
+
+## Entity-Relationship Diagram
+
+```mermaid
+erDiagram
+    umbrellaProject |o--|{ project : projects
+    project ||--|{ dataset : datasets
+    project ||--|| person : contactPoint
+    project ||--|| organization : contactPoint
+    project ||--|{ person : funders
+    project ||--|{ organization : funders
+    project |o--|{ collection : collections
+    dataset ||--|{ record : records
+    collection |o--o{ collection : collections
+    collection |o--o{ record : records
+    person ||--|{ organization : affiliations
+
+    umbrellaProject {
+        string __id "1"
+        string __type "1; Literal 'UmbrellaProject'"
+        string name "1"
+        id[] projects "1-n; Project IDs"
+        lang_string description "0-1"
+        lang_string[] alternativeNames "0-n"
+        url url "0-1"
+        id contactPoint "0-1"
+        id[] institutionalPartner "0-n; Organization IDs"
+    }
+    
+    project {
+        string __id "1"
+        string __type "1; Literal 'Project'"
+        string shortcode "1"
+        string status "1; Literal 'Ongoing', 'Finished'"
+        string name "1"
+        lang_string description "1"
+        date startDate "1"
+        string teaserText "1"
+        url url "1"
+        string howToCite "1"
+        id[] datasets "1-n; Dataset IDs"
+        lang_string[] keywords "1-n"
+        lang_string_or_url[] disciplines "1-n"
+        lang_string_or_url[] temporalCoverage "1-n"
+        url[] spatialCoverage "1-n"
+        id[] funders "1-n; Person or Organization IDs"
+        attribution[] attributions "1-n"
+        date endDate "0-1"
+        url secondaryURL "0-1"
+        dmp dataManagementPlan "0-1"
+        id contactPoint "0-1"
+        publication[] publications "0-n"
+        grant[] grants "0-n"
+        lang_string[] alternativeNames "0-n"
+    }
+
+    dataset {
+        string __id "1"
+        string __type "1; Literal 'Dataset'"
+        string title "1"
+        string[] typeOfData "1-n; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'"
+        license[] licenses "1-n"
+        string[] copyright "1-n"
+        attribution[] attributions "1-n"
+        string howToCite "0-1"
+        lang_string description "0-1"
+        date dateCreated "0-1"
+    }
+
+    collection {
+        string __id "1"
+        string __type "1; Literal 'Collection'"
+        string name "1"
+        string accessConditions "1; Literal 'open', 'restricted' or 'closed'"
+        string provenance "0-1"
+        date datePublished "0-1"
+        date dateCreated "0-1"
+        date dateModified "0-1"
+        url distribution "0-1"
+        id[] records "0-n; Record IDs"
+        id[] collections "0-n; Collection IDs"
+        lang_string[] alternativeNames "0-n"
+        lang_string[] keywords "0-n"
+        url[] urls "0-n"
+        lang_string_or_url[] additional "0-n"
+        lang_string_or_url[] description "1-n"
+        string[] typeOfData "1-n; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'"
+        license[] licenses "1-n"
+        string[] copyright "1-n"
+        lang_string[] languages "1-n"
+        attribution[] attributions "1-n"
+    }
+
+    record {
+        string __id "1"
+        string __type "1; Literal 'Record'"
+        string pid "1"
+        lang_string label "1"
+        string accessConditions "1; Literal 'open', 'restricted' or 'closed'"
+        license license "1"
+        string copyright "1"
+        attribution attribution "1"
+        string provenance "0-1"
+        date datePublished "0-1"
+        date dateCreated "0-1"
+        date dateModified "0-1"
+        string typeOfData "0-1; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'"
+    }
+
+    person {
+        string __id "1"
+        string __type "1; Literal 'Person'"
+        string[] givenNames "1-n"
+        string[] familyNames "1-n"
+        string[] jobTitles "0-n"
+        id[] affiliations "0-n; Organization IDs"
+        address address "0-1"
+        string email "0-1"
+        string secondaryEmail "0-1"
+        url[] authorityRefs "0-n"
+    }
+
+    organization {
+        string __id "1"
+        string __type "1; Literal 'Organization'"
+        string name "1"
+        url url "1"
+        address address "0-1"
+        string email "0-1"
+        lang_string alternativeName "0-1"
+        url[] authorityRefs "0-n"
+    }
+```
+
+## Change Log
+
+- Make `Grant` a value type and remove it from the top level.
+- Added entity `umbrellaProject` to the top level.
+- Added entity `collection` to the top level.
+- Added entity `record` to the top level.
+- Added `copyright` to `dataset`.
+- Changed type of `abstract`/`description` in `dataset` to `lang_string`.
+- Changed cardinality of `abstract`/`description` in `dataset` to 1.
+- Changed cardinality of `howToCite` in `dataset` to 0-1.
+- Changed cardinality of `description` in `dataset` to 0-1.
+- Removed `accessConditions` from `dataset`.
+- Removed `status` from `dataset`.
+- Renamed `abstract` to `description` in `dataset`.
+- Removed `languages` from `dataset`.
+- Removed `datePublished`, and `dateModified` from `dataset`.
+- Removed `distribution` from `dataset`.
+- Removed `additional` from `dataset`.
+- Removed `alternativeTitles` from `dataset`.
+- Removed `urls` from `dataset`.
diff --git a/docs/data/provisional-datamodel.md b/docs/data/provisional-datamodel.md
deleted file mode 100644
index ca104cb6..00000000
--- a/docs/data/provisional-datamodel.md
+++ /dev/null
@@ -1,599 +0,0 @@
-# Provisional Data Model
-
-!!! warning
-    This document does _not_ represent the current state of the metadata model.  
-    It is a working document for planned upcoming changes to the metadata model.
-
-!!! note
-    This model is an idealized version of the metadata model. 
-    With the current implementation that is entirely separate from the DSP,
-    it is not feasible to implement metadata on the record level.  
-    Such a system may be implemented in the archive in the future,
-    but for now, we will keep the metadata on the dataset level.  
-    A separate, simplified model for applying some of these changes, 
-    while remaining compatible with the current implementation,
-    should be created alongside this model.
-
-## Overview
-
-The metadata model is a hierarchical structure of metadata elements. 
-
-```mermaid
-
-flowchart TD
-    hyper-project[Umbrella Project] -->|1-n| project[Research Project]
-    project -->|1-n| dataset[Dataset]
-    dataset -->|1-n| record[Record /<br/>Resource]
-    project -->|0-n| collection[Collection]
-    collection --> collection
-    hyper-project -->|0-n| collection
-    collection --> record
-```
-
-- A `Umbrella Project` is optional and collects one or more `Research Projects`.  
-  It is typically of institutional nature, 
-  not directly tied to a specific funding grant, 
-  and may be long-lived.  
-  Examples are EKWS/CAS, BEOL or LIMC.
-- A `Research Project` is the main entity of the metadata model.  
-  It corresponds to a `project` in the DSP.
-  It is typically tied to a specific funding grant, 
-  and hence has a limited lifetime of ~3-5 years;
-  multiple funding rounds and a longer lifetime are possible.  
-  A `Research Project` is part of 0-1 `Umbrella Project`,
-  it has 1-n `Datasets` and 0-n `Collections`.
-- A `Dataset` is a collection of `Records` within a `Research Project`.  
-  It is mostly meant for system-internal and technical use,
-  and should not have particular semantics or a "historical meaning" in the context of the project.  
-  A `Dataset` is part of exactly 1 `Research Project`
-  and contains 1-n `Records`.
-- A `Collection` is also a collection of `Records` within a `Research Project`.  
-  It is meant for semantic grouping of `Records` within a `Research Project`,
-  and may have a "historical meaning" in the context of the project.  
-  Examples may be physical collections such as p person's "Nachlass" in an archive,
-  or groupings of records based on a specific research question within a project.  
-  A `Collection` is part of at least 1 `Research Project`, `Umbrella Project` or `Collection`, 
-  but can be part of multiple. It may either contain 0-n `Collections` or 1-n `Records`.
-- A `Record` is a single resource within a `Dataset`.  
-  It represents a single entity, and the smallest unit that can meaningfully have an identifier. 
-  It maps to a `knora-base:Resource` (DSP-API) or an `Asset` (SIPI/Ingest) in the DSP.  
-  A `Record` is part of exactly 1 `Dataset` and may be part of 0-n `Collections`. 
-
-Additionally, there are the entities `Person` and `Organization`:  
-`Person` and `Organization` are entities that are independent of the `Research Project` hierarchy,
-and may be related to various entities within the hierarchy.  
-
-
-## Top Level
-
-A set of metadata consists of the following top-level elements:
-
-- Umbrella Project
-- Project
-- Dataset
-- Collection
-- Record
-- Person
-- Organization
-
-Each of these elements is an entity identified by a unique identifier. 
-Other elements can refer to these entities by their identifier.
-
-Any other metadata element may itself be a complex object,
-but it is always part of one of the top-level elements.
-Such elements do not have an identifier, 
-but are identified by their position in the hierarchy.
-
-| Field             | Type            | Cardinality |
-| ----------------- | --------------- | ----------- |
-| `$schema`         | string          | 0-1         |
-| `umbrellaProject` | umbrellaProject | 0-1         |
-| `project`         | project         | 1           |
-| `datasets`        | dataset[]       | 1-n         |
-| `collections`     | collection[]    | 0-n         |
-| `records`         | record[]        | 0-n         |
-| `persons`         | person[]        | 0-n         |
-| `organizations`   | organization[]  | 0-n         |
-
-
-!!! question
-    Do we consider "permissions" as metadata?  
-    (Not as they are in the DSP, but as they will be in the archive; 
-    that is: "open", "restricted", "embargo", "metadata only".)  
-    If so, this should be added on each level, I suppose.
-
-
-## Types
-
-### Entity Types
-
-#### Unbrella Project
-
-| Field                  | Type          | Card. | Restrictions                                                 |
-| ---------------------- | ------------- | ----- | ------------------------------------------------------------ |
-| `__id`                 | string        | 1     |                                                              |
-| `__type`               | string        | 1     | Literal 'UmbrellaProject'                                    |
-| `name`                 | string        | 1     |                                                              |
-| `projects`             | id[]          | 1-n   | String containing the identifier of a project                |
-| `description`          | lang_string   | 0-1   |                                                              |
-| `alternativeNames`     | lang_string[] | 0-n   |                                                              |
-| `url`                  | url           | 0-1   |                                                              |
-| `contactPoint`         | id            | 0-1   | String containing the identifier of a person or organization |
-| `institutionalPartner` | id[]          | 0-n   | String containing the identifier of an organization          |
-
-!!! question
-    This opens up the questions of how to deal with multiple projects in a umbrella project. 
-    We probably want to keep one entry per project, 
-    so this leaves us with either duplicating the umbrella project metadata for each project,
-    or having umbrella project metadata separately and only linking it from the project.
-    The latter seems preferable, 
-    but then the question arises who gets to edit the umbrella project metadata.  
-    For a first implementation, we could simply duplicate the metadata for each project, 
-    and later factor it out.
-
-!!! question
-    what is the best name for `institutionalPartner`?  
-    AI suggested:  
-    - Affiliated Institution  
-    - Associated Body  
-    - Supporting Organization  
-    - Institutional Partner
-
-!!! question
-    How do we capture the time aspect of the data provenance and genesis in this context? Should this be here?  
-    Concretely, an umbrella project is often like a "timeline" of projects, or the "history" of a series of projects.
-
-To make the model of this entity as flexible as possible,
-most of the fields are optional.
-
-
-#### Project
-
-| Field                | Type                | Cardinality | Restrictions                                                 |
-| -------------------- | ------------------- | ----------- | ------------------------------------------------------------ |
-| `__type`             | string              | 1           | Literal "Project"                                            |
-| `shortcode`          | string              | 1           | 4 char hexadecimal                                           |
-| `status`             | string              | 1           | Literal "Ongoing" or "Finished"                              |
-| `name`               | string              | 1           |                                                              |
-| `description`        | lang_string         | 1           |                                                              |
-| `startDate`          | date                | 1           | String of format "YYYY-MM-DD"                                |
-| `teaserText`         | string              | 1           |                                                              |
-| `url`                | url                 | 1           |                                                              |
-| `howToCite`          | string              | 1           |                                                              |
-| `datasets`           | id[]                | 1-n         | String containing the identifier of a dataset                |
-| `keywords`           | lang_string[]       | 1-n         |                                                              |
-| `disciplines`        | lang_string / url[] | 1-n         |                                                              |
-| `temporalCoverage`   | lang_string / url[] | 1-n         |                                                              |
-| `spatialCoverage`    | url[]               | 1-n         |                                                              |
-| `funders`            | id[]                | 1-n         | String containing the identifier of a person or organization |
-| `attributions`       | attribution[]       | 1-n         |                                                              |
-| `endDate`            | date                | 0-1         | String of format "YYYY-MM-DD"                                |
-| `secondaryURL`       | url                 | 0-1         |                                                              |
-| `dataManagementPlan` | dmp                 | 0-1         |                                                              |
-| `contactPoint`       | id                  | 0-1         | String containing the identifier of a person or organization |
-| `publications`       | publication[]       | 0-n         |                                                              |
-| `grants`             | grant[]             | 0-n         |                                                              |
-| `alternativeNames`   | lang_string[]       | 0-n         |                                                              |
-
-!!! question
-    If we can have copyright/license on dataset level,
-    do we want to have it on project level as well?  
-    In any case, it should be computed from the datasets/records.
-
-!!! question
-    Do we still need funders if we have grants?
-
-!!! question
-    What about projects that do not have funding?
-
-!!! question
-    Do we want my proposed `attributions` field n project?
-
-!!! question
-    Should we have an `abstract` field in the project, like we used to have in the dataset?
-
-
-#### Dataset
-
-| Field          | Type          | Cardinality | Restrictions                                     | Remarks                                                 |
-| -------------- | ------------- | ----------- | ------------------------------------------------ | ------------------------------------------------------- |
-| `__id`         | string        | 1           |                                                  |                                                         |
-| `__type`       | string        | 1           | Literal "Dataset"                                |                                                         |
-| `title`        | string        | 1           |                                                  | may be auto-generated?                                  |
-| `typeOfData`   | string[]      | 1-n         | Literal "XML", "Text", "Image", "Video", "Audio" | does this still make sense? should it be cardinality 1? |
-| `licenses`     | license[]     | 1-n         |                                                  | should be computed from the records                     |
-| `copyright`    | string[]      | 1-n         |                                                  | computed along with license                             |
-| `attributions` | attribution[] | 1-n         |                                                  | can this be computed?                                   |
-| `howToCite`    | string        | 0-1         |                                                  | still wanted?                                           |
-| `description`  | lang_string   | 0-1         |                                                  |                                                         |
-| `dateCreated`  | date          | 0-1         |                                                  |                                                         |
-
-!!! note
-    If we think of a dataset as something internal, 
-    we should limit the metadata to what is necessary for the system to work.  
-    Additionally, we may want to have some minimal descriptive metadata for the dataset, 
-    (like for the use case that a project once a year grabs a box of achrival material and digitizes it). 
-
-!!! question
-    Do we need to store the license on the dataset level, 
-    or can we compute it from the records?  
-    If we store it on the dataset level, 
-    how do we deal with datasets that contain records with different licenses?
-
-!!! question
-    Do we need to store the language on the dataset level, 
-    or can we compute it from the records?  
-    If we store it on the dataset level, 
-    how do we deal with datasets that contain records in different languages?
-
-!!! question
-    Do we need to store the attribution on the dataset level, 
-    or can we compute it from the records?  
-    If we store it on the dataset level, 
-    how do we deal with datasets that contain records with different attributions?
-
-!!! question
-    Do we need a reference to the records in the dataset?
-
-!!! question
-    Does `dateCreated` suffice here? There were more date properties in the old model.
-
-Data sets arefor internal use, 
-they serve to partition the data into manageable chunks. 
-This is done both by type of data (RDF vs. assets), and by size.
-
-In some cases, there may be a "logical" grouping consisting a dataset, 
-e.g. if data is digitized in a batch and there is a temporal separation between the batches.  
-In these cases, the project may make use of the descriptive metadata of the dataset. 
-But normally, the dataset is just a technical entity, and should not carry semantic information.
-
-#### Collection
-
-| Field              | Type              | Cardinality | Restrictions                                     | Remarks                                                  |
-| ------------------ | ----------------- | ----------- | ------------------------------------------------ | -------------------------------------------------------- |
-| `__id`             | string            | 1           |                                                  |                                                          |
-| `__type`           | string            | 1           | Literal 'Collection'                             |                                                          |
-| `name`             | string            | 1           |                                                  |                                                          |
-| `description`      | string / url      | 1-n         |                                                  |                                                          |
-| `typeOfData`       | string[]          | 1-n         | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; does this still make sense?         |
-| `licenses`         | license[]         | 1-n         |                                                  | copied from dataset; should be computed from the records |
-| `copyright`        | string[]          | 1-n         |                                                  | computed along with license                              |
-| `languages`        | lang_string[]     | 1-n         |                                                  | copied from dataset; does this make sense?               |
-| `attributions`     | attribution[]     | 1-n         |                                                  | copied from dataset; can this be calculated?             |
-| `provenance`       | string            | 0-1         |                                                  |                                                          |
-| `distribution`     | url               | 0-1         |                                                  | copied from dataset; does this make sense?               |
-| `records`          | id[]              | 0-n         | Record IDs                                       | can be 0 in case it points to a collection               |
-| `collections`      | id[]              | 0-n         | Collection IDs                                   |                                                          |
-| `alternativeNames` | lang_string[]     | 0-n         |                                                  |                                                          |
-| `keywords`         | lang_string[]     | 0-n         |                                                  | does this make sense?                                    |
-| `urls`             | url[]             | 0-n         |                                                  | copied from dataset;                                     |
-| `additional`       | lang_string / url | 0-n         |                                                  | copied from dataset;                                     |
-
-
-!!! question
-    Do we need a reference to the records in the collection?
-
-
-#### Record
-
-| Field              | Type        | Cardinality | Restrictions                                     | Remarks                                                  |
-| ------------------ | ----------- | ----------- | ------------------------------------------------ | -------------------------------------------------------- |
-| `__id`             | string      | 1           |                                                  |                                                          |
-| `__type`           | string      | 1           | Literal 'Record'                                 |                                                          |
-| `pid`              | id          | 1           |                                                  | or `ARK`?                                                |
-| `label`            | lang_string | 1           |                                                  | do we want this, or does it go too far?                  |
-| `accessConditions` | string      | 1           | Literal "open", "restricted" or "closed"         | copied from dataset; change to proper terms              |
-| `license`          | license     | 1           |                                                  | copied from dataset; should be computed from the records |
-| `copyright`        | string      | 1           |                                                  | computed along with license                              |
-| `attribution`      | attribution | 1           |                                                  | do we want this, or does it go too far?                  |
-| `provenance`       | string      | 0-1         |                                                  | do we want this, or does it go too far?                  |
-| `datePublished`    | date        | 0-1         |                                                  | copied from dataset; do they make sense?                 |
-| `dateCreated`      | date        | 0-1         |                                                  | copied from dataset; do they make sense?                 |
-| `dateModified`     | date        | 0-1         |                                                  | copied from dataset; do they make sense?                 |
-| `typeOfData`       | string      | 0-1         | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; wanted? what values?                |
-
-!!! question
-    How granular do we want to be with the metadata on the record level?
-
-!!! question
-    If we have copyright, what is the purpose of attribution?
-
-
-#### Person
-
-| Field            | Type     | Cardinality | Restrictions                           | Remarks |
-| ---------------- | -------- | ----------- | -------------------------------------- | ------- |
-| `__id`           | string   | 1           |                                        |         |
-| `__type`         | string   | 1           | Literal 'Person'                       |         |
-| `givenNames`     | string[] | 1-n         |                                        |         |
-| `familyNames`    | string[] | 1-n         |                                        |         |
-| `jobTitles`      | string[] | 0-n         |                                        |         |
-| `affiliations`   | id[]     | 0-n         | Organization IDs                       |         |
-| `address`        | address  | 0-1         |                                        |         |
-| `email`          | string   | 0-1         |                                        |         |
-| `secondaryEmail` | string   | 0-1         |                                        |         |
-| `authorityRefs`  | url[]    | 0-n         | References to external authority files |         |
-
-
-#### Organization
-
-| Field             | Type        | Cardinality | Restrictions                           | Remarks |
-| ----------------- | ----------- | ----------- | -------------------------------------- | ------- |
-| `__id`            | string      | 1           |                                        |         |
-| `__type`          | string      | 1           | Literal 'Organization'                 |         |
-| `name`            | string      | 1           |                                        |         |
-| `url`             | url         | 1           |                                        |         |
-| `address`         | address     | 0-1         |                                        |         |
-| `email`           | string      | 0-1         |                                        |         |
-| `alternativeName` | lang_string | 0-1         |                                        |         |
-| `authorityRefs`   | url[]       | 0-n         | References to external authority files |         |
-
-
-### Value Types
-
-#### String with Language Tag (`lang_string`)
-
-Object with an ISO language code as key and a string as value.
-
-```json
-{
-    "en": "Lorem ipsum in English.",
-    "de": "Lorem ipsum auf Deutsch."
-}
-```
-
-
-#### Date
-
-String with the format `YYYY-MM-DD`.
-
-
-#### URL
-
-An object representing a URL. 
-Depending on the `type` field,
-the URL may be a generic URL
-or a more specific link, like a PID
-or a reference to a resource in an external authority file.
-
-
-| Field    | Type   | Cardinality | Restrictions                                                                                                                                |
-| -------- | ------ | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
-| `__type` | string | 1           | Literal 'URL'                                                                                                                               |
-| `type`   | string | 1           | Literal 'URL', 'Geonames', 'Pleiades', 'Skos', 'Periodo', 'Chronontology', 'GND', 'VIAF', 'Grid', 'ORCID', 'Creative Commons', 'DOI', 'ARK' |
-| `url`    | string | 1           |                                                                                                                                             |
-| `text`   | string | 0-1         |                                                                                                                                             |
-
-!!! question
-    can we model different types of URLs in a more sensible way?
-
-
-#### Data Management Plan (`dmp`)
-
-| Field       | Type    | Cardinality | Restrictions                 |
-| ----------- | ------- | ----------- | ---------------------------- |
-| `__type`    | string  | 1           | Literal 'DataManagementPlan' |
-| `available` | boolean | 0-1         |                              |
-| `url`       | url     | 0-1         |                              |
-
-
-!!! question
-    Does the model for `Data Management Plan` still make sense? 
-    Could it be a string? 
-    Is "available" useful information? 
-    How do we ensure that either `available` or `url` is set?
-
-
-#### Publication
-
-| Field  | Type   | Cardinality | Restrictions |
-| ------ | ------ | ----------- | ------------ |
-| `text` | string | 1           |              |
-| `url`  | url    | 0-1         |              |
-
-
-#### Address
-
-| Field        | Type   | Cardinality | Restrictions      |
-| ------------ | ------ | ----------- | ----------------- |
-| `__type`     | string | 1           | Literal 'Address' |
-| `street`     | string | 1           |                   |
-| `postalCode` | string | 1           |                   |
-| `locality`   | string | 1           |                   |
-| `country`    | string | 1           |                   |
-| `canton`     | string | 0-1         |                   |
-| `additional` | string | 0-1         |                   |
-
-
-#### License
-
-| Field     | Type   | Cardinality | Restrictions      |
-| --------- | ------ | ----------- | ----------------- |
-| `__type`  | string | 1           | Literal 'License' |
-| `license` | url    | 1           |                   |
-| `date`    | date   | 1           |                   |
-| `details` | string | 0-1         |                   |
-
-!!! question
-    Is this model up to date with our current understanding of licenses? 
-    Is `details` ever used? 
-    What is the purpose of `date` here? 
-    How does it relate to a copyright statement?
-
-
-#### Attribution
-
-| Field    | Type   | Cardinality | Restrictions              | Remark                      |
-| -------- | ------ | ----------- | ------------------------- | --------------------------- |
-| `__type` | string | 1           | Literal 'Attribution'     |                             |
-| `agent`  | id     | 1           | Person or Organization ID | Or can this only be person? |
-| `roles`  | string | 1-n         |                           |                             |
-
-
-#### Grant
-
-| Field     | Type   | Cardinality | Restrictions               |
-| --------- | ------ | ----------- | -------------------------- |
-| `__type`  | string | 1           | Literal 'Grant'            |
-| `funders` | id[]   | 1-n         | Person or Organization IDs |
-| `number`  | string | 0-1         |                            |
-| `name`    | string | 0-1         |                            |
-| `url`     | url    | 0-1         |                            |
-
-
-## Entity-Relationship Diagram
-
-```mermaid
-erDiagram
-    umbrellaProject |o--|{ project : projects
-    project ||--|{ dataset : datasets
-    project ||--|| person : contactPoint
-    project ||--|| organization : contactPoint
-    project ||--|{ person : funders
-    project ||--|{ organization : funders
-    project |o--|{ collection : collections
-    dataset ||--|{ record : records
-    collection |o--o{ collection : collections
-    collection |o--o{ record : records
-    person ||--|{ organization : affiliations
-
-    umbrellaProject {
-        string __id "1"
-        string __type "1; Literal 'UmbrellaProject'"
-        string name "1"
-        id[] projects "1-n; Project IDs"
-        lang_string description "0-1"
-        lang_string[] alternativeNames "0-n"
-        url url "0-1"
-        id contactPoint "0-1"
-        id[] institutionalPartner "0-n; Organization IDs"
-    }
-    
-    project {
-        string __id "1"
-        string __type "1; Literal 'Project'"
-        string shortcode "1"
-        string status "1; Literal 'Ongoing', 'Finished'"
-        string name "1"
-        lang_string description "1"
-        date startDate "1"
-        string teaserText "1"
-        url url "1"
-        string howToCite "1"
-        id[] datasets "1-n; Dataset IDs"
-        lang_string[] keywords "1-n"
-        lang_string_or_url[] disciplines "1-n"
-        lang_string_or_url[] temporalCoverage "1-n"
-        url[] spatialCoverage "1-n"
-        id[] funders "1-n; Person or Organization IDs"
-        attribution[] attributions "1-n"
-        date endDate "0-1"
-        url secondaryURL "0-1"
-        dmp dataManagementPlan "0-1"
-        id contactPoint "0-1"
-        publication[] publications "0-n"
-        grant[] grants "0-n"
-        lang_string[] alternativeNames "0-n"
-    }
-
-    dataset {
-        string __id "1"
-        string __type "1; Literal 'Dataset'"
-        string title "1"
-        string[] typeOfData "1-n; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'"
-        license[] licenses "1-n"
-        string[] copyright "1-n"
-        attribution[] attributions "1-n"
-        string howToCite "0-1"
-        lang_string description "0-1"
-        date dateCreated "0-1"
-    }
-
-    collection {
-        string __id "1"
-        string __type "1; Literal 'Collection'"
-        string name "1"
-        string accessConditions "1; Literal 'open', 'restricted' or 'closed'"
-        string provenance "0-1"
-        date datePublished "0-1"
-        date dateCreated "0-1"
-        date dateModified "0-1"
-        url distribution "0-1"
-        id[] records "0-n; Record IDs"
-        id[] collections "0-n; Collection IDs"
-        lang_string[] alternativeNames "0-n"
-        lang_string[] keywords "0-n"
-        url[] urls "0-n"
-        lang_string_or_url[] additional "0-n"
-        lang_string_or_url[] description "1-n"
-        string[] typeOfData "1-n; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'"
-        license[] licenses "1-n"
-        string[] copyright "1-n"
-        lang_string[] languages "1-n"
-        attribution[] attributions "1-n"
-    }
-
-    record {
-        string __id "1"
-        string __type "1; Literal 'Record'"
-        string pid "1"
-        lang_string label "1"
-        string accessConditions "1; Literal 'open', 'restricted' or 'closed'"
-        license license "1"
-        string copyright "1"
-        attribution attribution "1"
-        string provenance "0-1"
-        date datePublished "0-1"
-        date dateCreated "0-1"
-        date dateModified "0-1"
-        string typeOfData "0-1; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'"
-    }
-
-    person {
-        string __id "1"
-        string __type "1; Literal 'Person'"
-        string[] givenNames "1-n"
-        string[] familyNames "1-n"
-        string[] jobTitles "0-n"
-        id[] affiliations "0-n; Organization IDs"
-        address address "0-1"
-        string email "0-1"
-        string secondaryEmail "0-1"
-        url[] authorityRefs "0-n"
-    }
-
-    organization {
-        string __id "1"
-        string __type "1; Literal 'Organization'"
-        string name "1"
-        url url "1"
-        address address "0-1"
-        string email "0-1"
-        lang_string alternativeName "0-1"
-        url[] authorityRefs "0-n"
-    }
-```
-
-
-
-## Change Log
-
-
-- Make `Grant` a value type and remove it from the top level.
-- Added entity `umbrellaProject` to the top level.
-- Added entity `collection` to the top level.
-- Added entity `record` to the top level.
-- Added `copyright` to `dataset`.
-- Changed type of `abstract`/`description` in `dataset` to `lang_string`.
-- Changed cardinality of `abstract`/`description` in `dataset` to 1.
-- Changed cardinality of `howToCite` in `dataset` to 0-1.
-- Changed cardinality of `description` in `dataset` to 0-1.
-- Removed `accessConditions` from `dataset`.
-- Removed `status` from `dataset`.
-- Renamed `abstract` to `description` in `dataset`.
-- Removed `languages` from `dataset`.
-- Removed `datePublished`, and `dateModified` from `dataset`.
-- Removed `distribution` from `dataset`.
-- Removed `additional` from `dataset`.
-- Removed `alternativeTitles` from `dataset`.
-- Removed `urls` from `dataset`.
diff --git a/docs/index.md b/docs/index.md
index d5e22889..8cb5cd09 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,23 +1,48 @@
-This page provides documentation for the DSP-META repository.
+# DSP Metadata
 
-The repository contains all metadata to projects deposited on the DaSCH Service Platform (DSP),
-as well as the code of the [DSP Metadata Browser](https://meta.dasch.swiss).
+The dsp-meta repository contains the code of the [DSP Metadata Browser](https://meta.dasch.swiss),
+as well all metadata from projects deposited on the DaSCH Service Platform (DSP).
 
 ## DSP Metadata
 
+This documentation provides an overview of the metadata model used by the DSP to manage and describe
+research data in the humanities. Our vision is to fully capture the provenance of research data—detailing
+its origins, how it was created, and how it has been used over time.
+
+Humanities research projects are inherently diverse and often span multiple years or even decades.
+Many of these projects receive funding from various grants and different funders throughout their lifecycle.
+Additionally, the researchers involved in creating and reusing the data may change over time, reflecting
+the evolving nature of academic collaboration.
+
+Understanding the complex history of research data is crucial for transparency, reproducibility, and future scholarship.
+The DSP metadata model is designed to accommodate this complexity by meticulously recording the provenance of data. It
+tracks:
+
+- Funding Sources: Documenting the multiple grants and funders that have supported the project over time.
+- Research Personnel: Keeping a record of all researchers who have contributed to or utilized the data, acknowledging
+  the shifts in team composition.
+- Data Lifecycle: Outlining how the data was created, modified, and reused, providing a comprehensive view of its
+  evolution.
+
+By capturing this rich contextual information, we aim to provide a robust framework that supports the integrity and
+longevity of humanities research data. Whether you are a researcher contributing new data or a scholar exploring
+existing datasets, this documentation will guide you through our metadata practices and help you understand the stories
+behind the data.
+
 ### Consuming Metadata
 
-If you are interested in viewing the metadata in human-readable form, 
+If you are interested in viewing the metadata in human-readable form,
 you can visit the [DSP Metadata Browser](https://meta.dasch.swiss).
 
-If you are interested in re-using our metadata, you can find extensive documentation [here](data/current-datamodel.md).
+If you are interested in re-using our metadata, you can find extensive documentation [here](data/current-datamodel.md),
+and the work-in-progress documentation of our future data-model [here](data/future-datamodel.md).
 
 The metadata itself can be found [here](https://github.com/dasch-swiss/dsp-meta/tree/main/data/json)
 or requested over the API as described [here](data/api.md).
 
 ### Adding Metadata
 
-For adding metadata, please see [here](adding-metadata.md).
+For adding metadata, please see [here](data/adding-metadata.md).
 
 ## Code Documentation
 
diff --git a/mkdocs.yml b/mkdocs.yml
index f8ced1ff..d021ce48 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -5,8 +5,8 @@ nav:
     - Consuming Metadata:
           - Metadata API: data/api.md
           - Current Data Model: data/current-datamodel.md
-          - Provisional Data Model: data/provisional-datamodel.md
-    - Adding Metadata: adding-metadata.md
+          - Future Data Model: data/future-datamodel.md
+          - Adding Metadata: data/adding-metadata.md
     - Code Documentation:
           - Overview: code/overview.md
           - Front End: code/front-end.md