From 0094fbe232f50d3e5377e43228073383704cc27e Mon Sep 17 00:00:00 2001
From: Ivan Subotic <400790+subotic@users.noreply.github.com>
Date: Tue, 26 Nov 2024 13:10:55 +0100
Subject: [PATCH] docs: extend documentation for future metadata model (#283)
Co-authored-by: danielasubotic <48174709+danielasubotic@users.noreply.github.com>
---
docs/{ => data}/adding-metadata.md | 0
docs/data/future-datamodel.md | 670 +++++++++++++++++++++++++++++
docs/data/provisional-datamodel.md | 599 --------------------------
docs/index.md | 37 +-
mkdocs.yml | 4 +-
5 files changed, 703 insertions(+), 607 deletions(-)
rename docs/{ => data}/adding-metadata.md (100%)
create mode 100644 docs/data/future-datamodel.md
delete mode 100644 docs/data/provisional-datamodel.md
diff --git a/docs/adding-metadata.md b/docs/data/adding-metadata.md
similarity index 100%
rename from docs/adding-metadata.md
rename to docs/data/adding-metadata.md
diff --git a/docs/data/future-datamodel.md b/docs/data/future-datamodel.md
new file mode 100644
index 00000000..680b3d09
--- /dev/null
+++ b/docs/data/future-datamodel.md
@@ -0,0 +1,670 @@
+m# Future Data Model
+
+!!! warning
+This document does _not_ represent the current state of the metadata model.
+It is a working document for planned upcoming changes to the metadata model.
+
+!!! note
+This model is an idealized version of the metadata model.
+With the current implementation that is entirely separate from the DSP,
+it is not feasible to implement metadata on the record level.
+Such a system may be implemented in the archive in the future,
+but for now, we will keep the metadata on the dataset level.
+A separate, simplified model for applying some of these changes,
+while remaining compatible with the current implementation,
+should be created alongside this model.
+
+The enhancements to the DSP metadata model are thoughtfully designed to better accommodate
+the inherent complexity of humanities projects, while still being flexible enough to
+support simpler project structures.
+
+One of the key improvements is the introduction of an additional hierarchical level above
+the research project, which we refer to as the umbrella project. This allows for a more
+accurate representation of overarching initiatives that span multiple research projects
+over extended periods. Additionally, we have implemented collections and subcollections
+to facilitate more precise referencing and organization of different parts of the data.
+
+By expanding our metadata model in this way, we aim to provide a more robust framework
+that supports the integrity and longevity of humanities research data. This evolution
+reflects our commitment to capturing the rich, nuanced histories of research projects
+with greater accuracy and detail.
+
+## Overview
+
+The metadata model is a hierarchical structure of metadata elements.
+
+```mermaid
+
+flowchart TD
+ hyper-project[Umbrella Project] -->|1-n| project[Research Project]
+ project -->|1-n| dataset[Dataset]
+ dataset -->|1-n| record[Record]
+ project -->|0-n| collection[Collection]
+ collection --> collection
+ hyper-project -->|0-n| collection
+ collection --> record
+```
+
+- A `Umbrella Project` is optional and collects one or more `Research Projects`.
+ It is typically of institutional nature,
+ not directly tied to a specific funding grant,
+ and may be long-lived.
+ Examples are EKWS/CAS, BEOL or LIMC.
+- A `Research Project` is the main entity of the metadata model.
+ It corresponds to a `project` in the DSP.
+ It is typically tied to a specific funding grant,
+ and hence has a limited lifetime of ~3-5 years;
+ multiple funding rounds and a longer lifetime are possible.
+ A `Research Project` is part of 0-1 `Umbrella Project`,
+ it has 1-n `Datasets` and 0-n `Collections`.
+- A `Dataset` is a collection of `Records` within a `Research Project`.
+ It is mostly meant for system-internal and technical use,
+ and should not have particular semantics or a "historical meaning" in the context of the project.
+ A `Dataset` is part of exactly 1 `Research Project`
+ and contains 1-n `Records`.
+- A `Collection` is also a collection of `Records` within a `Research Project`.
+ It is meant for semantic grouping of `Records` within a `Research Project`,
+ and may have a "historical meaning" in the context of the project.
+ Examples may be physical collections such as p person's "Nachlass" in an archive,
+ or groupings of records based on a specific research question within a project.
+ A `Collection` is part of at least 1 `Research Project`, `Umbrella Project` or `Collection`,
+ but can be part of multiple. It may either contain 0-n `Collections` or 1-n `Records`.
+- A `Record` is a single entry within a `Dataset`.
+ It represents a single entity, and the smallest unit that can meaningfully have an identifier.
+ It maps to a `knora-base:Resource` (DSP-API) or an `Asset` (SIPI/Ingest) in the DSP.
+ A `Record` is part of exactly 1 `Dataset` and may be part of 0-n `Collections`.
+
+Additionally, there are the entities `Person` and `Organization`:
+`Person` and `Organization` are entities that are independent of the `Research Project` hierarchy,
+and may be related to various entities within the hierarchy.
+
+## Top Level
+
+A set of metadata consists of the following top-level elements:
+
+- Umbrella Project
+- Project
+- Dataset
+- Collection
+- Record
+- Person
+- Organization
+
+Each of these elements is an entity identified by a unique identifier.
+Other elements can refer to these entities by their identifier.
+
+Any other metadata element may itself be a complex object,
+but it is always part of one of the top-level elements.
+Such elements do not have an identifier,
+but are identified by their position in the hierarchy.
+
+| Field | Type | Cardinality |
+|-------------------|-----------------|-------------|
+| `$schema` | string | 0-1 |
+| `umbrellaProject` | umbrellaProject | 0-1 |
+| `project` | project | 1 |
+| `datasets` | dataset[] | 1-n |
+| `collections` | collection[] | 0-n |
+| `records` | record[] | 0-n |
+| `persons` | person[] | 0-n |
+| `organizations` | organization[] | 0-n |
+
+!!! question
+Do we consider "permissions" as metadata?
+(Not as they are in the DSP, but as they will be in the archive;
+that is: "open", "restricted", "embargo", "metadata only".)
+If so, this should be added on each level, I suppose.
+
+!!! answer
+Yes, as COAR indicates, [COAR Access Rights](https://vocabularies.coar-repositories.org/access_rights/)
+
+## Types
+
+### Entity Types
+
+#### Umbrella Project
+
+| Field | Type | Card. | Restrictions |
+|------------------------|---------------|-------|--------------------------------------------------------------|
+| `pid` | id | 1 | or `ARK`? -> probably use `pid` |
+| `__id` | string | 1 | |
+| `__type` | string | 1 | Literal 'UmbrellaProject' |
+| `name` | string | 1 | |
+| `projects` | id[] | 1-n | String containing the identifier of a project |
+| `description` | lang_string | 0-1 | |
+| `alternativeNames` | lang_string[] | 0-n | |
+| `url` | url | 0-1 | |
+| `contactPoint` | id | 0-1 | String containing the identifier of a person or organization |
+| `institutionalPartner` | id[] | 0-n | String containing the identifier of an organization |
+
+!!! question
+This opens up the questions of how to deal with multiple projects in a umbrella project.
+We probably want to keep one entry per project,
+so this leaves us with either duplicating the umbrella project metadata for each project,
+or having umbrella project metadata separately and only linking it from the project.
+The latter seems preferable,
+but then the question arises who gets to edit the umbrella project metadata.
+For a first implementation, we could simply duplicate the metadata for each project,
+and later factor it out.
+
+!!! question
+what is the best name for `institutionalPartner`?
+AI suggested:
+
+- Affiliated Institution
+- Associated Body
+- Supporting Organization
+- Institutional Partner
+
+!!! answer
+We don't need `institutionalPartner` since contactPoint can be an organziation or a person.
+
+!!! question
+How do we capture the time aspect of the data provenance and genesis in this context? Should this be here?
+Concretely, an umbrella project is often like a "timeline" of projects, or the "history" of a series of projects.
+
+!!! answer
+We don't need this information here on this level. The umbrella project needs to know what projects are under it and
+then it's a matter of displaying the timeline.
+
+To make the model of this entity as flexible as possible,
+most of the fields are optional.
+
+#### Project
+
+| Field | Type | Cardinality | Restrictions |
+|----------------------|---------------------|-------------|--------------------------------------------------------------|
+| `pid` | id | 1 | or `ARK`? -> probably use `pid` |
+| `__type` | string | 1 | Literal "Project" |
+| `shortcode` | string | 1 | 4 char hexadecimal |
+| `status` | string | 1 | Literal "Ongoing" or "Finished" |
+| `name` | string | 1 | |
+| `description` | lang_string | 1 | |
+| `startDate` | date | 1 | String of format "YYYY-MM-DD" |
+| `teaserText` | string | 1 | |
+| `url` | url | 1 | |
+| `howToCite` | string | 1 | |
+| `datasets` | id[] | 1-n | String containing the identifier of a dataset |
+| `keywords` | lang_string[] | 1-n | |
+| `disciplines` | lang_string / url[] | 1-n | |
+| `temporalCoverage` | lang_string / url[] | 1-n | |
+| `spatialCoverage` | url[] | 1-n | |
+| `funders` | id[] | 1-n | String containing the identifier of a person or organization |
+| `attributions` | attribution[] | 1-n | |
+| `endDate` | date | 0-1 | String of format "YYYY-MM-DD" |
+| `secondaryURL` | url | 0-1 | |
+| `dataManagementPlan` | dmp | 0-1 | |
+| `contactPoint` | id | 0-1 | String containing the identifier of a person or organization |
+| `publications` | publication[] | 0-n | |
+| `grants` | grant[] | 0-n | |
+| `alternativeNames` | lang_string[] | 0-n | |
+
+!!! question
+If we can have copyright/license on dataset level,
+do we want to have it on project level as well?
+
+!!! answer
+Since we have copyright/license on a record level, everything above should be a computed field if available and
+optionally added manually. And then it's a matter of displaying it.
+
+!!! question
+Do we still need funders if we have grants?
+
+!!! answer
+No, we don't need funders.
+
+!!! question
+What about projects that do not have funding?
+
+!!! answer
+Then it's self-funded.
+
+!!! question
+Do we want my proposed `attributions` field n project?
+
+!!! answer
+Yes, but it should be a computed field if available and optionally added manually.
+
+!!! question
+Should we have an `abstract` field in the project, like we used to have in the dataset?
+
+!!! answer
+We should only have it in the project but not in the dataset anymore.
+
+#### Dataset
+
+| Field | Type | Cardinality | Restrictions | Remarks |
+|----------------|---------------|-------------|--------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `pid` | id | 1 | | or `ARK`? -> probably use `pid` |
+| `__id` | string | 1 | | |
+| `__type` | string | 1 | Literal "Dataset" | |
+| `title` | string | 1 | | may be auto-generated? -> No |
+| `typeOfData` | string[] | 1-n | Literal "XML", "Text", "Image", "Video", "Audio" | does this still make sense? should it be cardinality 1? -> This does still make sense but it should be computed if available and optionally added manually. And the cardinality needs to be 1-n |
+| `licenses` | license[] | 1-n | | should be computed from the records if available and optionally added manually. |
+| `copyright` | string[] | 1-n | | computed along with license -> should be computed from the records if available and optionally added manually. |
+| `attributions` | attribution[] | 1-n | | can this be computed? -> Yes, if available and optionally added manually. |
+| `howToCite` | string | 0-1 | | still wanted? -> A generated field along with the ARK. |
+| `description` | lang_string | 0-1 | | |
+| `dateCreated` | date | 0-1 | | |
+
+!!! question
+Are PIDs missing for umbrella-project, dataset and collection? Are generated how-to-cites missing for them as well?
+
+!!! answer
+Yes, we need PIDs for all levels (umbrella-project, dataset and collection).
+
+!!! note
+If we think of a dataset as something internal,
+we should limit the metadata to what is necessary for the system to work.
+Additionally, we may want to have some minimal descriptive metadata for the dataset,
+(like for the use case that a project once a year grabs a box of archival material and digitizes it).
+
+!!! question
+Do we need to store the license on the dataset level,
+or can we compute it from the records?
+If we store it on the dataset level,
+how do we deal with datasets that contain records with different licenses?
+
+!!! answer
+We compute it if available and optionally added manually. And when there are different licenses then we display those.
+
+!!! question
+Do we need to store the language on the dataset level,
+or can we compute it from the records?
+If we store it on the dataset level,
+how do we deal with datasets that contain records in different languages?
+
+!!! answer
+We compute it if available and optionally added manually. And when there are different languages then we display those.
+
+!!! question
+Do we need to store the attribution on the dataset level,
+or can we compute it from the records?
+If we store it on the dataset level,
+how do we deal with datasets that contain records with different attributions?
+
+!!! answer
+We compute it if available and optionally added manually. And when there are different attributions then we display
+those. We need to keep in mind that we don't inlcude us in these computations.
+
+!!! question
+Do we need a reference to the records in the dataset?
+
+!!! answer
+At the moment no. Unsure. Tbd.
+
+!!! question
+Does `dateCreated` suffice here? There were more date properties in the old model.
+
+!!! answer
+What is the meaning of `dateCreated` in this context?
+
+~~Datasets are for internal use,they serve to partition the data into manageable chunks. This is done both by type of
+data (RDF vs. assets), and by size.~~
+
+~~In some cases, there may be a "logical" grouping consisting a dataset, e.g. if data is digitized in a batch and there
+is a temporal separation between the batches. In these cases, the project may make use of the descriptive metadata of
+the dataset. But normally, the dataset is just a technical entity, and should not carry semantic information.~~
+
+A project can have more than one dataset if it's the project's wish and if it provides meaningful grouping of the
+records e.g., 2 researchers worked one one part of the data and the 2 other researchers on the other part of the data,
+EKWS digitizing different boxes and each box becomes a dataset.
+A record can only be part of one dataset.
+
+#### Collection
+
+| Field | Type | Cardinality | Restrictions | Remarks |
+|--------------------|-------------------|-------------|--------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `pid` | id | 1 | | or `ARK`? -> probably use `pid` |
+| `__id` | string | 1 | | |
+| `__type` | string | 1 | Literal 'Collection' | |
+| `name` | string | 1 | | |
+| `description` | string / url | 1-n | | |
+| `typeOfData` | string[] | 1-n | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; does this still make sense? -> Maybe not. |
+| `licenses` | license[] | 1-n | | copied from dataset; should be computed from the records if available and optionally added manually. |
+| `copyright` | string[] | 1-n | | computed along with license -> should be computed from the records if available and optionally added manually. |
+| `languages` | lang_string[] | 1-n | | copied from dataset; does this make sense? -> computed if available and optionally added manually. |
+| `attributions` | attribution[] | 1-n | | copied from dataset; can this be calculated? -> Yes, if available and optionally added manually. |
+| `provenance` | string | 0-1 | | -> needed, see: [openAIRE Guidelines](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_source.html#dc-source) |
+| `distribution` | url | 0-1 | | copied from dataset; does this make sense? -> not needed |
+| `records` | id[] | 0-n | Record IDs | can be 0 in case it points to a collection |
+| `collections` | id[] | 0-n | Collection IDs | |
+| `alternativeNames` | lang_string[] | 0-n | | |
+| `keywords` | lang_string[] | 0-n | | does this make sense? -> Interesting for the search. |
+| `urls` | url[] | 0-n | | copied from dataset; |
+| `additional` | lang_string / url | 0-n | | copied from dataset; -> Probably not needed. |
+
+!!! question
+Do we need a reference to the records in the collection?
+
+!!! answer
+Yes, we would need that.
+
+#### Record
+
+| Field | Type | Cardinality | Restrictions | Remarks |
+|---------------------|-------------|-------------|--------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `__id` | string | 1 | | |
+| `__type` | string | 1 | Literal 'Record' | |
+| `pid` | id | 1 | | or `ARK`? -> probably use `pid` |
+| `label` | lang_string | 1 | | do we want this, or does it go too far? -> We want to keep it because it's the "name" of the record. But we can think about renaming it. |
+| `accessConditions` | string | 1 | Literal "open", "restricted" or "closed" | copied from dataset; change to proper terms -> open, restricted, embargoed, metadata-only and renaming `accessConditions` to `rights` to be in line with openAIRE. |
+| `embargoPeriodDate` | date | 0-1 | | -> needs to be added to be in line with openAIRE, e.g., ``` 2011-12-01 2012-12-01 ``` |
+| `publisher` | string | 1 | | should be DaSCH |
+| `license` | license | 1 | | copied from dataset; should be computed from the records -> No, you have to indicate the license here. Computation is not possible. |
+| `copyright` | string | 1 | | computed along with license -> -> No, you have to indicate the copyright here. Computation is not possible. |
+| `attribution` | attribution | 1 | | do we want this, or does it go too far? -> Yes |
+| `provenance` | string | 0-1 | | do we want this, or does it go too far? -> Yes, [openAIRE data-source](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_source.html#dc-source) |
+| `datePublished` | date | 0-1 | | copied from dataset; do they make sense? -> Yes |
+| `dateCreated` | date | 0-1 | | copied from dataset; do they make sense? -> Yes |
+| `dateModified` | date | 0-1 | | copied from dataset; do they make sense? -> Yes |
+| `typeOfData` | string | 0-1 | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; wanted? what values? -> Yes, type is computed and should represent: [openAIRE Resource Type](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_publicationtype.html#aire-resourcetype) |
+| `size` | string | 0-1 | | needs to be added, see: [openAIRE Size](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_size.html#dci-size) |
+| `audience` | string | 0-n | | needs to be added, see: [openAIRE Audience](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_audience.html#dct-audience) |
+
+!!! question
+How granular do we want to be with the metadata on the record level?
+
+!!! answer
+We need provenance,
+see: [openAIRE Source](https://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/v4.0.0/field_source.html#dc-source)
+
+!!! question
+If we have copyright, what is the purpose of attribution?
+
+!!! answer
+Copyright doesn't have anything to do with attribution. Attribution is who did something with the data. Copyright is
+person/organization who holds the right to this record and can give others the permission to do something with this
+record aka license.
+
+#### Person
+
+| Field | Type | Cardinality | Restrictions | Remarks |
+|------------------|----------|-------------|----------------------------------------|---------|
+| `__id` | string | 1 | | |
+| `__type` | string | 1 | Literal 'Person' | |
+| `givenNames` | string[] | 1-n | | |
+| `familyNames` | string[] | 1-n | | |
+| `jobTitles` | string[] | 0-n | | |
+| `affiliations` | id[] | 0-n | Organization IDs | |
+| `address` | address | 0-1 | | |
+| `email` | string | 0-1 | | |
+| `secondaryEmail` | string | 0-1 | | |
+| `authorityRefs` | url[] | 0-n | References to external authority files | |
+
+#### Organization
+
+| Field | Type | Cardinality | Restrictions | Remarks |
+|-------------------|-------------|-------------|----------------------------------------|---------|
+| `__id` | string | 1 | | |
+| `__type` | string | 1 | Literal 'Organization' | |
+| `name` | string | 1 | | |
+| `url` | url | 1 | | |
+| `address` | address | 0-1 | | |
+| `email` | string | 0-1 | | |
+| `alternativeName` | lang_string | 0-1 | | |
+| `authorityRefs` | url[] | 0-n | References to external authority files | |
+
+### Value Types
+
+#### String with Language Tag (`lang_string`)
+
+Object with an ISO language code as key and a string as value.
+
+```json
+{
+ "en": "Lorem ipsum in English.",
+ "de": "Lorem ipsum auf Deutsch."
+}
+```
+
+#### Date
+
+String with the format `YYYY-MM-DD`.
+
+#### URL
+
+An object representing a URL.
+Depending on the `type` field,
+the URL may be a generic URL
+or a more specific link, like a PID
+or a reference to a resource in an external authority file.
+
+| Field | Type | Cardinality | Restrictions |
+|----------|--------|-------------|---------------------------------------------------------------------------------------------------------------------------------------------|
+| `__type` | string | 1 | Literal 'URL' |
+| `type` | string | 1 | Literal 'URL', 'Geonames', 'Pleiades', 'Skos', 'Periodo', 'Chronontology', 'GND', 'VIAF', 'Grid', 'ORCID', 'Creative Commons', 'DOI', 'ARK' |
+| `url` | string | 1 | |
+| `text` | string | 0-1 | |
+
+!!! question
+can we model different types of URLs in a more sensible way?
+
+!!! answer
+In the mid-term we should untangle this mess of URLs, ARKs, Geonames etc.
+
+#### Data Management Plan (`dmp`)
+
+| Field | Type | Cardinality | Restrictions |
+|-------------|---------|-------------|------------------------------|
+| `__type` | string | 1 | Literal 'DataManagementPlan' |
+| `available` | boolean | 0-1 | |
+| `url` | url | 0-1 | |
+
+!!! question
+Does the model for `Data Management Plan` still make sense?
+Could it be a string?
+Is "available" useful information?
+How do we ensure that either `available` or `url` is set?
+
+!!! answer
+If we cannot upload the DMP or provide a reference to a published, then we don't need this.
+
+#### Publication
+
+| Field | Type | Cardinality | Restrictions |
+|--------|--------|-------------|--------------|
+| `text` | string | 1 | |
+| `url` | url | 0-1 | |
+
+#### Address
+
+| Field | Type | Cardinality | Restrictions |
+|--------------|--------|-------------|-------------------|
+| `__type` | string | 1 | Literal 'Address' |
+| `street` | string | 1 | |
+| `postalCode` | string | 1 | |
+| `locality` | string | 1 | |
+| `country` | string | 1 | |
+| `canton` | string | 0-1 | |
+| `additional` | string | 0-1 | |
+
+#### License
+
+| Field | Type | Cardinality | Restrictions |
+|-----------|--------|-------------|-------------------|
+| `__type` | string | 1 | Literal 'License' |
+| `license` | url | 1 | |
+| `date` | date | 1 | |
+| `details` | string | 0-1 | |
+
+!!! question
+Is this model up to date with our current understanding of licenses?
+Is `details` ever used?
+What is the purpose of `date` here?
+How does it relate to a copyright statement?
+
+!!! answer
+License are depending on dates. It doesn't relate to a copyright statement.
+
+#### Attribution
+
+| Field | Type | Cardinality | Restrictions | Remark |
+|----------|--------|-------------|---------------------------|-----------------------------------|
+| `__type` | string | 1 | Literal 'Attribution' | |
+| `agent` | id | 1 | Person or Organization ID | Or can this only be person? -> No |
+| `roles` | string | 1-n | | |
+
+#### Grant
+
+| Field | Type | Cardinality | Restrictions |
+|-----------|--------|-------------|----------------------------|
+| `__type` | string | 1 | Literal 'Grant' |
+| `funders` | id[] | 1-n | Person or Organization IDs |
+| `number` | string | 0-1 | |
+| `name` | string | 0-1 | |
+| `url` | url | 0-1 | |
+
+## Entity-Relationship Diagram
+
+```mermaid
+erDiagram
+ umbrellaProject |o--|{ project : projects
+ project ||--|{ dataset : datasets
+ project ||--|| person : contactPoint
+ project ||--|| organization : contactPoint
+ project ||--|{ person : funders
+ project ||--|{ organization : funders
+ project |o--|{ collection : collections
+ dataset ||--|{ record : records
+ collection |o--o{ collection : collections
+ collection |o--o{ record : records
+ person ||--|{ organization : affiliations
+
+ umbrellaProject {
+ string __id "1"
+ string __type "1; Literal 'UmbrellaProject'"
+ string name "1"
+ id[] projects "1-n; Project IDs"
+ lang_string description "0-1"
+ lang_string[] alternativeNames "0-n"
+ url url "0-1"
+ id contactPoint "0-1"
+ id[] institutionalPartner "0-n; Organization IDs"
+ }
+
+ project {
+ string __id "1"
+ string __type "1; Literal 'Project'"
+ string shortcode "1"
+ string status "1; Literal 'Ongoing', 'Finished'"
+ string name "1"
+ lang_string description "1"
+ date startDate "1"
+ string teaserText "1"
+ url url "1"
+ string howToCite "1"
+ id[] datasets "1-n; Dataset IDs"
+ lang_string[] keywords "1-n"
+ lang_string_or_url[] disciplines "1-n"
+ lang_string_or_url[] temporalCoverage "1-n"
+ url[] spatialCoverage "1-n"
+ id[] funders "1-n; Person or Organization IDs"
+ attribution[] attributions "1-n"
+ date endDate "0-1"
+ url secondaryURL "0-1"
+ dmp dataManagementPlan "0-1"
+ id contactPoint "0-1"
+ publication[] publications "0-n"
+ grant[] grants "0-n"
+ lang_string[] alternativeNames "0-n"
+ }
+
+ dataset {
+ string __id "1"
+ string __type "1; Literal 'Dataset'"
+ string title "1"
+ string[] typeOfData "1-n; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'"
+ license[] licenses "1-n"
+ string[] copyright "1-n"
+ attribution[] attributions "1-n"
+ string howToCite "0-1"
+ lang_string description "0-1"
+ date dateCreated "0-1"
+ }
+
+ collection {
+ string __id "1"
+ string __type "1; Literal 'Collection'"
+ string name "1"
+ string accessConditions "1; Literal 'open', 'restricted' or 'closed'"
+ string provenance "0-1"
+ date datePublished "0-1"
+ date dateCreated "0-1"
+ date dateModified "0-1"
+ url distribution "0-1"
+ id[] records "0-n; Record IDs"
+ id[] collections "0-n; Collection IDs"
+ lang_string[] alternativeNames "0-n"
+ lang_string[] keywords "0-n"
+ url[] urls "0-n"
+ lang_string_or_url[] additional "0-n"
+ lang_string_or_url[] description "1-n"
+ string[] typeOfData "1-n; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'"
+ license[] licenses "1-n"
+ string[] copyright "1-n"
+ lang_string[] languages "1-n"
+ attribution[] attributions "1-n"
+ }
+
+ record {
+ string __id "1"
+ string __type "1; Literal 'Record'"
+ string pid "1"
+ lang_string label "1"
+ string accessConditions "1; Literal 'open', 'restricted' or 'closed'"
+ license license "1"
+ string copyright "1"
+ attribution attribution "1"
+ string provenance "0-1"
+ date datePublished "0-1"
+ date dateCreated "0-1"
+ date dateModified "0-1"
+ string typeOfData "0-1; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'"
+ }
+
+ person {
+ string __id "1"
+ string __type "1; Literal 'Person'"
+ string[] givenNames "1-n"
+ string[] familyNames "1-n"
+ string[] jobTitles "0-n"
+ id[] affiliations "0-n; Organization IDs"
+ address address "0-1"
+ string email "0-1"
+ string secondaryEmail "0-1"
+ url[] authorityRefs "0-n"
+ }
+
+ organization {
+ string __id "1"
+ string __type "1; Literal 'Organization'"
+ string name "1"
+ url url "1"
+ address address "0-1"
+ string email "0-1"
+ lang_string alternativeName "0-1"
+ url[] authorityRefs "0-n"
+ }
+```
+
+## Change Log
+
+- Make `Grant` a value type and remove it from the top level.
+- Added entity `umbrellaProject` to the top level.
+- Added entity `collection` to the top level.
+- Added entity `record` to the top level.
+- Added `copyright` to `dataset`.
+- Changed type of `abstract`/`description` in `dataset` to `lang_string`.
+- Changed cardinality of `abstract`/`description` in `dataset` to 1.
+- Changed cardinality of `howToCite` in `dataset` to 0-1.
+- Changed cardinality of `description` in `dataset` to 0-1.
+- Removed `accessConditions` from `dataset`.
+- Removed `status` from `dataset`.
+- Renamed `abstract` to `description` in `dataset`.
+- Removed `languages` from `dataset`.
+- Removed `datePublished`, and `dateModified` from `dataset`.
+- Removed `distribution` from `dataset`.
+- Removed `additional` from `dataset`.
+- Removed `alternativeTitles` from `dataset`.
+- Removed `urls` from `dataset`.
diff --git a/docs/data/provisional-datamodel.md b/docs/data/provisional-datamodel.md
deleted file mode 100644
index ca104cb6..00000000
--- a/docs/data/provisional-datamodel.md
+++ /dev/null
@@ -1,599 +0,0 @@
-# Provisional Data Model
-
-!!! warning
- This document does _not_ represent the current state of the metadata model.
- It is a working document for planned upcoming changes to the metadata model.
-
-!!! note
- This model is an idealized version of the metadata model.
- With the current implementation that is entirely separate from the DSP,
- it is not feasible to implement metadata on the record level.
- Such a system may be implemented in the archive in the future,
- but for now, we will keep the metadata on the dataset level.
- A separate, simplified model for applying some of these changes,
- while remaining compatible with the current implementation,
- should be created alongside this model.
-
-## Overview
-
-The metadata model is a hierarchical structure of metadata elements.
-
-```mermaid
-
-flowchart TD
- hyper-project[Umbrella Project] -->|1-n| project[Research Project]
- project -->|1-n| dataset[Dataset]
- dataset -->|1-n| record[Record /
Resource]
- project -->|0-n| collection[Collection]
- collection --> collection
- hyper-project -->|0-n| collection
- collection --> record
-```
-
-- A `Umbrella Project` is optional and collects one or more `Research Projects`.
- It is typically of institutional nature,
- not directly tied to a specific funding grant,
- and may be long-lived.
- Examples are EKWS/CAS, BEOL or LIMC.
-- A `Research Project` is the main entity of the metadata model.
- It corresponds to a `project` in the DSP.
- It is typically tied to a specific funding grant,
- and hence has a limited lifetime of ~3-5 years;
- multiple funding rounds and a longer lifetime are possible.
- A `Research Project` is part of 0-1 `Umbrella Project`,
- it has 1-n `Datasets` and 0-n `Collections`.
-- A `Dataset` is a collection of `Records` within a `Research Project`.
- It is mostly meant for system-internal and technical use,
- and should not have particular semantics or a "historical meaning" in the context of the project.
- A `Dataset` is part of exactly 1 `Research Project`
- and contains 1-n `Records`.
-- A `Collection` is also a collection of `Records` within a `Research Project`.
- It is meant for semantic grouping of `Records` within a `Research Project`,
- and may have a "historical meaning" in the context of the project.
- Examples may be physical collections such as p person's "Nachlass" in an archive,
- or groupings of records based on a specific research question within a project.
- A `Collection` is part of at least 1 `Research Project`, `Umbrella Project` or `Collection`,
- but can be part of multiple. It may either contain 0-n `Collections` or 1-n `Records`.
-- A `Record` is a single resource within a `Dataset`.
- It represents a single entity, and the smallest unit that can meaningfully have an identifier.
- It maps to a `knora-base:Resource` (DSP-API) or an `Asset` (SIPI/Ingest) in the DSP.
- A `Record` is part of exactly 1 `Dataset` and may be part of 0-n `Collections`.
-
-Additionally, there are the entities `Person` and `Organization`:
-`Person` and `Organization` are entities that are independent of the `Research Project` hierarchy,
-and may be related to various entities within the hierarchy.
-
-
-## Top Level
-
-A set of metadata consists of the following top-level elements:
-
-- Umbrella Project
-- Project
-- Dataset
-- Collection
-- Record
-- Person
-- Organization
-
-Each of these elements is an entity identified by a unique identifier.
-Other elements can refer to these entities by their identifier.
-
-Any other metadata element may itself be a complex object,
-but it is always part of one of the top-level elements.
-Such elements do not have an identifier,
-but are identified by their position in the hierarchy.
-
-| Field | Type | Cardinality |
-| ----------------- | --------------- | ----------- |
-| `$schema` | string | 0-1 |
-| `umbrellaProject` | umbrellaProject | 0-1 |
-| `project` | project | 1 |
-| `datasets` | dataset[] | 1-n |
-| `collections` | collection[] | 0-n |
-| `records` | record[] | 0-n |
-| `persons` | person[] | 0-n |
-| `organizations` | organization[] | 0-n |
-
-
-!!! question
- Do we consider "permissions" as metadata?
- (Not as they are in the DSP, but as they will be in the archive;
- that is: "open", "restricted", "embargo", "metadata only".)
- If so, this should be added on each level, I suppose.
-
-
-## Types
-
-### Entity Types
-
-#### Unbrella Project
-
-| Field | Type | Card. | Restrictions |
-| ---------------------- | ------------- | ----- | ------------------------------------------------------------ |
-| `__id` | string | 1 | |
-| `__type` | string | 1 | Literal 'UmbrellaProject' |
-| `name` | string | 1 | |
-| `projects` | id[] | 1-n | String containing the identifier of a project |
-| `description` | lang_string | 0-1 | |
-| `alternativeNames` | lang_string[] | 0-n | |
-| `url` | url | 0-1 | |
-| `contactPoint` | id | 0-1 | String containing the identifier of a person or organization |
-| `institutionalPartner` | id[] | 0-n | String containing the identifier of an organization |
-
-!!! question
- This opens up the questions of how to deal with multiple projects in a umbrella project.
- We probably want to keep one entry per project,
- so this leaves us with either duplicating the umbrella project metadata for each project,
- or having umbrella project metadata separately and only linking it from the project.
- The latter seems preferable,
- but then the question arises who gets to edit the umbrella project metadata.
- For a first implementation, we could simply duplicate the metadata for each project,
- and later factor it out.
-
-!!! question
- what is the best name for `institutionalPartner`?
- AI suggested:
- - Affiliated Institution
- - Associated Body
- - Supporting Organization
- - Institutional Partner
-
-!!! question
- How do we capture the time aspect of the data provenance and genesis in this context? Should this be here?
- Concretely, an umbrella project is often like a "timeline" of projects, or the "history" of a series of projects.
-
-To make the model of this entity as flexible as possible,
-most of the fields are optional.
-
-
-#### Project
-
-| Field | Type | Cardinality | Restrictions |
-| -------------------- | ------------------- | ----------- | ------------------------------------------------------------ |
-| `__type` | string | 1 | Literal "Project" |
-| `shortcode` | string | 1 | 4 char hexadecimal |
-| `status` | string | 1 | Literal "Ongoing" or "Finished" |
-| `name` | string | 1 | |
-| `description` | lang_string | 1 | |
-| `startDate` | date | 1 | String of format "YYYY-MM-DD" |
-| `teaserText` | string | 1 | |
-| `url` | url | 1 | |
-| `howToCite` | string | 1 | |
-| `datasets` | id[] | 1-n | String containing the identifier of a dataset |
-| `keywords` | lang_string[] | 1-n | |
-| `disciplines` | lang_string / url[] | 1-n | |
-| `temporalCoverage` | lang_string / url[] | 1-n | |
-| `spatialCoverage` | url[] | 1-n | |
-| `funders` | id[] | 1-n | String containing the identifier of a person or organization |
-| `attributions` | attribution[] | 1-n | |
-| `endDate` | date | 0-1 | String of format "YYYY-MM-DD" |
-| `secondaryURL` | url | 0-1 | |
-| `dataManagementPlan` | dmp | 0-1 | |
-| `contactPoint` | id | 0-1 | String containing the identifier of a person or organization |
-| `publications` | publication[] | 0-n | |
-| `grants` | grant[] | 0-n | |
-| `alternativeNames` | lang_string[] | 0-n | |
-
-!!! question
- If we can have copyright/license on dataset level,
- do we want to have it on project level as well?
- In any case, it should be computed from the datasets/records.
-
-!!! question
- Do we still need funders if we have grants?
-
-!!! question
- What about projects that do not have funding?
-
-!!! question
- Do we want my proposed `attributions` field n project?
-
-!!! question
- Should we have an `abstract` field in the project, like we used to have in the dataset?
-
-
-#### Dataset
-
-| Field | Type | Cardinality | Restrictions | Remarks |
-| -------------- | ------------- | ----------- | ------------------------------------------------ | ------------------------------------------------------- |
-| `__id` | string | 1 | | |
-| `__type` | string | 1 | Literal "Dataset" | |
-| `title` | string | 1 | | may be auto-generated? |
-| `typeOfData` | string[] | 1-n | Literal "XML", "Text", "Image", "Video", "Audio" | does this still make sense? should it be cardinality 1? |
-| `licenses` | license[] | 1-n | | should be computed from the records |
-| `copyright` | string[] | 1-n | | computed along with license |
-| `attributions` | attribution[] | 1-n | | can this be computed? |
-| `howToCite` | string | 0-1 | | still wanted? |
-| `description` | lang_string | 0-1 | | |
-| `dateCreated` | date | 0-1 | | |
-
-!!! note
- If we think of a dataset as something internal,
- we should limit the metadata to what is necessary for the system to work.
- Additionally, we may want to have some minimal descriptive metadata for the dataset,
- (like for the use case that a project once a year grabs a box of achrival material and digitizes it).
-
-!!! question
- Do we need to store the license on the dataset level,
- or can we compute it from the records?
- If we store it on the dataset level,
- how do we deal with datasets that contain records with different licenses?
-
-!!! question
- Do we need to store the language on the dataset level,
- or can we compute it from the records?
- If we store it on the dataset level,
- how do we deal with datasets that contain records in different languages?
-
-!!! question
- Do we need to store the attribution on the dataset level,
- or can we compute it from the records?
- If we store it on the dataset level,
- how do we deal with datasets that contain records with different attributions?
-
-!!! question
- Do we need a reference to the records in the dataset?
-
-!!! question
- Does `dateCreated` suffice here? There were more date properties in the old model.
-
-Data sets arefor internal use,
-they serve to partition the data into manageable chunks.
-This is done both by type of data (RDF vs. assets), and by size.
-
-In some cases, there may be a "logical" grouping consisting a dataset,
-e.g. if data is digitized in a batch and there is a temporal separation between the batches.
-In these cases, the project may make use of the descriptive metadata of the dataset.
-But normally, the dataset is just a technical entity, and should not carry semantic information.
-
-#### Collection
-
-| Field | Type | Cardinality | Restrictions | Remarks |
-| ------------------ | ----------------- | ----------- | ------------------------------------------------ | -------------------------------------------------------- |
-| `__id` | string | 1 | | |
-| `__type` | string | 1 | Literal 'Collection' | |
-| `name` | string | 1 | | |
-| `description` | string / url | 1-n | | |
-| `typeOfData` | string[] | 1-n | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; does this still make sense? |
-| `licenses` | license[] | 1-n | | copied from dataset; should be computed from the records |
-| `copyright` | string[] | 1-n | | computed along with license |
-| `languages` | lang_string[] | 1-n | | copied from dataset; does this make sense? |
-| `attributions` | attribution[] | 1-n | | copied from dataset; can this be calculated? |
-| `provenance` | string | 0-1 | | |
-| `distribution` | url | 0-1 | | copied from dataset; does this make sense? |
-| `records` | id[] | 0-n | Record IDs | can be 0 in case it points to a collection |
-| `collections` | id[] | 0-n | Collection IDs | |
-| `alternativeNames` | lang_string[] | 0-n | | |
-| `keywords` | lang_string[] | 0-n | | does this make sense? |
-| `urls` | url[] | 0-n | | copied from dataset; |
-| `additional` | lang_string / url | 0-n | | copied from dataset; |
-
-
-!!! question
- Do we need a reference to the records in the collection?
-
-
-#### Record
-
-| Field | Type | Cardinality | Restrictions | Remarks |
-| ------------------ | ----------- | ----------- | ------------------------------------------------ | -------------------------------------------------------- |
-| `__id` | string | 1 | | |
-| `__type` | string | 1 | Literal 'Record' | |
-| `pid` | id | 1 | | or `ARK`? |
-| `label` | lang_string | 1 | | do we want this, or does it go too far? |
-| `accessConditions` | string | 1 | Literal "open", "restricted" or "closed" | copied from dataset; change to proper terms |
-| `license` | license | 1 | | copied from dataset; should be computed from the records |
-| `copyright` | string | 1 | | computed along with license |
-| `attribution` | attribution | 1 | | do we want this, or does it go too far? |
-| `provenance` | string | 0-1 | | do we want this, or does it go too far? |
-| `datePublished` | date | 0-1 | | copied from dataset; do they make sense? |
-| `dateCreated` | date | 0-1 | | copied from dataset; do they make sense? |
-| `dateModified` | date | 0-1 | | copied from dataset; do they make sense? |
-| `typeOfData` | string | 0-1 | Literal "XML", "Text", "Image", "Video", "Audio" | copied from dataset; wanted? what values? |
-
-!!! question
- How granular do we want to be with the metadata on the record level?
-
-!!! question
- If we have copyright, what is the purpose of attribution?
-
-
-#### Person
-
-| Field | Type | Cardinality | Restrictions | Remarks |
-| ---------------- | -------- | ----------- | -------------------------------------- | ------- |
-| `__id` | string | 1 | | |
-| `__type` | string | 1 | Literal 'Person' | |
-| `givenNames` | string[] | 1-n | | |
-| `familyNames` | string[] | 1-n | | |
-| `jobTitles` | string[] | 0-n | | |
-| `affiliations` | id[] | 0-n | Organization IDs | |
-| `address` | address | 0-1 | | |
-| `email` | string | 0-1 | | |
-| `secondaryEmail` | string | 0-1 | | |
-| `authorityRefs` | url[] | 0-n | References to external authority files | |
-
-
-#### Organization
-
-| Field | Type | Cardinality | Restrictions | Remarks |
-| ----------------- | ----------- | ----------- | -------------------------------------- | ------- |
-| `__id` | string | 1 | | |
-| `__type` | string | 1 | Literal 'Organization' | |
-| `name` | string | 1 | | |
-| `url` | url | 1 | | |
-| `address` | address | 0-1 | | |
-| `email` | string | 0-1 | | |
-| `alternativeName` | lang_string | 0-1 | | |
-| `authorityRefs` | url[] | 0-n | References to external authority files | |
-
-
-### Value Types
-
-#### String with Language Tag (`lang_string`)
-
-Object with an ISO language code as key and a string as value.
-
-```json
-{
- "en": "Lorem ipsum in English.",
- "de": "Lorem ipsum auf Deutsch."
-}
-```
-
-
-#### Date
-
-String with the format `YYYY-MM-DD`.
-
-
-#### URL
-
-An object representing a URL.
-Depending on the `type` field,
-the URL may be a generic URL
-or a more specific link, like a PID
-or a reference to a resource in an external authority file.
-
-
-| Field | Type | Cardinality | Restrictions |
-| -------- | ------ | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
-| `__type` | string | 1 | Literal 'URL' |
-| `type` | string | 1 | Literal 'URL', 'Geonames', 'Pleiades', 'Skos', 'Periodo', 'Chronontology', 'GND', 'VIAF', 'Grid', 'ORCID', 'Creative Commons', 'DOI', 'ARK' |
-| `url` | string | 1 | |
-| `text` | string | 0-1 | |
-
-!!! question
- can we model different types of URLs in a more sensible way?
-
-
-#### Data Management Plan (`dmp`)
-
-| Field | Type | Cardinality | Restrictions |
-| ----------- | ------- | ----------- | ---------------------------- |
-| `__type` | string | 1 | Literal 'DataManagementPlan' |
-| `available` | boolean | 0-1 | |
-| `url` | url | 0-1 | |
-
-
-!!! question
- Does the model for `Data Management Plan` still make sense?
- Could it be a string?
- Is "available" useful information?
- How do we ensure that either `available` or `url` is set?
-
-
-#### Publication
-
-| Field | Type | Cardinality | Restrictions |
-| ------ | ------ | ----------- | ------------ |
-| `text` | string | 1 | |
-| `url` | url | 0-1 | |
-
-
-#### Address
-
-| Field | Type | Cardinality | Restrictions |
-| ------------ | ------ | ----------- | ----------------- |
-| `__type` | string | 1 | Literal 'Address' |
-| `street` | string | 1 | |
-| `postalCode` | string | 1 | |
-| `locality` | string | 1 | |
-| `country` | string | 1 | |
-| `canton` | string | 0-1 | |
-| `additional` | string | 0-1 | |
-
-
-#### License
-
-| Field | Type | Cardinality | Restrictions |
-| --------- | ------ | ----------- | ----------------- |
-| `__type` | string | 1 | Literal 'License' |
-| `license` | url | 1 | |
-| `date` | date | 1 | |
-| `details` | string | 0-1 | |
-
-!!! question
- Is this model up to date with our current understanding of licenses?
- Is `details` ever used?
- What is the purpose of `date` here?
- How does it relate to a copyright statement?
-
-
-#### Attribution
-
-| Field | Type | Cardinality | Restrictions | Remark |
-| -------- | ------ | ----------- | ------------------------- | --------------------------- |
-| `__type` | string | 1 | Literal 'Attribution' | |
-| `agent` | id | 1 | Person or Organization ID | Or can this only be person? |
-| `roles` | string | 1-n | | |
-
-
-#### Grant
-
-| Field | Type | Cardinality | Restrictions |
-| --------- | ------ | ----------- | -------------------------- |
-| `__type` | string | 1 | Literal 'Grant' |
-| `funders` | id[] | 1-n | Person or Organization IDs |
-| `number` | string | 0-1 | |
-| `name` | string | 0-1 | |
-| `url` | url | 0-1 | |
-
-
-## Entity-Relationship Diagram
-
-```mermaid
-erDiagram
- umbrellaProject |o--|{ project : projects
- project ||--|{ dataset : datasets
- project ||--|| person : contactPoint
- project ||--|| organization : contactPoint
- project ||--|{ person : funders
- project ||--|{ organization : funders
- project |o--|{ collection : collections
- dataset ||--|{ record : records
- collection |o--o{ collection : collections
- collection |o--o{ record : records
- person ||--|{ organization : affiliations
-
- umbrellaProject {
- string __id "1"
- string __type "1; Literal 'UmbrellaProject'"
- string name "1"
- id[] projects "1-n; Project IDs"
- lang_string description "0-1"
- lang_string[] alternativeNames "0-n"
- url url "0-1"
- id contactPoint "0-1"
- id[] institutionalPartner "0-n; Organization IDs"
- }
-
- project {
- string __id "1"
- string __type "1; Literal 'Project'"
- string shortcode "1"
- string status "1; Literal 'Ongoing', 'Finished'"
- string name "1"
- lang_string description "1"
- date startDate "1"
- string teaserText "1"
- url url "1"
- string howToCite "1"
- id[] datasets "1-n; Dataset IDs"
- lang_string[] keywords "1-n"
- lang_string_or_url[] disciplines "1-n"
- lang_string_or_url[] temporalCoverage "1-n"
- url[] spatialCoverage "1-n"
- id[] funders "1-n; Person or Organization IDs"
- attribution[] attributions "1-n"
- date endDate "0-1"
- url secondaryURL "0-1"
- dmp dataManagementPlan "0-1"
- id contactPoint "0-1"
- publication[] publications "0-n"
- grant[] grants "0-n"
- lang_string[] alternativeNames "0-n"
- }
-
- dataset {
- string __id "1"
- string __type "1; Literal 'Dataset'"
- string title "1"
- string[] typeOfData "1-n; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'"
- license[] licenses "1-n"
- string[] copyright "1-n"
- attribution[] attributions "1-n"
- string howToCite "0-1"
- lang_string description "0-1"
- date dateCreated "0-1"
- }
-
- collection {
- string __id "1"
- string __type "1; Literal 'Collection'"
- string name "1"
- string accessConditions "1; Literal 'open', 'restricted' or 'closed'"
- string provenance "0-1"
- date datePublished "0-1"
- date dateCreated "0-1"
- date dateModified "0-1"
- url distribution "0-1"
- id[] records "0-n; Record IDs"
- id[] collections "0-n; Collection IDs"
- lang_string[] alternativeNames "0-n"
- lang_string[] keywords "0-n"
- url[] urls "0-n"
- lang_string_or_url[] additional "0-n"
- lang_string_or_url[] description "1-n"
- string[] typeOfData "1-n; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'"
- license[] licenses "1-n"
- string[] copyright "1-n"
- lang_string[] languages "1-n"
- attribution[] attributions "1-n"
- }
-
- record {
- string __id "1"
- string __type "1; Literal 'Record'"
- string pid "1"
- lang_string label "1"
- string accessConditions "1; Literal 'open', 'restricted' or 'closed'"
- license license "1"
- string copyright "1"
- attribution attribution "1"
- string provenance "0-1"
- date datePublished "0-1"
- date dateCreated "0-1"
- date dateModified "0-1"
- string typeOfData "0-1; Literal 'XML', 'Text', 'Image', 'Video', 'Audio'"
- }
-
- person {
- string __id "1"
- string __type "1; Literal 'Person'"
- string[] givenNames "1-n"
- string[] familyNames "1-n"
- string[] jobTitles "0-n"
- id[] affiliations "0-n; Organization IDs"
- address address "0-1"
- string email "0-1"
- string secondaryEmail "0-1"
- url[] authorityRefs "0-n"
- }
-
- organization {
- string __id "1"
- string __type "1; Literal 'Organization'"
- string name "1"
- url url "1"
- address address "0-1"
- string email "0-1"
- lang_string alternativeName "0-1"
- url[] authorityRefs "0-n"
- }
-```
-
-
-
-## Change Log
-
-
-- Make `Grant` a value type and remove it from the top level.
-- Added entity `umbrellaProject` to the top level.
-- Added entity `collection` to the top level.
-- Added entity `record` to the top level.
-- Added `copyright` to `dataset`.
-- Changed type of `abstract`/`description` in `dataset` to `lang_string`.
-- Changed cardinality of `abstract`/`description` in `dataset` to 1.
-- Changed cardinality of `howToCite` in `dataset` to 0-1.
-- Changed cardinality of `description` in `dataset` to 0-1.
-- Removed `accessConditions` from `dataset`.
-- Removed `status` from `dataset`.
-- Renamed `abstract` to `description` in `dataset`.
-- Removed `languages` from `dataset`.
-- Removed `datePublished`, and `dateModified` from `dataset`.
-- Removed `distribution` from `dataset`.
-- Removed `additional` from `dataset`.
-- Removed `alternativeTitles` from `dataset`.
-- Removed `urls` from `dataset`.
diff --git a/docs/index.md b/docs/index.md
index d5e22889..8cb5cd09 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,23 +1,48 @@
-This page provides documentation for the DSP-META repository.
+# DSP Metadata
-The repository contains all metadata to projects deposited on the DaSCH Service Platform (DSP),
-as well as the code of the [DSP Metadata Browser](https://meta.dasch.swiss).
+The dsp-meta repository contains the code of the [DSP Metadata Browser](https://meta.dasch.swiss),
+as well all metadata from projects deposited on the DaSCH Service Platform (DSP).
## DSP Metadata
+This documentation provides an overview of the metadata model used by the DSP to manage and describe
+research data in the humanities. Our vision is to fully capture the provenance of research data—detailing
+its origins, how it was created, and how it has been used over time.
+
+Humanities research projects are inherently diverse and often span multiple years or even decades.
+Many of these projects receive funding from various grants and different funders throughout their lifecycle.
+Additionally, the researchers involved in creating and reusing the data may change over time, reflecting
+the evolving nature of academic collaboration.
+
+Understanding the complex history of research data is crucial for transparency, reproducibility, and future scholarship.
+The DSP metadata model is designed to accommodate this complexity by meticulously recording the provenance of data. It
+tracks:
+
+- Funding Sources: Documenting the multiple grants and funders that have supported the project over time.
+- Research Personnel: Keeping a record of all researchers who have contributed to or utilized the data, acknowledging
+ the shifts in team composition.
+- Data Lifecycle: Outlining how the data was created, modified, and reused, providing a comprehensive view of its
+ evolution.
+
+By capturing this rich contextual information, we aim to provide a robust framework that supports the integrity and
+longevity of humanities research data. Whether you are a researcher contributing new data or a scholar exploring
+existing datasets, this documentation will guide you through our metadata practices and help you understand the stories
+behind the data.
+
### Consuming Metadata
-If you are interested in viewing the metadata in human-readable form,
+If you are interested in viewing the metadata in human-readable form,
you can visit the [DSP Metadata Browser](https://meta.dasch.swiss).
-If you are interested in re-using our metadata, you can find extensive documentation [here](data/current-datamodel.md).
+If you are interested in re-using our metadata, you can find extensive documentation [here](data/current-datamodel.md),
+and the work-in-progress documentation of our future data-model [here](data/future-datamodel.md).
The metadata itself can be found [here](https://github.com/dasch-swiss/dsp-meta/tree/main/data/json)
or requested over the API as described [here](data/api.md).
### Adding Metadata
-For adding metadata, please see [here](adding-metadata.md).
+For adding metadata, please see [here](data/adding-metadata.md).
## Code Documentation
diff --git a/mkdocs.yml b/mkdocs.yml
index f8ced1ff..d021ce48 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -5,8 +5,8 @@ nav:
- Consuming Metadata:
- Metadata API: data/api.md
- Current Data Model: data/current-datamodel.md
- - Provisional Data Model: data/provisional-datamodel.md
- - Adding Metadata: adding-metadata.md
+ - Future Data Model: data/future-datamodel.md
+ - Adding Metadata: data/adding-metadata.md
- Code Documentation:
- Overview: code/overview.md
- Front End: code/front-end.md