Skip to content

Commit

Permalink
Merge branch 'dev' into 'main'
Browse files Browse the repository at this point in the history
Merge dev into main for documenation release v1.0.0

See merge request hmc/hmc-public/unhide/documentation!5
  • Loading branch information
broeder-j committed Oct 11, 2023
2 parents 9df0cf2 + f984f78 commit 139aedd
Show file tree
Hide file tree
Showing 85 changed files with 9,186 additions and 46 deletions.
22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Volker Hofmann [<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/0

Research across the Helmholtz Association (HGF) depends and thrives on a complex network of inter- and multidisciplinary collaborations which spans across its 18 Centres and beyond.

However, the (meta)data generated through the HGF's research and operations is typically siloed within institutional infrastructure and often within individual teams. The result is that the wealth of the HGF's (meta)data is stored and maintained in a scattered manner, and cannot be used to its full value to scientists, managers, stratgists, and policy makers.
However, the (meta)data generated through the HGF's research and operations is typically siloed within institutional infrastructure and often within individual teams. The result is that the wealth of the HGF's (meta)data is stored and maintained in a scattered manner, and cannot be used to its full value to scientists, managers, strategists, and policy makers.

To address this challenge, the Helmholtz Metadata Collaboration (HMC) is launching the **unified Helmholtz Information and Data Exchange (unHIDE)**. This initiative seeks to create a lightweight and sustainable interoperability layer to interlink data infrastructures and provide greater, cross-organisational access to the HGF's (meta)data and information assets. Using proven and globally adopted knowledge graph technology (Box 1), unHIDE will develop a comprehensive association-wide Knowledge Graph (KG) the "Helmholtz-KG": a solution to connect (meta)data, information, and knowledge.

Expand All @@ -21,12 +21,12 @@ To address this challenge, the Helmholtz Metadata Collaboration (HMC) is launchi
> - A "knowledge graph" uses such a graph structure to capture knowledge about how a collection of things (represented as nodes) relate to one another (via edges). This helps organisations keep track of their collective knowledge, especially in complex and rapidly changing scenarios.
> - Social networks are perhaps the best known graphs, that store knowledge about who knows whom and how, what their interests are, what groups they belong to, and what content they create and interact with.
With the implementation of the Helmholtz-KG, unHIDE will create substantial additinal value for the Helmholtz digital ecosystem and its interconnectivity:
With the implementation of the Helmholtz-KG, unHIDE will create substantial additional value for the Helmholtz digital ecosystem and its interconnectivity:

**With the development of the Helmholtz-KG, unHide will:**
- increase discoverability and actionability of HGF data across the whole Association*
- motivate enhancement of (meta)data quality [1] and interoperability
- provide overviews and diagnositcs of the HGF dataspace and digital assets
- provide overviews and diagnostics of the HGF dataspace and digital assets
- allow for traceable and reproducible recovery of (meta)data to enhance research
- support connectivity of HGF data to interact with global infrastructures and projects
- act as a central access and distribution point for stakeholders within and beyond the HGF
Expand All @@ -47,18 +47,18 @@ The Helmholtz Knowledge Graph (Helmholtz-KG) aims to enhance the HGF's digital c
>
> **[Learn more: https://www.w3.org/2013/data/](https://www.w3.org/2013/data/)**
To ensure ease of use, the Helmholtz-KG will be based on a lightweight and internationally adopted interoperabiliy architecture based on schema.org semantics and JSON-LD serialisation [2]. This architecture widely used by data producers - including public, private, and governmental data systems - to link and expose scattered, diverse digital assets. By reusing this architecture, unHIDE will ensure that the Helmholtz-KG is able to natively interoperate with global systems.
To ensure ease of use, the Helmholtz-KG will be based on a lightweight and internationally adopted interoperabiliy architecture based on schema.org semantics and JSON-LD serialisation [2]. This architecture is widely used by data producers - including public, private, and governmental data systems - to link and expose scattered, diverse digital assets. By reusing this architecture, unHIDE will ensure that the Helmholtz-KG is able to natively interoperate with global systems.

### Modular design & Extensibility
While the foundation of the Helmholtz-KG will reuse standard web architectural elements and proven, globally adopted conventions, the KG itself is modular by nature: Graphs can be merged, split, independently managed, and readily interfaced with other digital resources without compromising core integrity and functionality. In this manner, Helmholtz data scientists and engineers will be able to propose and test extensions to the graph with minimal overhead, which wll support the ability to extend into existing and well-established systems in the HGF.
While the foundation of the Helmholtz-KG will reuse standard web architectural elements and proven, globally adopted conventions, the KG itself is modular by nature: Graphs can be merged, split, independently managed, and readily interfaced with other digital resources without compromising core integrity and functionality. In this manner, Helmholtz data scientists and engineers will be able to propose and test extensions to the graph with minimal overhead, which will support the ability to extend into existing and well-established systems in the HGF.

This modularity (especially the ability to securely and independently manage parts of the overall graph) will also allow to realize different modes of access to digital assets (e.g. respecting sensitivity and confidentiality but also permitting full openness). The initial implementation of the Helmholtz-KG will not contend with sensitive or confidential data, but such capacities (e.g. user management, license recognition across (meta)data holdings, and authentication) can be explored and implemented when the core technology and operational procedures are stabilised.
This modularity (especially the ability to securely and independently manage parts of the overall graph) will also allow to realize different modes of access to digital assets (e.g. respecting sensitivity and confidentiality but also permitting full openness). The initial implementation of the Helmholtz-KG will not contain sensitive or confidential data, but such capacities (e.g. user management, license recognition across (meta)data holdings, and authentication) can be explored and implemented when the core technology and operational procedures are stabilised.

The backbone architecture of the Helmholtz Knowledge Graph will be licensed under [CC0/CCBY](https://creativecommons.org/about/cclicenses/) to enable crosswalks to the outside world and gain visibility as e.g. a sub-cloud of the [Linked Open Data Cloud](https://lod-cloud.net/).

### Inspiration

The implementation the Helmholtz-KG architecture is inspired by the federation of stakeholders in IOC-UNESCO's Ocean Data and Information System (ODIS), interconnected by the [ODIS Architecture](https://book.oceaninfohub.org/) [2], and rendered into a knowledge graph federating over 50 partners across the globe by the Ocean InfoHub Project (OIH). Personnel from the HMC's Earth and Environment Hub chair the ODIS federation and lead the technical implementation of OIH, offering direct alignment with unHIDE.
The implementation of the Helmholtz-KG architecture is inspired by the federation of stakeholders in IOC-UNESCO's Ocean Data and Information System (ODIS), interconnected by the [ODIS Architecture](https://book.oceaninfohub.org/) [2], and rendered into a knowledge graph federating over 50 partners across the globe by the Ocean InfoHub Project (OIH). Personnel from the HMC's Earth and Environment Hub chair the ODIS federation and lead the technical implementation of OIH, offering direct alignment with unHIDE.


## Data Sources
Expand All @@ -69,11 +69,11 @@ Initial efforts of the Helmholtz-KG implementation will focus on the representat
- published Datasets
- Software
- Institutions
- Infrastructure & Ressources
- Infrastructure & Resources
- Researchers & Experts
- Projects

The representation of these instances will semantically alligned with the [schema.org](https://schema.org/docs/full.html) vocabulary, a globally adopted standard offering a relaxed frame for the representation of heterogeneous data. Following the initial implementation the semantic expresivness of the graph can be increased by integrating domain ontologies such as the HMC developed [Helmholtz Digitization Ontology](https://codebase.helmholtz.cloud/hmc/hmc-public/hob/hdo) (HDO), which provides precise and comprehensive semantics of the concepts and practices used to manage digital assets.
The representation of these instances will be semantically aligned with the [schema.org](https://schema.org/docs/full.html) vocabulary, a globally adopted standard offering a relaxed frame for the representation of heterogeneous data. Following the initial implementation the semantic expressiveness of the graph can be increased by integrating domain ontologies such as the HMC developed [Helmholtz Digitization Ontology](https://codebase.helmholtz.cloud/hmc/hmc-public/hob/hdo) (HDO), which provides precise and comprehensive semantics of the concepts and practices used to manage digital assets.

#### Data Ingestion Process
The Helmholtz-KG will offer multiple options for existing and emerging HGF infrastructures, data providers, and communities to declare their resources and digital assets in the graph for discoverability. We will prioritise the recommended publishing process for structured data on the web (as used by ODIS/OIH and many others): data providers would either 1) provide a sitemap or robots.txt file which will direct harvesting software to a collection of JSON-LD/schema.org documents or 2) expose JSON-LD snippets in the document head element of a web resource (i.e. HTML document). Both approaches are described in the publisher documentation of the Ocean InfoHub Project [3].
Expand All @@ -91,12 +91,12 @@ The initial implementation fill focus on:
- Domain-specific (data) repositories relevant to HGF
- Helmholtz GitLab Instances

Subsequent efforts will include further ressources such as:
Subsequent efforts will include further resources such as:
- Helmholtz FAIR digital objects (via HMC)
- Helmholtz Ontology Base (HOB) (via HMC)
- The Helmholtz [software directory](https://helmholtz.software/) (centrally maintained by HIFIS)
- [Helmholtz Data Challenges Platform](https://helmholtz-data-challenges.de/)
- other ressources of the Helmholtz Metadata Collaboration (HMC)
- other resources of the Helmholtz Metadata Collaboration (HMC)
- Content management systems (CMS)
- Helmholtz Computing centers (e.g. JSC)
- Helmholtz Federated IT Services (HIFIS)
Expand Down
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,6 @@ jupyter-book build docs
```

Also ideally the terms should go somewhere else and be automatically included in this.
The same goes for code documenation of the pipelines and their usage.
The same goes for code documentation of the pipelines and their usage.


9 changes: 8 additions & 1 deletion docs/_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ exclude_patterns : [_build, Thumbs.db, .DS_Store, "**.ipynb_checkpoin
# Auto-exclude files not in the toc
only_build_toc_files : false


bibtex_bibfiles:
- references.bib

#######################################################################################
# Execution settings
execute:
Expand Down Expand Up @@ -81,11 +85,14 @@ repository:
url : https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/documentation # The URL to your book's repository
path_to_book : docs # A path to your book's folder, relative to the repository root.
branch : main # Which branch of the repository should be used when creating links

provider : gitlab

#######################################################################################
# Advanced and power-user settings
sphinx:
extra_extensions : # A list of extra extensions to load by Sphinx (added to those already used by JB).
local_extensions : # A list of local extensions to load by sphinx specified by "name: path" items
recursive_update : false # A boolean indicating whether to overwrite the Sphinx config (true) or recursively update (false)
config : # key-value pairs to directly over-ride the Sphinx configuration
html_theme_options:
repository_provider: custom
82 changes: 78 additions & 4 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,81 @@
# Learn more at https://jupyterbook.org/customize/toc.html

format: jb-book
root: intro
chapters:
- file: implementation
- file: data_sources
root: landingpage.md

parts:
- caption: Introduction
numbered: False
chapters:
- file: introduction/about.md
title: "About"
- file: introduction/implementation.md
title: "Implementation overview"
- file: introduction/data_sources.md
- caption: Data in UnHIDE
numbered: True
chapters:
- file: data/overview.md
title: "Overview"
- file: data/dataset.md
title: "Dataset"
- file: data/documents.md
title: "Documents"
- file: data/experts.md
title: "Experts"
- file: data/Institutions.md
title: "Institution"
- file: data/Instruments.md
title: "Instruments"
- file: data/software.md
title: "Software"
- file: data/training.md
title: "Training"
- caption: Interacting with UnHIDE data
numbered: False
chapters:
- file: interfaces/usecases.md
title: "Use case examples"
- file: interfaces/web.md
title: "Web search"
- file: interfaces/sparql.md
title: "SPARQL endpoint"
- file: interfaces/api.md
title: "REST API"
- caption: Related Knowledge
numbered: False
chapters:
- file: knowledge/overview.md
title: "Structured data on the web"
- file: knowledge/tools.md
title: "Tools around Linked Data"
- file: knowledge/other_graphs.md
title: "Other Graphs"
- caption: Technical implementation
numbered: True
chapters:
- file: tech/datapipe.md
title: Data pipeline
sections:
- file: tech/harvesting.md
title: "Data Harvesting"
- file: tech/uplifting.md
title: "Data uplifting"
- file: tech/backend.md
title: "Architecture"
sections:
- file: dev_guide/architecture/01_introduction_and_goals.md
- file: dev_guide/architecture/02_architecture_constraints.md
- file: dev_guide/architecture/03_system_scope_and_context.md
- file: dev_guide/architecture/04_solution_strategy.md
- file: dev_guide/architecture/05_building_block_view.md
- file: dev_guide/architecture/06_runtime_view.md
- file: dev_guide/architecture/07_deployment_view.md
- file: dev_guide/architecture/08_concepts.md
- file: dev_guide/architecture/09_architecture_decisions.md
- file: dev_guide/architecture/10_quality_requirements.md
- file: dev_guide/architecture/11_technical_risks.md
- file: dev_guide/architecture/12_glossary.md



9 changes: 9 additions & 0 deletions docs/data/Institutions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Institutions

Metadata template for institutions. This is just schema.org organizations.
UnHIDE extracts these from individual records and it tries to uplifted this data with metadata provided
with ROR as well as orcid.

```{literalinclude} ./graphs/institutionTemplate.json
:linenos:
```
7 changes: 7 additions & 0 deletions docs/data/Instruments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Instruments

Metadata template for Instruments:

```{literalinclude} ./graphs/instrumentTemplate.json
:linenos:
```
5 changes: 5 additions & 0 deletions docs/data/dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Dataset

```{literalinclude} ./graphs/datasetTemplate.json
:linenos:
```
15 changes: 15 additions & 0 deletions docs/data/documents.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Documents


Metadata template for Documents, which is a combined category from the schema.org terms:
[CreativeWork](https://schema.org/CreativeWork), [DigitalDocument](https://schema.org/DigitalDocument), [Thesis](https://schema.org/Thesis), [Report](https://schema.org/Report), [Article](https://schema.org/Article), ...

So this all entries of these types will end up in the 'Document' search bucket.


Be aware that this category may contain also entries which are in reality no documents but typed in the metadata
by the high level schema.org class 'CreateWork'.

```{literalinclude} ./graphs/documentTemplate.json
:linenos:
```
8 changes: 8 additions & 0 deletions docs/data/experts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Experts

The expert category contains Persons and and Institutions combined, which are extracted from individual records.


```{literalinclude} ./graphs/expertTemplate.json
:linenos:
```
Loading

0 comments on commit 139aedd

Please sign in to comment.