Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
matthiasprobst committed Jan 2, 2024
1 parent 36ed940 commit d346054
Show file tree
Hide file tree
Showing 4 changed files with 48 additions and 25 deletions.
50 changes: 33 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,41 @@

*Note, that the project is still under development!*

The "HDF5 Research Data Management Toolbox" (h5RDMtoolbox) is a python package supporting everybody who is working with
HDF5 to achieve a sustainable data lifecycle which follows
The "HDF5 Research Data Management Toolbox" (h5RDMtoolbox) is a python package supporting everybody who is working
with HDF5 to achieve a sustainable data lifecycle which follows
the [FAIR (Findable, Accessible, Interoperable, Reusable)](https://www.nature.com/articles/sdata201618)
principles. It specifically supports the five main steps of
principles. It specifically supports the five main steps of *planning*, *collecting*, *analyzing*, *sharing* and
*reusing* data. Please visit the [documentation](https://h5rdmtoolbox.readthedocs.io/en/latest/) for detailed
information of try the [quickstart using colab](#quickstart).

1. **Planning** (defining an internal layout for HDF5, a metadata convention or/and an ontology for attribute usage)
2. **Collecting** data (creating HDF5 files with a convention in place supervising metadata usage)
3. **Analyzing** and processing data (E.g. through interface with [xarray](https://docs.xarray.dev/en/stable/), ...)
4. **Sharing data** (upload and download to repositories. Currently implemented: [Zenodo](https://zenodo.org/))
5. **Reusing data** (Map metadata to dedicated databases like [mongoDB](https://www.mongodb.com/) or use local HDF5
files themselves as a database to search for attributes).


[//]: # (1. **Planning** (defining an internal layout for HDF5, a metadata convention or/and an ontology for attribute usage))

[//]: # (2. **Collecting** data (creating HDF5 files with a convention in place supervising metadata usage))

[//]: # (3. **Analyzing** and processing data (E.g. through interface with [xarray](https://docs.xarray.dev/en/stable/), ...))

[//]: # (4. **Sharing data** (upload and download to repositories. Currently implemented: [Zenodo](https://zenodo.org/)))

[//]: # (5. **Reusing data** (Map metadata to dedicated databases like [mongoDB](https://www.mongodb.com/) or use local HDF5)

[//]: # ( files themselves as a database to search for attributes).)


## Highlights

- Combining HDF5 and [xarray](https://docs.xarray.dev/en/stable/) to allow easy access to metadata and data during
analysis and processing (see [here](https://h5rdmtoolbox.readthedocs.io/en/latest/gettingstarted/quickoverview.html#datasets-xarray-interface).
- Assigning [metadata with "globally unique and persistent identifiers"]() as required by [F1 of the FAIR
principles](https://www.go-fair.org/fair-principles/f1-meta-data-assigned-globally-unique-persistent-identifiers/).
This "remove[s] ambiguity in the meaning of your published data...".
- Define standard attributes through
[conventions](https://h5rdmtoolbox.readthedocs.io/en/latest/userguide/convention/index.html) and enforce users
to use them
- Upload HDF5 files directly to [repositories](https://h5rdmtoolbox.readthedocs.io/en/latest/userguide/repository/index.html)
like [Zenodo](https://zenodo.org/) or [use them with noSQL databases](https://h5rdmtoolbox.readthedocs.io/en/latest/userguide/database/index.html) like
[mongoDB](https://www.mongodb.com/).

## Who is the package for?
For everybody, who is...
Expand All @@ -35,13 +59,6 @@ For everybody, who ...
- ... has established conventions and managements approaches in his or her community


## Highlights

- Assigning [metadata with "globally unique and persistent identifiers"]() as required by [F1 of the FAIR
principles](https://www.go-fair.org/fair-principles/f1-meta-data-assigned-globally-unique-persistent-identifiers/).
This "remove[s] ambiguity in the meaning of your published data...".
- To be completed...

## Package Architecture/structure

The toolbox implements five modules, which are shown below. The numbers reference to their main usage in the stages in
Expand Down Expand Up @@ -79,7 +96,6 @@ on the image, which shows the research data lifecycle in the center and the resp

A paper is published in the journal [inggrid](https://preprints.inggrid.org/repository/view/23/).

<a href="https://h5rdmtoolbox.readthedocs.io/en/latest/"><img src="docs/_static/new_icon_with_text.svg" alt="RDM lifecycle" style="widht:600px;"></a>

## Installation

Expand Down
2 changes: 1 addition & 1 deletion codemeta.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"license": "https://spdx.org/licenses/MIT",
"codeRepository": "git+https://github.com/matthiasprobst/h5RDMtoolbox.git",
"name": "h5RDMtoolbox",
"version": "1.1.0",
"version": "1.2.0",
"description": "Supporting a FAIR Research Data lifecycle using Python and HDF5.",
"applicationCategory": "Engineering",
"programmingLanguage": [
Expand Down
13 changes: 10 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,17 @@ with HDF5 to achieve a sustainable data lifecycle which follows the

Highlights
----------
- Assigning [metadata with "globally unique and persistent identifiers"]() as required by [F1 of the FAIR
principles](https://www.go-fair.org/fair-principles/f1-meta-data-assigned-globally-unique-persistent-identifiers/).
- Combining HDF5 and [xarray](https://docs.xarray.dev/en/stable/) to allow easy access to metadata and data during
analysis and processing (see [here](https://h5rdmtoolbox.readthedocs.io/en/latest/gettingstarted/quickoverview.html#datasets-xarray-interface).
- Assigning [metadata with "globally unique and persistent identifiers"]() as required by [F1 of the FAIR
principles](https://www.go-fair.org/fair-principles/f1-meta-data-assigned-globally-unique-persistent-identifiers/).
This "remove[s] ambiguity in the meaning of your published data...".
- To be completed...
- Define standard attributes through
[conventions](https://h5rdmtoolbox.readthedocs.io/en/latest/userguide/convention/index.html) and enforce users
to use them
- Upload HDF5 files directly to [repositories](https://h5rdmtoolbox.readthedocs.io/en/latest/userguide/repository/index.html)
like [Zenodo](https://zenodo.org/) or [use them with noSQL databases](https://h5rdmtoolbox.readthedocs.io/en/latest/userguide/database/index.html) like
[mongoDB](https://www.mongodb.com/).


.. grid:: 3
Expand Down
8 changes: 4 additions & 4 deletions docs/userguide/wrapper/FAIRAttributes.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"\n",
"According to [F1 of the *FAIR Principles*](https://www.go-fair.org/fair-principles/f1-meta-data-assigned-globally-unique-persistent-identifiers/) attributes shall be assigned to globally unique and persistent identifiers.\n",
"\n",
"Here's what www.g-fair.org says about it:\n",
"Here's what www.go-fair.org says about it:\n",
"\n",
"*Globally unique and persistent identifiers remove ambiguity in the meaning of your published data by assigning a unique identifier to every element of metadata and every concept/measurement in your dataset. In this context, identifiers consist of an internet link (e.g., a URL that resolves to a web page that defines the concept such as a particular human protein). Many data repositories will automatically generate globally unique and persistent identifiers to deposited datasets. Identifiers can help other people understand exactly what you mean, and they allow computers to interpret your data in a meaningful way (i.e., computers that are searching for your data or trying to automatically integrate them). Identifiers are essential to the human-machine interoperation that is key to the vision of Open Science. In addition, identifiers will help others to properly cite your work when reusing your data.*\n",
"\n",
Expand All @@ -31,9 +31,9 @@
"Then, we can \"explain\" the data in the following way:\n",
"- The *group \"contact\"* <u>is</u> **Person**\n",
"- The *group \"contact\"* <u>has ORCiD</u> **\\<value\\>**\n",
"- The *dataset \"u\"* <u>hasUnit/u> **\"m/s\"**\n",
"- The *dataset \"u\"* <u>hasUnit</u> **\"m/s\"**\n",
"- *\"m/s\"* <u>is</u> **\"https://qudt.org/vocab/unit/M-PER-SEC\"**\n",
"- The *dataset \"u\"* <u>has kind of quantity/u> **Velocity** (defined by qudt)\n",
"- The *dataset \"u\"* <u>has kind of quantity</u> **Velocity** (defined by qudt)\n",
"- etc.\n",
"\n",
"Let's build such a file:"
Expand All @@ -54,7 +54,7 @@
"id": "23f59915-a00a-4f30-b7d0-37f1b58aa6b5",
"metadata": {},
"source": [
"## Attribute-IRI-Association\n",
"## Associate IRI to attributes\n",
"\n",
"An IRI can be assigned during or after attribute creation. Various possibilities are shown below.\n",
"\n",
Expand Down

0 comments on commit d346054

Please sign in to comment.