super.json documentation #485

CPernet · 2024-08-14T11:00:21Z

Hi guys,

I could not find documentation about the metadata/super.json.
How do you fill it for multiple datasets?

{dataset1},{dataset2}
or
{{dataset1},{dataset2}}
did not work for me

thx

The text was updated successfully, but these errors were encountered:

mslw · 2024-08-14T12:53:51Z

AFAIK super.json is just to say which dataset opens as a "main page" (or "landing page") i.e. which ID and version get filled in if you only go to your base URL.

So I think the literal answer is "there can be just one". The next question is, what is your intended purpose of declaring multiple datasets? If it's listing these datasets on the landing page, then the catalog's solution is to add those as subdatasets of your superdataset (they will then be shown shown in its Datasets tab).

CPernet · 2024-08-14T13:01:32Z

Hi Michał,

yes the goal is to list many datasets, some having sub-datasets too. For instance

- dataset1
- dataset2
|-- subdataset2-1
|-- subdataset2-2

so If I understand your answer, I need to create my metadataset

-metadataset
|- dataset1
|- dataset2
  |-- subdataset2-1
  |-- subdataset2-2

and the super.json only has to list metadataset

the next question is how to make sub-datasets? (then nesting becomes obvious) I see in the example type: 'redirect' but nothing in the documentation itself and uncertain about that

mslw · 2024-08-14T13:44:08Z

Yes, in terms of catalog inputs, the metadataset needs to have subdatasets property (dataset1 and dataset2 in your example). I am afraid the "Datasets" tab might not support showing the dataset tree (subdataset2-[1,2]), the only way to see them might be clicking through to the dataset2 page.

CPernet · 2024-08-14T13:55:24Z

ok I see the subdataset, but I'm pretty sure @jsheunis or @mih showed me an example of catalogue with multiple datasets at the landing page -- hopefully that is still possible

I can already move on my sudatasets anyway - thx

CPernet · 2024-08-14T14:21:54Z

well actually not obvious how to set the id ?? for instance

{"type":"dataset","name","superdataset",dataset_id":"PN000003",...}
{"type":"subdataset","name","subdataset1","dataset_id":"PN000003",...}
{"type":"file","dataset_id":"PN000003",..}
{"type":"subdataset","name","subdataset2","dataset_id":"PN000003",...}
{"type":"file","dataset_id":"PN000003",..}

this will not work since file should point to the subdataset_id, unless there is a subdataset_id we can specify, like:

{"type":"dataset","name","superdataset",dataset_id":"PN000003",...}
{"type":"subdataset","name","subdataset1","dataset_id":"PN000003","subdataset_id":"PN000003-1"...}
{"type":"file","dataset_id":"PN000003","subdataset_id":"PN000003-1",..}
{"type":"subdataset","name","subdataset2","dataset_id":"PN000003","subdataset_id":"PN000003-2",...}
{"type":"file","dataset_id":"PN000003","subdataset_id":"PN000003-2",..}

Is there a list or file I can look up at what values are allowed in the shemas?

thx

jsheunis · 2024-08-14T20:11:25Z

Hey @CPernet

I think there's some confusion about the super-dataset sub-dataset relationship, and the main page of the catalog. Like @mslw said, the super.json file only points to the id and version of the dataset that should be displayed as the homepage of the catalog. This can be set via the command line using:

datalad catalog-set --catalog my-cat --dataset-id mydataset1234 --dataset-version myversion6578 home

This does nothing with regards to subdatasets showing on the main page. The subdatasets showing up in the "datasets" tab on any dataset page (the main page or any other), are all taken from the metadata of that specific dataset (i.e. not from super.json).

So if your metadataset is set as your catalog's homepage, then the homepage will show the subdatasets of the metadataset, i.e. dataset1 and dataset2, in the "Datasets" tab.

Getting those two datasets listed as subdatasets of the metadataset is done via the catalog-add command. Any metadata added via this command has to conform to the catalog schema, and you can see exactly which fields are available and/or required here: https://github.com/datalad/datalad-catalog/tree/main/datalad_catalog/catalog/schema

(I'm sorry about the broken documentation links here: http://docs.datalad.org/projects/catalog/en/latest/catalog_schema.html, I've created an issue for this and will fix it soon: #488)

Specifically: https://github.com/datalad/datalad-catalog/blob/main/datalad_catalog/catalog/schema/jsonschema_dataset.json#L233 shows what format the subdataset property of a metadata item should have if adding a subdataset to that dataset.

Lastly, yes there are metadata files that are specified as redirects, although this is not strictly necessary. This functionality was added as part of a PR #430 (see the PR description for a detailed explanation, and how to use it). So I would suggest not going that route from the start, and just adding dataset-level metadata (including a list of its subdatasets) as per usual, and then only using the script mentioned in the PR later if you want to use the redirect functionality.

mslw · 2024-08-14T20:13:18Z

(typed at the same time, so I'll post anyway as it seems complementary)

Try something more like:

Meta-dataset (this would be selected in super.json):

{
  "type": "dataset",
  "dataset_id": "meta",
  "subdatasets": [
    {
      "dataset_id": "PN000003",
      "dataset_version": "some",
      "dataset_path": "PN000003"
    },
    {
      "dataset_id": "PN000004",
      "dataset_version": "some",
      "dataset_path": "PN000004"
    }
  ]
}

And each (sub)dataset (PN000003, PN000004 and their children) as a dataset in its own right:

{
  "type": "dataset",
  "dataset_id": "PN000003",
  "subdatasets": [
    {
      "dataset_id": "PN000003-1",
      "dataset_version": "some",
      "dataset_path": "PN000003-2"
    }
  ]
}

(typed by hand, so there will probably be missing properties or errors but you get the point).

CPernet · 2024-08-15T05:03:54Z

thx guys, what I'd prefer is that the home page list datasets separately not landing on any in particular but if i understand that is not how this work -- ie create a fake meta data of the repo that lists all the datasets as children is the way to do it

jsheunis · 2024-08-15T07:11:13Z

create a fake meta data of the repo that lists all the datasets as children is the way to do it

Exactly. Here are some examples:

CPernet mentioned this issue Aug 14, 2024

super.json Public-nEUro/DataCatalogue#5

Closed

CPernet closed this as completed Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

super.json documentation #485

super.json documentation #485

CPernet commented Aug 14, 2024 •

edited

Loading

mslw commented Aug 14, 2024

CPernet commented Aug 14, 2024

mslw commented Aug 14, 2024

CPernet commented Aug 14, 2024 •

edited

Loading

CPernet commented Aug 14, 2024

jsheunis commented Aug 14, 2024

mslw commented Aug 14, 2024

CPernet commented Aug 15, 2024 •

edited

Loading

jsheunis commented Aug 15, 2024

super.json documentation #485

super.json documentation #485

Comments

CPernet commented Aug 14, 2024 • edited Loading

mslw commented Aug 14, 2024

CPernet commented Aug 14, 2024

mslw commented Aug 14, 2024

CPernet commented Aug 14, 2024 • edited Loading

CPernet commented Aug 14, 2024

jsheunis commented Aug 14, 2024

mslw commented Aug 14, 2024

CPernet commented Aug 15, 2024 • edited Loading

jsheunis commented Aug 15, 2024

CPernet commented Aug 14, 2024 •

edited

Loading

CPernet commented Aug 14, 2024 •

edited

Loading

CPernet commented Aug 15, 2024 •

edited

Loading