CKAN harvester for FAIR Data Point. Contains a harvester for FAIR data points. In the future, the FAIR data point API might be supported by this extension too.
The harvester runs in three stages. Each of these stages can be modified.
- Gather stage. The gather stage uses the FairDataPointRecordProvider which implements the IRecordProvider interface to create a list of identifiers of the objects which should be included in the harvest. In case of a FAIR data point, this list includes catalogs and datasets. In the future, collections could be added;
- Fetch stage. The fetch stage downloads the actual source data. In this phase, additional data from other sources may be included to better suit the DCAT profile as expected by CKAN;
- Import stage. The import stage does the actual import. How the RDF from the FAIR data point is mapped to CKAN packages and resources is determined by so-called application profiles. In case of a FAIR data point which uses custom fields, a profile must be created. A profile can be defined as a Python class in the ckanext.fairdatapoint.profiles.py file. The new profile must be registered in the [ckan.rdf.profiles] section of setup.py. What profile is being used for a particular is determined by the harvester configuration.
{ "profiles": "fairdatapoint_dcat_ap" }
To run the harvester from the command line:
ckan --config=<full path to CKAN ini-file> harvester run-test <id of harvester>
To rebuiod the index in case it is not automatically update after clearing all packages from a harvester:
ckan --config=<full path to CKAN ini-file> search-index rebuild
For more information got to GDI harvester information
Compatibility with core CKAN versions:
CKAN version | Compatible? |
---|---|
2.10 | tested |
TODO: Add any additional install steps to the list below. For example installing any non-Python dependencies or adding any required config settings.
To install gdi-userportal-ckanext-fairdatapoint:
-
Activate your CKAN virtual environment, for example:
. /usr/lib/ckan/default/bin/activate
-
Clone the source and install it on the virtualenv
git clone https://github.com/GenomicDataInfrastructure/gdi-userportal-ckanext-fairdatapoint.git cd gdi-userportal-ckanext-fairdatapoint pip install -e . pip install -r requirements.txt
-
Add
fairdatapoint
to theckan.plugins
setting in your CKAN config file (by default the config file is located at/etc/ckan/default/ckan.ini
). -
Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu:
sudo service apache2 reload
There is a setting ckanext.fairdatapoint.harvest_catalogs
. Default is false
. If set to true
,
CKAN will harvest catalogs as datasets.
The setting can be overriden in the harvester profile, by setting "harvest_catalogs": "true"
or
"harvest_catalogs": "false"
in the harvester configuration JSON.
The harvester supports the resolving of labels for fields defined as a (resolvable) URI. Examples of
this include Wikidata entities. There is a setting ckanext.fairdatapoint.resolve_labels
. Default
is true
, but you can disable it globally by explicitly setting it to false
.
The setting can be overriden in the harvester profile, by setting "resolve_labels": "true"
or
"resolve_labels": "false"
in the harvester configuration JSON.
To install ckanext-fairdatapoint for development, activate your CKAN virtualenv and do:
git clone https://github.com/GenomicDataInfrastructure/gdi-userportal-ckanext-fairdatapoint.git
cd gdi-userportal-ckanext-fairdatapoint
python setup.py develop
pip install -r dev-requirements.txt
Fairdatapoint plugin depends on ckanext-scheming
, ckanext-harvester
and ckanext-dcat
. Make sure these are installed,
otherwise run:
pip install -e 'git+https://github.com/ckan/ckanext-scheming.git@release-3.0.0#egg=ckanext-scheming[requirements]'
pip install -e 'git+https://github.com/ckan/ckanext-harvest.git@v1.6.0#egg=ckanext-harvest[requirements]'
pip install -e 'git+https://github.com/ckan/ckanext-dcat.git@v2.1.0#egg=ckanext-dcat'
pip install -r https://raw.githubusercontent.com/ckan/ckanext-dcat/v2.1.0/requirements.txt
To run the tests go to GDI harvester test information
If ckanext-fairdatapoint should be available on PyPI you can follow these steps to publish a new version:
-
Update the version number in the
setup.py
file. See PEP 440 for how to choose version numbers. -
Make sure you have the latest version of necessary packages:
pip install --upgrade setuptools wheel twine
-
Create a source and binary distributions of the new version:
python setup.py sdist bdist_wheel && twine check dist/*
Fix any errors you get.
-
Upload the source distribution to PyPI:
twine upload dist/*
-
Commit any outstanding changes:
git commit -a git push
-
Tag the new release of the project on GitHub with the version number from the
setup.py
file. For example if the version number insetup.py
is 0.0.1 then do:git tag 0.0.1 git push --tags
This work is licensed under multiple licenses. Because keeping this section up-to-date is challenging, here is a brief summary as of January 2024: