Transformer Exporter for Dataverse

This exporter allows you to have up to 100 exporters using a single pre-built JAR file. You can add new exporters by adding directories into the exporters directory (see the Installation section below) and placing (and editing) the configuration (config.json) and the transformation (transformer.json, transformer.py or transformer.xsl, see also the examples below) files in it.

Supported Dataverse versions: 6.0 - recent.

Installation

If you haven’t already configured it, set the dataverse-spi-exporters-directory configuration value first. Then navigate to the configured directory and download the JAR file together with the examples you want to try out:

# download the jar
wget -O exporter-transformer-1.0.9-jar-with-dependencies.jar https://repo1.maven.org/maven2/io/gdcc/export/exporter-transformer/1.0.9/exporter-transformer-1.0.9-jar-with-dependencies.jar
# download the hello-world example
mkdir hello-world
wget -O hello-world/config.json https://raw.githubusercontent.com/gdcc/exporter-transformer/main/examples/hello-world/config.json
wget -O hello-world/transformer.json https://raw.githubusercontent.com/gdcc/exporter-transformer/main/examples/hello-world/transformer.json
# download the debug example
mkdir debug
wget -O debug/config.json https://raw.githubusercontent.com/gdcc/exporter-transformer/main/examples/debug/config.json
wget -O debug/transformer.json https://raw.githubusercontent.com/gdcc/exporter-transformer/main/examples/debug/transformer.json
# etc.

After restarting the Dataverse, you should be able to use the newly installed exporters (next to the internal exporters):

Each exporter will have at least these files after starting:

All of these files can be edited, if needed. Typically you will only need to edit the config.json and the transformer.json files. If you want to add more exporters, your own or from the provided examples, just add a new configuration directory in your exporters directory with at least the config.json and the transformer.json, transformer.py or transformer.xsl files there. After restarting the servers the newly added exporters should be ready to use.

Examples

The following examples are provided in the examples directory:

Hello World!

Very basic exporter providing always the same output: {"hello":"World!"}.

Debug

This exporter uses only the identity transformation on the provided source document. It lets you to see what fields are available for copying and transforming:

datasetJson: native Dataverse JSON export
datasetORE: ORE Dataverse export
datasetSchemaDotOrg: Schema.org JSON-LD export
datasetFileDetails: file details from the native Dataverse JSON export
preTransformed: JSON-pointer friendly version of the native Dataverse JSON export
config: the content of the config.json

Short example

This exporter copies only the title, the author names and the file download URL to the output.

Javascript transformer

The same exporter as the "Short example", but it uses JavaScript instead of copy transformations.

Croissant

This exporter is entirely based on the Croissant Exporter for Dataverse. It is simply a port of that exporter into JavaScript that is bundled into a ready to use transformer. It is also a great example to start from when writing your own exporters.

Basic RO-Crate

This exporter transforms the output from the Schema.org exporter into an RO-Crate compatible output.

Transformer generated with Python

This exporter is based on the Customizable RO-Crate Metadata Exporter for Dataverse. You can edit the provided CSV file and rerun the Python script to overwrite the default transformer.json:

python3 csv2transformer.py

After copying the resulting transformer.json, together with the provided config.jar, you will have a customized RO-Crate exporter (listed as "CSV RO-Crate" by default).

Debug XML

This exporter outputs the XML version of the native JSON format that can be transformed with XSLT.

Short example XML

This exporter copies only the title, the author names and the filenames of the dataset version, and outputs them in an XML document.

Debug written in Python

This exporter is identical to the Debug example in its output (the only difference is that it is written in py) and it lets you to see what fields are available for copying and transforming:

datasetJson: native Dataverse JSON export
datasetORE: ORE Dataverse export
datasetSchemaDotOrg: Schema.org JSON-LD export
datasetFileDetails: file details from the native Dataverse JSON export
preTransformed: JSON-pointer friendly version of the native Dataverse JSON export
config: the content of the config.json

Short example written in Python

This exporter copies only the title, the author names and the filenames of the dataset version, and outputs them in a JSON document. It is written in Python.

HTML example written in Python

This exporter takes JSON input from a prerequisite exporter (short_example_py by default), and displays it as HTML. It is written in Python. In order to change the JSON input for this exporter, change the prerequisiteFormatName value in the config.json to the format name of the exporter you wish to use as input.

DDI PDF codebook

This exporter is entirely based on the DDI PDF Exporter. It is simply a port of that exporter into Python (Jython). It illustrates how to convert XML input to PDF in an exporter.

ARP RO-Crate

This exporter is entirely based on the Dataverse PR 10086. It is simply a port of that exporter into Python (Jython).

Developer guide

The easiest way to start is to write JavasCript code. You can use the provided Croissant code as the start point. You will need to restart the server after changing that code. Note that the exporters use caching, you will need to either to wait until the cache is expired or delete the cached exporter output manually to see the changes.

The JavaScript supported by the transformer exporter is as provided by the Project Nashorn, you can only use the syntax provided by that project. Additional limitation is that the multiple line statements are not supported. This could be circumvented by using a minimizer, or simply by using only single line statements (empty lines, comments, etc. are fine to include in the JavaScript files). Finally, you can access these Java classes from your scripts:

Map: java.util.LinkedHashMap
Set: java.util.LinkedHashSet
List: java.util.ArrayList
Collectors: java.util.stream.Collectors
JsonValue: jakarta.json.JsonValue

You can also try writing the transformations using the transformation language as described here. It is a preferred way for writing straight-forward exporters, for example, when you only need to add one or more fields to an already existing exporter format. In that case, you could use the identity transformation followed by simple copy transformations. You can also start from an already existing example and add new copy, remove, etc., transformations at the end of the transformer.json file.

You can also write XML transformations in a similar way, but using the XSLT instead of JSON-transformations, as illustrated in the provided XML examples.

Finally, you can also write your transformers as Python code. You can start from the provided example that can also be run as test:

mvn test -Dtest="TransformerExporterTest#testPythonScript"

You can start by changing the code in the transformer.py, shown below, and testing your code until the desired outcome is achieved (see also py-input.json and py-result.json). When you are done, just place the new transformer.py together with a config.json files in a new folder in the exporters directory (make sure that the transformer-exporter JAR file is also placed in the exporters directory). After restarting the server, your new exporter should be ready to use.

res["title"] = x["preTransformed"]["datasetVersion"]["metadataBlocks"]["citation"]["title"]

res["author"] = []
for author in x["preTransformed"]["datasetVersion"]["metadataBlocks"]["citation"]["author"]:
    res["author"].append(author["authorName"])

res["files"] = []
for distribution in x["datasetSchemaDotOrg"]["distribution"]:
    res["files"].append(distribution["contentUrl"])

Note that you can also use Java classes from your Python code, as explained on the Jython website (the library used by this exporter for the Python language interpretation), e.g.:

from java.lang import System # Java import

print('Running on Java version: ' + System.getProperty('java.version'))
print('Unix time from Java: ' + str(System.currentTimeMillis()))

See also the documentation from Jython and the DDI-PDF example for how it is used in practice.

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
.github/workflows		.github/workflows
examples		examples
src		src
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer Exporter for Dataverse

Installation

Examples

Hello World!

Debug

Short example

Javascript transformer

Croissant

Basic RO-Crate

Transformer generated with Python

Debug XML

Short example XML

Debug written in Python

Short example written in Python

HTML example written in Python

DDI PDF codebook

ARP RO-Crate

Developer guide

About

Releases 6

Packages

Contributors 2

Languages

License

gdcc/exporter-transformer

Folders and files

Latest commit

History

Repository files navigation

Transformer Exporter for Dataverse

Installation

Examples

Hello World!

Debug

Short example

Javascript transformer

Croissant

Basic RO-Crate

Transformer generated with Python

Debug XML

Short example XML

Debug written in Python

Short example written in Python

HTML example written in Python

DDI PDF codebook

ARP RO-Crate

Developer guide

About

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 2

Languages

Packages