Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Transformer exporter example #14

Merged
merged 41 commits into from
Aug 23, 2024
Merged

Conversation

ErykKul
Copy link
Contributor

@ErykKul ErykKul commented May 14, 2024

I am not sure if this one is very useful. It might be harder to use the transformer than just to write java code. It looks to me that it might be useful in the case where only some minor changes would be needed for an exporter, such that you could edit the transformer.json and make it work for your installation, without rebuilding the jar.

Anyway, I did not find any JSON transformers that could do what was needed here, I wanted to get to know the jakarta.json specification, so I gave it a try and made a JSON transformer that can use expressions (JavaScript scripts, to be more precise) and can handle the transformations of complex JSON structures while using JSON Pointers.

I think that it turned out to be very interesting, maybe worth taking a look at it.

@poikilotherm
Copy link
Member

@ErykKul please note that with #16 we will provide a common Parent POM for exporter plugins to simplify setup and releasing. Happy to receive feedback!

As the code you suggest is rather large, it probably isn't an easy example. But I do see value in having such an exporter around. I'm happy to create a repo for you, like gdcc/exporter-json-transform or similar.

@ErykKul
Copy link
Contributor Author

ErykKul commented Jun 20, 2024

@poikilotherm New repo would be great! I started experimenting with it on my own fork: https://github.com/ErykKul/dataverse-transformer-exporter, it would be nice if I could move it to gdcc. I also created a shorter example: https://github.com/ErykKul/exporter-ro-crate, I think I will just use that one on the new repo. I will continue the work on it next week: improve the documentation, use the new example, see if I can support multiple configs for one jar and use it as multiple exporters (I have a plan with a minimal impact for it), etc. I will give a demo of it on the 2nd July in the tech-hours.

@poikilotherm
Copy link
Member

poikilotherm commented Jun 21, 2024

@ErykKul did you see #15 ? Maybe it would be good to coordinate your efforts for an RO-Crate exporter with theirs @okaradeniz @beepsoft?

Speaking of configuration: I have a local branch which is not put out there yet to move the functionality of configuration retrieval into the SPI package. (poikilotherm/dataverse@1e6e1ec) Would this be helpful for your exporter?

@ErykKul
Copy link
Contributor Author

ErykKul commented Jun 21, 2024

Certainly, all cooperation is very welcome! The exporter with CSV from @okaradeniz has slightly different start requirements: it is RO-Crate specific exporter with the goal of making the writing of the transformation as easy as possible for the users. Making it specific to RO-Crate allows simplifications of the needed definitions. I guess that simply adding an option of using CSV as input for the transformer-exporter would not cover that. Nevertheless, it would be an interesting exercise to see how much readability and ease of adjusting of the transformations is lost when the same exporter would be written as a transformer-exporter.

With the exporter from @beepsoft it might be easier, as I think that it is less important there how the exporter is implemented or defined? The transformer-exporter should be generic enough to do any transformations, I assume it should work there too. You can also have "smart" transformations that cope with different structures and fields in the source document. However, depending on the complexity of the transformations, it might get messy. It would be also nice to see that one as an example as well.

Having multiple examples, even if they are all some variants of the RO-Crate, would help me a lot with the validation of the transformer-exporter approach. I was aiming at having several unit tests, so if you can provide me with some examples of expected transformations from a dataset, eg., from beta.dataverse.org (it has a very handy debug exporter that lets you see the result of all of the internal exporters at once, this is also what I use as the input document for the transformer-exporter), I can try writing the transformers for them. It would be even better if someone was willing to give it a shot and write some transformations themselves, this could result in a very welcome feedback that I am desperate to get! Nevertheless, I do not mind writing first versions myself.

I also do not mind on working on other exporters as well. Transformer-exporter is a not yet validated idea at this point, it is still in the experimental phase, and I would not mind if it does not make it to the "mainstream".

As for the MP config, I guess it depends on how having one jar for different export types would work in practice. At this point I am not sure yet, I was looking first in a direction where there would be no impact on the current exporter interface, but that might fail. In that case, assuming we want to explore this transformer-exporters idea further, we could brainstorm on that.

@poikilotherm
Copy link
Member

Heads up that there is a milestone release of the parent POM now, so you can actually try stuff.

The example shows how to use the parent POM for now, see https://github.com/gdcc/dataverse-exporters/blob/main/example/pom.xml

Note the emptiness of that POM - in case you don't need much stuff, you might be just fine with this. It already take care about the most basic dependencies as well as build config.

@ErykKul
Copy link
Contributor Author

ErykKul commented Jun 25, 2024

I have changed the type of this PR: it is now about the inclusion in the matrix, as this repo's purpose had also changed. I have also fixed the problem of having only one Transformer Exporter, now you can have up to 100 of these with only one pre-built JAR file. I also ported the Croissant exporter to JavaScript (bundled in a transformer), to try it out copy the following files to your exporters dir (see also the README file):

If you want to add more exporters, your own or from the provided examples, just add a new configuration directory in your exporters directory with at least the config.json and transformer.json files there, for example (hello-world):

After restarting the server they are all ready to use. How cool is that! Note the emptiness of the entire Hello World! example 😉 (you only need these two files to run it, assuming you already have the jar file in your exporters dir)

@ErykKul
Copy link
Contributor Author

ErykKul commented Jun 25, 2024

I decided to add the XSLT support. I will look at it later this week. It looks very trivial to add, so it should not take long to add.

@ErykKul
Copy link
Contributor Author

ErykKul commented Jun 26, 2024

XSLT support added.

@ErykKul
Copy link
Contributor Author

ErykKul commented Jul 9, 2024

I have extensively tested this exporter and I have added many examples: HTML, DDI-PDF, CSV-RO-Crate, ARP-RO-Crate, etc. I think it is ready to be released at gdcc namespace and this mention-PR can be merged. @poikilotherm , what do you think? Can you add the sonatype-maven-central credentials to the exporter-transformer repo? Thanks!

@pdurbin
Copy link
Member

pdurbin commented Jul 9, 2024

Can you add the sonatype-maven-central credentials to the exporter-transformer repo?

@ErykKul they are already defined at the org level so you should be able to use them since this repo is under @gdcc.

@ErykKul
Copy link
Contributor Author

ErykKul commented Jul 9, 2024

I will try to get it working. I see that a test fails in the publish job, I will fix that first.

@ErykKul
Copy link
Contributor Author

ErykKul commented Jul 9, 2024

It did not work. I did fix the test, but sending to maven central fails (signing too). I think that my repo needs to be granted access to the organization secrets first:
image

@qqmyers
Copy link
Member

qqmyers commented Jul 9, 2024

done - your repo should be able to use the secrets now

@pdurbin
Copy link
Member

pdurbin commented Jul 9, 2024

Oh, right, sorry about that. Thanks for giving access, Jim.

@ErykKul
Copy link
Contributor Author

ErykKul commented Jul 9, 2024

Thanks! It worked, I made a release and the jar is on maven central.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, I'm working on this issue...

... and I just did some minimal testing of the Transformer exporter and it seem to be working great. I'll go ahead and merge this.

@pdurbin pdurbin merged commit 1280828 into gdcc:main Aug 23, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants