Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rserve integration #120

Open
poikilotherm opened this issue Nov 4, 2019 · 15 comments
Open

Rserve integration #120

poikilotherm opened this issue Nov 4, 2019 · 15 comments
Labels
integration Everything regarding a Dataverse integration

Comments

@poikilotherm
Copy link
Member

poikilotherm commented Nov 4, 2019

Some ingest functionality does not work without an Rserve server.

Looks like https://github.com/ubc/r-docker is a trustworthy image, coming from University of British Columbia.

Maybe open an issue over there asking what their plans are on supporting and pushing updated images to Docker Hub: https://hub.docker.com/r/ubcctlt/rserve

@poikilotherm poikilotherm added the integration Everything regarding a Dataverse integration label Nov 4, 2019
@4tikhonov
Copy link
Collaborator

We've integrated Rserve in Dataverse Docker module, I don't know if you want to host a separated Docker images for that:
IQSS/dataverse-docker@973cc9c

@poikilotherm
Copy link
Member Author

IMHO this should be kept apart. I do believe in the UNIX philosophy "do one thing, do it well". This gives more flexibility for people that might want to run their own services, use special flavors, install certain amount of packages, ...

@4tikhonov
Copy link
Collaborator

Ok, you should contact people from Rserve then.

@pdurbin
Copy link
Member

pdurbin commented Nov 4, 2019

If it helps, I've been happily using Rserve on Dataverse spun up by dataverse-ansible since @donsizemore implemented it over the summer: IQSS/dataverse-ansible#87

Data Explorer didn't work properly without it. It takes time to compile all the R modules so I sometimes comment it out if I don't need the functionality.

@poikilotherm
Copy link
Member Author

@donsizemore
Copy link
Member

It takes time to compile all the R modules so I sometimes comment it out if I don't need the functionality.

@pdurbin you may also set rserve.install to false =) the role will still place rserve.host et al. in domain.xml to talk to an external R service.

@4tikhonov
Copy link
Collaborator

@donsizemore, in the same time it's not really sustainable if Dataverse is relying on an external R service that should do data processing.

@pdurbin
Copy link
Member

pdurbin commented Nov 4, 2019

On a related note, we've considered splitting the "ingest" service out of the Dataverse monolith and into its own microservice: IQSS/dataverse#2331

Not all installations of Dataverse want ingest (I'm thinking of Pete's structural biology datasets) but I suspect most do. 😄

@donsizemore
Copy link
Member

@4tikhonov note that Akio's TRSA branch https://github.com/OdumInstitute/trsa-web/tree/jee8line carves ingest out of Dataverse proper and at present makes it optional to the end user. what would you prefer Dataverse use in addition to or instead of R?

@poikilotherm
Copy link
Member Author

poikilotherm commented Nov 4, 2019

I'd really love to discuss this matter in more depth, but I'm pretty sure this is beyond the scope of this issue.

Maybe some of you guys can open an issue at IQSS/dataverse, so it reaches even more people interested in ingest?

@raprasad
Copy link

raprasad commented Nov 6, 2019

@pdurbin : Regarding the R script that runs on Rserve and produces metadata summaries:

  • We now have an updated version that is a Python library, which removes the R dependency.
  • @aaron-lebo who works with @vjdorazio has done a lot of work with it--including analyzing all of the tabular files in the Journal of Politics Dataverse.
  • It is available as a pypi package.
  • Documentation on the JSON output is here: https://tworavens.github.io/TwoRavens/Metadata/
  • We're happy to provide more info on it and invite input on adding useful documentation
    • We wrapped it in a web service a while ago (Django/celery), but for Dataverse purposes, this could be greatly simplified--a basic endpoint with Flask or something in a Docker container

  • Regarding using the output of data as a drop-in replacement for the current Dataverse R script--the JSON has additional data and a slightly different structure--if there's interest, we can include an output flag/function, etc. that outputs the older version.

cc/ @tercer

@4tikhonov
Copy link
Collaborator

4tikhonov commented Nov 6, 2019

@raprasad, I really like this solution as python microservice. Not because we're "at home" with python but because it can be more sustainable in the long term perspective.

@donsizemore
Copy link
Member

@raprasad wonderful news! Go @aaron-lebo go!

@pdurbin
Copy link
Member

pdurbin commented Nov 6, 2019

a slightly different structure

@raprasad is the JSON emitted from your new Python code backward compatible with the JSON emitted from the old/current R code? If not, would it be possible to make it backward compatible? I don't want Data Explorer (my main reason for wanting this JSON) to break if we switch to backward-incompatible JSON produced by new code.

Now that we (finally) have API tests running automatically on "develop" and pull requests (on https://jenkins.dataverse.org thanks to the absolutely heroic efforts of @donsizemore !!! 🎉 🎉 🎉 ), we could start to make assertions on the old/current JSON format so that any backward incompatibilities would be detected. Writing those assertions might be a good first small chunk. If someone wants to create an issue about this at https://github.com/IQSS/dataverse/issues please go ahead! 😄

@raprasad
Copy link

raprasad commented Nov 8, 2019

@pdurbin We will add the backward compatibility to the library. Pleae add comments that may be relevant: TwoRavens/raven-metadata-service#205

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration Everything regarding a Dataverse integration
Projects
None yet
Development

No branches or pull requests

5 participants