Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

direct upload to s3 store using Dataverse directupload api #136

Open
jmjamison opened this issue Apr 20, 2021 · 10 comments
Open

direct upload to s3 store using Dataverse directupload api #136

jmjamison opened this issue Apr 20, 2021 · 10 comments
Assignees
Labels
pkg:api api related activities prio:high status:confirmed Is a valid issue and will be moved forward soon. type:bug Something isn't working

Comments

@jmjamison
Copy link

I have been working with the directupload api (https://guides.dataverse.org/en/5.4/developers/s3-direct-upload-api.html)
Its done in 2 passes. First puts the file into temp s3 storage, 2nd adds it to the dataset. As soon as I have a workable script I'll send it over.
I'm a bit confused about the post request. Documentation shows:
def post_request(self, url, data=None, auth=False, params=None, files=None):
"""Make a POST request.
But if I set auth=True (because I'm using an api key) I get an error of:
TypeError: 'bool' object is not callable

I checked my server log and found this:
#|2021-04-19T19:43:25.360+0000|SEVERE|Payara 5.2020.6|javax.enterprise.web.core|_ThreadID=66;_ThreadName=http-thread-pool::http-listener-1(3);_TimeMillis=1618861405360;_LevelValue=1000;_MessageID=AS-WEB-CORE-00037;|
An exception or error occurred in the container during the request processing
java.lang.Exception: Host is not set
at org.glassfish.grizzly.http.server.util.Mapper.map(Mapper.java:865)
at org.apache.catalina.connector.CoyoteAdapter.postParseRequest(CoyoteAdapter.java:496)
at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:309)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:238)
at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:520)
at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:217)
at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:182)
at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:156)
at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:218)
at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:95)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:260)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:177)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:109)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:88)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:53)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:524)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:89)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:94)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:33)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:114)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549)
at java.lang.Thread.run(Thread.java:748)
|#]

Jamie Jamison
UCLA Dataverse
jamison@library.ucla.edu

@jmjamison jmjamison added the status:incoming Newly created issue to be forwarded label Apr 20, 2021
@skasberger skasberger self-assigned this Apr 22, 2021
@skasberger skasberger added pkg:api api related activities prio:high status:confirmed Is a valid issue and will be moved forward soon. type:bug Something isn't working and removed status:incoming Newly created issue to be forwarded labels Apr 22, 2021
@skasberger skasberger added this to the v0.4.0 milestone Apr 22, 2021
@skasberger
Copy link
Member

@jmjamison Which pyDataverse and Dataverse versions are you working on? And can you also share the code executed for the POST request?

@jmjamison
Copy link
Author

Dataverse: 5.3 build 286-fcb5ce7
pyDataverse: 0.3.1

@jmjamison
Copy link
Author

import pyDataverse
from pyDataverse.api import NativeApi
api = NativeApi(dataverse_server, api_key) <- set earlier
import subprocess as sp
from requests import ConnectionError, Response, delete, get, post, put
resp = api.get_info_version()
resp.json()

{'status': 'OK', 'data': {'version': '5.3', 'build': '286-fcb5ce7'}}

resp = requests.put(url_persistent_id, data=None, params=None, auth=(), files=None)
resp.json()

{'status': 'ERROR',
'code': 405,
'message': 'API endpoint does not support this method. Consult our API guide at http://guides.dataverse.org.',
'requestUrl': 'https://dataverse.ucla.edu/api/v1/datasets/:persistentId/uploadurls?persistentId=doi:10.25346/S6/T4LHZF&size=10000000',
'requestMethod': 'PUT'}

Also tried:
url_persistent_id = '%s/api/datasets/:persistentId/uploadurls?persistentId=%s&size=%s' % (dataverse_server, persistentId, str(size))
r = requests.post(url_persistent_id,
headers={
"X-Dataverse-key": "$API_TOKEN"
},
cookies={},
auth=()
)

{'status': 'ERROR',
'code': 405,
'message': 'API endpoint does not support this method. Consult our API guide at http://guides.dataverse.org.',
'requestUrl': 'https://dataverse.ucla.edu/api/v1/datasets/:persistentId/uploadurls?persistentId=doi:10.25346/S6/T4LHZF&size=10000000',
'requestMethod': 'POST'}

@jmjamison
Copy link
Author

Is there anything else I should add?

@skasberger
Copy link
Member

@jmjamison Is this still an issue / problem? Am on parental leave until may 2022, so my time for pyDataverse is very, very limited.

@jmjamison
Copy link
Author

Apologies, I didn't realize you were on parental leave. The issue exists but I can use other methods for direct uploads. Enjoy the time with your youngster.

@skasberger
Copy link
Member

Update: I left AUSSDA, so my funding for pyDataverse development has stopped.

I want to get some basic funding to implement the most urgent updates (PRs, Bug fixes, maintenance work). If you can support this, please reach out to me. (www.stefankasberger.at). If you have feature requests, the same.

Another option would be, that someone else helps with the development and / or maintenance. For this, also get in touch with me (or comment here).

@qqmyers
Copy link
Member

qqmyers commented Oct 28, 2022

FWIW: There was some recent work on python support for direct upload in IQSS/dataverse.harvard.edu#194 - not multipart yet and not associated with pydataverse but possibly useful and possibly something to mine for pyDataverse.

@pdurbin
Copy link
Member

pdurbin commented Feb 14, 2024

As discussed during the 2024-02-14 meeting of the pyDataverse working group, we are closing old milestones in favor of a new project board at https://github.com/orgs/gdcc/projects/1 and removing issues (like this one) from those old milestones. Please feel free to join the working group! You can find us at https://py.gdcc.io and https://dataverse.zulipchat.com/#narrow/stream/377090-python

@pdurbin pdurbin removed this from the v0.4.0 milestone Feb 14, 2024
@pdurbin
Copy link
Member

pdurbin commented Feb 14, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg:api api related activities prio:high status:confirmed Is a valid issue and will be moved forward soon. type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants