Skip to content

Releases: J535D165/datahugger

v0.12

27 Jun 12:15
a203837
Compare
Choose a tag to compare

What's Changed

  • Fix issue with new Dryad REST API format for downloading files #75 by @micafer in #76
  • Remove Pandas dep and fix argument checksum on CLI by @J535D165 in #79
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #74
  • Add support for providers in OSF: #69 by @micafer in #77
  • Fix unused, overwritten line of code by @J535D165 in #82
  • Change GitHub Actions to run workflow job per service by @J535D165 in #83

New Contributors

Full Changelog: v0.11...v0.12

Coverage report

The following benchmark was applied to 1000 randomly selected records from Datacite.

Percentages

Percentage of datasets supported: 26.7%
Percentage of datasets not supported: 69.1%
Percentage of datasets with error: 4.2%

Table with unexpected errors

id type url service error
47 10.58100/ibcr0302rx67ws2 dois http://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=21038&SAM=IBCR0302RX67WS2 nan 503 Server Error: Service Unavailable for url: https://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=21038&SAM=IBCR0302RX67WS2
52 10.18730/v7c2= dois https://glis.fao.org/glis/doi/10.18730/V7C2= nan '10.18730/v7c2=' is not a correct resource identifier (e.g. a URL, DOI, Handle)
73 10.20345/digitue.1029.61 dois http://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141 nan 500 Server Error: Internal Server Error for url: https://opendigi.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141
90 10.6068/dp15afea413954 dois http://statisticaldatasets.data-planet.com/dataplanet/Datasheet_DOI_Servlet?ID=15afea413954&type=gwtdatasheet&version=1 nan HTTPConnectionPool(host='statisticaldatasets.data-planet.com', port=80): Max retries exceeded with url: /dataplanet/Datasheet_DOI_Servlet?ID=15afea413954&type=gwtdatasheet&version=1 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f369f730fb0>, 'Connection to statisticaldatasets.data-planet.com timed out. (connect timeout=3)'))
96 10.17876/plate/dr.2/plates/201_33742 dois https://www.plate-archive.org/objects/dr.2/plates/201_33742 nan 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/plates/201_33742/
119 10.18430/m3.irrmc.4168 dois https://proteindiffraction.org/project/SETDB1-x122 nan 'NoneType' object has no attribute 'find'
129 10.58100/ibcr0310rxocku2 dois http://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=22618&SAM=IBCR0310RXOCKU2 nan 503 Server Error: Service Unavailable for url: https://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=22618&SAM=IBCR0310RXOCKU2
133 10.14469/ch/8676 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/to-8701 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/to-8701 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f3693833500>, 'Connection to spectradspace.lib.imperial.ac.uk timed out. (connect timeout=3)'))
136 10.17614/q4h70857g dois http://pqr.pitt.edu/mol/KFKSYDSVYUWMHK-UHFFFAOYSA-N nan HTTPConnectionPool(host='pqr.pitt.edu', port=80): Max retries exceeded with url: /mol/KFKSYDSVYUWMHK-UHFFFAOYSA-N (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f369fbb6b40>, 'Connection to pqr.pitt.edu timed out. (connect timeout=3)'))
156 10.6068/dp14ba7fada6a81 dois http://statisticaldatasets.data-planet.com/dataplanet/Datasheet_DOI_Servlet?ID=14ba7fada6a81&type=datasheet&version=1 nan HTTPConnectionPool(host='statisticaldatasets.data-planet.com', port=80): Max retries exceeded with url: /dataplanet/Datasheet_DOI_Servlet?ID=14ba7fada6a81&type=datasheet&version=1 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f36938bb440>, 'Connection to statisticaldatasets.data-planet.com timed out. (connect timeout=3)'))
241 10.4233/uuid:51dde3f6-2a38-47a0-b719-420ff74ded5d dois http://resolver.tudelft.nl/uuid:51dde3f6-2a38-47a0-b719-420ff74ded5d nan HTTPSConnectionPool(host='repository.tudelft.nl', port=443): Read timed out. (read timeout=10)
256 10.17171/1-8-2854 dois http://repository.edition-topoi.org/collection/ICG/object/3675 nan HTTPConnectionPool(host='repository.edition-topoi.org', port=80): Max retries exceeded with url: /collection/ICG/object/3675 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f36938b9e80>, 'Connection to repository.edition-topoi.org timed out. (connect timeout=3)'))
259 10.6068/dp15e784c851034 dois http://statisticaldatasets.data-planet.com/dataplanet/Datasheet_DOI_Servlet?ID=15e784c851034&type=datasheet&version=1 nan HTTPConnectionPool(host='statisticaldatasets.data-planet.com', port=80): Max retries exceeded with url: /dataplanet/Datasheet_DOI_Servlet?ID=15e784c851034&type=datasheet&version=1 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f369fcc8050>, 'Connection to statisticaldatasets.data-planet.com timed out. (connect timeout=3)'))
267 10.17614/q4td9p06n dois http://pqr.pitt.edu/mol/HJQMFSDWWCLFTC-TWJUVVLDSA-N nan HTTPConnectionPool(host='pqr.pitt.edu', port=80): Ma...
Read more

v0.11

26 Mar 22:15
4935c64
Compare
Choose a tag to compare

What's Changed

  • implement checksum checking as feature of datahugger by @davetromp in #72

Full Changelog: v0.10.4...v0.11

Coverage report

The following benchmark was applied to 1000 randomly selected records from Datacite.

Percentages

Percentage of datasets supported: 26.9%
Percentage of datasets not supported: 69.4%
Percentage of datasets with error: 3.7%

Table with unexpected errors

id type url service error
45 10.17188/1264410 dois http://www.osti.gov/servlets/purl/1264410/ nan HTTPSConnectionPool(host='www.osti.gov', port=443): Max retries exceeded with url: /servlets/purl/1264410/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7faf41f9cad0>, 'Connection to www.osti.gov timed out. (connect timeout=3)'))
47 10.58100/ibcr0302rx67ws2 dois http://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=21038&SAM=IBCR0302RX67WS2 nan 503 Server Error: Service Unavailable for url: https://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=21038&SAM=IBCR0302RX67WS2
52 10.18730/v7c2= dois https://glis.fao.org/glis/doi/10.18730/V7C2= nan '10.18730/v7c2=' is not a correct resource identifier (e.g. a URL, DOI, Handle)
60 10.3929/ethz-a-010147993 dois http://hdl.handle.net/20.500.11850/83547 nan 429 Client Error: Too Many Requests for url: https://www.research-collection.ethz.ch/handle/20.500.11850/83547
73 10.20345/digitue.1029.61 dois http://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141 nan 500 Server Error: Internal Server Error for url: https://opendigi.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141
96 10.17876/plate/dr.2/plates/201_33742 dois https://www.plate-archive.org/objects/dr.2/plates/201_33742 nan 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/plates/201_33742/
119 10.18430/m3.irrmc.4168 dois https://proteindiffraction.org/project/SETDB1-x122 nan 'NoneType' object has no attribute 'find'
129 10.58100/ibcr0310rxocku2 dois http://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=22618&SAM=IBCR0310RXOCKU2 nan 503 Server Error: Service Unavailable for url: https://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=22618&SAM=IBCR0310RXOCKU2
133 10.14469/ch/8676 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/to-8701 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Read timed out. (read timeout=3)
252 10.15781/t2xs4w dois https://repositories.lib.utexas.edu/handle/2152/31647 nan HTTPSConnectionPool(host='repositories.lib.utexas.edu', port=443): Max retries exceeded with url: /handle/2152/31647 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7faf41d4a3c0>, 'Connection to repositories.lib.utexas.edu timed out. (connect timeout=3)'))
362 10.14469/ch/1303 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/to-1328 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Read timed out. (read timeout=3)
367 10.18720/spbpu/2/v18-6126 dois http://elib.spbstu.ru/dl/2/v18-6126.pdf nan HTTPSConnectionPool(host='elib.spbstu.ru', port=443): Read timed out. (read timeout=10)
383 10.14456/scitechasia.2022.12 dois http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14456/scitechasia.2022.12 nan HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Max retries exceeded with url: /?page=resolve_doi&resolve_doi=10.14456/scitechasia.2022.12 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
397 10.17876/plate/dr.2/envelopes/201_50873 dois https://www.plate-archive.org/objects/dr.2/envelopes/201_50873 nan 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/envelopes/201_50873/
400 10.23725/akhp-6959 dois https://ors.datacite.org/doi:/10.23725/akhp-6959 nan HTTPSConnectionPool(host='ors.datacite.org', port=443): Max retries exceeded with url: /doi:/10.23725/akhp-6959 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7faf4e527d10>: Failed to resolve 'ors.datacite.org' ([Errno -2] Name or service not known)"))
403 10.58100/ibcr0381exz5001 dois http://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=26882&SAM=IBCR0381EXZ5001 nan 503 Server Error: Service Unavailable for url: https://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=26882&SAM=IBCR0381EXZ5001
434 10.58100/ibcr0364exxoa01 dois http://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=26567&SAM=IBCR0364EXXOA01 nan 503 Server Error: Service Unavailable for url: https://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=26567&SAM=IBCR0364EXXOA01
452 10.14469/ch/129258 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/134211 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Read timed out. (read timeout=3) ...
Read more

v0.10.4

30 Oct 09:36
b38ee94
Compare
Choose a tag to compare

What's Changed

  • Fix Zenodo service after Zenodo update by @J535D165 in #70

Full Changelog: v0.10.3...v0.10.4

Coverage report

The following benchmark was applied to 1000 randomly selected records from Datacite.

Percentages

Percentage of datasets supported: 27.5%
Percentage of datasets not supported: 68.9%
Percentage of datasets with error: 3.6%

Table with unexpected errors

id type url service error
9 10.48448/kgfs-s492 dois https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus nan 500 Server Error: Internal Server Error for url: https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus
45 10.17188/1264410 dois http://www.osti.gov/servlets/purl/1264410/ nan HTTPSConnectionPool(host='www.osti.gov', port=443): Max retries exceeded with url: /servlets/purl/1264410/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f2d1f08a720>, 'Connection to www.osti.gov timed out. (connect timeout=3)'))
52 10.18730/v7c2= dois https://glis.fao.org/glis/doi/10.18730/V7C2= nan '10.18730/v7c2=' is not a correct resource identifier (e.g. a URL, DOI, Handle)
60 10.3929/ethz-a-010147993 dois http://hdl.handle.net/20.500.11850/83547 nan 429 Client Error: Too Many Requests for url: https://www.research-collection.ethz.ch/handle/20.500.11850/83547
73 10.20345/digitue.1029.61 dois http://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141 nan 500 Server Error: Internal Server Error for url: https://opendigi.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141
81 10.7916/d8-qcx3-yp94 dois https://dlc.library.columbia.edu/resolve/10.7916/d8-qcx3-yp94 nan 500 Server Error: Internal Server Error for url: https://dlc.library.columbia.edu/catalog/10.7916/d8-qcx3-yp94
96 10.17876/plate/dr.2/plates/201_33742 dois https://www.plate-archive.org/objects/dr.2/plates/201_33742 nan 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/plates/201_33742/
119 10.18430/m3.irrmc.4168 dois https://proteindiffraction.org/project/SETDB1-x122 nan 'NoneType' object has no attribute 'find'
133 10.14469/ch/8676 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/to-8701 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/to-8701 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1000)')))
329 10.57907/mirri/c1lede dois http://147.156.5.176:8080/citation?persistentId=doi:10.57907/MIRRI/C1LEDE nan HTTPConnectionPool(host='147.156.5.176', port=8080): Max retries exceeded with url: /citation?persistentId=doi:10.57907/MIRRI/C1LEDE (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2d1f1da090>, 'Connection to 147.156.5.176 timed out. (connect timeout=3)'))
362 10.14469/ch/1303 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/to-1328 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/to-1328 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1000)')))
383 10.14456/scitechasia.2022.12 dois http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14456/scitechasia.2022.12 nan HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Read timed out. (read timeout=10)
397 10.17876/plate/dr.2/envelopes/201_50873 dois https://www.plate-archive.org/objects/dr.2/envelopes/201_50873 nan 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/envelopes/201_50873/
400 10.23725/akhp-6959 dois https://ors.datacite.org/doi:/10.23725/akhp-6959 nan HTTPSConnectionPool(host='ors.datacite.org', port=443): Max retries exceeded with url: /doi:/10.23725/akhp-6959 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f2d1e51bd10>: Failed to resolve 'ors.datacite.org' ([Errno -2] Name or service not known)"))
452 10.14469/ch/129258 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/134211 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/134211 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1000)')))
458 10.14469/ch/41814 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/48213 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/48213 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1000)')))
464 10.3929/ethz-b-000581366 dois http://hdl.handle.net/20.500.11850/581366 nan 429 Client Error: Too Many Requests for url: https://www.research-collection.ethz.ch/handle/20.500.11850/581366
483 10.18730/12n7m$ dois https://glis.fao.org/glis/doi/10.18730/12N7M$ nan '10.18730/12n7m$' is not a correct resource identifier (e.g. a URL, DOI, Handle)
496 10.14457/cmu.the.2009.132 dois http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14457/CMU.the.2009.132 nan HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Read timed out. (read timeout=10) ...
Read more

v0.10.3

22 Sep 14:07
fe3908b
Compare
Choose a tag to compare

What's Changed

  • Update the benchmark dataset and run a larger benchmark.

Coverage report

The following benchmark was applied to 1000 randomly selected records from Datacite.

Percentages

Percentage of datasets supported: 27.4%
Percentage of datasets not supported: 69.9%
Percentage of datasets with error: 2.7%

Table with unexpected errors

id type url service error
9 10.48448/kgfs-s492 dois https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus nan 500 Server Error: Internal Server Error for url: https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus
52 10.18730/v7c2= dois https://glis.fao.org/glis/doi/10.18730/V7C2= nan '10.18730/v7c2=' is not a correct resource identifier (e.g. a URL, DOI, Handle)
73 10.20345/digitue.1029.61 dois http://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141 nan 500 Server Error: Internal Server Error for url: https://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141
81 10.7916/d8-qcx3-yp94 dois https://dlc.library.columbia.edu/resolve/10.7916/d8-qcx3-yp94 nan 500 Server Error: Internal Server Error for url: https://dlc.library.columbia.edu/catalog/10.7916/d8-qcx3-yp94
96 10.17876/plate/dr.2/plates/201_33742 dois https://www.plate-archive.org/objects/dr.2/plates/201_33742 nan 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/plates/201_33742/
119 10.18430/m3.irrmc.4168 dois https://proteindiffraction.org/project/SETDB1-x122 nan 'NoneType' object has no attribute 'find'
133 10.14469/ch/8676 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/to-8701 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/to-8701 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
201 10.17188/1652700 dois https://www.osti.gov/servlets/purl/1652700/ nan HTTPSConnectionPool(host='www.osti.gov', port=443): Max retries exceeded with url: /servlets/purl/1652700/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f14556469d0>, 'Connection to www.osti.gov timed out. (connect timeout=3)'))
362 10.14469/ch/1303 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/to-1328 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/to-1328 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
373 10.25716/hfmdk-505 dois https://hfmdk.hebis.de/jspui/handle/123456789/507 nan HTTPSConnectionPool(host='hfmdk.hebis.de', port=443): Read timed out. (read timeout=10)
397 10.17876/plate/dr.2/envelopes/201_50873 dois https://www.plate-archive.org/objects/dr.2/envelopes/201_50873 nan 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/envelopes/201_50873/
400 10.23725/akhp-6959 dois https://ors.datacite.org/doi:/10.23725/akhp-6959 nan HTTPSConnectionPool(host='ors.datacite.org', port=443): Max retries exceeded with url: /doi:/10.23725/akhp-6959 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f14556577d0>: Failed to resolve 'ors.datacite.org' ([Errno -2] Name or service not known)"))
452 10.14469/ch/129258 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/134211 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/134211 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
458 10.14469/ch/41814 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/48213 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/48213 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
483 10.18730/12n7m$ dois https://glis.fao.org/glis/doi/10.18730/12N7M$ nan '10.18730/12n7m$' is not a correct resource identifier (e.g. a URL, DOI, Handle)
501 10.14456/stj.2019.4 dois http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14456/stj.2019.4 nan HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Read timed out. (read timeout=10)
505 10.14469/ch/175982 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/180406 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/180406 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
551 10.17876/plate/dr.2/plates/201_35722 dois https://www.plate-archive.org/objects/dr.2/plates/201_35722 nan 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/plates/201_35722/
581 10.48550/arxiv.2309.02963 dois https://arxiv.org/abs/2309.02963 nan HTTPSConnectionPool(host='arxiv.org', port=443): Read timed out. (read timeout=10) ...
Read more

v0.10.2

22 Sep 14:04
865d400
Compare
Choose a tag to compare
Update PyPI badge link

v0.10.1

22 Sep 13:42
865d400
Compare
Choose a tag to compare

What's Changed

Coverage report

The following benchmark was applied to 500 randomly selected records from Datacite.

Percentages

Percentage of datasets supported: 25.4%
Percentage of datasets not supported: 70.0%
Percentage of datasets with error: 4.6%

Table with unexpected errors

id type url service error
9 10.48448/kgfs-s492 dois https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus nan 500 Server Error: Internal Server Error for url: https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus
44 10.48550/arxiv.2011.14822 dois https://arxiv.org/abs/2011.14822 nan HTTPSConnectionPool(host='arxiv.org', port=443): Read timed out. (read timeout=3)
52 10.18730/v7c2= dois https://glis.fao.org/glis/doi/10.18730/V7C2= nan '10.18730/v7c2=' is not a correct resource identifier (e.g. a URL, DOI, Handle)
56 10.48550/arxiv.1410.8409 dois https://arxiv.org/abs/1410.8409 nan HTTPSConnectionPool(host='arxiv.org', port=443): Read timed out. (read timeout=3)
73 10.20345/digitue.1029.61 dois http://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141 nan 500 Server Error: Internal Server Error for url: https://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141
81 10.7916/d8-qcx3-yp94 dois https://dlc.library.columbia.edu/resolve/10.7916/d8-qcx3-yp94 nan 500 Server Error: Internal Server Error for url: https://dlc.library.columbia.edu/catalog/10.7916/d8-qcx3-yp94
87 10.48550/arxiv.1206.4514 dois https://arxiv.org/abs/1206.4514 nan HTTPSConnectionPool(host='arxiv.org', port=443): Read timed out. (read timeout=3)
96 10.17876/plate/dr.2/plates/201_33742 dois https://www.plate-archive.org/objects/dr.2/plates/201_33742 nan 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/plates/201_33742/
107 10.48550/arxiv.2004.12397 dois https://arxiv.org/abs/2004.12397 nan HTTPSConnectionPool(host='arxiv.org', port=443): Read timed out. (read timeout=3)
112 10.48550/arxiv.1910.08101 dois https://arxiv.org/abs/1910.08101 nan HTTPSConnectionPool(host='arxiv.org', port=443): Read timed out. (read timeout=3)
130 10.48550/arxiv.1907.07968 dois https://arxiv.org/abs/1907.07968 nan HTTPSConnectionPool(host='arxiv.org', port=443): Read timed out. (read timeout=3)
160 10.48550/arxiv.2111.04828 dois https://arxiv.org/abs/2111.04828 nan HTTPSConnectionPool(host='arxiv.org', port=443): Read timed out. (read timeout=3)
174 10.25721/jhzr-0x24 dois https://didomena.ehess.fr/concern/data_sets/r494vn720 nan HTTPSConnectionPool(host='didomena.ehess.fr', port=443): Read timed out. (read timeout=10)
200 10.23725/akhp-6959 dois https://ors.datacite.org/doi:/10.23725/akhp-6959 nan HTTPSConnectionPool(host='ors.datacite.org', port=443): Max retries exceeded with url: /doi:/10.23725/akhp-6959 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f9d8a243d10>: Failed to resolve 'ors.datacite.org' ([Errno -2] Name or service not known)"))
252 10.14469/ch/129258 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/134211 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/134211 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
258 10.14469/ch/41814 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/48213 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/48213 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
283 10.18730/12n7m$ dois https://glis.fao.org/glis/doi/10.18730/12N7M$ nan '10.18730/12n7m$' is not a correct resource identifier (e.g. a URL, DOI, Handle)
296 10.14457/cmu.the.2009.132 dois http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14457/CMU.the.2009.132 nan HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Read timed out. (read timeout=10)
316 10.14469/ch/90617 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/97675 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/97675 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
321 10.14456/apsr.2022.3 doi...
Read more

v0.10

21 Sep 20:03
3ca48ed
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.9...v0.10

Coverage report

The following benchmark was applied to 500 randomly selected records from Datacite.

Percentages

Percentage of datasets supported: 26.6%
Percentage of datasets not supported: 70.2%
Percentage of datasets with error: 3.2%

Table with unexpected errors

id type url service error
9 10.48448/kgfs-s492 dois https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus nan 500 Server Error: Internal Server Error for url: https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus
52 10.18730/v7c2= dois https://glis.fao.org/glis/doi/10.18730/V7C2= nan '10.18730/v7c2=' is not a correct resource identifier (e.g. a URL, DOI, Handle)
73 10.20345/digitue.1029.61 dois http://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141 nan 500 Server Error: Internal Server Error for url: https://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141
81 10.7916/d8-qcx3-yp94 dois https://dlc.library.columbia.edu/resolve/10.7916/d8-qcx3-yp94 nan 500 Server Error: Internal Server Error for url: https://dlc.library.columbia.edu/catalog/10.7916/d8-qcx3-yp94
96 10.17876/plate/dr.2/plates/201_33742 dois https://www.plate-archive.org/objects/dr.2/plates/201_33742 nan 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/plates/201_33742/
163 10.34755/irok.2022.72.26.033 dois https://www.elibrary.ru/item.asp?id=48800309&pff=1 nan ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
200 10.23725/akhp-6959 dois https://ors.datacite.org/doi:/10.23725/akhp-6959 nan HTTPSConnectionPool(host='ors.datacite.org', port=443): Max retries exceeded with url: /doi:/10.23725/akhp-6959 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f0981cba3d0>: Failed to resolve 'ors.datacite.org' ([Errno -2] Name or service not known)"))
252 10.14469/ch/129258 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/134211 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/134211 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
258 10.14469/ch/41814 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/48213 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/48213 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
283 10.18730/12n7m$ dois https://glis.fao.org/glis/doi/10.18730/12N7M$ nan '10.18730/12n7m$' is not a correct resource identifier (e.g. a URL, DOI, Handle)
296 10.14457/cmu.the.2009.132 dois http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14457/CMU.the.2009.132 nan HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Read timed out. (read timeout=10)
316 10.14469/ch/90617 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/97675 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/97675 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
321 10.14456/apsr.2022.3 dois http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14456/apsr.2022.3 nan HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Read timed out. (read timeout=10)
394 10.5287/bodleianjpcy.2 dois https://databank.ora.ox.ac.uk/ww1archives/datasets/ww1-3945?version=2 nan HTTPSConnectionPool(host='databank.ora.ox.ac.uk', port=443): Max retries exceeded with url: /ww1archives/datasets/ww1-3945?version=2 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f0981dc4f90>, 'Connection to databank.ora.ox.ac.uk timed out. (connect timeout=3)'))
481 10.34628/w74t-gn74 dois http://hdl.handle.net/11067/89 nan HTTPConnectionPool(host='repositorio.ulusiada.pt', port=80): Read timed out. (read timeout=10)
493 10.7916/d8-47rs-s759 dois https://dlc.library.columbia.edu/resolve/10.7916/d8-47rs-s759 nan 500 Server Error: Internal Server Error for url: https://dlc.library.columbia.edu/catalog/10.7916/d8-47rs-s759

v0.9

15 Sep 10:39
7460ba2
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.8.1...v0.9

Coverage report

The following benchmark was applied to 500 randomly selected records from Datacite.

Percentages

Percentage of datasets supported: 26.0%
Percentage of datasets not supported: 71.6%
Percentage of datasets with error: 2.4%

Table with unexpected errors

id type url service error
9 10.48448/kgfs-s492 dois https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus nan 500 Server Error: Internal Server Error for url: https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus
52 10.18730/v7c2= dois https://glis.fao.org/glis/doi/10.18730/V7C2= nan '10.18730/v7c2=' is not a correct resource identifier (e.g. a URL, DOI, Handle)
73 10.20345/digitue.1029.61 dois http://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141 nan 500 Server Error: Internal Server Error for url: https://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141
81 10.7916/d8-qcx3-yp94 dois https://dlc.library.columbia.edu/resolve/10.7916/d8-qcx3-yp94 nan 500 Server Error: Internal Server Error for url: https://dlc.library.columbia.edu/catalog/10.7916/d8-qcx3-yp94
96 10.17876/plate/dr.2/plates/201_33742 dois https://www.plate-archive.org/objects/dr.2/plates/201_33742 nan 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/plates/201_33742/
200 10.23725/akhp-6959 dois https://ors.datacite.org/doi:/10.23725/akhp-6959 nan HTTPSConnectionPool(host='ors.datacite.org', port=443): Max retries exceeded with url: /doi:/10.23725/akhp-6959 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f78942e2d90>: Failed to resolve 'ors.datacite.org' ([Errno -2] Name or service not known)"))
252 10.14469/ch/129258 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/134211 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/134211 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
258 10.14469/ch/41814 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/48213 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/48213 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
283 10.18730/12n7m$ dois https://glis.fao.org/glis/doi/10.18730/12N7M$ nan '10.18730/12n7m$' is not a correct resource identifier (e.g. a URL, DOI, Handle)
316 10.14469/ch/90617 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/97675 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/97675 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
394 10.5287/bodleianjpcy.2 dois https://databank.ora.ox.ac.uk/ww1archives/datasets/ww1-3945?version=2 nan HTTPSConnectionPool(host='databank.ora.ox.ac.uk', port=443): Max retries exceeded with url: /ww1archives/datasets/ww1-3945?version=2 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f7894907610>, 'Connection to databank.ora.ox.ac.uk timed out. (connect timeout=3)'))
493 10.7916/d8-47rs-s759 dois https://dlc.library.columbia.edu/resolve/10.7916/d8-47rs-s759 nan 500 Server Error: Internal Server Error for url: https://dlc.library.columbia.edu/catalog/10.7916/d8-47rs-s759

v0.8.1

14 Sep 23:12
025fd7f
Compare
Choose a tag to compare

What's Changed

  • Fix resolver with unknown Re3Data software by @J535D165 in #60

Full Changelog: v0.8...v0.8.1

Coverage report

The following benchmark was applied to 500 randomly selected records from Datacite.

Percentages

Percentage of datasets supported: 22.4%
Percentage of datasets not supported: 75.0%
Percentage of datasets with error: 2.6%

Table with unexpected errors

id type url service error
9 10.48448/kgfs-s492 dois https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus nan 500 Server Error: Internal Server Error for url: https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus
52 10.18730/v7c2= dois https://glis.fao.org/glis/doi/10.18730/V7C2= nan '10.18730/v7c2=' is not a correct resource identifier (e.g. a URL, DOI, Handle)
73 10.20345/digitue.1029.61 dois http://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141 nan 500 Server Error: Internal Server Error for url: https://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141
81 10.7916/d8-qcx3-yp94 dois https://dlc.library.columbia.edu/resolve/10.7916/d8-qcx3-yp94 nan 500 Server Error: Internal Server Error for url: https://dlc.library.columbia.edu/catalog/10.7916/d8-qcx3-yp94
96 10.17876/plate/dr.2/plates/201_33742 dois https://www.plate-archive.org/objects/dr.2/plates/201_33742 nan 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/plates/201_33742/
163 10.34755/irok.2022.72.26.033 dois https://www.elibrary.ru/item.asp?id=48800309&pff=1 nan ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
200 10.23725/akhp-6959 dois https://ors.datacite.org/doi:/10.23725/akhp-6959 nan HTTPSConnectionPool(host='ors.datacite.org', port=443): Max retries exceeded with url: /doi:/10.23725/akhp-6959 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f31d0d2ab90>: Failed to resolve 'ors.datacite.org' ([Errno -2] Name or service not known)"))
252 10.14469/ch/129258 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/134211 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/134211 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
258 10.14469/ch/41814 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/48213 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/48213 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
283 10.18730/12n7m$ dois https://glis.fao.org/glis/doi/10.18730/12N7M$ nan '10.18730/12n7m$' is not a correct resource identifier (e.g. a URL, DOI, Handle)
316 10.14469/ch/90617 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/97675 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Max retries exceeded with url: /dspace/handle/10042/97675 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
394 10.5287/bodleianjpcy.2 dois https://databank.ora.ox.ac.uk/ww1archives/datasets/ww1-3945?version=2 nan HTTPSConnectionPool(host='databank.ora.ox.ac.uk', port=443): Max retries exceeded with url: /ww1archives/datasets/ww1-3945?version=2 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f31d0dd5190>, 'Connection to databank.ora.ox.ac.uk timed out. (connect timeout=3)'))
493 10.7916/d8-47rs-s759 dois https://dlc.library.columbia.edu/resolve/10.7916/d8-47rs-s759 nan 500 Server Error: Internal Server Error for url: https://dlc.library.columbia.edu/catalog/10.7916/d8-47rs-s759

v0.8

14 Sep 16:47
2972fca
Compare
Choose a tag to compare

What's Changed

  • Improve resolve speed and prevent hitting re3data.org servers by @J535D165 in #52
  • Add extensible support for handle systems and metadata by @J535D165 in #56
  • Auto unzip option #53 by @davetromp in #55
  • Fix datahugger errors for CrossRef DOIs by @J535D165 in #58

New Contributors

Full Changelog: v0.7...v0.8

Coverage report

The following benchmark was applied to 500 randomly selected records from Datacite.

Percentages

Percentage of datasets supported: 22.4%
Percentage of datasets not supported: 60.6%
Percentage of datasets with error: 17.0%

Table with unexpected errors

id type url service error
7 10.60516/au6956100 dois https://pid.geoscience.gov.au/sample/AU6956100 nan 'unknown'
9 10.48448/kgfs-s492 dois https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus nan 500 Server Error: Internal Server Error for url: https://underline.io/lecture/50210-findings-thai-nested-named-entity-recognition-corpus
12 10.26197/ala.f34c149a-4578-47c5-83fc-52ba63e37cad dois https://doi.ala.org.au/doi/f34c149a-4578-47c5-83fc-52ba63e37cad nan 'other'
13 10.60494/4013-gm34 dois https://pid.geoscience.gov.au/sample/AU9420041 nan 'unknown'
18 10.60516/au4510089 dois https://pid.geoscience.gov.au/sample/AU4510089 nan 'unknown'
20 10.60516/au4857683 dois https://pid.geoscience.gov.au/sample/AU4857683 nan 'unknown'
52 10.18730/v7c2= dois https://glis.fao.org/glis/doi/10.18730/V7C2= nan '10.18730/v7c2=' is not a correct resource identifier (e.g. a URL, DOI, Handle)
53 10.60494/ksh4-h631 dois https://pid.geoscience.gov.au/sample/AU8303563 nan 'unknown'
55 10.13145/bacdive5076.20230509.8 dois https://bacdive.dsmz.de/index.php?site=pdf_view&id=5076&doi=doi:10.13145/bacdive5076.20230509.8 nan 'unknown'
57 10.60516/au1215414 dois https://pid.geoscience.gov.au/sample/AU1215414 nan 'unknown'
68 10.1594/pangaea.318586 dois https://doi.pangaea.de/10.1594/PANGAEA.318586 nan 'other'
73 10.20345/digitue.1029.61 dois http://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141 nan 500 Server Error: Internal Server Error for url: https://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141
75 10.60516/au5331689 dois https://pid.geoscience.gov.au/sample/AU5331689 nan 'unknown' ...
Read more