scripting #64
Replies: 9 comments 5 replies
-
Hi Mark, thanks for alerting us to this! I have fixed the link to that sample upload. |
Beta Was this translation helpful? Give feedback.
-
Thanks, Katie, for the quick catch and fix. Re: query, I haven't looked
at the search webpage code/form, but can a query be built and passed using
the same interface as that? Or is there some authentication token stuff
making it tricky?
…On Tue, Jan 11, 2022 at 12:40 PM Katie Pearson ***@***.***> wrote:
Hi Mark, thanks for alerting us to this! I have fixed the link to that
sample upload.
We are currently developing an API for Symbiota software, but it is not
complete. Until then, it is not possible to query to download via a script
unless you're connected to the database.
—
Reply to this email directly, view it on GitHub
<#64 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABOFTQKT4KU5JEQ5G5GNDPTUVR2RNANCNFSM5LWZKWCQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Mark Hereld
|
Beta Was this translation helpful? Give feedback.
-
I’ll take a look at the page source and see if I can figure out anything.
…Sent from my iPhone
On Jan 11, 2022, at 1:29 PM, Katie Pearson ***@***.***> wrote:
There are sanitation steps that would probably prevent any queries from being passed through the search interface, but maybe if you customized the page itself, or the class? Not exactly sure I understand.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you authored the thread.
|
Beta Was this translation helpful? Give feedback.
-
Hi Jason!
Thank you! This looks like a good template. Do you have a community of
folks doing this kind of programmatic data access and analysis? Or is
everybody using the portal and its available features to work with these
data?
-- Mark
…On Wed, Jan 26, 2022 at 1:00 PM Jason Best ***@***.***> wrote:
Mark, I made a very simple Python script that might help with your search
query and download. It is at
https://github.com/jbest/symbiota_tools/blob/main/symbiota_download.py
It currently is only set up to query based on state and county parameters.
You'd have to dig into the Symbiota form HTML to get the names of other
fields and values you want to add to the query.
—
Reply to this email directly, view it on GitHub
<#64 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABOFTQNYN4GAEB6KDNQ7HB3UYBAFHANCNFSM5LWZKWCQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Mark Hereld
|
Beta Was this translation helpful? Give feedback.
-
Jason,
The Google group has 214 members now. Thanks for the tip. A question about
that BRIT Archive (apparently available in zip link at url you posted):
How big is it? Would be great to have a general sense of size before
plunging into the download.
…On Thu, Jan 27, 2022 at 1:22 PM Jason Best ***@***.***> wrote:
Mark, I can't point to any one forum for sharing code and techniques, but
this GitHub forum is probably a good start. Discussions are a newish GitHub
feature so there probably isn't a lot of momentum here yet so I'd suggest
you also join the Symbiota Google Group (
https://groups.google.com/g/symbiotagroup) which has 213 members as of
now. You may be aware already, but many collections in Symbiota publish
their full dataset periodically through a Darwin Core Archive which is
accessible on the collection's detail page (eg. for BRIT -
https://portal.torcherbaria.org/portal/collections/misc/collprofiles.php?collid=370).
You can download these if you want a comprehensive dataset without the need
to query but be warned that the data in the Archive may be out of date with
the live data. The Archive data for many collections are aggregated at
iDigBio.org and GBIF.org, both of which have API access which may suit your
needs.
—
Reply to this email directly, view it on GitHub
<#64 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABOFTQKXWA6HYWSUEZFGACDUYGLQPANCNFSM5LWZKWCQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Mark Hereld
|
Beta Was this translation helpful? Give feedback.
-
There don't appear to be any images at all. I was hesitant to download the
wad without a size hint bc I imagined that 185K images would be quite
large. I am primarily interested in such large datasets -- including the
images. I'll look through this to get a better feel for it.
…On Fri, Jan 28, 2022 at 11:42 AM Jason Best ***@***.***> wrote:
Matt, the DwC Archive file for the BRIT collection is 37 MB for ~273,000
specimen records with ~185,000 image records. The download itself only took
a few seconds but it took a couple of minutes for the site to respond.
There are other smaller collections you can peruse at
https://portal.torcherbaria.org/portal/collections/index.php and click on
the more info link to see the collection detail page if you want to start
with a smaller dataset. Not every collection will have a published DwC
archive.
—
Reply to this email directly, view it on GitHub
<#64 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABOFTQMMOKX5WIA2ZSZX3TDUYLIQ7ANCNFSM5LWZKWCQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Mark Hereld
|
Beta Was this translation helpful? Give feedback.
-
Hey Matt, Jason's scripts are making use of a non-formalized webservice that dynamically packages up a custom query as a DwC-Archive return. While this information isn't secret, but we haven't publicly documented this backend functionality (until now, see details below) because we haven't yet added authentication or throttling options to protect against abuse. We are actively developing a new API, which we plan to wrap this functionality into, and will also include some protections. Until then, we are hoping the community will refrain from writing scripts that run multiple download simultaneously. Base URL: /webservices/dwc/dwcapubhandler.php Variables:
Return: Darwin Core Archive of matching occurrences, associated images, identification history, and other associated data extensions |
Beta Was this translation helpful? Give feedback.
-
Physical images are often hosted on different servers from where the portal/database is stored. They are often served directly from the owning institution (e.g. BRIT). Thus, the size of the file return is not the only reason physical images are not delivered within the DwC-Archive, it's also because it could take hours to download and compile the images, to again be transfer to person requesting the data. Thus, individual image downloads need to be done by your script using the URLs within the DwC-A. |
Beta Was this translation helpful? Give feedback.
-
Ed, thanks for the mini-API summary. That will be very helpful. (I do
believe that I saw mention to preparing for new API while perusing the code
in github.) --mark
…On Fri, Jan 28, 2022 at 12:34 PM Edward Gilbert ***@***.***> wrote:
Physical images are often hosted on different servers from where the
portal/database is stored. They are often served directly from the owning
institution (e.g. BRIT). Thus, the size of the file return is not the only
reason physical images are not delivered within the DwC-Archive, it's also
because it could take hours to download and compile the images, to again be
transfer to person requesting the data. Thus, individual image downloads
need to be done by your script using the URLs within the DwC-A.
—
Reply to this email directly, view it on GitHub
<#64 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABOFTQK5NLP6TW2AGEO7NRDUYLOSZANCNFSM5LWZKWCQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Mark Hereld
|
Beta Was this translation helpful? Give feedback.
-
I'm looking into scripted interactions with portal data. At this link:
https://biokic.github.io/symbiota-docs/coll_manager/upload/
there is a reference to an example script called SampleSystemUpload.sh but that link seems to be broken.
Any examples out there on how to query, upload, download via script or other programming interface?
Thanks,
Mark
Beta Was this translation helpful? Give feedback.
All reactions