NEW FEATURES
-
New function
bold_identify_taxonomy()
to add taxonomic information to the output ofbold_identify()
and replacebold_identify_parents()
. Instead of taking the taxon names from thebold_identify()
output, and usebold_tax_name()
to get the taxonomic ID to then pass it tobold_tax_id()
to get the parent names, we take the process ids from thebold_identify()
output and then pass them tobold_specimens()
. This has the advantages of being faster and, more importantly, making sure the correct taxonomy is returned. The function has less arguments since the filtering of the result isn't necessary anymore. Since the result now has only one line per row of input, the output is always in 'wide' format (like when usingbold_identify_parents()
withwide=TRUE
). There is one new argumenttaxOnly
which isTRUE
by default and return only the taxonomic data. However, sincebold_specimens()
also returns other data (habitat, country, image_url, etc), setting this argument toFALSE
will also join that data to the input. -
New function
bold_tax_id2()
which will eventually replacebold_tax_id()
. The main changes are in the format of the output. For thedataTypes
'basic', 'stats', 'images' and 'thirdparty', the output doesn't change. For thedataTypes
'sequencinglabs', 'geo' and 'depository', instead of having one (sometimes very) wide data.frame, the result is now in 'long' format, with the columns 'input', 'taxid', 'sequencinglabs|country|depository' and 'count'. For thedataTypes
'all' or when selecting more than one dataTypes, the output is a list for each data types containing their respective data.frame. When setting includeTree toTRUE
, the parents' data is rbinded to their respective data.frame. The function also check that all arguments are the correct type and that thedataTypes
chosen are valid. -
The now deprecated
bold_tax_id()
has the same argument checks asbold_tax_id2()
but will throw warnings instead of errors to not affect existing workflows. Also, if a chosendataTypes
is invalid, it gets removed to not make unnecessary requests. -
Similarly, the now deprecated
bold_identify_parents()
has new argument checks and will throw warnings to not affect existing workflows. -
For
bold_tax_id2()
andbold_tax_name()
, when querying multiple taxa, if one fails, the loop won't break and will instead throw the API error as a warning. The output object will also have 2 new attributes "errors" and "params" that will let you see what errors occurred for with request and what parameters were use for the request.
To make it easy to retrieve these attributes, 3 new functions have been created:bold_get_attr()
will return a list of the two attributesbold_get_errors()
will return a list of the errorsbold_get_params()
will return a list of parameters used
-
bold_specimens()
andbold_seqspec()
have a new parametercleanData
which, when set toTRUE
, replaces empty strings ("") by NAs and strings containing only duplicated values by their unique value (ex : "COI-5P|COI-5P|COI-5P" becomes "COI-5P"). -
New function
bold_read_trace()
to replaceread_trace()
. Can read one or multiple trace files from aboldtrace
object or provided file path(s). -
New function
b_sepFasta()
to use after a call tobold_seqspec()
wheresepFasta
wasn't set toTRUE
.
MINOR IMPROVEMENTS
- made tests for the new functions
- made tests for the
bold_trace()
function - added test to existing functions to improved test coverage
- added/completed argument checks for every functions
bold_specimens()
andbold_seqspec()
can now also return partial output likebold_seq()
- using
data.table
when possible, removeddplyr
andreshape
dependencies - using
stringi
instead ofstringr
which removedstringr
's other dependencies - added more details to the documentation of some functions
BUG FIXES
- changed how http responses are read so they throw warnings and return NAs instead of errors. This prevents a long request to stop and fail, loosing the already fetched data. (#74)
- added a note in the documentation of
bold_seq()
,bold_seqspec()
andbold_specimen()
that if thetaxon
doesn't have public records, if using another parameter will return all data for that parameter. Users can verify the availability of public records withbold_stats()
. A note was also added inbold_tax_name()
that the column 'specimenrecords' relate to the records in the taxonomy browser and not in the public data portal. (#76) - fixed output of bold_seq() (#79)
- changed the function used to encode to UTF-8 (#81, #86)
- contacted bold so they would fix their typo in 'depository' which prevented fetching related data with
bold_tax_id()
(#83). Added a line in the function to change 'depositories' to 'depository' in case people had been using that. - added a check for 'name' in
bold_tax_name()
to double escape single quotes. Otherwise it doesn't return the data (#84, #85). Since it's related to the API, this means that the data that comes back also contains errors. So I added a function to repair the names of 'taxon', 'taxonrep' and 'parentname' in the returned object. The function is also used inpipe_params()
(which is used bybold_seq()
,bold_seqspec()
andbold_specimen()
) to repair thetaxon
parameter in case users use results from previous versions. - changed the way the response of
bold_seqspec()
is read (#87, #88) thanks @cjfields - added a note in
bold_stats()
documentation to specify that the record counts include all gene markers (#90).