In addition to the code generated Swagger API documentation we collect here examples of how to use the ChecklistBank (CLB) API for common use cases. The ChecklistBank API is accessible at https://api.checklistbank.org, but we also provide a development installation for testing: http://api.dev.checklistbank.org
ChecklistBank (CLB) was designed to store many different datasets, often called checklists, that are dealing with scientific names at their core. In order to retain the original record identifiers - which might also occurr in other datasets - identifiers are only unique within a single dataset and compound keys are used to address a record uniquely in CLB.
Most API resources are therefore scoped by a dataset key like this:
- https://api.checklistbank.org/dataset/9910/name
- https://api.checklistbank.org/dataset/9910/taxon/H6
- https://api.checklistbank.org/dataset/9910/reference?q=Wetter
ChecklistBank differes between names and name usages which can be a taxon (=accepted name) or a synonym. Names can exist on their own in a dataset, e.g. if they are taxonomically unplaced or part of a nomenclatural dataset. We refer to these names as bare names.
Knowing the datasetKey of a dataset to work with is important to access it's data. CLB user interface offers a dataset search with various filters which you can also access via the API:
https://api.checklistbank.org/dataset?q=hobern
This finds datasets with Donald Hobern as an author or contributor.
The q
parameter is a weighted full text search, which hits the alias, title, description, but also creators, editors, publisher and other fields of the dataset metadata.
Data itself, e.g. scientific names, are not searched.
Popular dataset filters:
origin=PROJECT
: find project datasets onlytype=LEGAL&type=PHYLOGENETIC
: find datasets of type LEGAL or PHYLOGENETIC.releasedFrom=3
: list only releases of project 3, which is the Catalogue of Life (COL)contributesTo=3
: list only datasets that are sources of project 3, which is the Catalogue of LifegbifKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c
: find dataset with the GBIF key d7dddbf4-2cf0-4f39-9b2a-bb099caae36cgbifPublisherKey=7ce8aef0-9e92-11dc-8738-b8a03c50a862
: find datasets which have been published in GBIF by the given publisher, in this case Plazi
We synchronise ChecklistBank with checklist datasets in GBIF, so you have access to all these GBIF datasets from CLB.
ChecklistBank distinguishes 3 kind of datasets indicated by their origin
property:
external
: datasets which are maintained outside of ChecklistBank and are imported for read accecss only. This is the vast majority of all datasetsproject
: datasets which are maintained inside ChecklistBank and which often include & sync data from other sources. The Catalogue of Life checklist is such a project with datasetKey=3release
: immutable snapshots of a project with stable identifiers
The API also provides some simple magic dataset keys, that will allow you to access some datasets without knowing the latest key:
{KEY}LR
: a substitute for the latest, public release of a project, e.g the latest COL checklist: https://api.checklistbank.org/dataset/3LRCOL{YEAR}
: a substitute for the annual release of the COL checklist in the given year with 4 digits: https://api.checklistbank.org/dataset/COL2023
There are many places in the API where a controlled vocabulary is used. You can find an inventory of all vocabularies here: http://api.checklistbank.org/vocab In order to see all supported values for a vocabulary append its name to the vocab resource, e.g. http://api.checklistbank.org/vocab/taxonomicstatus
The majority of the API is open and can be accessed anonymously.
Writing data often requires authentication and datasets can be private
, i.e. are only visible to authorised users.
ChecklistBank shares user accounts with GBIF, so you need to have a GBIF account to authenticate to the CLB API. Authentication in the API uses simple BasicAuth, but mostly for user interfaces we also provide JWT. Note that BasicAuth in itself is not very secure, so please use it always with the https protocol. CLB will actually decline the use of plain http.
A simple basic authentication using curl would look like this:
curl -s -v --user j.smith:passwd1234xyz "https:api.checklistbank.org/user/me"```
> GET /user/me HTTP/2
> Host: api.checklistbank.org
> authorization: Basic ai5zbWl0aDpwYXNzd2QxMjM0eHl6
> user-agent: curl/7.85.0
> accept: */*
>
< HTTP/2 401
< date: Fri, 30 Jun 2023 07:48:10 GMT
< content-type: application/json
- Authentication problem. Ignoring this.
< www-authenticate: Basic realm="COL"
< www-authenticate: Bearer token_type="JWT" realm="COL"
< cache-control: must-revalidate,no-cache,no-store
< content-length: 52
< x-varnish: 296268277
< age: 0
< via: 1.1 varnish (Varnish/6.0)
< x-cache: pass uncacheable
< vary: Origin
Note that this user does not exist and a 401 is therefore returned.
The API primarily provides RESTful JSON services documented with OpenAPI: http://api.checklistbank.org/openapi. When using it from the terminal curl and jq are good friends, but there are also other advanced clients like Insomnia or RapidAPI that you can use. These rich clients usually allow you to read in our OpenAPI description so they know already about all existing API methods!
In some areas other formats are also supported which can be requested by setting the appropriate http Accept
header.
To simplify usage of the API when you do not have access to modifying request headers, the API also accepts various URL resource suffices which are converted into the matching Accept header:
.xml
:Accept: application/xml
.json
:Accept: application/json
.yaml
:Accept: text/yaml
.txt
:Accept: text/plan
.html
:Accept: text/html
.tsv
:Accept: text/tsv
.csv
:Accept: text/csv
.zip
:Accept: application/zip
.png
:Accept: image/png
.bib
:Accept: application/x-bibtex
.csljs
:Accept: application/vnd.citationstyles.csl+json
For example you can retrieve dataset metadata in JSON, YAML or EML XML that way:
- https://api.checklistbank.org/dataset/9910.json
- https://api.checklistbank.org/dataset/9910.yaml
- https://api.checklistbank.org/dataset/9910.xml
In most searches and list results the API offers a limit
and offset
parameter to support paging.
Not that paging is restricted to a maximum of 100.000 records.
The Catalogue of Life checkist is a project in ChecklistBank with the datasetKey=3. Projects are living datasets that can change at any time, might temporarily have duplicate or bad data and therefore also do not use stable identifiers. For regular use only immutable releases should be used, which are created on a monthly basis for COL. Monthly releases will not change, but they might be deleted at some point after a minimum retention of one year. Once a year the Catalogue of Life also releases an annual checklist with long term support, which will never be deleted.
To search for a name usage in the annual release 2023 you can use the plain search:
You can filter the search by various options:
- rank: https://api.checklistbank.org/dataset/COL2022/nameusage/search?q=Abies&rank=species
- content: what to search on, SCIENTIFIC_NAME, AUTHORSHIP or both: https://api.checklistbank.org/dataset/COL2022/nameusage/search?q=Hobern&content=AUTHORSHIP
- type: the kind of search to use. One of prefix, whole_words or exact: https://api.checklistbank.org/dataset/COL2022/nameusage/search?q=Hobern&content=AUTHORSHIP
- prefix: Matches a search term to the beginning of the epithets within a scientific name. This is the only available search type for the suggest service. Whole-word and exact matching defies the purpose of auto-completion. However, you still have the option of fuzzy/non-fuzzy matching.
- whole_words: Matches a search term to entire epithets within a scientific name.
- exact: Matches the entire search phrase to the entire scientific name. When choosing this type you cannot also opt for fuzzy matching. The "fuzzy" parameter is silently ignored.
- fuzzy: allow for slightly fuzzy matching
You can get the vernacular names for a species, e.g. 4QHKG, through either the full taxon info: http://api.checklistbank.org/dataset/COL2023/taxon/4QHKG/info
or individually through the vernacular name resource: http://api.checklistbank.org/dataset/COL2023/taxon/4QHKG/vernacular
ChecklistBank also provides a basic vernacular search that finds vernacular names in an entire dataset: http://api.checklistbank.org/dataset/COL2023/vernacular?q=Puma http://api.checklistbank.org/dataset/COL2023/vernacular?q=Puma&language=spa
There is a specialised API to support navigating a taxonomic tree and to implement a tree browser. This lists the root taxa of the tree - or better forrests: https://api.checklistbank.org/dataset/COL2023/tree
For a specific taxon you can then list all of its direct children https://api.checklistbank.org/dataset/COL2023/tree/N/children
Some taxa have mixed ranks as their direct children. For these the Tree API offers an option to create virtual placeholder nodes that group them under a single entry: https://api.checklistbank.org/dataset/COL2023/tree/RT/children?insertPlaceholder=true
To load a tree from any given taxon, you can use the following resource that lists the parent classification: https://api.checklistbank.org/dataset/COL2023/tree/33VS
Our reusable React Tree component makes use of this API.
The name matching API in ChecklistBank allows to match against any dataset in CLB which is identified by an integer dataset key. There are different resources for simple & batch matching.
The simple one by one matching resource takes various query parameters, the main one being a q
or alternatively scientificName
for the name:
https://api.checklistbank.org/dataset/3LR/match/nameusage?q=Abies
3LR
stands for the Latest Release of dataset 3 which is the COL project.
So you will get the latest monthly release with that one without knowing it's actual integer key.
You can search for datasets to match against here: https://www.checklistbank.org/dataset
This gives you all releases of the COL checklist: https://www.checklistbank.org/dataset?releasedFrom=3&sortBy=created
Annual releases of COL, which will be kept forever, can also be found by using COL + year as the dataset key, for example the annual release of 2022: https://www.checklistbank.org/dataset/COL2022
Matching with an ambiguous "homonym" gives you no match, but shows the alternative options: https://api.checklistbank.org/dataset/3LR/match/nameusage?q=Oenanthe
Adding an author or some classification helps to disambiguate in such a case: https://api.checklistbank.org/dataset/3LR/match/nameusage?q=Oenanthe&kingdom=Plantae https://api.checklistbank.org/dataset/3LR/match/nameusage?q=Oenanthe&authorship=Linneaus
Alternatively there is also a bulk matching method which creates an asynchroneous job similar to downloads. You must have a user account to use it and will get an email notification once done. CLB user accounts are the same as GBIF accounts, therefore you need to register with GBIF and then log into ChecklistBank with the GBIF credentials once. The bulk matching is only available via the API at this stage, the UI will follow shortly.
Bulk matching accepts different inputs for names:
- upload a
CSV
orTSV
file to supply names for matching or - select a source dataset from ChecklistBank that you want to use to supply names for matching. This can then also be filtered by various parameters to just match a subtree, certain ranks, etc
A bulk matching request could look like this:
curl -s --user USERNAME:PASSWORD -H "Content-Type: text/tsv" --data-binary @match.tsv -X POST "https://api.checklistbank.org/dataset/COL2022/match/nameusage/job"
with a match.tsv
input file such as this one:
ID rank scientificName authorship kingdom
tp phylum Tracheophyta Plantae
1 species Abies alba Mill. Plantae
2 species Poa annua L. Plantae
Query parameters for bulk matches from a source dataset:
format
: CSV or TSV for the final result filesourceDatasetKey
: to request a dataset in CLB as the source of names for matchingtaxonID
: a taxon identifier from the source dataset to restrict names only from the subtree of that taxon, e.g. a selected familylowestRank
: the lowest rank to consider for source names. E.g. to match only species and ignore all infraspecific namessynonyms
: if synonyms should be included, defaults to true
Query params for individual matches and column names in bulk input are called the same:
id
scientificName
(orq
)authorship
code
rank
superkingdom
kingdom
subkingdom
superphylum
phylum
subphylum
superclass
class
subclass
superorder
order
suborder
superfamily
family
subfamily
tribe
subtribe
genus
subgenus
section
species
Authentification in the CLB API works either as plain BasicAuth
for every request or you can request a JWToken
which the UI for example does.
Basic API Docs https://api.checklistbank.org/#/default/match_1
The Names Index (nidx) is a ChecklistBank component that automatically tracks all unique names across all datasets. It is the heart of the name matching, but can also be used directly to find related names. Every name in the index links back to a canonical version of the name which is unranked and does not have any authorship. Otherwise names are considered the same only if the latin name, rank & authorship match up according to a rather strict similarity algorithm. As this algorithm is continuously being improved, entries in the index and their related names cluster differ over time and nidx identifiers should not be stored for a longer time as they are not guaranteed to be stable.
Names Index entries should also not be seen as nomenclaturally scrutinized data. Any name present in any of the CLB datasets will be included as long as their Name.type is not one of:
no_name
: 1234placeholder
: Asteraceae incertae sedisinformal
: Abies spec.
A names index entry can be resolved like this: https://api.checklistbank.org/nidx/2
{
"created":"2023-02-22T18:52:01.575189",
"modified":"2023-02-22T18:52:01.575189",
"canonicalId":1,
"scientificName":"Animalia",
"rank":"kingdom",
"uninomial":"Animalia",
"labelHtml":"Animalia",
"parsed":true,
"id":2,
}
The canonicalId
points to the canonical version of the name:
https://api.checklistbank.org/nidx/1
You can list all index names that share the same canonical name with the group resource: https://api.checklistbank.org/nidx/1/group
The names index id also allows to find all name instances in CheckistBank no matter which dataset they belong to: https://api.checklistbank.org/nameusage?nidx=1
Another feature driven by the names index are exports of ID mappings between different datasets.
Similar to downloads this is an asynchroneous job that will result in a compressed CSV file with all names from the requested datasets
aligned according to their names index match. An optional min
parameter can be given to only include names that appear in at least the given number of datasets.
For example an export of ID mappings between the Catalogue of World Gelechiidae and LepIndex can be triggered with
curl --user USERNAME:PASSWORD -X POST "https://api.checklistbank.org/nidx/export?datasetKey=1018&datasetKey=2362&min=2"
The top of the result file would look like this:
rank scientificName authorship IDdataset1018 IDdataset2362
species Acanthophila piceana Sulcs, 1968 100532 4263-4266
species Acompsia angulifera Walsingham, 1897 98070 2517-2518
species Acompsia dimorpha Petry, 1904 98076 28
species Acompsia fuscella Duponchel, 1844 103303 7819-7824
Github repositories are very well suited to to host source data in ColDP or DwC archives if individual files do not exceed the 100MB limit. Any changes to the data will be versioned and github automatically provides the archive as a zip package for all files.
For datasets managed in github ChecklistBank also offers a webhook that these github repositories can be configured for. If setup correctly and change commit will trigger an import into ChecklistBank so that the data is kept up to date near realtime!
A new dataset can be configure with these steps:
- in ChecklistBank:
- register the dataset in ChecklistBank and remember its DATASET_KEY for later
- configure it's access URL to point to the github repo zip archive, e.g. https://github.com/CatalogueOfLife/data-vespoidea/archive/refs/heads/master.zip
- in Github:
- go to the repository settings and add a new webhook that points the payload URL to http://api.checklistbank.org/importer/{DATASET_KEY}/github and uses the
application/json
Content-Type. - configure github to use a secret that the CLB admin hands over to you confidently. Please request your secret with mdoering {at} gbif.org
- go to the repository settings and add a new webhook that points the payload URL to http://api.checklistbank.org/importer/{DATASET_KEY}/github and uses the