Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

Smithsonian/GBIF-Issues-Explorer

Repository files navigation

GBIF Issues Explorer

Archived

This repo is now archived. For a similar set of features, check https://github.com/Smithsonian/GBIF-Dataset-Explorer

Intro

This Shiny app allows researchers and data/collection managers to navigate the records with issues in a GBIF Darwin Core Archive.

The app can be used to:

  • Determine the source of issues:
    • Researchers can determine if the data is usable for a particular analysis
    • Collection and data managers can check their own database and figure out the source of the problem and fix it in the next update to GBIF
  • Determine if an issue would affect an analysis:
    • For example, a COUNTRY_COORDINATE_MISMATCH could be because the coordinates fall just outside the country borders. Is this an error in the coordinates or an expected result of an occurrence in water?

Occurrence records in GBIF can be tagged with a number of issues that their system has detected. However, like the processing information page indicates:

Not all issues indicate bad data. Some are merley flagging the fact that GBIF has altered values during processing.

This tool allows collection and data managers, as well as researchers, to explore issues in GBIF Darwin Core Archive downloads in an easy web-based interface. Just enter a GBIF download key and the tool will download the zip file, create a local database, and display the issues in the data contained. Once provided with the GBIF key, this tool will:

  1. Download the zip archive
  2. Extract the files
  3. Create a local database
  4. Load the data from the occurrence, verbatim, multimedia, and dataset tables to the database
  5. Generate summary statistics of the issues

To use, just provide the key to a Darwin Core Archive from GBIF. The download key can be requested via the GBIF API or on the website. If your download URL is:

www.gbif.org/occurrence/download/0001419-180824113759888

Then, the last part, '0001419-180824113759888' is the GBIF key you will need to provide this tool. The first time the app is run, it takes some time to create a local database, in particular for large data files. Afterwards, it uses the local database, so it will be faster.

As an alternative, you can copy the zip file to the data folder and run the load_from_DwC_zip.R script. It will run the same steps as above (skipping downloading the file) from the command line.

Then, you can click the 'Explore Issues' tab to see how many records have been tagged with a particular issue.

Once you select an issue, a table will display the rows that have been tagged with that issue. If you click on a row, more details of the occurrence record will be shown, including a map using Leaflet (if the record has coordinates). You can choose to delete the row from the local database.

The 'Explore Data Fields' will show a summary and top data values in all fields of the occurrence.txt file (except for the gbifID field).

Features

  • Load a GBIF DwC download from the web or from a local zip file
  • Navigate the issues in the records
    • Spatial issues are shown with relevant fields and a map
  • Explore the data included in each field
    • How many are null or empty, how many distinct values there are?

Screenshots

Main page, showing the number of records with specific issues:

gbif1

Exploring issues by looking at record details:

GBIF Issues Explorer2

Explore the data fields, see number of records without data, and distinct values (new in version 0.4):

GBIF Issues Explorer3

Testing the app in local computer

To test the app locally, without the need of a server, just install R and Shiny. Then, run a command that will download the source files from Github.

R version 3.3 or better is required. After starting R, copy and paste these commands:

install.packages(
    c("shiny", "DT", "dplyr", 
      "ggplot2", "stringr", "leaflet", 
      "XML", "curl", "data.table", "RSQLite", 
      "jsonlite", "R.utils", "shinyWidgets", 
      "shinycssloaders")
    )

library(shiny)
runGitHub("GBIF-Issues-Explorer", "Smithsonian")

Please note that the installation of the required packages may take a few minutes to download and install. Future versions will try to reduce the number of dependencies.

Winner of 2nd Place Award in GBIF 2018 Ebbe Nielsen Challenge!

The Global Biodiversity Information Facility (GBIF) Secretariat announced today that the Smithsonian Institution’s Digitization Program Office (DPO) was selected by an expert jury as a winner in the 2018 Ebbe Nielsen Challenge. The entry, submitted by DPO Informatics Program Officer Luis J. Villanueva, the GBIF Issues Explorer, won a Second Place award among 23 entries from countries around the world. The Challenge was open to software tools that used GBIF data or tools to promote open science and open biodiversity data.

More details...

Install

The app requires R 3.3, or later, and these packages:

  • shiny
  • DT
  • dplyr
  • ggplot2
  • stringr
  • leaflet
  • XML
  • curl
  • data.table
  • RSQLite
  • jsonlite
  • R.utils
  • shinyWidgets
  • shinycssloaders

To install the required packages:

install.packages(
    c("shiny", "DT", "dplyr", 
      "ggplot2", "stringr", "leaflet", 
      "XML", "curl", "data.table", "RSQLite", 
      "jsonlite", "R.utils", "shinyWidgets", 
      "shinycssloaders")
    )

Please feel free to submit issues, ideas, suggestions, and pull requests.