Skip to content

Publishing Microdata

Mehmood Asghar edited this page Aug 9, 2021 · 2 revisions

This page covers the functions included with NADAR to create and publish Microdata studies in the catalog and set various options.

For publishing Microdata studies, there are two options:

  1. Import DDI codebook 2.5 XML
  2. Create study from scratch using JSON schemas

Import DDI Codebook 2.5 XML

Download example project

Download the demo popstan project from here https://github.com/ihsn/ddi-examples/tree/main/demo-popstan-2006

Study files and folder structure

For microdata studies, we need the study DDI, RDF, documentation and data files. You can organize the files in any way you like. We recommend to create a separate folder for each study and use the study IDNo for the folder name. This will make it easier to write scripts to automate data import using NADAR or any other tools. The study folder should include:

  • Study DDI codebook (XML) file
  • External resources file (RDF)
  • Microdata and all other documentation files described in the RDF

Create and publish studies in the catalog

1 - Upload a DDI xml file

This is the first step, uploading a DDI file will create a new study in a draft mode (unpublished state).

To upload a DDI file, use NADAR function import_ddi:

Parameters

  • xml_file: Path to the DDI2 codebook xml file
  • rdf_file: (optional) Dublin core RDF file for importing external resources
  • repositoryid: Study collection ID,
  • access_policy: Data access type. Options are {direct, public,licensed,remote, data_na},
  • data_remote_url: (optional) Link to the repository where data is available for download. Only required if access policy is set to remote,
  • published: Set the publish status for the study. allowed values are {0=draft, 1=publish},
  • overwrite: Set it to yes if you want to replace the study if it already exists. allowed values {yes, no},

Example: Import DDI

  xml_file_path='popstan/ihsn-popstan-mics-2000.xml'

  result=nadar::import_ddi(
     xml_file=xml_file_path, 
     published = 1,
     overwrite = "yes", 
     access_policy = "direct")

  #check the status code - any value other than 200 means an error
  if (result$status_code!=200){
    stop(paste0("DDI import failed:",result$message)
  } 
  
  #success, show the response message from the API
  print(result$message)

2 - Upload RDF file

The RDF file contains the description of all study files (questionnaire, reports, microdata files, etc). The NADAR function can upload the resource descriptions and upload the actual files if they are placed in the same folder or subfolder as the RDF file.

For popstan study, the RDF file contains relative links to the resource files, running the R code below will create the resources and upload the files in one go. If you don't have an RDF file or files are not organized in a folder structure, you can still create and upload resources, see section below.

NADAR function external_resources_import:

Parameters

  • dataset_idno: Study IDNo
  • rdf_file: Dublin core RDF file
  • skip_uploads: Set to TRUE to skip file uploads
  • overwrite: Set it to yes if you want to replace existing resources. allowed values {yes, no},

Example: Import External resources (RDF) This example uses the popstan study, the RDF file includes the relative paths to the resource files. The external_resources_import function will import the descriptions from the RDF and will find the files from the study folder and upload them to the catalog.

skip_uploads - With the skip_uploads param set to FALSE, the function will throw warnings/errors if a resource file cannot be found in the study folder.

resource_file_path='popstan/ihsn-popstan-mics-2000.rdf'

nadar::external_resources_import(
    dataset_idno="ihsn-popstan-mics-2000",
    rdf_file=resource_file_path,
    skip_uploads = FALSE,
    overwrite="yes"
  )

The function does not return anything. If there were any errors, you'll see them in the R console. All errors are reported as warnings. To see the errors, call the R function warnings() which will show output similar to this:

Warning messages:
1: In nadar::external_resources_import(dataset_idno = "ihsn-popstan-mics-2000",  :
  Resource file not found: /Users/m2/Downloads/ihsn-popstan-mics-2000/ihsn-popstan-mics-2000-stata.zip

3 - Other options for creating and importing external resources and file uploads

If you don't have the RDF files available for your studies as in the step 2 or you have a different folder structure which does not allow to import and upload files in one step. You can use other functions available on NADAR to have your own workflow.

For external resources, you have two options:

  • Import RDF file
  • Create resource file from scratch

Import RDF file

Use the external_resources_import' function and set the parameter skip_uploads` to TRUE.

resource_file_path='popstan/ihsn-popstan-mics-2000.rdf'

nadar::external_resources_import(
    dataset_idno="ihsn-popstan-mics-2000",
    rdf_file=resource_file_path,
    skip_uploads = TRUE,
    overwrite="yes"
  )

Create resource file from scratch

You can create external resources without importing RDF files. To create a resource, use the NADAR function external_resources_add.

Parameters

  • idno: Unique ID for the study
  • dctype: Resource document type - see the API documentation for valid options
  • title: Resource title
  • dcformat: Resource file format - see the API documentation for valid options
  • author: Author name
  • dcdate: Date using YYYY-MM-DD format
  • country: Country name
  • language: Language or Language code
  • contributor: Contributor name
  • publisher: Publisher name
  • rights: Rights
  • description: Resource detailed description
  • abstract: Resource abstract
  • toc: Table of contents
  • file_path: File path for uploading
  • overwrite: Overwrite if resource already exists - Accepted values "yes", "no"

Example:

external_resources_add (
        idno="ihsn-popstan-mics-2000",
        dctype="Administrive document [doc/adm]",
        title= "Resource title",
        dcformat="Application/pdf",
        author="Author name",
        dcdate="2020/01/01",
        country="USA",
        language="english",
        contributor="contributor name",
        publisher="pubisher name",
        rights="rights",
        description="resource description",
        abstract="abstract",
        toc="table of contents",
        file_path="path/to/file/publication.pdf",
        overwrite="no")
        

4 - Set data access, and other options

TODO

5 - Upload a thumbnail

TODO