Skip to content

Bulk Data Import

Sarah McDougall edited this page Nov 28, 2022 · 5 revisions

Note: The bulk data work on this server is very much a work in progress, and may change as the specification evolves and is tested further.


Bulk Data importing is handled via a Redis-based Queue Library for robust processing of a large amount of data. The functionality also relies on the bulk-data-utilities library for doing export flow, parsing ndjson, and reference checking.

Standard Bulk Data Import (Ping and Pull)

The server exposes a /$import endpoint that follows the standard Bulk Import PnP Spec. For example, to import all FHIR Patient, Encounter, and Procedure resources from a FHIR server at http://example.com, the request would look as follows:

Kickoff

Request

POST http://localhost:3000/4_0_1/$import

Headers:

Content-Type: application/json+fhir
Prefer: respond-async

Body:

{
  "resourceType": "Parameters",
  "parameter": [
    {
      "name": "exportUrl",
      "valueUrl": "http://example.com/$export"
    },
    {
      "name": "_type",
      "valueString": "Patient"
    },
    {
      "name": "_type",
      "valueString": "Encounter"
    },
    {
      "name": "_type",
      "valueString": "Procedure"
    }
  ]
}

Response

Status: 202 Accepted
Content-Location: 4_0_1/bulkstatus/<unique-id>

When the kickoff request is accepted, a new job is added to the bulk data processing queue that will use bulk-data-utilities to start the ping and pull process.

When bulk-data-utilities does the export flow, it will take the resulting ndjson files and map them into individual FHIR Bundle resources that contain all resources specific to the given Patient in that Bundle.

Polling

To poll for the status of the import request, send a GET request to the Content-Location endpoint specified from the kickoff response:

Request

GET http://localhost:3000/4_0_1/bulkstatus/<unique-id>

Response

Status: 202 Accepted
X-Progress: "Retrieving export files"
Retry-After: 120

When the job is still in progress, the client will be told to retry after a certain amount of time. When the job is finished, subsequent status requests will respond with the output of the bulk import:

Request

GET http://localhost:3000/4_0_1/bulkstatus/<unique-id>

Response

Status: 200 OK

{
  "transactionTime": "<datetime of transaction>",
  "requiresAccessToken": false,
  "outcome": [
    {
      "type": "OperationOutcome",
      "url": "http://localhost:3000/4_0_1/<unique-id>/info_file_1.ndjson"
    }
  ]
}

The resources will now be present in the server's MongoDB database. To view the OperationOutcome resources generated during the import:

GET http://localhost:3000/4_0_1/<unique-id>/info_file_1.ndjson

eCQM-Specific Bulk Data Import

eCQM-specific bulk import is similar to the standard "Ping and Pull" approach described above. The main differences are as follows:

  • Resource types and any additional parameters are computed from the Data Requirements of the specified Measure, not by the requester
  • Kickoff request uses an asynchronous version of the $submit-data operation
    • Must send request to $bulk-submit-data endpoint for a specific eCQM
    • Must include a measureReport parameter that is a data-collection MeasureReport referencing the desired Measure by canonical URL

Kickoff

Request

POST http://localhost:3000/4_0_1/Measure/<measure-id>/$bulk-submit-data

Headers:

Content-Type: application/json+fhir
Prefer: respond-async

Body:

{
  "resourceType": "Parameters",
  "parameter": [
    {
      "name": "measureReport",
      "resource": {
        "resourceType": "MeasureReport",
        "measure": "http://example.com/full/url/of/measure-id"
      }
    },
    {
      "name": "exportUrl",
      "valueUrl": "http://example.com/$export"
    }
  ]
}

Response

Status: 202 Accepted
Content-Location: 4_0_1/bulkstatus/<unique-id>

When the kickoff request is accepted, a new job is added to the bulk data processing queue just like the base $import operation. However, the Data Requirements of the measure will be calculated using fqm-execution and passed along to bulk-data-utilities in order to identify the required types to be exported from the data provider.

Polling

The polling flow is identical to regular $import:

Request

GET http://localhost:3000/4_0_1/bulkstatus/<unique-id>

Response

Status: 202 Accepted
X-Progress: "Retrieving export files"
Retry-After: 120

...

GET http://localhost:3000/4_0_1/bulkstatus/<unique-id>

Status: 200 OK

{
  "transactionTime": "<datetime of transaction>",
  "requiresAccessToken": false,
  "outcome": [
    {
      "type": "OperationOutcome",
      "url": "http://localhost:3000/4_0_1/<unique-id>/info_file_1.ndjson"
    }
  ]
}

Architecture

The following diagram depicts the architecture used for testing bulk import with deqm-test-server as the data consumer:

Bulk Import Architecture