This repository combines data selection and extraction, pseudonymisation, bundleing of the extracted FHIR resources and Creating a Document Reference + (Binary resource from the bundle OR the bundle itself) to be send via the DSF.
Testdata is automatically provided as part of this repository in the testdata folder.
To spin up a FHIR server clone the following repository: https://github.com/medizininformatik-initiative/fhir-server-examples,
navigate to the server/blaze folder, set the BASE_URL in the docker-compose file to "http://blaze:8080" and execute docker-compose -p mii-projectathon up -d
This will spin up blaze FHIR server and expose it on localhost on port 8082.
You can access your FHIR server under http://localhost:8082/fhir/my-endpoint-here , to see all your currently loaded Patients go to:
http://localhost:8082/fhir/Patient
If you are using a HAPI FHIR server the following environment variable needs to be set in the docker-compose file to allow the uploading of transaction bundles without executing them: HAPI_FHIR_ALLOWED_BUNDLE_TYPES: COLLECTION,DOCUMENT,MESSAGE,TRANSACTION,TRANSACTIONRESPONSE,BATCH,BATCHRESPONSE,HISTORY,SEARCHSET
Once the server is available (this can take a couple of seconds) and you get a response from the Patient URL above you can load your testdata.
To load your testdata execute upload-testdata.sh
, which will upload all the testdata fromt the testdata
folder to your FHIR server.
Once the script is finished access http://localhost:8082/fhir/Patient again to see if your resources have been loaded.
Create your own config files using the initialise-env-files.sh
Before you can run the data extraction you will need to change the .env file according to your requirements. For all configuration options see "Configuration Options" below.
If you are using the standard blaze server as described in Step 2 above you should set the env var: MII_DATA_EXTRACTION_FHIR_BASE_URL=http://blaze:8080/fhir
If you are using a self-signed certificate for your FHIR server add the ca certificate to the "certs" folder and name it custom-ca-bundle.crt
.
Optionally you can also change the data extraction, pseudonymisation and bundeling by changing the data_extraction_config.json and psd_config.json accordingly. For more information see "Running each script individually" below.
Execute the set_rights.sh
to set the folder rights correctly
In this repository execute docker-compose -p mii-projectathon up
.
This will run in sequence the three python scripts of this repository:
- data-selection-and-extraction.py (selects and extracts the data according to the data_extraction_config.json) - see section data-selection-and-extraction.py below)
- pseudonymisation.py (pseudonymises selected resources according to the psd_config.json - see section pseudonymisation.py below)
- build-transaction-bundle.py (bundles all extracted resources into one single transaction bundle, b64 encodes it and creates another Transaction bundle which contains a DocumentReference FHIR resource and (configurable) EITHER a Binary FHIR resources which includes the b64 encoded bundle created before OR a bundle of all extracted resources)
All scripts create files:
(1.) saves all extracted reources in extracted_resources
(2.) saves all pseudonymised resources in pseudonymised_resources
(3.) saves the extracted resources bundle, the b64 of the extracted resources bundle and the bundle for the base64 extracted resources bundle in to_send
To see the available script arguments execute python3 data-selection-and-extraction.py -h
This script selects and extracts data from a FHIR server according to the data_extraction_config.json
field | description | possible values | default value should be |
---|---|---|---|
name | name of the data extraction - used to specify the filename use for this data extrcation | ||
query | query used to extract the data - | any fhir search string which return only one resource type | |
output_file_path | path where to save the extracted resources | any path to a folder - best left as extracted_resources | |
request_type | How should data be extracted - with a POST or a GET request | POST , GET | GET |
cohort_extraction | specifies that a cohort should be extracted from this fhir search extraction - contains two subfields cohort_id_field and cohort_id_prefix both of which are required if cohort_extraction is used | ||
cohort_id_field | the json path to the field which contains the patient ids of this resource | ||
cohort_id_prefix | the id prefix which has to be deleted to get the actual IDs | ||
cohort_dependence | specifies that an extraction is to be cohort dependen - contains two subfields chunk_size and cohort_id_selection_field both of which are required if cohort_extraction is used | ||
chunk_size | defines the chunks into which ids are meant to be split when extracted the resources according to the cohort | ||
cohort_id_selection_field | the fhir search param to be used to select the cohort |
To see the available script arguments execute python3 pseudonymisation.py -h
This script pseudonymises FHIR resources according to the psd_config.json
field | description | possible values | default value should be |
---|---|---|---|
psd_name | name of the pseudonymisation - used to specify the filename used to save this pseudonymised data and the filename for the file from wich the resources to be pseudonymised are loaded | ||
input_file_path | path from which to load the resources to be pseudonymised | any path to a folder - best left as extracted_resources | |
psd_file_path | path where pseudonymised resources are to be saved - best left as pseudonymised_resources | ||
select | List of paths to parts of a FHIR resource to be selected into the new pseudonymised resources e.g. "id" - path logic see below this table | ||
change_id | List of ids to replace - contains two subfields id_pool and path_to_id both of which are required if change_id is used | ||
id_pool | the pool of pseudonyms the pseudonym is part of - this ensures that the same Ids are replaced with the same PSD ids | ||
path_to_id | path to the field where to replace the id - path logic see below this table | ||
apply_function | applies a pseudonymisation function to a field note that these have to be implemented in the pseudonymisation.py to be available here - contains two subfields function_to_apply and path_to_field both of which are required if apply_function is used | ||
function_to_apply | specifies the function to apply to a field | ||
path_to_field | path to the field which the function is to be applied to - path logic see below this table |
path logic for pseudonymisation: <field_name or array>.<field_name or array>.<field_name or array> Array can be with index [0] or [] to apply to all entries in array, examples: "id", "resourceType", "diagnosis.[].use", "serviceType", "diagnosis.[*].condition.reference", "subject.reference", "period"
Further it is possible to select array entries based on attribute values (currentl only string values are supported) of the array entry objects, according to this logic [@='<stringValueOfAttribute'], example: "code.coding.[@system='http://fhir.de/CodeSystem/bfarm/icd-10-gm']"
Note - more complex paths are currently not supported
available pseudonymisation functions: obfuscate_date_to_year, obfuscate_date_to_day
To see the available script arguments execute python3 build-transaction-bundle.py -h
env var | description | default value |
---|---|---|
MII_DATA_EXTRACTION_FHIR_BASE_URL | Local FHIR server base url e.g. see default value | http://fhir-server:8080/fhir |
MII_DATA_EXTRACTION_FHIR_USER | Basic auth user for local FHIR server | |
MII_DATA_EXTRACTION_FHIR_PW | Basic auth password for local FHIR server | |
MII_DATA_EXTRACTION_FHIR_TOKEN | auth token for local FHIR server | |
MII_DATA_EXTRACTION_FHIR_PROXY_HTTP | HTTP url for proxy if used for local FHIR server | |
MII_DATA_EXTRACTION_FHIR_PROXY_HTTPS | HTTPS url for proxy if used for local FHIR server | |
MII_DATA_EXTRACTION_SHARE_FHIR_BASE_URL | Local share FHIR server base url where the transfer bundle will be uploaded and which the DSF connects to e.g. see default value | http://fhir-server:8080/fhir |
MII_DATA_EXTRACTION_SHARE_FHIR_USER | Basic auth user for share FHIR server | |
MII_DATA_EXTRACTION_SHARE_FHIR_PW | Basic auth password for share FHIR server | |
MII_DATA_EXTRACTION_SHARE_FHIR_TOKEN | auth token for share FHIR server | |
MII_DATA_EXTRACTION_SHARE_FHIR_PROXY_HTTP | HTTP url for proxy if used for share FHIR server | |
MII_DATA_EXTRACTION_SHARE_FHIR_PROXY_HTTPS | HTTPS url for proxy if used for share FHIR server | |
MII_DATA_EXTRACTION_PROJ_IDENT | Ident of the project you are executing these scripts for | |
MII_DATA_EXTRACTION_ORG_IDENT | DSF ident of your organization | |
MII_DATA_EXTRACTION_PSD_NAMES | prefix names of files to be packaged to a bundle - should match psd_name names of the psd_config.json file for the pseudonymised resources to be bundle | |
MII_DATA_EXTRACTION_STORE_BUNDLE | whether to store the bundle directly on the fhir server, activate by setting env to "--storebundle" | None |
MII_DATA_EXTRACTION_ENCB64 | whether to encode the bundle as base64, activate by setting env to "--encb64" | None |
If you are using a self-signed certificate for your FHIR server add the ca certificate to the "certs" folder and name it custom-ca-bundle.crt
.