Skip to content
Ken Stevens edited this page Apr 1, 2020 · 9 revisions

This document describes how the proposed new hapi-empi feature will work.

Design Principles

Below are some simplifying principles hapi-empi enforces to reduce complexity and ensure data integrity.

  1. When empi is enabled on a hapi-fhir server, any Person resource in the repository that has the "hapi-empi" tag is now effectively read-only via the FHIR endpoint. These Person resources are managed exclusively by hapi-empi. Users can only directly change them via hapi-empi REST operations. In most cases, users indirectly change them by creating and updating Patient and Practitioner ("Pat/Prac") resources. For the rest of this document, assume "Person" refers to a "hapi-empi" tagged Person resource.

  2. Every ("Pat/Prac") resource in the system is linked to a Person resource unless that Patient or Practitioner has the "no-empi" tag or it has POSSIBLE_MATCH links pending review.

  3. Every Pat/Prac in the system links to at most one Person resource.

  4. The hapi-empi rules define a single identifier system that holds the enterprise id ("EID"). If a Pat/Prac has an EID, then the Person it links to always has the same EID.

  5. Two different Person resources cannot have the same EID.

  6. Pat/Prac resources are only ever compared to Person resources via this EID. For all other matches, Patient resources are only ever compared to Patient resources and Practitioner resources are only ever compared to Practitioner resources.

Links

  1. hapi-empi manages empi-link records ("links") that link a Pat/Prac resource to a Person resource. When these are changed by matching rules, the links are marked as AUTO. When these links are changed manually, they are marked as MANUAL.

  2. Once a link has been manually assigned as NO_MATCH or MATCHED, the system will not change it.

  3. When a new Pat/Prac resource is created/updated then it is compared to all other Pat/Prac resources in the repository. The outcome of each of these comparisons is either NO_MATCH, PROBABLE_MATCH or MATCHED.

  4. Whenever a MATCHED link is established between a Pat/Prac resource and a Person resource, that Pat/Prac is always added to that Person resource links. All MATCHED links have corresponding Person resource links and all Person resource links have corresponding MATCHED empi-link records. You can think of the fields of the empi-link records as extra meta-data associated with each Person.link.target.

Possible rule match outcomes:

When a new Pat/Prac resource is compared with all other resources of that type in the repository, there are four possible cases:

  • CASE 1: No MATCHED and no PROBABLE_MATCHED outcomes -> a new Person resource is created and linked to that Pat/Prac. All fields are copied from the Pat/Prac to the Person. If the incoming resource has an EID, it is copied to the Person. Otherwise a new unique uuid is created and used as the EID.

  • CASE 2: All of the MATCHED Pat/Prac resources are already linked to the same Person -> a new Link is created between the new Pat/Prac and that Person and is set to MATCHED.

  • CASE 3: The MATCHED Pat/Prac resources link to more than one Person -> link the Pat/Prac to the Person belonging to the Pat/Prac it had the highest matching score with. All other Person resources are marked as POSSIBLE_DUPLICATE of this first Person. These duplicates are manually reviewed later and either merged or marked as NO_MATCH and the system will no longer consider them as a POSSIBLE_DUPLICATE going forward.

  • CASE 4: Only PROBABLE_MATCH outcomes -> In this case, empi-link records are created with PROBABLE_MATCH outcome and await manual assignment to either NO_MATCH or MATCHED. Person resources are not changed.

Rules

hapi-empi rules are managed via a single json document. This document contains a version. empi-links derived from these rules are marked with this version. The following configuration is stored in the rules:

  • resourceSearchParams: These define fields which must have at least one exact match before two resources are considered for matching. This is like a list of "pre-searches" that find potential candidates for matches, to avoid the expensive operation of running a match score calculation on all resources in the system. E.g. you may only wish to consider matching two Patients if they either share at least one identifier in common or have the same birthday.
[ {
    "resourceType" : "Patient",
    "searchParam" : "birthdate"
}, {
    "resourceType" : "Patient",
    "searchParam" : "identifier"
} ]
  • filterSearchParams When searching for match candidates, only resources that match this filter are considered. E.g. you may wish to only search for Patients for which active=true.
[ {
    "resourceType" : "Patient",
    "searchParam" : "active",
    "fixedValue" : "true"
} ]
  • matchFields Once the match candidates have been found, they are then each assigned a match vector that marks which fields match. The match vector is determined by a list of matchFields. Each matchField defines a name, distance metric, a success threshold, a resource type, and resource path to check. For example:
{
    "name" : "patient-given",
    "resourceType" : "Patient",
    "resourcePath" : "name.given",
    "metric" : "COSINE",
    "matchThreshold" : 0.8
}
  • weightMap A map which converts combinations of successful matchFields into a scalar score for overall matching of a given pair of resources.

  • noMatchThreshold, matchThreshold: If the weightMap score is above the matchThreshold, it is a MATCH. If between noMatchThreshold and matchThreshold it is a PROBABLE_MATCH. If it is below the noMatchThreshold, it is a NO_MATCH.