You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Certain profiled users have features that aren't properly captured in an identity source. This is especially true for users on our periphery such as alumni and residents. For example, ReCiter doesn't perform well for Harold Varmus, because we have no record in a source system for his time at NIH. We would like to give users the opportunity to assert additional characteristics of themselves and have that be used to improve accuracy.
With an API, we can display features in an external interface and ask users to accept or reject these features.
Note that this output would need to occur against the output of the feature generator. It would also require doing a query against the Identity table. The building of the feedback interface would go in a separate issue.
Process
1. Compute and score the suggested articles for a given person.
2. From the computed articles in the Analysis output where score is greater than the minimum threshold for storing articles and the article is not rejected, identify any of the following features.
a. Institutional affiliations (Scopus Author ID and label)
b. Organizational units (label)
- Look up org units in the ScienceMetrixJournalDepartmentCategory.primaryDepartment field using one of the following against affiliation:
- "Department of " + [primaryDepartment]
- "Division of " + [primaryDepartment]
- "Dept of " + [primaryDepartment]
- any of the other patterns defined in org unit matching
- Look for patterns from org unit synonyms
- Some org units in ScienceMetrixJournalDepartmentCategory are substrings of each other. Match to longest unit if possible.
c. Aliases of target author
Sanitize names using standard function
Identify cases where targetAuthor=TRUE and name is not:
listed among existing names in aliases or primary name.
first name is not one initial, or 2-3 initials all capitals
Dedupe substrings. Prefer longer versions.
d. Aliases of non-target authors
Sanitize names using standard function
Identify cases where targetAuthor=FALSE and name is not:
listed among existing names in aliases or primary name.
first name is not one initial, or 2-3 initials all capitals
Dedupe substrings. Prefer longer versions.
CWID shouldn’t be required for importing additional relationships
For each distinct feature, multiply the average score of the candidate articles associated with that feature times the count raised to some constant N, e.g., 3 x ((8.1 + 12.2 + 13.1)/3)^N
N is averageArticleScore-DisambiguationExponent. Default is 1.5.
4. Determine status of each feature
Options are:
assertedInSystemOfRecord (e.g., determine if feature is already located in Identity table)
Create a disambiguation profile API
Mockup
Business need
Certain profiled users have features that aren't properly captured in an identity source. This is especially true for users on our periphery such as alumni and residents. For example, ReCiter doesn't perform well for Harold Varmus, because we have no record in a source system for his time at NIH. We would like to give users the opportunity to assert additional characteristics of themselves and have that be used to improve accuracy.
With an API, we can display features in an external interface and ask users to accept or reject these features.
Note that this output would need to occur against the output of the feature generator. It would also require doing a query against the Identity table. The building of the feedback interface would go in a separate issue.
Process
1. Compute and score the suggested articles for a given person.
2. From the computed articles in the Analysis output where score is greater than the minimum threshold for storing articles and the article is not rejected, identify any of the following features.
a. Institutional affiliations (Scopus Author ID and label)
b. Organizational units (label)
- Look up org units in the ScienceMetrixJournalDepartmentCategory.primaryDepartment field using one of the following against affiliation:
- "Department of " + [primaryDepartment]
- "Division of " + [primaryDepartment]
- "Dept of " + [primaryDepartment]
- any of the other patterns defined in org unit matching
- Look for patterns from org unit synonyms
- Some org units in ScienceMetrixJournalDepartmentCategory are substrings of each other. Match to longest unit if possible.
c. Aliases of target author
d. Aliases of non-target authors
e. Email address(es)
f. ORCID identifier(s)
3. Compute a score for each feature
4. Determine status of each feature
5. Output features
Notes
See here for some up to data thoughts on possible features.
The text was updated successfully, but these errors were encountered: