Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a new Candidate Article Retrieval API #438

Open
paulalbert1 opened this issue Oct 28, 2020 · 0 comments
Open

Create a new Candidate Article Retrieval API #438

paulalbert1 opened this issue Oct 28, 2020 · 0 comments

Comments

@paulalbert1
Copy link
Contributor

paulalbert1 commented Oct 28, 2020

Background

Those who wish to do clustering need to use Java to retrieve existing candidate records for a user. To facilitate alternate clustering approaches, we should allow third parties to retrieve these candidate records using an API service.

Requirements

Please create a new API called Candidate Article Retrieval, which has this URL: /reciter/candidate-article-retrieval/by/uid. It should be in the re-citer-controler.

Parameters should be personIdentifier, useGoldStandard, and retrievalRefreshFlag.

The articles returned should be consistent with what is returned by Feature Generator API except in this case we are returning all candidate records irrespective of score (the scoring phase which hasn't happened yet).

retrievalRefreshFlag and useGoldStandard should behave the same as they do for Feature Generator API. In the case of the latter, the value for useGoldStandard can change whether records stored in the GoldStandard table are also included.

The resulting data should be stateless, meaning that these data are not saved anywhere.

Authentication can be done using the same key that authenticates for Feature Generator API.

Sample data

Here's the proposed output for the user, meb7002, which returns 1 of the 42 articles it would customarily return.

{
  "personIdentifier": "meb7002",
  "dateRun": "2020-10-27T22:14:44.038+00:00",
  "countArticles": 42,
  "reCiterArticleFeatures": [
    {
      "pmid": 24694772,
      "pmcid": "PMC4180817",
      "publicationDateDisplay": "2014 Mar 30",
      "publicationDateStandardized": "2014-03-30",
      "datePublicationAddedToEntrez": "2014-04-04",
      "doi": "10.1016/j.jbi.2014.03.013",
      "publicationType": {
        "publicationTypeCanonical": "Academic Article",
        "publicationTypePubMed": [
          "Journal Article"
        ],
        "publicationTypeScopus": {
          "publicationTypeScopusAbbreviation": "ar",
          "publicationTypeScopusLabel": "Article"
        }
      },
      "timesCited": 6,
      "citesCitedBy": {
          "cites": ["25046832","20016547","23362505","20072710","19567789","14728536","20083443"],
          "citedBy": ["25046832","29339930"],
          "type": "pmid"
        }
      },
      "publicationAbstract": "OBJECTIVE: Publications are a key data source for investigator profiles and research networking systems. We developed ReCiter, an algorithm that automatically extracts bibliographies from PubMed using institutional information about the target investigators. METHODS: ReCiter executes a broad query against PubMed, groups the results into clusters that appear to constitute distinct author identities and selects the cluster that best matches the target investigator. Using information about investigators from one of our institutions, we compared ReCiter results to queries based on author name and institution and to citations extracted manually from the Scopus database. Five judges created a gold standard using citations of a random sample of 200 investigators. RESULTS: About half of the 10,471 potential investigators had no matching citations in PubMed, and about 45% had fewer than 70 citations. Interrater agreement (Fleiss' kappa) for the gold standard was 0.81. Scopus achieved the best recall (sensitivity) of 0.81, while name-based queries had 0.78 and ReCiter had 0.69. ReCiter attained the best precision (positive predictive value) of 0.93 while Scopus had 0.85 and name-based queries had 0.31. DISCUSSION: ReCiter accesses the most current citation data, uses limited computational resources and minimizes manual entry by investigators. Generation of bibliographies using named-based queries will not yield high accuracy. Proprietary databases can perform well but requite manual effort. Automated generation with higher recall is possible but requires additional knowledge about investigators.",
      "articleKeywords": [
        {
          "keyword": "Abstracting and Indexing",
          "type": "MESH_MAJOR",
          "count": null
        },
        {
          "keyword": "Algorithms",
          "type": "MESH_MAJOR",
          "count": 199455
        },
        {
          "keyword": "Authorship",
          "type": "MESH_MAJOR",
          "count": 5259
        },
        {
          "keyword": "Data Mining",
          "type": "MESH_MAJOR",
          "count": 12200
        },
        {
          "keyword": "Natural Language Processing",
          "type": "MESH_MAJOR",
          "count": 3223
        },
        {
          "keyword": "Pattern Recognition, Automated",
          "type": "MESH_MAJOR",
          "count": 35284
        },
        {
          "keyword": "PubMed",
          "type": "MESH_MAJOR",
          "count": 39308
        }
      ],
       "journalCategory":  {
          "journalCategoryID": 36,
          "journalCategoryLabel": "Medical Informatics"
        },
        "grantIdentifiers": [
          "UL1 RR024996",
          "UL1 TR000040",
          "UL1 TR000457"
        ]
      },
      "scopusDocID": "84907990604",
      "journalTitleVerbose": "Journal of biomedical informatics",
      "issn": [
        {
          "issntype": "Electronic",
          "issn": "1532-0480"
        },
        {
          "issntype": "Linking",
          "issn": "1532-0464"
        }
      ],
      "journalTitleISOabbreviation": "J Biomed Inform",
      "articleTitle": "Automatic generation of investigator bibliographies for institutional research networking systems.",
      "reCiterArticleAuthorFeatures": [
        {
          "rank": 1,
          "lastName": "Johnson",
          "firstName": "Stephen B",
          "initials": "S",
          "affiliations": {
            "affiliationStatementLabel": "Department of Public Health, Weill Cornell Medical College, New York, United States. Electronic address: johnsos@med.cornell.edu.",
            "affiliationStatementLabelSource": "PUBMED",
            "affiliationInstitutions": [
              {
                "affiliationInstitutionLabel": "Weill Cornell Medicine",
                "affiliationInstitutionId": 60007997,
                "affiliationInstitutionSource": "SCOPUS"
              }
            ]
          },
          "email": "johnsos@med.cornell.edu",
          "targetAuthor": false
        },
        {
          "rank": 2,
          "lastName": "Bales",
          "firstName": "Michael E",
          "initials": "M",
          "affiliations": {
            "affiliationStatementLabel": "Department of Biomedical Informatics, Columbia University, New York, United States.",
            "affiliationStatementLabelSource": "PUBMED",
            "affiliationInstitutions": [
              {
                "affiliationInstitutionLabel": "Columbia University in the City of New York",
                "affiliationInstitutionId": 60030162,
                "affiliationInstitutionSource": "SCOPUS"
              }
            ]
          },
          "targetAuthor": true
        },
        {
          "rank": 3,
          "lastName": "Dine",
          "firstName": "Daniel",
          "initials": "D",
          "affiliations": {
            "affiliationStatementLabel": "Department of Biomedical Informatics, Columbia University, New York, United States; The Irving Institute for Clinical and Translational Research, Columbia University, New York, United States.",
            "affiliationStatementLabelSource": "PUBMED",
            "affiliationInstitutions": [
              {
                "affiliationInstitutionLabel": "Columbia University in the City of New York",
                "affiliationInstitutionId": 60030162,
                "affiliationInstitutionSource": "SCOPUS"
              }
            ]
          },
          "targetAuthor": false
        },
        {
          "rank": 4,
          "lastName": "Bakken",
          "firstName": "Suzanne",
          "initials": "S",
          "affiliations": {
            "affiliationStatementLabel": "Department of Biomedical Informatics, Columbia University, New York, United States; The Irving Institute for Clinical and Translational Research, Columbia University, New York, United States.",
            "affiliationStatementLabelSource": "PUBMED",
            "affiliationInstitutions": [
              {
                "affiliationInstitutionLabel": "Columbia University in the City of New York",
                "affiliationInstitutionId": 60030162,
                "affiliationInstitutionSource": "SCOPUS"
              }
            ]
          },
          "targetAuthor": false
        },
        {
          "rank": 5,
          "lastName": "Albert",
          "firstName": "Paul J",
          "initials": "P",
          "affiliations": {
            "affiliationStatementLabel": "Samuel J. Wood Library, Weill Cornell Medical College, New York, United States.",
            "affiliationStatementLabelSource": "PUBMED",
            "affiliationInstitutions": [
              {
                "affiliationInstitutionLabel": "Weill Cornell Medicine",
                "affiliationInstitutionId": 60007997,
                "affiliationInstitutionSource": "SCOPUS"
              }
            ]
          },
          "targetAuthor": false
        },
        {
          "rank": 6,
          "lastName": "Weng",
          "firstName": "Chunhua",
          "initials": "C",
          "affiliations": {
            "affiliationStatementLabel": "Department of Biomedical Informatics, Columbia University, New York, United States; The Irving Institute for Clinical and Translational Research, Columbia University, New York, United States.",
            "affiliationStatementLabelSource": "PUBMED",
            "affiliationInstitutions": [
              {
                "affiliationInstitutionLabel": "Columbia University in the City of New York",
                "affiliationInstitutionId": 60030162,
                "affiliationInstitutionSource": "SCOPUS"
              }
            ]
          },
          "targetAuthor": false
        }
      ],
      "volume": "51",
      "pages": "8-14"
    }
  ]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants