Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the way ReCiter handles books #521

Open
paulalbert1 opened this issue Oct 18, 2023 · 1 comment
Open

Update the way ReCiter handles books #521

paulalbert1 opened this issue Oct 18, 2023 · 1 comment
Labels

Comments

@paulalbert1
Copy link
Contributor

paulalbert1 commented Oct 18, 2023

Scope

Approximately 0.1% of records in PubMed are for books although this has increased in the past year.

Screenshot 2023-10-18 at 10 43 34 AM

Data model

Books have a different data model. Key differences include:

Description Book XML Attribute Journal Article XML Attribute
Publication Type <PublicationType>Book [Chapter]</PublicationType> <PublicationType>Journal Article</PublicationType>
Source Title <BookTitle> <JournalTitle>
Identifier (ISBN/ISSN) <ISBN> <ISSN>
Publisher <Publisher> N/A
Publication Place <PlaceOfPublication> N/A
Authors <AuthorList><Author>...</Author></AuthorList> same
Editors (for books) <EditorList><Editor>...</Editor></EditorList> N/A
Pagination <PageRange> (especially for chapters) <MedlinePgn>
Publication Frequency N/A Could be inferred from <JournalIssue><PubFrequency>
DOI <ELocationID EIdType="doi">...</ELocationID> same
Abstract <AbstractText>...</AbstractText> (sometimes omitted) same

Effect

The inconsistent data model causes chaos. For example, for personIdentifier = tme2002 and PMID = 34818336 (see also API), the wrong authors are listed. What probably is occurring is that the author list if shifting by one.

Screenshot 2023-10-18 at 10 37 35 AM Screenshot 2023-10-18 at 10 34 20 AM

Another example: mtoth and 21204454:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=21204454&retmode=xml

Options

  1. Update ReCiter PubMed Retrieval Tool not to return books. We could do this like so: cole c[au] NOT (booksdocs[Filter])

  2. Update data model across the projects to handle books:

  • ReCiter PubMed Retrieval Tool
  • ReCiter
  • ReCiterDB
  • ReCiter Publication Manager
  1. Exclude books from ReCiter Feature Generator and Article Retrieval output.
  • Approach 1: Exclude cases where PublicationType = Book [Chapter], or
  • Approach 2: Require JournalTitle attribute
  • Include a flag in application.properties to exclude books
@paulalbert1 paulalbert1 changed the title Exclude books from output Update the way ReCiter handles books Oct 18, 2023
@paulalbert1
Copy link
Contributor Author

I'm not sure this is still an issue.
Screenshot 2023-10-22 at 12 00 35 PM

Screenshot 2023-10-22 at 12 01 07 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant