Empty affiliation strings from PMCID #1

trangdata · 2020-08-24T15:05:57Z

I'm trying to extract affiliation information from PMCID. For example, for PMC6986235, I tried the following:

from lxml.etree import tostring

art = get_frontmatter_etree_via_api('PMC6986235')
print(tostring(art, encoding = 'unicode'))

Part of the output contains the affiliation of the corresponding author:

<aff id="A1">Georgetown University, Department of Oncology and Lombardi
Comprehensive Cancer Center, Washington, DC, 20007.</aff>

However, when I tried

extract_authors_from_article(art)

all affiliations is empty:

[{'pmcid': 'PMC6986235',
  'position': 1,
  'fore_name': 'Ziling',
  'last_name': 'Fan',
  'corresponding': 0,
  'reverse_position': 3,
  'affiliations': []},
 {'pmcid': 'PMC6986235',
  'position': 2,
  'fore_name': 'Yuan',
  'last_name': 'Zhou',
  'corresponding': 0,
  'reverse_position': 2,
  'affiliations': []},
 {'pmcid': 'PMC6986235',
  'position': 3,
  'fore_name': 'Habtom W.',
  'last_name': 'Ressom',
  'corresponding': 0,
  'reverse_position': 1,
  'affiliations': []}]

It is possible that we can't extract this information because of the way journals deposited the metadata. I just wanted to make sure that there is not a better alternative than skipping these articles entirely.

The text was updated successfully, but these errors were encountered:

dhimmel · 2020-08-28T16:38:01Z

I think the problem is that there is a coded affiliation of A1 but neither of the authors are linked to that affiliation. If you have the author frontmatter XML handy, we could confirm this.

The only workaround I see is to assume that if there's a single affiliation that is not linked to any authors, we could assume it applied to all authors. Not sure how many articles it affects. If it affects many, perhaps this is something we could implement.

However, it would stop working if there were multiple affiliations, since we then couldn't match affiliation to author.

If there's only a single author and many affiliations, we could assume all affiliations applied to the author. Although perhaps there are situations this backfires. Don't know

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty affiliation strings from PMCID #1

Empty affiliation strings from PMCID #1

trangdata commented Aug 24, 2020

dhimmel commented Aug 28, 2020

Empty affiliation strings from PMCID #1

Empty affiliation strings from PMCID #1

Comments

trangdata commented Aug 24, 2020

dhimmel commented Aug 28, 2020