Skip to content

Commit

Permalink
Merge pull request #68 from dvklopfenstein/dev
Browse files Browse the repository at this point in the history
Update apikey; If rsp is not json, text, or asn.1, return dict from xml format
  • Loading branch information
dvklopfenstein authored Feb 5, 2024
2 parents 6376737 + 1ea0583 commit a6bcf3a
Show file tree
Hide file tree
Showing 19 changed files with 680 additions and 592 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#
.eutilsrc
/icite
/notebooks/icite

# Word
~*
Expand Down
48 changes: 10 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Contact: dvklopfenstein@protonmail.com
* [**2) Forward citation search**](https://github.com/dvklopfenstein/pmidcite#2-forward-citation-search): following a paper's *Cited by* links or *Forward snowballing*
* [**3) Backward citation search**](https://github.com/dvklopfenstein/pmidcite#3-backward-citation-search): following the links to a paper's references or *Backward snowballing*
* [**4) Summarize a group of citations**](https://github.com/dvklopfenstein/pmidcite#4-summarize-a-group-of-citations)
* [**5) Search PubMed from the command line**](https://github.com/dvklopfenstein/pmidcite/blob/main/README.md#5-download-citations-for-all-papers-returned-from-a-pubmed-search)
* [**5) Download citations for all papers returned from a PubMed search**](https://github.com/dvklopfenstein/pmidcite/blob/main/README.md#5-download-citations-for-all-papers-returned-from-a-pubmed-search)
* ***Examples in Jupyter notebooks using the *pmidcite* Python library***
* [**1) Download NIH-OCC citation data**](https://github.com/dvklopfenstein/pmidcite/blob/main/notebooks/NIHOCC_data_download_always.ipynb)
* [**2) Download missing or load existing NIH-OCC citation data**](https://github.com/dvklopfenstein/pmidcite/blob/main/notebooks/NIHOCC_data_download_or_import.ipynb)
Expand Down Expand Up @@ -77,53 +77,25 @@ or
```$ icite -H; icite 26032263 -r | sort -k6 -r```

## 4) Summarize a group of citations
* 4a) Examine a paper with PMID `30022098`. Print the column headers(`-H`):
`icite -H 30022098`
* 4b) Download the details about each paper(`-c`) that cites `30022098` into a file(`-o goatools_cites.txt`):
`icite 30022098 -c -o goatools_cites.txt`
* 4c) Summarize the overall performace of the 300+ citing papers contained in `goatools_cites.txt`
`summarize_papers goatools_cites.txt -p TOP CIT CLI`

### 4a) Examine a paper with PMID `30022098`. Print the column headers(`-H`):
```
$ icite -H 30022098
COL 2 3 4 5 6 7 8 9 10 au[11](authors)
TYP PMID RP HAMCc % G YEAR cit cli ref au[00](authors) title
TOP 30022098 R. .A..c 100 4 2018 318 1 23 au[14](D V Klopfenstein) GOATOOLS: A Python library for Gene Ontology analyses.
```

Paper with PMID `30022098` is cited by `318`(`cit`) other research papers and `1`(`cli`) clinical study. It has `23` references(`ref`).

### 4b) Download the details about each paper(`-c`) that cites `30022098` into a file(`-o goatools_cites.txt`):
Create a file containing numerous PMIDs annotated with icite info
```
$ icite 30022098 -c -o goatools_cites.txt
WROTE: goatools_cites.txt
```

The requested paper (PMID=`30022098`) is described in one one line in `goatools_cites.txt`:
Count the number of lines in the file
```
$ grep TOP goatools_cites.txt
TOP 30022098 R. .A..c 100 4 2018 318 1 23 au[14](D V Klopfenstein) GOATOOLS: A Python library for Gene Ontology analyses.
$ wc -l goatools_cites.txt
468 goatools_cites.txt
```

The paper (PMID=`30022098`) is cited by 381(`CIT`) research papers plus 1(`CLI`) clinical study:
Summarize the papers in "goatools_cites.txt"
```
$ grep CIT goatools_cites.txt | wc -l
318
$ grep CLI goatools_cites.txt | wc -l
1
$ sumpaps goatools_cites.txt
i=026.9% 4=003.0% 3=018.9% 2=028.8% 1=015.9% 0=006.5% 6 years:2018-2024 465 papers goatools_cites.txt
```

### 4c) Summarize all the papers in `goatools_cites.txt`
**NEW FUNCTIONALITY; INPUT REQUESTED: What would you like to see?** [Open an issue](https://github.com/dvklopfenstein/pmidcite/issues) to comment.
```
$ summarize_papers goatools_cites.txt -p TOP CIT CLI
i=033.4% 4=003.4% 3=020.9% 2=021.9% 1=015.9% 0=004.4% 4 years:2018-2022 320 papers goatools_cites.txt
```

* Output is on one line so many files containing sets of PMIDs may be compared. TBD: Add multiline verbose option.
* The output is on one line so many files containing sets of PMIDs may be compared
* The groups are from newest(`i`) to top-performing(`4`), great(`3`), very good(`2`), and overlooked(`1` and `0`)
* The percentages of papers in `goatools_citations.txt` in each group follow the group name


## 5) Download citations for all papers returned from a PubMed search
Expand Down
4 changes: 3 additions & 1 deletion makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ install:
pip3 install .

py:
find src -name \*.py
find src -name \*.py | grep -v icite

e:
find src/pmidcite/eutils -name \*.py
Expand Down Expand Up @@ -96,6 +96,8 @@ clean_build:

pyc:
find . -name __pycache__ -type d | xargs rm -rf
find . -name .ipynb_checkpoints | xargs rm -rf
rm -rf notebooks/icite; mkdir notebooks/icite

clean:
make pyc
Expand Down
43 changes: 22 additions & 21 deletions notebooks/NIHOCC_data_download_always.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
"output_type": "stream",
"text": [
"PMID RP HAMCc % G YEAR cit cli ref au[00](authors) title\n",
"22882545 .. .A... 58 2 2013 26 0 24 au[03](P J N de Bruyn) Killer whale ecotypes: is there a global model?\n"
"22882545 .. .A... 64 2 2013 37 0 24 au[03](P J N de Bruyn) Killer whale ecotypes: is there a global model?\n"
]
}
],
Expand Down Expand Up @@ -93,9 +93,9 @@
" YEAR/citations/references section:\n",
" ----------------------------------\n",
" YEAR: The year the article was published\n",
" x: Total of all unique articles that have cited the paper, including clinical articles\n",
" y: Number of unique clinical articles that have cited the paper\n",
" z: Number of references\n",
" cit: Total of all unique articles that have cited the paper, including clinical articles\n",
" cli: Number of unique clinical articles that have cited the paper\n",
" ref: Number of references\n",
" au[A]: A is the number of authors\n"
]
}
Expand All @@ -120,9 +120,9 @@
"name": "stdout",
"output_type": "stream",
"text": [
"31461780 R. .A... -1 i 2020 1 0 0 au[06](Robert L Pitman) Enigmatic megafauna: type D killer whale in the Southern Ocean.\n",
"22882545 .. .A... 58 2 2013 26 0 24 au[03](P J N de Bruyn) Killer whale ecotypes: is there a global model?\n",
"20050301 R. .A... 72 2 2009 53 0 25 au[05](Andrew D Foote) Ecological, morphological and genetic divergence of sympatric North Atlantic killer whale populations.\n"
"31461780 R. .A... 8 1 2020 1 0 0 au[06](Robert L Pitman) Enigmatic megafauna: type D killer whale in the Southern Ocean.\n",
"22882545 .. .A... 64 2 2013 37 0 24 au[03](P J N de Bruyn) Killer whale ecotypes: is there a global model?\n",
"20050301 R. .A... 71 2 2009 58 0 25 au[05](Andrew D Foote) Ecological, morphological and genetic divergence of sympatric North Atlantic killer whale populations.\n"
]
}
],
Expand Down Expand Up @@ -156,30 +156,31 @@
" authors ['P J N de Bruyn', 'Cheryl A Tosh', 'Aleks Terauds']\n",
" journal Biol Rev Camb Philos Soc\n",
" is_research_article False\n",
" relative_citation_ratio 1.23\n",
" nih_percentile 58.2\n",
" relative_citation_ratio 1.47\n",
" nih_percentile 64.4\n",
" human 0.0\n",
" animal 1.0\n",
" molecular_cellular 0.0\n",
" apt 0.05\n",
" is_clinical False\n",
" citation_count 26\n",
" citations_per_year 3.25\n",
"expected_citations_per_year 2.6331107075213147\n",
" field_citation_rate 5.228895376386997\n",
" citation_count 37\n",
" citations_per_year 3.3636363636363638\n",
"expected_citations_per_year 2.29476864413001\n",
" field_citation_rate 5.230408205818978\n",
" provisional False\n",
" x_coord 0.8660254037844386\n",
" y_coord -0.5\n",
" cited_by_clin []\n",
" cited_by [31230140, 25297864, 31215081, 29895580, 31631360, 26937049, 31131963, 30992478, 25244680, 27336480, 30051821, 27147024, 25052415, 29692289, 31120038, 24383934, 27039511, 25883362, 29876075, 28666015, 29272275, 25738698, 27923044, 33798257, 27804965, 28371192]\n",
" cited_by [31230140, 25297864, 35233242, 37055915, 31215081, 29895580, 31631360, 26937049, 34750442, 31131963, 37839906, 37284666, 30992478, 25244680, 27336480, 30051821, 27147024, 25052415, 35815600, 29692289, 35472428, 31120038, 24383934, 27039511, 25883362, 29876075, 28666015, 29272275, 25738698, 27923044, 36917944, 33798257, 37339590, 37591692, 27804965, 28371192, 38179079]\n",
" references [19912451, 15791540, 11729317, 20413674, 17400573, 13679915, 19919590, 19755531, 22073275, 21241391, 18524738, 12137576, 20810427, 18345862, 28313404, 9542159, 21949818, 17395829, 21757487, 22031725, 19451116, 20050301, 14526101, 18481536]\n",
" doi 10.1111/j.1469-185X.2012.00239.x\n",
" last_modified 01/28/2024, 13:07:22\n",
" nih_group 2\n",
" num_auth 3\n",
" num_clin 0\n",
" num_cite 26\n",
" num_cites_all 26\n",
" nih_perc 58\n",
" num_cite 37\n",
" num_cites_all 37\n",
" nih_perc 64\n",
" num_refs 24\n"
]
}
Expand All @@ -193,13 +194,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (C) 2019-present, DV Klopfenstein. All rights reserved."
"Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -213,9 +214,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}
Loading

0 comments on commit a6bcf3a

Please sign in to comment.