-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understand if current pipeline is operational #23
Comments
First step fails for me:
This is on a MacPro with a local installation of getpapers etc. - I suspect I need to set some larger limit on the memory node is permitted to use or something. |
Hmmm, sorry @andreww I'm not sure what to suggest for that. |
Notes from May 21: Step 1: find full-texts containing 'search-phrase'Use getpapers to mine for ‘github.com’ and find the full-texts - Naomi's notebook at https://github.com/softwaresaved/code-cite/blob/master/notebooks/getpapers.md I did this at the collabw18 hackday (March 28), not repeated for this test, but note Andrew's report above. Output data is in /data folder in github repo and locally (~/Github/collabw18/code-cite/code-cite/data). Step 2: process full-texts --> JSON data with 'search-term' URLsExtract URLs from full-texts and create JSON data structure (article DOI, URLs, pubdate) - Neil's notebook at https://github.com/softwaresaved/code-cite/blob/master/notebooks/EuPMCCodeReferences.ipynb.
Bug (reported June 11, issue #27): if there is a folder without xml or JSON, or a JSON file without a doi (assumption), script --> errors, cannot run past ln[21]:
For following, using the subset data JSON output Step 3: analyse results dataStep 3: Use JSON data from step 2 to check if URLs resolve, and if repositories have documentation, license files, etc. using Andrew's notebook at https://github.com/softwaresaved/code-cite/blob/master/notebooks/resolvre_and_check_resources.ipynb [npscience notes: This requires a github token. Generate on github. What scope? guessed only repo/public_repo. Notebook amended for file location, manually enter github token.] Results (head):
So this works. How to output this nicely? / end May 21 notes Next: Step 4: Visualise!Note the web app is at https://github.com/softwaresaved/code-cite-app |
Run through entire pipeline as it stands locally to find bugs / see if it works.
The text was updated successfully, but these errors were encountered: