-
Notifications
You must be signed in to change notification settings - Fork 17
Tasks, People and Roles
This is a volunteer project, where most people have "day activities" which take precedence. It is understood that contributors give what they can, when they can and this project and the world are very grateful.
Timescales are unpredictable. We are aiming to duplicate our experience with volunteer-based projects like https://en.wikipedia.org/wiki/Blue_Obelisk.
To create new Open scientific/medical knowledge relevant to the COVID-19 pandemic that will be used by others for:
- policy
- other science (especially data-driven)
- education and learning for everyone
With limited, but smart, resources our main impact will be:
- using knowledge from a wide range of disciplines, not just biomedical. Social science, psychology, economics, literature, etc. as well as maths, phys, chemistry, materials, biomedical.
- welcoming collaborators from round the world
- giving readers power to define the knowledge they want (e.g. creating dictionaries)
- helping citizens demand that all scholarly research be free to everyone
We have common goals:
- to download and index the whole of the world's relevant literature.
- to build crawlers and readers that make this trivial for users
- to transform legacy documents (PDF) into structured XHTML
- to build a distributed dictionary system that covers all relevant subjects
- to search and extract snippets of text (and other knowledge) that can be indexed and aggregated.
- to enhance the reading of the literature through annotation.
- to build reusable resources for search, education, machines
- to interface with other communities (Wikimedia, R, Jupyter, etc.)
We have managed to identify important areas which are self-contained and continuous (i.e. every contribution matters and enhances the project, but where none are blocking. This is mainly because we have multiple inputs (e.g. corpora, sites), multiple dictionaries (knowledge facets), and multiple outputs, and multiple distribution routes. There are general tasks (documentation, tutorials, outreach) which are continuous. There is no blame, and we do not rely too heavily on other colleagues' contributions.
This section is highly mutable!
The workflow is basically:
- inputs crawl and read sources, and normalize them
- transform search them with dictionaries, possibly including transforms
- outputs display the results
These are largely separable as information flows downwards and at each stage is captured in filestore. That means that a developer "only" needs to be able to read from a standard file type, and output to another filetype.
Infrastructure: RP, PMR
- EuropePMC, our main workhorse (works): no one
-
biorxiv
andmedrxiv
(prototype): PMR - theses: AJ
- journals / scrapers: LH
- DOAJ abstracts: CD
- AMI: RP , PMR
- Solr: CD, AJ
- R: TS
- display: CD
- creation , maintenance, documentation: RL, PMR
- wikimedia: TS
Everyone :)