Skip to content

Tasks, People and Roles

Remko Popma edited this page Apr 15, 2020 · 9 revisions

Tasks, People and Roles

This is a volunteer project, where most people have "day activities" which take precedence. It is understood that contributors give what they can, when they can and this project and the world are very grateful.

Timescales are unpredictable. We are aiming to duplicate our experience with volunteer-based projects like https://en.wikipedia.org/wiki/Blue_Obelisk.

Vision

To create new Open scientific/medical knowledge relevant to the COVID-19 pandemic that will be used by others for:

  • policy
  • other science (especially data-driven)
  • education and learning for everyone

With limited, but smart, resources our main impact will be:

  • using knowledge from a wide range of disciplines, not just biomedical. Social science, psychology, economics, literature, etc. as well as maths, phys, chemistry, materials, biomedical.
  • welcoming collaborators from round the world
  • giving readers power to define the knowledge they want (e.g. creating dictionaries)
  • helping citizens demand that all scholarly research be free to everyone

How we work

We have common goals:

  • to download and index the whole of the world's relevant literature.
  • to build crawlers and readers that make this trivial for users
  • to transform legacy documents (PDF) into structured XHTML
  • to build a distributed dictionary system that covers all relevant subjects
  • to search and extract snippets of text (and other knowledge) that can be indexed and aggregated.
  • to enhance the reading of the literature through annotation.
  • to build reusable resources for search, education, machines
  • to interface with other communities (Wikimedia, R, Jupyter, etc.)

We have managed to identify important areas which are self-contained and continuous (i.e. every contribution matters and enhances the project, but where none are blocking. This is mainly because we have multiple inputs (e.g. corpora, sites), multiple dictionaries (knowledge facets), and multiple outputs, and multiple distribution routes. There are general tasks (documentation, tutorials, outreach) which are continuous. There is no blame, and we do not rely too heavily on other colleagues' contributions.

Current Tasks and People

This section is highly mutable!

Workflow

The workflow is basically:

  • inputs crawl and read sources, and normalize them
  • transform search them with dictionaries, possibly including transforms
  • outputs display the results

These are largely separable as information flows downwards and at each stage is captured in filestore. That means that a developer "only" needs to be able to read from a standard file type, and output to another filetype.

Infrastructure: RP, PMR

Inputs

  • EuropePMC, our main workhorse (works): no one
  • biorxiv and medrxiv (prototype): PMR
  • theses: AJ
  • journals / scrapers: LH
  • DOAJ abstracts: CD

Transforms and searches

  • AMI: RP , PMR
  • Solr: CD, AJ

Outputs

  • R: TS
  • display: CD

Dictionaries

  • creation , maintenance, documentation: RL, PMR
  • wikimedia: TS

Testing

Everyone :)

Clone this wiki locally