Skip to content

RMassBank development task list

meowcat edited this page Jan 10, 2023 · 8 revisions

This page is supposed to be a loose collection of development proposals for RMassBank, which will be discussed in an upcoming dev meeting.

Use a h2 tag for every proposal. Descriptions may be as long or short as needed; make subsections with h3 if required.

Scope:

  • narrow are incremental changes in the code, fixing bugs in a single function, etc.
  • medium is the addition of new functionality, rewrites affecting a small number of functions
  • broad regards changes that affect a large part of the code. This does not mean that this needs to be an all-or-nothing task (e.g. linting can be done incrementally).

Risk:

  • minimal: This will be a Pareto improvement, hurting no one and helping at least one person. Also will not make the package more complex.
  • medium: This is a change, but is unlikely to break existing functionality. Might need additional documentation and clarification
  • high: This could
    • change existing behaviour for the same input
    • change the user interface
    • change an implementation in a way that needs to be verified (and since we don't have unit tests, needs to be done by hand)
    • require changes that "make things worse" before "making things better" later

State:

  • idea: proposal before detailed consideration, something to think about: The author does not necessarily already think this should be done. Mainly for collection / triage.
  • proposal: The author thinks this should be done eventually. This should be discussed at the next dev meeting
  • rejected proposal: Was discussed at a meeting and rejected.
  • goal: Was discussed at a meeting and accepted as something that should eventually be done with no specific priorization.
  • task: Was discussed at a meeting and specifically added to the list of things that are being worked on

After the task state, items should go to issues & milestones.

There is no automatic TOC here apparently, except for the bar at the right side :(

Linting the sourcecode

  • Author: meowcat
  • Scope: broad
  • Risk: minimal
  • State: task to be performed incrementally

Over the years we have amassed a lot of codestyle inconsistencies (and some of the things were bad to start with). We should fix bad practices.

  • I propose to run lintr and fix everything it suggests, with prio 1 to the actual package code (R directory).
  • Default lintr (and tidyverse, and most of modern R) style is snake_case. However Bioconductor style is camelCase for functions (and CamelCaps for S4 classes) and I suggest we stick with this because all our user-facing functions are camelCase. A lot of internal variables are snake_case or some other inconsistent mess, we should clean those up.
  • There is https://lcolladotor.github.io/biocthis/reference/bioc_style.html which might work.

Moving functionality to dplyr and purrr

  • Author: meowcat
  • Scope: broad
  • Risk: medium if done incrementally.
  • State: task to be performed incrementally

An awful lot of RMassBank functionality is a bad implementation of what Tidyverse already does well, very crufty ways of doing the equivalent of mutate, bind_rows, map. We could cut out a lot of code and make the code much more readable if we rewrote the code to use tidyverse functionality.

  • The one drawback is that we would add a dependency, since we technically don't yet depend on dplyr and purrr. However these packages are nearly globally installed nowadays.
  • Ideally some test is added before making the change.

Unit tests

  • Author: meowcat
  • Scope: broad
  • Risk: minimal
  • State: goal

Old topic, but still not done. We should test more of our code.

Adding precursor and fragment EIC extraction / correlation

  • Author: meowcat
  • Scope: medium
  • Risk: medium
  • State: proposal, was IMO not clearly decided upon

At Eawag/Uchem and at ETH, we already use wrapper scripts that extract EICs for all fragments and measure the correlation to the precursor, this is a second criterion in addition to formula assignment to get better discrimination between noise and signal. The current way is very hacky and uses attributes to tack stuff into the existing S4 objects.

I suggest to move this as an option to msmsRead. The data should go directly into the child spectra, probably as a List column in the properties dataframe.

Match collision energy from raw data

  • Scope: narrow
  • Risk: medium
  • Author: meowcat
  • State: proposal, was IMO not clearly decided upon

Existing issue: https://github.com/MassBank/RMassBank/issues/316

Packing issues into milestones

I have good experience with using milestones to get things done and to focus on important issues. Issue hoping is not very efficient.

  • Author: tsufz
  • Scope: broad
  • Risk: low

Use templates to render spectra

  • Scope: medium
  • Risk: low
  • Author: meowcat
  • State: idea

A quick POC shows that we could use jinjar to render spectra using templates rather than by a handmade writer. This is not urgently needed and just one more alternative to using Spectra-based IO, but worth a consideration.