Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RibonanzaNet-SS, folding util extraction #748

Merged
merged 19 commits into from
Jul 8, 2024
Merged

RibonanzaNet-SS, folding util extraction #748

merged 19 commits into from
Jul 8, 2024

Conversation

luxaritas
Copy link
Member

@luxaritas luxaritas commented May 25, 2024

Summary

  • Add support for the RibonanzaNet-SS folding engine
  • Extract some generic folding utilities (pUnpaired, threshknot) into FoldUtil for easier reuse (and readability)

Implementation Notes

  • RibonanzaNet-SS inference setup was based on https://www.kaggle.com/code/shujun717/ribonanzanet-2d-structure-inference/
  • We've introduced a new asset type, onnx (a standard format for ML models). Particularly notable is that our jest setup has to actually have imports of this asset provide its path so that it can be loaded to run in tests. I've changed all generic assets to do this (instead of just being an empty module) in case something else in the future requires the assets to be actually loaded.
  • onnxruntime (responsible for doing the actual model inference) can only be run async, so we've had to ensure all codepaths which trigger folding operations can operate asynchronously (or, in the case of scripts, error when attempting to run synchronously). We use the existing PoseOp mechanism (originally built for allowing multistrand folding to yield to the main thread intermittently when not run from scripts) to do this, with some slight modifications.
  • We have introduced new sometimes-sync-sometimes-async codepaths with function overloads, where a boolean determines whether it is marked async and returns a promise or is not marked async and returns the value directly. The actual implementation is always an async function, however the JS spec specifies that if no await is ever encountered, the function will run synchronously.
  • When dot plot calculations are needed for constraints, instead of triggering the calculation in the constraint evaluate function (which has to run synchronously), we set a property on the constraint so that we can queue the calculation during the folding process (so that the data is already available when evaluate is run). We've also introduced the skipMelt parameter to updateMeltingPointAndDotPlot - this way if we're only running this update for the constraints, we don't have to run the extra fold/dot plot operations needed for the melt plot (which no constraints use)
  • Unfortunately we can't make folding multithreaded, even though onnxruntime supports it (and we'd potentially gain something like a 2x speedup), as that requires a cross origin isolated browsing context, which we can't set up because of YouTube and Vimeo embeds :(

Testing

  • New unit tests
  • Verified a couple samples were consistent with the kaggle notebook
  • Misc interactive testing, including user feedback via the forums

@luxaritas luxaritas changed the title Feat/rnnet ss RibonanzaNet-SS, specbox in puzzlemaker, lib instructions cleanup, folding util extraction May 25, 2024
@luxaritas
Copy link
Member Author

luxaritas commented May 25, 2024

TODOs

RNNet-SS Behavior

  • Updated mask diagonal handling (for 2nt hairpins)
  • Revisit higher-order pseudoknot limit (add letter-based pairs, error if cardinality is too high)
  • Investigate quantization (Deferred - planning on investigating this + distillation with Shujun later)

Metrics

  • Fix energies still showing in puzzlemaker (Consistently disable energies when unavailable #757)
  • eF1 (+ eF1,crossed-pair)
  • eF1 (+ eF1,crossed-pair) constraints (Deferred - don't need this right now, need to think through how this behaves with other engines)
  • Investigate dot plot being largely gray instead of white
  • Fix dot plots missing target? (Deferred - need to figure out how to handle the fact this has caused an inconsistency in the data exposed to EternaScript)

Engine implementation and access

  • Change path to rnnet onnx file to engines-bin
  • Lazy-load/initialize rnnet on first usage
  • Async APIs for Eternascript
  • Add tests (fold, dot plot, eF1)

Asset handling

  • Look into enabling secure context to enable threading (EDIT: Looks like we're out of luck)
  • Ensure wasm/onnx/ort files are cached
  • Investigate loading bar (Deferred - lazy-loading is sufficient for now)
  • Download confirmation? (Deferred - lazy-loading is sufficient for now)
  • Investigate splitting out sourcemaps (Improve sourcemap behavior #760)
  • Determine whether we need a "minimal" onnxruntime build (or maybe import from /wasm?) onnxruntime appears to be a relatively limited portion of our bundle size. I tried importing from /wasm and it saved like 300k, but that's not really that meaningful in context, so I'm leaving it as-is so that we won't have to change it back later if we can eventually enable GPU acceleration.

luxaritas added a commit that referenced this pull request Jun 25, 2024
… + config cleanup (#752)

## Summary
This PR does two related things (each of which is required for the
other):
1) Instead of using `require` to synchronouosly get references to assets
(triggering webpack to either
inline the asset as a data URL or provide the dist output URL), we now
use the pattern `new URL(path, import.meta.url).href`.
This brings us in line with modern standards (ie, ESM, which we use
everywhere else) and allows us to remove a linter
exclusion
2) We update our Jest configuration to use ESM instead of CommonJS -
again modernizing a bit. In order for Jest to use ESM we
have to drop `require`, and in order to use `import.meta.url` we have to
use the ESM module mode.

## Implementation Notes
While I was here I also cleaned up a bit of the jest config that was
duplicatous from its preset. Additionally importing from `loglevel` had
to be changed
to use a default import instead of a `*` import - it looks like due to
commonjs interop it was wrapped with a default export twice.

This also requires Node's `--experimental-vm-modules` option which
creates a bit of noise in the console (and could potentially make this a
bit brittle if Node breaks backward compat somewhere), but felt this was
the right way to go - we can backtrack if necessary, and if/when we
migrate to vite/vitest that should be a nonissue.

## Testing
Ran unit tests + launched puzzlemaker

## Related Issues
Started using `import.meta.url` in #748 which broke tests - this
resolves that
@luxaritas luxaritas changed the title RibonanzaNet-SS, specbox in puzzlemaker, lib instructions cleanup, folding util extraction RibonanzaNet-SS, specbox in puzzlemaker, folding util extraction Jun 28, 2024
luxaritas added a commit that referenced this pull request Jun 28, 2024
## Summary
Being able to see the dot plot and additional specs in puzzlemaker can
be useful in a variety of cases
be it to help solve an in-progress puzzle, work on a "partial" extracted
from a puzzle, or
otherwise perform analysis with a free-form modifiable structure.

This is particularly important for the upcoming integration of
RibonanzaNet, where we want to encourage players
to experiment with the behavior of the model, and both the dot plot and
upcoming eF1 metrics will be important
for this

## Testing
Opened specbox in puzzlemaker + made changes to puzzle

## Related Issues
Split off from #748
@luxaritas luxaritas changed the title RibonanzaNet-SS, specbox in puzzlemaker, folding util extraction RibonanzaNet-SS, folding util extraction Jun 28, 2024
Copy link
Member Author

@luxaritas luxaritas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to note in PR description:

  • Async functions without await run synchronously
  • Explain why we introduced skipMelt

src/eterna/folding/RNNet.ts Show resolved Hide resolved
src/eterna/folding/RNNet.ts Show resolved Hide resolved
src/eterna/folding/RNNet.ts Outdated Show resolved Hide resolved
src/eterna/folding/RNNet.ts Outdated Show resolved Hide resolved
src/eterna/folding/RNNet.ts Show resolved Hide resolved
src/eterna/folding/RNNet.ts Show resolved Hide resolved
src/eterna/folding/FoldUtil.ts Outdated Show resolved Hide resolved
src/eterna/folding/RNNet.ts Show resolved Hide resolved
src/eterna/mode/PoseEdit/PoseEditMode.ts Show resolved Hide resolved
src/eterna/mode/PoseEdit/PoseEditMode.ts Show resolved Hide resolved
Copy link
Contributor

@tkaragianes tkaragianes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed in sync conversation with Jonathan, notes are in his review.

luxaritas added a commit that referenced this pull request Jul 3, 2024
#753)

## Summary
This replaces the implementation of SecStruct.getParenthesis with a more
efficient implementation. This also improves the efficiency of filtering
for only/no pseudoknots, which is one of the most expensive parts of
layout recomputation.

## Implementation Notes
Instead of turning a pair map into stems, then for each stem iterating
over all bases between the start and end of the pair to find if a
half-pair has previously been placed there, we now walk over each base
only once, keeping track of any half-open pairs that have not been
closed off yet. This should bring our time complexity from O(n^2) (both
in the stem construction and in the string construction) to O(n). Doing
a quick and dirty benchmark, I saw up to a 10x performance improvement
in this method. It was slower in some cases, but those situation should
already be very fast.

In earlier commits of the PR, I first simplified the existing
implementation to wrap my head around it a bit better.

## Testing
Added new unit tests, verifying against the previous implementation. As
part of my benchmarking experimentation I also validated against the
Ribonanza PDB test set.

## Related Issues
Surfaced during work on #748
luxaritas added a commit that referenced this pull request Jul 3, 2024
## Summary
Currently, our pseudoknot handling only supports 3rd-order pseudoknots
as the only supported sets of characters for pairs are `()`, `[]`, `{}`,
and `<>`. We now add pairs of lower and upper case letters as well as
explicitly erroring when encountering situations we can't handle.

This was change was specifically spurred by the in-progress RibonanzaNet
integration sometimes predicting these higher-order pseudoknots
(https://forum.eternagame.org/t/preview-ribonanzanet-ss-in-eterna/4955/15).

## Implementation Notes
* The setPairs implementation is sometimes slower than the previous
implementation due to the performance penalty of property access.
However, following the pattern of the previous approach would have
caused a significant performance penalty for longer structures without
higher-order pseudoknots, and I've done a fair amount of work here to
optimize performance in multiple different situations. This does mean
that the code does not read as well as it maybe could, as other patterns
have an additive performance penalty. Performance is important here as
this is used on every fold (consider user scripts that may call fold
many times in quick succession) as well as in the current implementation
of pseudoknot filtering (an expensive part of layout recomputation).
Some examples where this comes up include:
* Usage of assignment in conditionals to avoid excess property accesses.
This could have been done with additional conditional nesting instead,
but gut feeling is that this is clearer?
* Note that if the left pair is not found but it's a legal character, we
assign to the local variable then assign to both left and right maps -
again, limiting property accesses.
* Using a second object which maps to the same pair stacks instead of
mapping to the paired character (yet another reduction in property
access).
* Unpaired is the first case while cut point the last case. Admittedly I
didn't really benchmark this - it's possible locality could make it
better/about the same if moved earlier, but my rationale here is that it
should be very rare that we even need to check for this at all.
* Dynamically adding pair maps for additional characters as needed
instead of preallocating them (which is wasteful for structures that
don't need them, which is the most common case)
* Maintaining the "naive" option for when we know we have no pseudoknots
* We did not used to error on encountering structures we couldn't
handle, but we do now. This could technically cause issues somewhere,
but I feel like it is more important to know that something has gone
wrong rather than not knowing that some pairs have silently been changed
to unpaired bases.

## Testing
New and existing unit tests

## Related Issues
Continues/stacked on #753, raised as part of #748
@luxaritas luxaritas merged commit 6908a54 into dev Jul 8, 2024
1 check passed
@luxaritas luxaritas deleted the feat/rnnet-ss branch July 8, 2024 19:07
luxaritas added a commit that referenced this pull request Jul 8, 2024
## Summary
* Puzzlemaker now properly shows energies as disabled for engines that
cant score with pseudoknots
* Energies are noted as unavailable instead of hidden for both pknot
cases as well as engines that cant score
* Switching between engines with and without energies available now
consistently enables and disables energies

## Implementation Notes
Instead of conditionally setting the scoreFolder based on
canScoreStructures and then checking the pseudoknotted case,
canScoreStructures now takes a parameter for whether or not pknots are
enabled, and skipping computations
is now handled just within Pose2D. This avoids logic for whether an
engine can score being split in multiple
places.

## Related Issues
Raised as part of #748
luxaritas added a commit that referenced this pull request Jul 10, 2024
## Summary
* There are now new async versions of EternaScript methods which trigger
folding operations. This allows usage with async-only folding engines,
as well as allowing scripts to use more efficient or user friendly
behavior that is only available when folding happens async
* When calling any EternaScript getter or setter, we now verify that
there are no asyncronous folding operations in process to avoid getting
out of sync. This could happen already even with the existing syncronous
APIs, as a user could trigger an asynchronous fold and then a script
could trigger a synchronous fold before the async fold finishes.
* Fixed inverted error messages for sync/async behavior in specbox
updates
* We now wait until the specbox is fully updated before unlocking the UI
after a fold (and run synchronously if triggered from a synchronous
script call)
* Added missing sync check to select_folder

## Implementation Notes
There was some ambiguity around whether the eternascript setters or
getters should be async and how much actually needs to be completed
before these methods returned. It seemed most straightforward and
consistent to have setters wait until all UI update operations are
finished, treating them as "please trigger this update, and let me know
once it is done". This also makes it clear for how scripts indicate that
they allow async folding to happen (opting-in to better behavior even if
sync folding is not strictly required) .

While I was here
* Reorganized the eternascript callbacks into a more natural order
* Removed set_design_title and load_parameters_from_buffer which I
verified no one is using

## Testing
With the specbox open and closed
* Made normal mutations with eftk and rnnss
* Ran sync methods with eftk and rnnss
* Ran async methods with eftk and rnnss, including serial operations
with and without awaiting

## Related Issues
Followon to #748
@tkaragianes
Copy link
Contributor

Closes #788.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants