RibonanzaNet-SS, folding util extraction #748

luxaritas · 2024-05-25T00:37:27Z

Summary

Add support for the RibonanzaNet-SS folding engine
Extract some generic folding utilities (pUnpaired, threshknot) into FoldUtil for easier reuse (and readability)

Implementation Notes

RibonanzaNet-SS inference setup was based on https://www.kaggle.com/code/shujun717/ribonanzanet-2d-structure-inference/
We've introduced a new asset type, onnx (a standard format for ML models). Particularly notable is that our jest setup has to actually have imports of this asset provide its path so that it can be loaded to run in tests. I've changed all generic assets to do this (instead of just being an empty module) in case something else in the future requires the assets to be actually loaded.
onnxruntime (responsible for doing the actual model inference) can only be run async, so we've had to ensure all codepaths which trigger folding operations can operate asynchronously (or, in the case of scripts, error when attempting to run synchronously). We use the existing PoseOp mechanism (originally built for allowing multistrand folding to yield to the main thread intermittently when not run from scripts) to do this, with some slight modifications.
We have introduced new sometimes-sync-sometimes-async codepaths with function overloads, where a boolean determines whether it is marked async and returns a promise or is not marked async and returns the value directly. The actual implementation is always an async function, however the JS spec specifies that if no await is ever encountered, the function will run synchronously.
When dot plot calculations are needed for constraints, instead of triggering the calculation in the constraint evaluate function (which has to run synchronously), we set a property on the constraint so that we can queue the calculation during the folding process (so that the data is already available when evaluate is run). We've also introduced the skipMelt parameter to updateMeltingPointAndDotPlot - this way if we're only running this update for the constraints, we don't have to run the extra fold/dot plot operations needed for the melt plot (which no constraints use)
Unfortunately we can't make folding multithreaded, even though onnxruntime supports it (and we'd potentially gain something like a 2x speedup), as that requires a cross origin isolated browsing context, which we can't set up because of YouTube and Vimeo embeds :(

Testing

New unit tests
Verified a couple samples were consistent with the kaggle notebook
Misc interactive testing, including user feedback via the forums

…lding util extraction

luxaritas · 2024-05-25T00:42:53Z

TODOs

RNNet-SS Behavior

Updated mask diagonal handling (for 2nt hairpins)
Revisit higher-order pseudoknot limit (add letter-based pairs, error if cardinality is too high)
~~Investigate quantization~~ (Deferred - planning on investigating this + distillation with Shujun later)

Metrics

Fix energies still showing in puzzlemaker (Consistently disable energies when unavailable #757)
eF1 (+ eF1,crossed-pair)
~~eF1 (+ eF1,crossed-pair) constraints~~ (Deferred - don't need this right now, need to think through how this behaves with other engines)
Investigate dot plot being largely gray instead of white
~~Fix dot plots missing target?~~ (Deferred - need to figure out how to handle the fact this has caused an inconsistency in the data exposed to EternaScript)

Engine implementation and access

Change path to rnnet onnx file to engines-bin
Lazy-load/initialize rnnet on first usage
Async APIs for Eternascript
Add tests (fold, dot plot, eF1)

Asset handling

Look into enabling secure context to enable threading (EDIT: Looks like we're out of luck)
Ensure wasm/onnx/ort files are cached
~~Investigate loading bar~~ (Deferred - lazy-loading is sufficient for now)
~~Download confirmation?~~ (Deferred - lazy-loading is sufficient for now)
Investigate splitting out sourcemaps (Improve sourcemap behavior #760)
~~Determine whether we need a "minimal" onnxruntime build (or maybe import from /wasm?)~~ onnxruntime appears to be a relatively limited portion of our bundle size. I tried importing from /wasm and it saved like 300k, but that's not really that meaningful in context, so I'm leaving it as-is so that we won't have to change it back later if we can eventually enable GPU acceleration.

…irpins

… + config cleanup (#752) ## Summary This PR does two related things (each of which is required for the other): 1) Instead of using `require` to synchronouosly get references to assets (triggering webpack to either inline the asset as a data URL or provide the dist output URL), we now use the pattern `new URL(path, import.meta.url).href`. This brings us in line with modern standards (ie, ESM, which we use everywhere else) and allows us to remove a linter exclusion 2) We update our Jest configuration to use ESM instead of CommonJS - again modernizing a bit. In order for Jest to use ESM we have to drop `require`, and in order to use `import.meta.url` we have to use the ESM module mode. ## Implementation Notes While I was here I also cleaned up a bit of the jest config that was duplicatous from its preset. Additionally importing from `loglevel` had to be changed to use a default import instead of a `*` import - it looks like due to commonjs interop it was wrapped with a default export twice. This also requires Node's `--experimental-vm-modules` option which creates a bit of noise in the console (and could potentially make this a bit brittle if Node breaks backward compat somewhere), but felt this was the right way to go - we can backtrack if necessary, and if/when we migrate to vite/vitest that should be a nonissue. ## Testing Ran unit tests + launched puzzlemaker ## Related Issues Started using `import.meta.url` in #748 which broke tests - this resolves that

…unning multiple tests in parallel

## Summary Being able to see the dot plot and additional specs in puzzlemaker can be useful in a variety of cases be it to help solve an in-progress puzzle, work on a "partial" extracted from a puzzle, or otherwise perform analysis with a free-form modifiable structure. This is particularly important for the upcoming integration of RibonanzaNet, where we want to encourage players to experiment with the behavior of the model, and both the dot plot and upcoming eF1 metrics will be important for this ## Testing Opened specbox in puzzlemaker + made changes to puzzle ## Related Issues Split off from #748

luxaritas

Make sure to note in PR description:

Async functions without await run synchronously
Explain why we introduced skipMelt

src/eterna/folding/RNNet.ts

src/eterna/folding/FoldUtil.ts

src/eterna/folding/RNNet.ts

src/eterna/mode/PoseEdit/PoseEditMode.ts

tkaragianes

Reviewed in sync conversation with Jonathan, notes are in his review.

#753) ## Summary This replaces the implementation of SecStruct.getParenthesis with a more efficient implementation. This also improves the efficiency of filtering for only/no pseudoknots, which is one of the most expensive parts of layout recomputation. ## Implementation Notes Instead of turning a pair map into stems, then for each stem iterating over all bases between the start and end of the pair to find if a half-pair has previously been placed there, we now walk over each base only once, keeping track of any half-open pairs that have not been closed off yet. This should bring our time complexity from O(n^2) (both in the stem construction and in the string construction) to O(n). Doing a quick and dirty benchmark, I saw up to a 10x performance improvement in this method. It was slower in some cases, but those situation should already be very fast. In earlier commits of the PR, I first simplified the existing implementation to wrap my head around it a bit better. ## Testing Added new unit tests, verifying against the previous implementation. As part of my benchmarking experimentation I also validated against the Ribonanza PDB test set. ## Related Issues Surfaced during work on #748

## Summary Currently, our pseudoknot handling only supports 3rd-order pseudoknots as the only supported sets of characters for pairs are `()`, `[]`, `{}`, and `<>`. We now add pairs of lower and upper case letters as well as explicitly erroring when encountering situations we can't handle. This was change was specifically spurred by the in-progress RibonanzaNet integration sometimes predicting these higher-order pseudoknots (https://forum.eternagame.org/t/preview-ribonanzanet-ss-in-eterna/4955/15). ## Implementation Notes * The setPairs implementation is sometimes slower than the previous implementation due to the performance penalty of property access. However, following the pattern of the previous approach would have caused a significant performance penalty for longer structures without higher-order pseudoknots, and I've done a fair amount of work here to optimize performance in multiple different situations. This does mean that the code does not read as well as it maybe could, as other patterns have an additive performance penalty. Performance is important here as this is used on every fold (consider user scripts that may call fold many times in quick succession) as well as in the current implementation of pseudoknot filtering (an expensive part of layout recomputation). Some examples where this comes up include: * Usage of assignment in conditionals to avoid excess property accesses. This could have been done with additional conditional nesting instead, but gut feeling is that this is clearer? * Note that if the left pair is not found but it's a legal character, we assign to the local variable then assign to both left and right maps - again, limiting property accesses. * Using a second object which maps to the same pair stacks instead of mapping to the paired character (yet another reduction in property access). * Unpaired is the first case while cut point the last case. Admittedly I didn't really benchmark this - it's possible locality could make it better/about the same if moved earlier, but my rationale here is that it should be very rare that we even need to check for this at all. * Dynamically adding pair maps for additional characters as needed instead of preallocating them (which is wasteful for structures that don't need them, which is the most common case) * Maintaining the "naive" option for when we know we have no pseudoknots * We did not used to error on encountering structures we couldn't handle, but we do now. This could technically cause issues somewhere, but I feel like it is more important to know that something has gone wrong rather than not knowing that some pairs have silently been changed to unpaired bases. ## Testing New and existing unit tests ## Related Issues Continues/stacked on #753, raised as part of #748

## Summary * Puzzlemaker now properly shows energies as disabled for engines that cant score with pseudoknots * Energies are noted as unavailable instead of hidden for both pknot cases as well as engines that cant score * Switching between engines with and without energies available now consistently enables and disables energies ## Implementation Notes Instead of conditionally setting the scoreFolder based on canScoreStructures and then checking the pseudoknotted case, canScoreStructures now takes a parameter for whether or not pknots are enabled, and skipping computations is now handled just within Pose2D. This avoids logic for whether an engine can score being split in multiple places. ## Related Issues Raised as part of #748

## Summary * There are now new async versions of EternaScript methods which trigger folding operations. This allows usage with async-only folding engines, as well as allowing scripts to use more efficient or user friendly behavior that is only available when folding happens async * When calling any EternaScript getter or setter, we now verify that there are no asyncronous folding operations in process to avoid getting out of sync. This could happen already even with the existing syncronous APIs, as a user could trigger an asynchronous fold and then a script could trigger a synchronous fold before the async fold finishes. * Fixed inverted error messages for sync/async behavior in specbox updates * We now wait until the specbox is fully updated before unlocking the UI after a fold (and run synchronously if triggered from a synchronous script call) * Added missing sync check to select_folder ## Implementation Notes There was some ambiguity around whether the eternascript setters or getters should be async and how much actually needs to be completed before these methods returned. It seemed most straightforward and consistent to have setters wait until all UI update operations are finished, treating them as "please trigger this update, and let me know once it is done". This also makes it clear for how scripts indicate that they allow async folding to happen (opting-in to better behavior even if sync folding is not strictly required) . While I was here * Reorganized the eternascript callbacks into a more natural order * Removed set_design_title and load_parameters_from_buffer which I verified no one is using ## Testing With the specbox open and closed * Made normal mutations with eftk and rnnss * Ran sync methods with eftk and rnnss * Ran async methods with eftk and rnnss, including serial operations with and without awaiting ## Related Issues Followon to #748

tkaragianes · 2024-07-24T23:45:51Z

Closes #788.

luxaritas added 2 commits May 24, 2024 20:22

RibonanzaNet-SS, specbox in puzzlemaker, lib instructions cleanup, fo…

a7b88de

…lding util extraction

Merge branch 'dev' into feat/rnnet-ss

32d1f73

luxaritas changed the title ~~Feat/rnnet ss~~ RibonanzaNet-SS, specbox in puzzlemaker, lib instructions cleanup, folding util extraction May 25, 2024

luxaritas added 4 commits June 14, 2024 15:11

Apply mask diagonal logic to prevent rnnet-ss from predicting <3nt ha…

a3e5618

…irpins

Add eF1 and eF1,cross-pair metrics

cfa3c07

Sync ef1 weights with latest notebook version

7683cc1

Lazy-load rnnet

d2ef444

luxaritas mentioned this pull request Jun 25, 2024

Move from require to new URL with import.meta, update Jest to use ESM + config cleanup #752

Merged

luxaritas added 3 commits June 25, 2024 16:11

Merge branch 'dev' into feat/rnnet-ss

d15ce03

RNNet unit tests

dfefab6

Adjust overriding jsdom URL node's in a way that doesn't break when r…

19d9713

…unning multiple tests in parallel

This was referenced Jun 27, 2024

Improve performance of generating dot-bracket strings with pseudoknots #753

Merged

Support pseudoknots of order higher than 3 #754

Merged

Merge branch 'dev' into feat/rnnet-ss

0e30e67

luxaritas changed the title ~~RibonanzaNet-SS, specbox in puzzlemaker, lib instructions cleanup, folding util extraction~~ RibonanzaNet-SS, specbox in puzzlemaker, folding util extraction Jun 28, 2024

This was referenced Jun 28, 2024

Lib setup instructions cleanup #755

Merged

Support specbox in puzzlemaker #756

Merged

Merge branch 'dev' into feat/rnnet-ss

1a2bf17

luxaritas changed the title ~~RibonanzaNet-SS, specbox in puzzlemaker, folding util extraction~~ RibonanzaNet-SS, folding util extraction Jun 28, 2024

luxaritas commented Jul 1, 2024

View reviewed changes

tkaragianes approved these changes Jul 1, 2024

View reviewed changes

Clarity tweaks

73c5796

luxaritas added 3 commits July 3, 2024 16:50

Merge branch 'dev' into feat/rnnet-ss

7ecfc07

Comment tweak

b22fd55

RIP multithreading I guess

8add167

luxaritas added 4 commits July 3, 2024 18:51

Dont draw values that are noise in dot plot

0560abc

Fix race condition in specbox

fc457b1

Allow rnnet built model to be loaded from package

bc621dd

Some minor comments in TEA while we're here

454e0a9

luxaritas merged commit 6908a54 into dev Jul 8, 2024
1 check passed

luxaritas deleted the feat/rnnet-ss branch July 8, 2024 19:07

luxaritas mentioned this pull request Jul 8, 2024

Consistently disable energies when unavailable #757

Merged

luxaritas mentioned this pull request Jul 10, 2024

Support async folding operations in EternaScript #761

Merged

tkaragianes mentioned this pull request Jul 24, 2024

Kaggle models deployed to Eterna #788

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RibonanzaNet-SS, folding util extraction #748

RibonanzaNet-SS, folding util extraction #748

luxaritas commented May 25, 2024 •

edited

Loading

luxaritas commented May 25, 2024 •

edited

Loading

luxaritas left a comment

tkaragianes left a comment

tkaragianes commented Jul 24, 2024

RibonanzaNet-SS, folding util extraction #748

RibonanzaNet-SS, folding util extraction #748

Conversation

luxaritas commented May 25, 2024 • edited Loading

Summary

Implementation Notes

Testing

luxaritas commented May 25, 2024 • edited Loading

TODOs

RNNet-SS Behavior

Metrics

Engine implementation and access

Asset handling

luxaritas left a comment

Choose a reason for hiding this comment

tkaragianes left a comment

Choose a reason for hiding this comment

tkaragianes commented Jul 24, 2024

luxaritas commented May 25, 2024 •

edited

Loading

luxaritas commented May 25, 2024 •

edited

Loading