Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration to depthcharge v0.4.8 #350

Open
wants to merge 43 commits into
base: dev
Choose a base branch
from

Conversation

andradesalazar
Copy link

This version of Casanovo is now based on depthcharge v0.4.8 instead of v.0.2.3.

@bittremieux bittremieux changed the base branch from main to dev July 2, 2024 06:33
@wfondrie wfondrie self-requested a review July 27, 2024 06:24
Copy link
Collaborator

@wfondrie wfondrie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this excellent PR 🎉

As a note for others, PR also adds a few features and changes some functionality. Some that I noticed:

  • Checkpoints now have useful file names.
  • Early stopping can be enabled.
  • The learning rate is logged.
  • Model precision can now be changed.
  • Various gradient things (clipping and such) can be configured. These are very useful for stability.

I've requested a few, mostly small changes. The biggest thing we need to address now are updates to the unit tests, so that all of our CI checks pass.

@wsnoble, @bittremieux, @melihyilmaz - with as big of a change as this is, you should all take it for a spin and make sure we didn't miss anything!

casanovo/data/ms_io.py Outdated Show resolved Hide resolved
casanovo/config.yaml Show resolved Hide resolved
casanovo/denovo/dataloaders.py Outdated Show resolved Hide resolved
casanovo/denovo/dataloaders.py Outdated Show resolved Hide resolved
casanovo/denovo/dataloaders.py Outdated Show resolved Hide resolved
casanovo/denovo/dataloaders.py Outdated Show resolved Hide resolved
casanovo/denovo/model_runner.py Outdated Show resolved Hide resolved
casanovo/denovo/model_runner.py Outdated Show resolved Hide resolved
casanovo/denovo/model_runner.py Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@andradesalazar
Copy link
Author

Hi @wfondrie ,

thanks for the comments :)

are you taking care of the updates to the unit tests, so that the CI checks pass or is it better if I have a look?

I think the documentation probably needs to be updated a bit, as well as the download of the latest weights for prediction, as the old ones are not compatible anymore.

Best,
Daniela

@bittremieux
Copy link
Collaborator

I think the documentation probably needs to be updated a bit, as well as the download of the latest weights for prediction, as the old ones are not compatible anymore.

Yes, we'll cut a new release v5.x.x for this implementation, as these are some breaking changes. We'll have to train a new model, but with the new major version the downloading code won't get confused.

are you taking care of the updates to the unit tests, so that the CI checks pass or is it better if I have a look?

Some of the first fixes might be relatively straightforward, with some renamed modules that have to be updated in the unit tests. If you have some bandwidth to look at it, feel free to do so.

# Configure early stopping
if config.early_stopping_patience is not None:
self.callbacks.append(
EarlyStopping(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing for now. This will be introduced back into casanovo/dev by a future pr if it is decided to introduce early stopping functionality into the mainline casanovo release.

# Configure learning rate monitor
if config.tb_summarywriter is not None:
self.callbacks.append(
LearningRateMonitor(logging_interval="step", log_momentum=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, removing for now as this will be reintroduced in an open pr.

assert os.path.basename(mgf_small.name) not in out_writer._run_map
assert os.path.abspath(mgf_small.name) in out_writer._run_map
assert mgf_small.name in out_writer._run_map
assert os.path.abspath(mgf_small.name) not in out_writer._run_map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this test to reflect the current behavior of MztabWriter, but it might be worth looking into we want to change the behavior of MztabWriter, especially if depthcharge is updated to include the full path in the spectrum dataloaders.

@Lilferrit
Copy link
Contributor

Lilferrit commented Oct 2, 2024

It looks like there might be a bug in Spec2Pep._finish_beams where beams that have not been predicted to end aren't checked for early termination due to exceeding the precursor m/z tolerance if the tokenizer doesn't have any residues with negative mass.

Particularly, this loop

for aa in ([None] if finished_beams[i] else aa_neg_mass_idx):

that does the early termination check is never entered if the beam isn't finished and there is nothing in aa_neg_mass_idx.

@Lilferrit
Copy link
Contributor

I've initialized the model's tokenizer with the residues from the tiny config as a work around to get the test_beam_search_decode test to run as we look into potential fixes, assuming this actually is a bug.

@bittremieux
Copy link
Collaborator

Yes, from a quick check I think you're right.

In the current version, there's always at least None included for no AAs with a negative mass. This is missing in the changed version.

beam = model.n_beams # S
model.decoder.reverse = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like the current PeptideDecoder supports the reverse option.

@bittremieux
Copy link
Collaborator

bittremieux commented Oct 8, 2024

The tests still seem to fail on GitHub, in contrast to the latest commit message. @Lilferrit is this expected behavior?

@Lilferrit
Copy link
Contributor

Best I can tell looking at the GitHub actions logs, the reason the tests fail on GitHub is due to the Pylance hf_converter keyword issue in Depthcharge. The test do pass in my local environment, but I manually downgraded Pylance to v0.15.0. Should I update Casanovo's pyproject.toml to require Pylance v0.15.0? That should solve the issue with the tests not passing on GitHub.

@bittremieux
Copy link
Collaborator

bittremieux commented Oct 8, 2024

Yes. And also make an issue and link it to the one in DepthCharge to track this.

Copy link

codecov bot commented Oct 8, 2024

Codecov Report

Attention: Patch coverage is 93.42105% with 15 lines in your changes missing coverage. Please review.

Project coverage is 88.85%. Comparing base (0f06ac9) to head (943dda4).

Files with missing lines Patch % Lines
casanovo/denovo/model_runner.py 90.00% 7 Missing ⚠️
casanovo/denovo/dataloaders.py 92.85% 4 Missing ⚠️
casanovo/denovo/model.py 94.73% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #350      +/-   ##
==========================================
- Coverage   94.37%   88.85%   -5.53%     
==========================================
  Files          13       14       +1     
  Lines        1102     1202     +100     
==========================================
+ Hits         1040     1068      +28     
- Misses         62      134      +72     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Lilferrit
Copy link
Contributor

Yes. And also make an issue and link it to the one in DepthCharge to track this.

Done, all of the test on GitHub pass now, and seem to run faster than they have historically as well.

@Lilferrit
Copy link
Contributor

I overwrote the previous merge with dev with the latest rebase to make the diff more representative of any new functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants